linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-09-20 17:41:09 +00:00

Author	SHA1	Message	Date
Linus Torvalds	9c37f95936	Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking tree changes from Ingo Molnar: "Two changes: a documentation update and a ticket locks live lock fix" * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ticketlock: Fix spin_unlock_wait() livelock locking/lglocks: Add documentation of current lglocks implementation	2014-12-09 19:59:22 -08:00
Linus Torvalds	a0e4467726	asm-generic: asm/io.h rewrite While there normally is no reason to have a pull request for asm-generic but have all changes get merged through whichever tree needs them, I do have a series for 3.19. There are two sets of patches that change significant portions of asm/io.h, and this branch contains both in order to resolve the conflicts: - Will Deacon has done a set of patches to ensure that all architectures define {read,write}{b,w,l,q}_relaxed() functions or get them by including asm-generic/io.h. These functions are commonly used on ARM specific drivers to avoid expensive L2 cache synchronization implied by the normal {read,write}{b,w,l,q}, but we need to define them on all architectures in order to share the drivers across architectures and to enable CONFIG_COMPILE_TEST configurations for them - Thierry Reding has done an unrelated set of patches that extends the asm-generic/io.h file to the degree necessary to make it useful on ARM64 and potentially other architectures. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIVAwUAVIdwNmCrR//JCVInAQJWuw/9FHt2ThMnI1J1Jqy4CVwtyjWTSa6Y/uVj xSytS7AOvmU/nw1quSoba5mN9fcUQUtK9kqjqNcq71WsQcDE6BF9SFpi9cWtjWcI ZfWsC+5kqry/mbnuHefENipem9RqBrLbOBJ3LARf5M8rZJuTz1KbdZs9r9+1QsCX ou8jeqVvNKUn9J1WyekJBFSrPOtZ4bCUpeyh23JHRfPtJeAHNOuPuymj6WceAz98 uMV1icRaCBMySsf9HgsHRYW5HwuCm3MrrYj6ukyPpgxYz7FRq4hJLDs6GnlFtAGb 71g87NpFdB32qbW+y1ntfYaJyUryMHMVHBWcV5H9m0btdHTRHYZjoOGOPuyLHHO8 +l4/FaOQhnDL8cNDj0HKfhdlyaFylcWgs1wzj68nv31c1dGjcJcQiyCDwry9mJhr erh4EewcerUvWzbBMQ4JP1f8syKMsKwbo1bVU61a1RQJxEqVCzJMLweGSOFmqMX2 6E4ZJVWv81UFLoFTzYx+7+M45K4NWywKNQdzwKmqKHc4OQyvq4ALJI0A7SGFJdDR HJ7VqDiLaSdBitgJcJUxNzKcyXij6wE9jE1fBe3YDFE4LrnZXFVLN+MX6hs7AIFJ vJM1UpxRxQUMGIH2m7rbDNazOAsvQGxINOjNor23cNLuf6qLY1LrpHVPQDAfJVvA 6tROM77bwIQ= =xUv6 -----END PGP SIGNATURE----- Merge tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic asm/io.h rewrite from Arnd Bergmann: "While there normally is no reason to have a pull request for asm-generic but have all changes get merged through whichever tree needs them, I do have a series for 3.19. There are two sets of patches that change significant portions of asm/io.h, and this branch contains both in order to resolve the conflicts: - Will Deacon has done a set of patches to ensure that all architectures define {read,write}{b,w,l,q}_relaxed() functions or get them by including asm-generic/io.h. These functions are commonly used on ARM specific drivers to avoid expensive L2 cache synchronization implied by the normal {read,write}{b,w,l,q}, but we need to define them on all architectures in order to share the drivers across architectures and to enable CONFIG_COMPILE_TEST configurations for them - Thierry Reding has done an unrelated set of patches that extends the asm-generic/io.h file to the degree necessary to make it useful on ARM64 and potentially other architectures" * tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (29 commits) ARM64: use GENERIC_PCI_IOMAP sparc: io: remove duplicate relaxed accessors on sparc32 ARM: sa11x0: Use void __iomem * in MMIO accessors arm64: Use include/asm-generic/io.h ARM: Use include/asm-generic/io.h asm-generic/io.h: Implement generic {read,write}s*() asm-generic/io.h: Reconcile I/O accessor overrides /dev/mem: Use more consistent data types Change xlate_dev_{kmem,mem}_ptr() prototypes ARM: ixp4xx: Properly override I/O accessors ARM: ixp4xx: Fix build with IXP4XX_INDIRECT_PCI ARM: ebsa110: Properly override I/O accessors ARC: Remove redundant PCI_IOBASE declaration documentation: memory-barriers: clarify relaxed io accessor semantics x86: io: implement dummy relaxed accessor macros for writes tile: io: implement dummy relaxed accessor macros for writes sparc: io: implement dummy relaxed accessor macros for writes powerpc: io: implement dummy relaxed accessor macros for writes parisc: io: implement dummy relaxed accessor macros for writes mn10300: io: implement dummy relaxed accessor macros for writes ...	2014-12-09 17:25:00 -08:00
Joe Perches	5cccc702fd	x86: bpf_jit_comp: Remove inline from static function definitions Let the compiler decide instead. No change in object size x86-64 -O2 no profiling Signed-off-by: Joe Perches <joe@perches.com> Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-09 14:56:41 -05:00
Joe Perches	d148134be5	x86: bpf_jit_comp: Reduce is_ereg() code size Use the (1 << reg) & mask trick to reduce code size. x86-64 size difference -O2 without profiling for various gcc versions: $ size arch/x86/net/bpf_jit_comp.o* text data bss dec hex filename 9266 4 0 9270 2436 arch/x86/net/bpf_jit_comp.o.4.4.new 10042 4 0 10046 273e arch/x86/net/bpf_jit_comp.o.4.4.old 9109 4 0 9113 2399 arch/x86/net/bpf_jit_comp.o.4.6.new 9717 4 0 9721 25f9 arch/x86/net/bpf_jit_comp.o.4.6.old 8789 4 0 8793 2259 arch/x86/net/bpf_jit_comp.o.4.7.new 10245 4 0 10249 2809 arch/x86/net/bpf_jit_comp.o.4.7.old 9671 4 0 9675 25cb arch/x86/net/bpf_jit_comp.o.4.9.new 10679 4 0 10683 29bb arch/x86/net/bpf_jit_comp.o.4.9.old Signed-off-by: Joe Perches <joe@perches.com> Tested-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-09 14:56:41 -05:00
Linus Torvalds	0160928e79	EDAC updates all over the place: * Enablement for AMD F15h models 0x60 CPUs. Most notably DDR4 RAM support. Out of tree stuff is arch/x86/kernel/amd_nb.c \| 2 + include/linux/pci_ids.h \| 2 + adding the required PCI IDs. From Aravind Gopalakrishnan. * Enable amd64_edac for 32-bit due to popular demand. From Tomasz Pala. * Convert the AMD MCE injection module to debugfs, where it belongs. * Misc EDAC cleanups -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJUhZbcAAoJEBLB8Bhh3lVKd5kQAK77qsB4ebpke1rEBerQl9jQ YCVrByCKu7QTRt/xvlqU9Vyp7EvcnpNxFbRCCqzIpBcJjre9v17QVRA2/zFS0q81 QRDTOWf9uhMWbssI2Zu1hbjDNMWYiEb9+aeZHjScVtkzPDmsgYuKGdWfTSLw9dkS AG/UUZ3ojwyc2dK3i0W3pCjakKYLUsCmijyTZBfb37+u4rRGuAgrQ9G8fBn2lL+l huptgV2BsTCQqdL554zTs64Yt912PnidsJWYCPCjMubgEPSeNcWRzTTBYDUf9NIn RxFXYHnOQxetPSQqfLVXlX3V/cGNQg0yEXFZ9S0tCt5uLNbbN/D8Uumtst0rq9x3 XkJ/EGHXBFP+KwHdV/i9j6OYM5rq4z+4ql4OqbWzsvPrEDbh/4p5gRbhqd1Hhy9U zgHJjVPpD/l2t82Tpz0jIOscQruZ6VqGMDSYo3LiLnNt724pcrmr3DiN9mc6ljzJ rsNsemMH0IoH8KbBHKGtMLnBVO6HbnrtC6iKFfocNBisvo1PKKzn9s2O1pdjmsCs jHwz5njoM7Ki/ygkJhbKiSDMXPs67eggwoGIGzpNMoY4RWxrcQzYE9yKfKzRNxET Qb3xUwDWDyL8ErwHtL3xMxGwyfkhb+SZdMd5aKYA5Rdbf+TN8P6iAv05nrnfpkyk lPTv5o9EQYvr8/Tc9FZW =ULmb -----END PGP SIGNATURE----- Merge tag 'edac_for_3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp Pull EDAC updates from Borislav Petkov: "EDAC updates all over the place: - Enablement for AMD F15h models 0x60 CPUs. Most notably DDR4 RAM support. Out of tree stuff is adding the required PCI IDs. From Aravind Gopalakrishnan. - Enable amd64_edac for 32-bit due to popular demand. From Tomasz Pala. - Convert the AMD MCE injection module to debugfs, where it belongs. - Misc EDAC cleanups" * tag 'edac_for_3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: EDAC, MCE, AMD: Correct formatting of decoded text EDAC, mce_amd_inj: Add an injector function EDAC, mce_amd_inj: Add hw-injection attributes EDAC, mce_amd_inj: Enable direct writes to MCE MSRs EDAC, mce_amd_inj: Convert mce_amd_inj module to debugfs EDAC: Delete unnecessary check before calling pci_dev_put() EDAC, pci_sysfs: remove unneccessary ifdef around entire file ghes_edac: Use snprintf() to silence a static checker warning amd64_edac: Build module on x86-32 EDAC, MCE, AMD: Add decoding table for MC6 xec amd64_edac: Add F15h M60h support {mv64x60,ppc4xx}_edac,: Remove deprecated IRQF_DISABLED EDAC: Sync memory types and names EDAC: Add DDR3 LRDIMM entries to edac_mem_types x86, amd_nb: Add device IDs to NB tables for F15h M60h pci_ids: Add PCI device IDs for F15h M60h	2014-12-08 20:17:49 -08:00
Al Viro	ba00410b81	Merge branch 'iov_iter' into for-next	2014-12-08 20:39:29 -05:00
Rafael J. Wysocki	cfc75ed68b	Merge branch 'pm-cpufreq' * pm-cpufreq: (21 commits) intel_pstate: skip this driver if Sun server has _PPC method cpufreq: arm_big_little: free OPP table created during ->init() imx6q: free OPP table created during ->init() exynos5440: free OPP table created during ->init() cpufreq-dt: free OPP table created during ->init() cpufreq-dt: register cooling device from ->ready() callback cpufreq: Introduce ->ready() callback for cpufreq drivers cpufreq-dt: pass 'policy->related_cpus' to of_cpufreq_cooling_register() cpufreq: Fix formatting issues in 'struct cpufreq_driver' cpufreq: pxa2xx: Add Kconfig entry cpufreq: Ref the policy object sooner cpufreq: Kconfig: Remove architecture specific menu entries cpufreq: pcc: Enable autoload of pcc-cpufreq for ACPI processors intel_pstate: Add CPUID for BDW-H CPU intel_pstate: Add support for HWP x86: Add support for Intel HWP feature detection. cpufreq: respect the min/max settings from user space cpufreq: cpufreq-dt: Handle regulator_get_voltage() failure cpufreq: cpufreq-dt: Improve debug about matching OPP cpufreq: Loongson1: Add cpufreq driver for Loongson1B ...	2014-12-08 20:00:15 +01:00
Rafael J. Wysocki	648fcab2b0	Merge branch 'pm-cpuidle' * pm-cpuidle: cpuidle: add MAINTAINERS entry for ARM Exynos cpuidle driver drivers: cpuidle: Remove cpuidle-arm64 duplicate error messages drivers: cpuidle: Add idle-state-name description to ARM idle states drivers: cpuidle: Add status property to ARM idle states cpuidle: Invert CPUIDLE_FLAG_TIME_VALID logic	2014-12-08 20:00:09 +01:00
Mark Brown	addaeea9ee	Merge remote-tracking branches 'asoc/topic/hdmi', 'asoc/topic/intel', 'asoc/topic/jack', 'asoc/topic/jz4740' and 'asoc/topic/lm49453' into asoc-next	2014-12-08 13:12:00 +00:00
Richard Weinberger	15bae280e4	x86/kconfig/defconfig: Enable CONFIG_FHANDLE=y systemd has a hard dependency on CONFIG_FHANDLE. If you run systemd with CONFIG_FHANDLE=n it will somehow boot but fail to spawn a getty or other basic services. As systemd is now used by most x86 distributions it makes sense to enabled this by default and save kernel hackers a lot of value debugging time. Signed-off-by: Richard Weinberger <richard@nod.at> Cc: gregkh@linuxfoundation.org Cc: rafael.j.wysocki@intel.com Cc: pebolle@tiscali.nl Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1416958612-7448-1-git-send-email-richard@nod.at Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-12-08 12:04:17 +01:00
Juergen Gross	76f0a486fa	xen: annotate xen_set_identity_and_remap_chunk() with __init Commit `5b8e7d8054` removed the __init annotation from xen_set_identity_and_remap_chunk(). Add it again. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2014-12-08 10:55:30 +00:00
Juergen Gross	90fff3ea15	xen: introduce helper functions to do safe read and write accesses Introduce two helper functions to safely read and write unsigned long values from or to memory when the access may fault because the mapping is non-present or read-only. These helpers can be used instead of open coded uses of __get_user() and __put_user() avoiding the need to do casts to fix sparse warnings. Use the helpers in page.h and p2m.c. This will fix the sparse warnings when doing "make C=1". Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2014-12-08 10:53:59 +00:00
Rasmus Villemoes	3736708f03	x86: Replace seq_printf() with seq_puts() seq_puts is a lot cheaper than seq_printf, so use that to print literal strings. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Link: http://lkml.kernel.org/r/1417208622-12264-1-git-send-email-linux@rasmusvillemoes.dk Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-12-08 11:48:15 +01:00
Oleg Nesterov	78bff1c868	x86/ticketlock: Fix spin_unlock_wait() livelock arch_spin_unlock_wait() looks very suboptimal, to the point I think this is just wrong and can lead to livelock: if the lock is heavily contended we can never see head == tail. But we do not need to wait for arch_spin_is_locked() == F. If it is locked we only need to wait until the current owner drops this lock. So we could simply spin until old_head != lock->tickets.head in this case, but .head can overflow and thus we can't check "unlocked" only once before the main loop. Also, the "unlocked" check can ignore TICKET_SLOWPATH_FLAG bit. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Paul E.McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Waiman Long <Waiman.Long@hp.com> Link: http://lkml.kernel.org/r/20141201213417.GA5842@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-12-08 11:36:44 +01:00
Borislav Petkov	c7c9b3929b	x86/mce: Spell "panicked" correctly We need the additional "k" to make it a hard-c: https://en.wiktionary.org/wiki/panicked Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1417642605-15730-1-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-12-08 11:12:46 +01:00
Dave Airlie	8c86394470	Linux 3.18 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJUhNLZAAoJEHm+PkMAQRiGAEcH/iclYDW7k2GKemMqboy+Ohmh +ELbQothNhlGZlS1wWdD69LBiiXkkQ+ufVYFh/hC0oy0gUdfPMt5t+bOHy6cjn6w 9zOcACtpDKnqbOwRqXZjZgNmIabk7lRjbn7GK4GQqpIaW4oO0FWcT91FFhtGSPDa tjtmGRqDmbNsqfzr18h0WPEpUZmT6MxIdv17AYDliPB1MaaRuAv1Kss05TJrXdfL Oucv+C0uwnybD9UWAz6pLJ3H/HR9VJFdkaJ4Y0pbCHAuxdd1+swoTpicluHlsJA1 EkK5iWQRMpcmGwKvB0unCAQljNpaJiq4/Tlmmv8JlYpMlmIiVLT0D8BZx5q05QQ= =oGNw -----END PGP SIGNATURE----- Merge tag 'v3.18' into drm-next Linux 3.18 Backmerge Linus tree into -next as we had conflicts in i915/radeon/nouveau, and everyone was solving them individually. * tag 'v3.18': (57 commits) Linux 3.18 watchdog: s3c2410_wdt: Fix the mask bit offset for Exynos7 uapi: fix to export linux/vm_sockets.h i2c: cadence: Set the hardware time-out register to maximum value i2c: davinci: generate STP always when NACK is received ahci: disable MSI on SAMSUNG 0xa800 SSD context_tracking: Restore previous state in schedule_user slab: fix nodeid bounds check for non-contiguous node IDs lib/genalloc.c: export devm_gen_pool_create() for modules mm: fix anon_vma_clone() error treatment mm: fix swapoff hang after page migration and fork fat: fix oops on corrupted vfat fs ipc/sem.c: fully initialize sem_array before making it visible drivers/input/evdev.c: don't kfree() a vmalloc address cxgb4: Fill in supported link mode for SFP modules xen-netfront: Remove BUGs on paged skb data which crosses a page boundary mm/vmpressure.c: fix race in vmpressure_work_fn() mm: frontswap: invalidate expired data on a dup-store failure mm: do not overwrite reserved pages counter at show_mem() drm/radeon: kernel panic in drm_calc_vbltimestamp_from_scanoutpos with 3.18.0-rc6 ... Conflicts: drivers/gpu/drm/i915/intel_display.c drivers/gpu/drm/nouveau/nouveau_drm.c drivers/gpu/drm/radeon/radeon_cs.c	2014-12-08 10:33:52 +10:00
Borislav Petkov	fbae4ba8c4	x86, microcode: Reload microcode on resume Normally, we do reapply microcode on resume. However, in the cases where that microcode comes from the early loader and the late loader hasn't been utilized yet, there's no easy way for us to go and apply the patch applied during boot by the early loader. Thus, reuse the patch stashed by the early loader for the BSP. Signed-off-by: Borislav Petkov <bp@suse.de>	2014-12-06 13:03:03 +01:00
Boris Ostrovsky	a18a0f6850	x86, microcode: Don't initialize microcode code on paravirt Paravirtual guests are not expected to load microcode into processors and therefore it is not necessary to initialize microcode loading logic. In fact, under certain circumstances initializing this logic may cause the guest to crash. Specifically, 32-bit kernels use __pa_nodebug() macro which does not work in Xen (the code path that leads to this macro happens during resume when we call mc_bp_resume()->load_ucode_ap() ->check_loader_disabled_ap()) Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: http://lkml.kernel.org/r/1417469264-31470-1-git-send-email-boris.ostrovsky@oracle.com Signed-off-by: Borislav Petkov <bp@suse.de>	2014-12-06 12:59:03 +01:00
Borislav Petkov	47768626c6	x86, microcode, intel: Drop unused parameter apply_microcode_early() doesn't use mc_saved_data, kill it. Signed-off-by: Borislav Petkov <bp@suse.de>	2014-12-06 12:58:56 +01:00
Alexei Starovoitov	769e0de647	bpf: x86: fix epilogue generation for eBPF programs classic BPF has a restriction that last insn is always BPF_RET. eBPF doesn't have BPF_RET instruction and this restriction. It has BPF_EXIT insn which can appear anywhere in the program one or more times and it doesn't have to be last insn. Fix eBPF JIT to emit epilogue when first BPF_EXIT is seen and all other BPF_EXIT instructions will be emitted as jump. Since jump offset to epilogue is computed as: jmp_offset = ctx->cleanup_addr - addrs[i] we need to change type of cleanup_addr to signed to compute the offset as: (long long) ((int)20 - (int)30) instead of: (long long) ((unsigned int)20 - (int)30) Fixes: `622582786c` ("net: filter: x86: internal BPF JIT") Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-05 21:23:54 -08:00
Linus Torvalds	beb5af4033	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "Two final fixlets for 3.18: - Prevent microcode reload wreckage on 32bit - Unbreak cross compilation" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, microcode: Limit the microcode reloading to 64-bit for now x86: Use $(OBJDUMP) instead of plain objdump	2014-12-05 10:47:19 -08:00
Juergen Gross	2e917175e1	xen: Speed up set_phys_to_machine() by using read-only mappings Instead of checking at each call of set_phys_to_machine() whether a new p2m page has to be allocated due to writing an entry in a large invalid or identity area, just map those areas read only and react to a page fault on write by allocating the new page. This change will make the common path with no allocation much faster as it only requires a single write of the new mfn instead of walking the address translation tables and checking for the special cases. Suggested-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2014-12-04 14:09:20 +00:00
Juergen Gross	054954eb05	xen: switch to linear virtual mapped sparse p2m list At start of the day the Xen hypervisor presents a contiguous mfn list to a pv-domain. In order to support sparse memory this mfn list is accessed via a three level p2m tree built early in the boot process. Whenever the system needs the mfn associated with a pfn this tree is used to find the mfn. Instead of using a software walked tree for accessing a specific mfn list entry this patch is creating a virtual address area for the entire possible mfn list including memory holes. The holes are covered by mapping a pre-defined page consisting only of "invalid mfn" entries. Access to a mfn entry is possible by just using the virtual base address of the mfn list and the pfn as index into that list. This speeds up the (hot) path of determining the mfn of a pfn. Kernel build on a Dell Latitude E6440 (2 cores, HT) in 64 bit Dom0 showed following improvements: Elapsed time: 32:50 -> 32:35 System: 18:07 -> 17:47 User: 104:00 -> 103:30 Tested with following configurations: - 64 bit dom0, 8GB RAM - 64 bit dom0, 128 GB RAM, PCI-area above 4 GB - 32 bit domU, 512 MB, 8 GB, 43 GB (more wouldn't work even without the patch) - 32 bit domU, ballooning up and down - 32 bit domU, save and restore - 32 bit domU with PCI passthrough - 64 bit domU, 8 GB, 2049 MB, 5000 MB - 64 bit domU, ballooning up and down - 64 bit domU, save and restore - 64 bit domU with PCI passthrough Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2014-12-04 14:09:15 +00:00
Juergen Gross	0aad568983	xen: Hide get_phys_to_machine() to be able to tune common path Today get_phys_to_machine() is always called when the mfn for a pfn is to be obtained. Add a wrapper __pfn_to_mfn() as inline function to be able to avoid calling get_phys_to_machine() when possible as soon as the switch to a linear mapped p2m list has been done. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2014-12-04 14:09:09 +00:00
Juergen Gross	792230c3a6	x86: Introduce function to get pmd entry pointer Introduces lookup_pmd_address() to get the address of the pmd entry related to a virtual address in the current address space. This function is needed for support of a virtual mapped sparse p2m list in xen pv domains, as we need the address of the pmd entry, not the one of the pte in that case. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2014-12-04 14:09:04 +00:00
Juergen Gross	5b8e7d8054	xen: Delay invalidating extra memory When the physical memory configuration is initialized the p2m entries for not pouplated memory pages are set to "invalid". As those pages are beyond the hypervisor built p2m list the p2m tree has to be extended. This patch delays processing the extra memory related p2m entries during the boot process until some more basic memory management functions are callable. This removes the need to create new p2m entries until virtual memory management is available. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2014-12-04 14:08:59 +00:00
Juergen Gross	97f4533a60	xen: Delay m2p_override initialization The m2p overrides are used to be able to find the local pfn for a foreign mfn mapped into the domain. They are used by driver backends having to access frontend data. As this functionality isn't used in early boot it makes no sense to initialize the m2p override functions very early. It can be done later without doing any harm, removing the need for allocating memory via extend_brk(). While at it make some m2p override functions static as they are only used internally. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2014-12-04 14:08:53 +00:00
Juergen Gross	1f3ac86b4c	xen: Delay remapping memory of pv-domain Early in the boot process the memory layout of a pv-domain is changed to match the E820 map (either the host one for Dom0 or the Xen one) regarding placement of RAM and PCI holes. This requires removing memory pages initially located at positions not suitable for RAM and adding them later at higher addresses where no restrictions apply. To be able to operate on the hypervisor supported p2m list until a virtual mapped linear p2m list can be constructed, remapping must be delayed until virtual memory management is initialized, as the initial p2m list can't be extended unlimited at physical memory initialization time due to it's fixed structure. A further advantage is the reduction in complexity and code volume as we don't have to be careful regarding memory restrictions during p2m updates. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2014-12-04 14:08:48 +00:00
Juergen Gross	7108c9ce8f	xen: use common page allocation function in p2m.c In arch/x86/xen/p2m.c three different allocation functions for obtaining a memory page are used: extend_brk(), alloc_bootmem_align() or __get_free_page(). Which of those functions is used depends on the progress of the boot process of the system. Introduce a common allocation routine selecting the to be called allocation routine dynamically based on the boot progress. This allows moving initialization steps without having to care about changing allocation calls. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2014-12-04 14:08:42 +00:00
Juergen Gross	820c4db2be	xen: Make functions static Some functions in arch/x86/xen/p2m.c are used locally only. Make them static. Rearrange the functions in p2m.c to avoid forward declarations. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2014-12-04 14:08:37 +00:00
Juergen Gross	6f58d89e6c	xen: fix some style issues in p2m.c The source arch/x86/xen/p2m.c has some coding style issues. Fix them. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2014-12-04 14:08:29 +00:00
Boris Ostrovsky	14520c92cb	xen/pci: Use APIC directly when APIC virtualization hardware is available When hardware supports APIC/x2APIC virtualization we don't need to use pirqs for MSI handling and instead use APIC since most APIC accesses (MMIO or MSR) will now be processed without VMEXITs. As an example, netperf on the original code produces this profile (collected wih 'xentrace -e 0x0008ffff -T 5'): 342 cpu_change 260 CPUID 34638 HLT 64067 INJ_VIRQ 28374 INTR 82733 INTR_WINDOW 10 NPF 24337 TRAP 370610 vlapic_accept_pic_intr 307528 VMENTRY 307527 VMEXIT 140998 VMMCALL 127 wrap_buffer After applying this patch the same test shows 230 cpu_change 260 CPUID 36542 HLT 174 INJ_VIRQ 27250 INTR 222 INTR_WINDOW 20 NPF 24999 TRAP 381812 vlapic_accept_pic_intr 166480 VMENTRY 166479 VMEXIT 77208 VMMCALL 81 wrap_buffer ApacheBench results (ab -n 10000 -c 200) improve by about 10% Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2014-12-04 13:02:43 +00:00
Boris Ostrovsky	066d79e4e2	xen/pci: Defer initialization of MSI ops on HVM guests If the hardware supports APIC virtualization we may decide not to use pirqs and instead use APIC/x2APIC directly, meaning that we don't want to set x86_msi.setup_msi_irqs and x86_msi.teardown_msi_irq to Xen-specific routines. However, x2APIC is not set up by the time pci_xen_hvm_init() is called so we need to postpone setting these ops until later, when we know which APIC mode is used. (Note that currently x2APIC is never initialized on HVM guests. This may change in the future) Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2014-12-04 13:00:51 +00:00
Stefano Stabellini	a4dba13089	xen/arm/arm64: introduce xen_arch_need_swiotlb Introduce an arch specific function to find out whether a particular dma mapping operation needs to bounce on the swiotlb buffer. On ARM and ARM64, if the page involved is a foreign page and the device is not coherent, we need to bounce because at unmap time we cannot execute any required cache maintenance operations (we don't know how to find the pfn from the mfn). No change of behaviour for x86. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2014-12-04 12:41:54 +00:00
Stefano Stabellini	a0f2dee0cd	xen: add a dma_addr_t dev_addr argument to xen_dma_map_page dev_addr is the machine address of the page. The new parameter can be used by the ARM and ARM64 implementations of xen_dma_map_page to find out if the page is a local page (pfn == mfn) or a foreign page (pfn != mfn). dev_addr could be retrieved again from the physical address, using pfn_to_mfn, but it requires accessing an rbtree. Since we already have the dev_addr in our hands at the call site there is no need to get the mfn twice. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2014-12-04 12:41:51 +00:00
Julia Lawall	a6326ba025	crypto: sha - replace memset by memzero_explicit Memset on a local variable may be removed when it is called just before the variable goes out of scope. Using memzero_explicit defeats this optimization. A simplified version of the semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ identifier x; type T; @@ { ... when any T x[...]; ... when any when exists - memset + memzero_explicit (x, -0, ...) ... when != x when strict } // </smpl> This change was suggested by Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2014-12-02 22:55:49 +08:00
Dave Airlie	e8115e79aa	Linux 3.18-rc7 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJUe7l9AAoJEHm+PkMAQRiGkGcIAIryQ7NKn4IaxUtS807Lx4Ih obEnx7nNKZTXCZpD/7XQGHMMJyozMJR50PHZESJoHu4Luhv9h7EFRnyJ6MdqMlwn zla3zY0yRsHwPoJKcHbSE0CPHZz0WPQHj7IEbM+XJz2tMNJfbgTrezElmcCM4DRp c9ae+ggwZ2cyNYM0r2RSwSJ525WMh69f9dzSUE27fpvkllQgwqNs/jHYz8HNOEht FWcv5UhvzKjwJS3awULfOB3zH2QdFvVTrwAzd+kbV2Q6T6CaUoFRlhXeKUO6W2Jv pJM6oj8tMZUkdXEv7EQXT1kwEqC4DULTTTHs4tSF79O1ESmNfePiOwwBcwoM2nM= =kG1Y -----END PGP SIGNATURE----- Merge tag 'v3.18-rc7' into drm-next This fixes a bunch of conflicts prior to merging i915 tree. Linux 3.18-rc7 Conflicts: drivers/gpu/drm/exynos/exynos_drm_drv.c drivers/gpu/drm/i915/i915_drv.c drivers/gpu/drm/i915/intel_pm.c drivers/gpu/drm/tegra/dc.c	2014-12-02 10:58:33 +10:00
Steven Rostedt (Red Hat)	6a06bdbf7f	ftrace/fgraph/x86: Have prepare_ftrace_return() take ip as first parameter The function graph helper function prepare_ftrace_return() which does the work to hijack the parent pointer has that parent pointer as its first parameter. Instead, if we make it the second parameter and have ip as the first parameter (self_addr), then it can use the %rdi from save_mcount_regs that loads it already. Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1411262304010.3961@nanos Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-12-01 14:08:58 -05:00
Steven Rostedt (Red Hat)	f1ab00af81	ftrace/x86: Get rid of ftrace_caller_setup Move all the work from ftrace_caller_setup into save_mcount_regs. This simplifies the code and makes it easier to understand. Link: http://lkml.kernel.org/r/CA+55aFxUTUbdxpjVMW8X9c=o8sui7OB_MYPfcbJuDyfUWtNrNg@mail.gmail.com Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1411262304010.3961@nanos Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-12-01 14:08:43 -05:00
Steven Rostedt (Red Hat)	0687c36e45	ftrace/x86: Have save_mcount_regs macro also save stack frames if needed The save_mcount_regs macro saves and restores the required mcount regs that need to be saved before calling C code. It is done for all the function hook utilities (static tracing, dynamic tracing, regs, function graph). When frame pointers are enabled, the ftrace trampolines need to set up frames and pointers such that a back trace (dump stack) can continue passed them. Currently, a separate macro is used (create_frame) to do this, but it's only done for the ftrace_caller and ftrace_reg_caller functions. It is not done for the static tracer or function graph tracing. Instead of having a separate macro doing the recording of the frames, have the save_mcount_regs perform this task. This also has all tracers saving the frame pointers when needed. Link: http://lkml.kernel.org/r/CA+55aFwF+qCGSKdGaEgW4p6N65GZ5_XTV=1NbtWDvxnd5yYLiw@mail.gmail.com Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1411262304010.3961@nanos Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-12-01 14:08:27 -05:00
Steven Rostedt (Red Hat)	85f6f0290c	ftrace/x86: Add macro MCOUNT_REG_SIZE for amount of stack used to save mcount regs The macro save_mcount_regs saves regs onto the stack. But to uncouple the amount of stack used in that macro from the users of the macro, we need to have a define that tells all the users how much stack is used by that macro. This way we can change the amount of stack the macro uses without breaking its users. Also remove some dead code that was left over from commit `fdc841b58c` "ftrace: x86: Remove check of obsolete variable function_trace_stop". Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1411262304010.3961@nanos Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-12-01 14:08:15 -05:00
Steven Rostedt (Red Hat)	527aa75b33	ftrace/x86: Simplify save_mcount_regs on getting RIP Currently save_mcount_regs is passed a "skip" parameter to know how much stack updated the pt_regs, as it tries to keep the saved pt_regs in the same location for all users. This is rather stupid, especially since the part stored on the pt_regs has nothing to do with what is suppose to be in that location. Instead of doing that, just pass in an "added" parameter that lets that macro know how much stack was added before it was called so that it can get to the RIP. But the difference is that it will now offset the pt_regs by that "added" count. The caller now needs to take care of the offset of the pt_regs. This will make it easier to simplify the code later. Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1411262304010.3961@nanos Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-12-01 14:07:50 -05:00
Steven Rostedt (Red Hat)	094dfc5451	ftrace/x86: Have save_mcount_regs store RIP in %rdi for first parameter Instead of having save_mcount_regs store the RIP in %rdx as a temp register to place it in the proper location of the pt_regs on the stack. Use the %rdi register as the temp register. This lets us remove the extra store in the ftrace_caller_setup macro. Link: http://lkml.kernel.org/r/CA+55aFwF+qCGSKdGaEgW4p6N65GZ5_XTV=1NbtWDvxnd5yYLiw@mail.gmail.com Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1411262304010.3961@nanos Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-12-01 14:07:39 -05:00
Steven Rostedt (Red Hat)	05df710ec3	ftrace/x86: Rename MCOUNT_SAVE_FRAME and add more detailed comments The name MCOUNT_SAVE_FRAME is rather confusing as it really isn't a function frame that is saved, but just the required mcount registers that are needed to be saved before C code may be called. The word "frame" confuses it as being a function frame which it is not. Rename MCOUNT_SAVE_FRAME and MCOUNT_RESTORE_FRAME to save_mcount_regs and restore_mcount_regs respectively. Noticed the lower case, which keeps it from screaming at the reviewers. Link: http://lkml.kernel.org/r/CA+55aFwF+qCGSKdGaEgW4p6N65GZ5_XTV=1NbtWDvxnd5yYLiw@mail.gmail.com Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1411262304010.3961@nanos Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-12-01 14:07:27 -05:00
Steven Rostedt (Red Hat)	4bcdf1522f	ftrace/x86: Move MCOUNT_SAVE_FRAME out of header file Linus pointed out that MCOUNT_SAVE_FRAME is used in only a single file and that there's no reason that it should be in a header file. Move the macro to the code that uses it. Link: http://lkml.kernel.org/r/CA+55aFwF+qCGSKdGaEgW4p6N65GZ5_XTV=1NbtWDvxnd5yYLiw@mail.gmail.com Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1411262304010.3961@nanos Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-12-01 14:07:16 -05:00
Steven Rostedt (Red Hat)	76c2f13c55	ftrace/x86: Have static tracing also use ftrace_caller_setup Linus pointed out that there were locations that did the hard coded update of the parent and rip parameters. One of them was the static tracer which could also use the ftrace_caller_setup to do that work. In fact, because it did not use it, it is prone to bugs, and since the static tracer is hardly ever used (who wants function tracing code always being called?) it doesn't get tested very often. I only run a few "does it still work" tests on it. But I do not run stress tests on that code. Although, since it is never turned off, just having it on should be stressful enough. (especially for the performance folks) There's no reason that the static tracer can't also use ftrace_caller_setup. Have it do so. Link: http://lkml.kernel.org/r/CA+55aFwF+qCGSKdGaEgW4p6N65GZ5_XTV=1NbtWDvxnd5yYLiw@mail.gmail.com Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1411262304010.3961@nanos Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-12-01 14:06:52 -05:00
Borislav Petkov	2ef84b3bb9	x86, microcode, AMD: Do not use smp_processor_id() in preemtible context Hand down the cpu number instead, otherwise lockdep screams when doing echo 1 > /sys/devices/system/cpu/microcode/reload. BUG: using smp_processor_id() in preemptible [00000000] code: amd64-microcode/2470 caller is debug_smp_processor_id+0x12/0x20 CPU: 1 PID: 2470 Comm: amd64-microcode Not tainted 3.18.0-rc6+ #26 ... Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1417428741-4501-1-git-send-email-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-12-01 11:51:05 +01:00
Borislav Petkov	02ecc41abc	x86, microcode: Limit the microcode reloading to 64-bit for now First, there was this: https://bugzilla.kernel.org/show_bug.cgi?id=88001 The problem there was that microcode patches are not being reapplied after suspend-to-ram. It was important to reapply them, though, because of for example Haswell's TSX erratum which disabled TSX instructions with a microcode patch. A simple fix was `fb86b97300` ("x86, microcode: Update BSPs microcode on resume") but, as it is often the case, simple fixes are too simple. This one causes 32-bit resume to fail: https://bugzilla.kernel.org/show_bug.cgi?id=88391 Properly fixing this would require more involved changes for which it is too late now, right before the merge window. Thus, limit this to 64-bit only temporarily. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1417353999-32236-1-git-send-email-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-12-01 10:55:08 +01:00
Rafael J. Wysocki	8a497cfdc0	Merge back earlier cpufreq material for 3.19-rc1.	2014-12-01 02:46:24 +01:00
Ard Biesheuvel	d3fccc7ef8	kvm: fix kvm_is_mmio_pfn() and rename to kvm_is_reserved_pfn() This reverts commit `85c8555ff0` ("KVM: check for !is_zero_pfn() in kvm_is_mmio_pfn()") and renames the function to kvm_is_reserved_pfn. The problem being addressed by the patch above was that some ARM code based the memory mapping attributes of a pfn on the return value of kvm_is_mmio_pfn(), whose name indeed suggests that such pfns should be mapped as device memory. However, kvm_is_mmio_pfn() doesn't do quite what it says on the tin, and the existing non-ARM users were already using it in a way which suggests that its name should probably have been 'kvm_is_reserved_pfn' from the beginning, e.g., whether or not to call get_page/put_page on it etc. This means that returning false for the zero page is a mistake and the patch above should be reverted. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-26 14:40:45 +01:00
Kees Cook	4943ba16bb	crypto: include crypto- module prefix in template This adds the module loading prefix "crypto-" to the template lookup as well. For example, attempting to load 'vfat(blowfish)' via AF_ALG now correctly includes the "crypto-" prefix at every level, correctly rejecting "vfat": net-pf-38 algif-hash crypto-vfat(blowfish) crypto-vfat(blowfish)-all crypto-vfat Reported-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2014-11-26 20:06:30 +08:00
Sasha Levin	db08655437	x86/nmi: Fix use of unallocated cpumask_var_t Commit "x86/nmi: Perform a safe NMI stack trace on all CPUs" has introduced a cpumask_var_t variable: +static cpumask_var_t printtrace_mask; But never allocated it before using it, which caused a NULL ptr deref when trying to print the stack trace: [ 1110.296154] BUG: unable to handle kernel NULL pointer dereference at (null) [ 1110.296169] IP: __memcpy (arch/x86/lib/memcpy_64.S:151) [ 1110.296178] PGD 4c34b3067 PUD 4c351b067 PMD 0 [ 1110.296186] Oops: 0002 [#1] PREEMPT SMP KASAN [ 1110.296234] Dumping ftrace buffer: [ 1110.296330] (ftrace buffer empty) [ 1110.296339] Modules linked in: [ 1110.296345] CPU: 1 PID: 10538 Comm: trinity-c99 Not tainted 3.18.0-rc5-next-20141124-sasha-00058-ge2a8c09-dirty #1499 [ 1110.296348] task: ffff880152650000 ti: ffff8804c3560000 task.ti: ffff8804c3560000 [ 1110.296357] RIP: __memcpy (arch/x86/lib/memcpy_64.S:151) [ 1110.296360] RSP: 0000:ffff8804c3563870 EFLAGS: 00010246 [ 1110.296363] RAX: 0000000000000000 RBX: ffffe8fff3c4a809 RCX: 0000000000000000 [ 1110.296366] RDX: 0000000000000008 RSI: ffffffff9e254040 RDI: 0000000000000000 [ 1110.296369] RBP: ffff8804c3563908 R08: 0000000000ffffff R09: 0000000000ffffff [ 1110.296371] R10: 0000000000000000 R11: 0000000000000006 R12: 0000000000000000 [ 1110.296375] R13: 0000000000000000 R14: ffffffff9e254040 R15: ffffe8fff3c4a809 [ 1110.296379] FS: 00007f9e43b0b700(0000) GS:ffff880107e00000(0000) knlGS:0000000000000000 [ 1110.296382] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 1110.296385] CR2: 0000000000000000 CR3: 00000004e4334000 CR4: 00000000000006a0 [ 1110.296400] Stack: [ 1110.296406] ffffffff81b1e46c 0000000000000000 ffff880107e03fb8 000000000000000b [ 1110.296413] ffff880107dfffc0 ffff880107e03fc0 0000000000000008 ffffffff93f2e9c8 [ 1110.296419] 0000000000000000 ffffda0020fc07f7 0000000000000008 ffff8804c3563901 [ 1110.296420] Call Trace: [ 1110.296429] ? memcpy (mm/kasan/kasan.c:275) [ 1110.296437] ? arch_trigger_all_cpu_backtrace (include/linux/bitmap.h:215 include/linux/cpumask.h:506 arch/x86/kernel/apic/hw_nmi.c:76) [ 1110.296444] arch_trigger_all_cpu_backtrace (include/linux/bitmap.h:215 include/linux/cpumask.h:506 arch/x86/kernel/apic/hw_nmi.c:76) [ 1110.296451] ? dump_stack (./arch/x86/include/asm/preempt.h:95 lib/dump_stack.c:55) [ 1110.296458] do_raw_spin_lock (./arch/x86/include/asm/spinlock.h:86 kernel/locking/spinlock_debug.c:130 kernel/locking/spinlock_debug.c:137) [ 1110.296468] _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151) [ 1110.296474] ? __page_check_address (include/linux/spinlock.h:309 mm/rmap.c:630) [ 1110.296481] __page_check_address (include/linux/spinlock.h:309 mm/rmap.c:630) [ 1110.296487] ? preempt_count_sub (kernel/sched/core.c:2615) [ 1110.296493] try_to_unmap_one (include/linux/rmap.h:202 mm/rmap.c:1146) [ 1110.296504] ? anon_vma_interval_tree_iter_next (mm/interval_tree.c:72 mm/interval_tree.c:103) [ 1110.296514] rmap_walk (mm/rmap.c:1653 mm/rmap.c:1725) [ 1110.296521] ? page_get_anon_vma (include/linux/rcupdate.h:423 include/linux/rcupdate.h:935 mm/rmap.c:435) [ 1110.296530] try_to_unmap (mm/rmap.c:1545) [ 1110.296536] ? page_get_anon_vma (mm/rmap.c:437) [ 1110.296545] ? try_to_unmap_nonlinear (mm/rmap.c:1138) [ 1110.296551] ? SyS_msync (mm/rmap.c:1501) [ 1110.296558] ? page_remove_rmap (mm/rmap.c:1409) [ 1110.296565] ? page_get_anon_vma (mm/rmap.c:448) [ 1110.296571] ? anon_vma_ctor (mm/rmap.c:1496) [ 1110.296579] migrate_pages (mm/migrate.c:913 mm/migrate.c:956 mm/migrate.c:1136) [ 1110.296586] ? _raw_spin_unlock_irq (./arch/x86/include/asm/preempt.h:95 include/linux/spinlock_api_smp.h:169 kernel/locking/spinlock.c:199) [ 1110.296593] ? buffer_migrate_lock_buffers (mm/migrate.c:1584) [ 1110.296601] ? handle_mm_fault (mm/memory.c:3163 mm/memory.c:3223 mm/memory.c:3336 mm/memory.c:3365) [ 1110.296607] migrate_misplaced_page (mm/migrate.c:1738) [ 1110.296614] handle_mm_fault (mm/memory.c:3170 mm/memory.c:3223 mm/memory.c:3336 mm/memory.c:3365) [ 1110.296623] __do_page_fault (arch/x86/mm/fault.c:1246) [ 1110.296630] ? vtime_account_user (kernel/sched/cputime.c:701) [ 1110.296638] ? get_parent_ip (kernel/sched/core.c:2559) [ 1110.296646] ? context_tracking_user_exit (kernel/context_tracking.c:144) [ 1110.296656] trace_do_page_fault (arch/x86/mm/fault.c:1329 include/linux/jump_label.h:114 include/linux/context_tracking_state.h:27 include/linux/context_tracking.h:45 arch/x86/mm/fault.c:1330) [ 1110.296664] do_async_page_fault (arch/x86/kernel/kvm.c:280) [ 1110.296670] async_page_fault (arch/x86/kernel/entry_64.S:1285) [ 1110.296755] Code: 08 4c 8b 54 16 f0 4c 8b 5c 16 f8 4c 89 07 4c 89 4f 08 4c 89 54 17 f0 4c 89 5c 17 f8 c3 90 83 fa 08 72 1b 4c 8b 06 4c 8b 4c 16 f8 <4c> 89 07 4c 89 4c 17 f8 c3 66 2e 0f 1f 84 00 00 00 00 00 83 fa All code ======== 0: 08 4c 8b 54 or %cl,0x54(%rbx,%rcx,4) 4: 16 (bad) 5: f0 4c 8b 5c 16 f8 lock mov -0x8(%rsi,%rdx,1),%r11 b: 4c 89 07 mov %r8,(%rdi) e: 4c 89 4f 08 mov %r9,0x8(%rdi) 12: 4c 89 54 17 f0 mov %r10,-0x10(%rdi,%rdx,1) 17: 4c 89 5c 17 f8 mov %r11,-0x8(%rdi,%rdx,1) 1c: c3 retq 1d: 90 nop 1e: 83 fa 08 cmp $0x8,%edx 21: 72 1b jb 0x3e 23: 4c 8b 06 mov (%rsi),%r8 26: 4c 8b 4c 16 f8 mov -0x8(%rsi,%rdx,1),%r9 2b:* 4c 89 07 mov %r8,(%rdi) <-- trapping instruction 2e: 4c 89 4c 17 f8 mov %r9,-0x8(%rdi,%rdx,1) 33: c3 retq 34: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 3b: 00 00 00 3e: 83 fa 00 cmp $0x0,%edx Code starting with the faulting instruction =========================================== 0: 4c 89 07 mov %r8,(%rdi) 3: 4c 89 4c 17 f8 mov %r9,-0x8(%rdi,%rdx,1) 8: c3 retq 9: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 10: 00 00 00 13: 83 fa 00 cmp $0x0,%edx [ 1110.296760] RIP __memcpy (arch/x86/lib/memcpy_64.S:151) [ 1110.296763] RSP <ffff8804c3563870> [ 1110.296765] CR2: 0000000000000000 Link: http://lkml.kernel.org/r/1416931560-10603-1-git-send-email-sasha.levin@oracle.com Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-25 14:15:30 -05:00
Dan Carpenter	5d1b3c98ec	crypto: sha-mb - remove a bogus NULL check This can't be NULL and we dereferenced it earlier. Smatch used to ignore these things where the pointer was obviously non-NULL but I've found that sometimes the intention was to check something else so we were maybe missing bugs. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2014-11-25 22:50:43 +08:00
Andy Lutomirski	7ddc6a2199	x86/asm/traps: Disable tracing and kprobes in fixup_bad_iret and sync_regs These functions can be executed on the int3 stack, so kprobes are dangerous. Tracing is probably a bad idea, too. Fixes: `b645af2d59` ("x86_64, traps: Rework bad_iret") Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: <stable@vger.kernel.org> # Backport as far back as it would apply Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/50e33d26adca60816f3ba968875801652507d0c4.1416870125.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-25 07:26:55 +01:00
Steven Rostedt (Red Hat)	62a207d748	ftrace/x86: Have static function tracing always test for function graph New updates to the ftrace generic code had ftrace_stub not always being called when ftrace is off. This causes the static tracer to always save and restore functions. But it also showed that when function tracing is running, the function graph tracer can not. We should always check to see if function graph tracing is running even if the function tracer is running too. The function tracer code is not the only one that uses the hook to function mcount. Cc: Markos Chandras <Markos.Chandras@imgtec.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-24 15:02:25 -05:00
Kees Cook	5d26a105b5	crypto: prefix module autoloading with "crypto-" This prefixes all crypto module loading with "crypto-" so we never run the risk of exposing module auto-loading to userspace via a crypto API, as demonstrated by Mathias Krause: https://lkml.org/lkml/2013/3/4/70 Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2014-11-24 22:43:57 +08:00
Andy Lutomirski	82975bc6a6	uprobes, x86: Fix _TIF_UPROBE vs _TIF_NOTIFY_RESUME x86 call do_notify_resume on paranoid returns if TIF_UPROBE is set but not on non-paranoid returns. I suspect that this is a mistake and that the code only works because int3 is paranoid. Setting _TIF_NOTIFY_RESUME in the uprobe code was probably a workaround for the x86 bug. With that bug fixed, we can remove _TIF_NOTIFY_RESUME from the uprobes code. Reported-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-11-23 14:25:28 -08:00
Linus Torvalds	00c89b2f11	Merge branch 'x86-traps' (trap handling from Andy Lutomirski) Merge x86-64 iret fixes from Andy Lutomirski: "This addresses the following issues: - an unrecoverable double-fault triggerable with modify_ldt. - invalid stack usage in espfix64 failed IRET recovery from IST context. - invalid stack usage in non-espfix64 failed IRET recovery from IST context. It also makes a good but IMO scary change: non-espfix64 failed IRET will now report the correct error. Hopefully nothing depended on the old incorrect behavior, but maybe Wine will get confused in some obscure corner case" * emailed patches from Andy Lutomirski <luto@amacapital.net>: x86_64, traps: Rework bad_iret x86_64, traps: Stop using IST for #SS x86_64, traps: Fix the espfix64 #DF fixup and rewrite it in C	2014-11-23 13:56:55 -08:00
Andy Lutomirski	b645af2d59	x86_64, traps: Rework bad_iret It's possible for iretq to userspace to fail. This can happen because of a bad CS, SS, or RIP. Historically, we've handled it by fixing up an exception from iretq to land at bad_iret, which pretends that the failed iret frame was really the hardware part of #GP(0) from userspace. To make this work, there's an extra fixup to fudge the gs base into a usable state. This is suboptimal because it loses the original exception. It's also buggy because there's no guarantee that we were on the kernel stack to begin with. For example, if the failing iret happened on return from an NMI, then we'll end up executing general_protection on the NMI stack. This is bad for several reasons, the most immediate of which is that general_protection, as a non-paranoid idtentry, will try to deliver signals and/or schedule from the wrong stack. This patch throws out bad_iret entirely. As a replacement, it augments the existing swapgs fudge into a full-blown iret fixup, mostly written in C. It's should be clearer and more correct. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-11-23 13:56:19 -08:00
Andy Lutomirski	6f442be2fb	x86_64, traps: Stop using IST for #SS On a 32-bit kernel, this has no effect, since there are no IST stacks. On a 64-bit kernel, #SS can only happen in user code, on a failed iret to user space, a canonical violation on access via RSP or RBP, or a genuine stack segment violation in 32-bit kernel code. The first two cases don't need IST, and the latter two cases are unlikely fatal bugs, and promoting them to double faults would be fine. This fixes a bug in which the espfix64 code mishandles a stack segment violation. This saves 4k of memory per CPU and a tiny bit of code. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-11-23 13:56:19 -08:00
Andy Lutomirski	af726f21ed	x86_64, traps: Fix the espfix64 #DF fixup and rewrite it in C There's nothing special enough about the espfix64 double fault fixup to justify writing it in assembly. Move it to C. This also fixes a bug: if the double fault came from an IST stack, the old asm code would return to a partially uninitialized stack frame. Fixes: `3891a04aaf` Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-11-23 13:56:18 -08:00
Chris Clayton	e2e68ae688	x86: Use $(OBJDUMP) instead of plain objdump commit `e6023367d7` 'x86, kaslr: Prevent .bss from overlaping initrd' broke the cross compile of x86. It added a objdump invocation, which invokes the host native objdump and ignores an active cross tool chain. Use $(OBJDUMP) instead which takes the CROSS_COMPILE prefix into account. [ tglx: Massage changelog and use $(OBJDUMP) ] Fixes: `e6023367d7` 'x86, kaslr: Prevent .bss from overlaping initrd' Signed-off-by: Chris Clayton <chris2553@googlemail.com> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Borislav Petkov <bp@suse.de> Cc: Junjie Mao <eternal.n08@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/54705C8E.1080400@googlemail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-23 21:21:53 +01:00
Thomas Gleixner	280510f106	PCI/MSI: Rename mask/unmask_msi_irq treewide The PCI/MSI irq chip callbacks mask/unmask_msi_irq have been renamed to pci_msi_mask/unmask_irq to mark them PCI specific. Rename all usage sites. The conversion helper functions are kept around to avoid conflicts in next and will be removed after merging into mainline. Coccinelle assisted conversion. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: x86@kernel.org Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Jason Cooper <jason@lakedaemon.net> Cc: Murali Karicheri <m-karicheri2@ti.com> Cc: Thierry Reding <thierry.reding@gmail.com> Cc: Mohit Kumar <mohit.kumar@st.com> Cc: Simon Horman <horms@verge.net.au> Cc: Michal Simek <michal.simek@xilinx.com> Cc: Yijing Wang <wangyijing@huawei.com>	2014-11-23 13:01:45 +01:00
Jiang Liu	83a18912b0	PCI/MSI: Rename write_msi_msg() to pci_write_msi_msg() Rename write_msi_msg() to pci_write_msi_msg() to mark it as PCI specific. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Grant Likely <grant.likely@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Yingjoe Chen <yingjoe.chen@mediatek.com> Cc: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-23 13:01:45 +01:00
Jiang Liu	891d4a48f7	PCI/MSI: Rename __read_msi_msg() to __pci_read_msi_msg() Rename __read_msi_msg() to __pci_read_msi_msg() and kill unused read_msi_msg(). It's a preparation to separate generic MSI code from PCI core. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Grant Likely <grant.likely@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Yingjoe Chen <yingjoe.chen@mediatek.com> Cc: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-23 13:01:45 +01:00
Linus Torvalds	c6c9161d06	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "Misc fixes: - gold linker build fix - noxsave command line parsing fix - bugfix for NX setup - microcode resume path bug fix - _TIF_NOHZ versus TIF_NOHZ bugfix as discussed in the mysterious lockup thread" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, syscall: Fix _TIF_NOHZ handling in syscall_trace_enter_phase1 x86, kaslr: Handle Gold linker for finding bss/brk x86, mm: Set NX across entire PMD at boot x86, microcode: Update BSPs microcode on resume x86: Require exact match for 'noxsave' command line option	2014-11-21 15:46:17 -08:00
Linus Torvalds	13f5004c94	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Misc fixes: two Intel uncore driver fixes, a CPU-hotplug fix and a build dependencies fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/uncore: Fix boot crash on SBOX PMU on Haswell-EP perf/x86/intel/uncore: Fix IRP uncore register offsets on Haswell EP perf: Fix corruption of sibling list with hotplug perf/x86: Fix embarrasing typo	2014-11-21 15:44:07 -08:00
Rafael J. Wysocki	0a924200ae	Merge back earlier cpuidle material for 3.19-rc1. Conflicts: drivers/cpuidle/dt_idle_states.c	2014-11-21 16:31:42 +01:00
Andy Lutomirski	b5e212a305	x86, syscall: Fix _TIF_NOHZ handling in syscall_trace_enter_phase1 TIF_NOHZ is 19 (i.e. _TIF_SYSCALL_TRACE \| _TIF_NOTIFY_RESUME \| _TIF_SINGLESTEP), not (1<<19). This code is involved in Dave's trinity lockup, but I don't see why it would cause any of the problems he's seeing, except inadvertently by causing a different path through entry_64.S's syscall handling. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Don Zickus <dzickus@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Dave Jones <davej@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/a6cd3b60a3f53afb6e1c8081b0ec30ff19003dd7.1416434075.git.luto@amacapital.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-20 23:01:53 +01:00
Masami Hiramatsu	a017784f1b	kprobes/ftrace: Recover original IP if pre_handler doesn't change it Recover original IP register if the pre_handler doesn't change it. Since current kprobes doesn't expect that another ftrace handler may change regs->ip, it sets kprobe.addr + MCOUNT_INSN_SIZE to regs->ip and returns to ftrace. This seems wrong behavior since kprobes can recover regs->ip and safely pass it to another handler. This adds code which recovers original regs->ip passed from ftrace right before returning to ftrace, so that another ftrace user can change regs->ip. Link: http://lkml.kernel.org/r/20141009130106.4698.26362.stgit@kbuild-f20.novalocal Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-20 11:42:48 -05:00
Steven Rostedt (Red Hat)	a9edc88093	x86/nmi: Perform a safe NMI stack trace on all CPUs When trigger_all_cpu_backtrace() is called on x86, it will trigger an NMI on each CPU and call show_regs(). But this can lead to a hard lock up if the NMI comes in on another printk(). In order to avoid this, when the NMI triggers, it switches the printk routine for that CPU to call a NMI safe printk function that records the printk in a per_cpu seq_buf descriptor. After all NMIs have finished recording its data, the seq_bufs are printed in a safe context. Link: http://lkml.kernel.org/p/20140619213952.360076309@goodmis.org Link: http://lkml.kernel.org/r/20141115050605.055232587@goodmis.org Tested-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 22:01:21 -05:00
Steven Rostedt (Red Hat)	467aa1f276	x86/kvm/tracing: Use helper function trace_seq_buffer_ptr() To allow for the restructiong of the trace_seq code, we need users of it to use the helper functions instead of accessing the internals of the trace_seq structure itself. Link: http://lkml.kernel.org/r/20141104160221.585025609@goodmis.org Tested-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Mark Rustad <mark.d.rustad@intel.com> Reviewed-by: Petr Mladek <pmladek@suse.cz> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 15:25:36 -05:00
Steven Rostedt (Red Hat)	aec0be2d6e	ftrace/x86/extable: Add is_ftrace_trampoline() function Stack traces that happen from function tracing check if the address on the stack is a __kernel_text_address(). That is, is the address kernel code. This calls core_kernel_text() which returns true if the address is part of the builtin kernel code. It also calls is_module_text_address() which returns true if the address belongs to module code. But what is missing is ftrace dynamically allocated trampolines. These trampolines are allocated for individual ftrace_ops that call the ftrace_ops callback functions directly. But if they do a stack trace, the code checking the stack wont detect them as they are neither core kernel code nor module address space. Adding another field to ftrace_ops that also stores the size of the trampoline assigned to it we can create a new function called is_ftrace_trampoline() that returns true if the address is a dynamically allocate ftrace trampoline. Note, it ignores trampolines that are not dynamically allocated as they will return true with the core_kernel_text() function. Link: http://lkml.kernel.org/r/20141119034829.497125839@goodmis.org Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 15:25:26 -05:00
Steven Rostedt (Red Hat)	9960efeb80	ftrace/x86: Add frames pointers to trampoline as necessary When CONFIG_FRAME_POINTERS are enabled, it is required that the ftrace_caller and ftrace_regs_caller trampolines set up frame pointers otherwise a stack trace from a function call wont print the functions that called the trampoline. This is due to a check in __save_stack_address(): #ifdef CONFIG_FRAME_POINTER if (!reliable) return; #endif The "reliable" variable is only set if the function address is equal to contents of the address before the address the frame pointer register points to. If the frame pointer is not set up for the ftrace caller then this will fail the reliable test. It will miss the function that called the trampoline. Worse yet, if fentry is used (gcc 4.6 and beyond), it will also miss the parent, as the fentry is called before the stack frame is set up. That means the bp frame pointer points to the stack of just before the parent function was called. Link: http://lkml.kernel.org/r/20141119034829.355440340@goodmis.org Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: stable@vger.kernel.org # 3.7+ Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-11-19 15:24:31 -05:00
Chen Yucong	fa92c58694	x86, mce: Support memory error recovery for both UCNA and Deferred error in machine_check_poll Uncorrected no action required (UCNA) - is a uncorrected recoverable machine check error that is not signaled via a machine check exception and, instead, is reported to system software as a corrected machine check error. UCNA errors indicate that some data in the system is corrupted, but the data has not been consumed and the processor state is valid and you may continue execution on this processor. UCNA errors require no action from system software to continue execution. Note that UCNA errors are supported by the processor only when IA32_MCG_CAP[24] (MCG_SER_P) is set. -- Intel SDM Volume 3B Deferred errors are errors that cannot be corrected by hardware, but do not cause an immediate interruption in program flow, loss of data integrity, or corruption of processor state. These errors indicate that data has been corrupted but not consumed. Hardware writes information to the status and address registers in the corresponding bank that identifies the source of the error if deferred errors are enabled for logging. Deferred errors are not reported via machine check exceptions; they can be seen by polling the MCi_STATUS registers. -- AMD64 APM Volume 2 Above two items, both UCNA and Deferred errors belong to detected errors, but they can't be corrected by hardware, and this is very similar to Software Recoverable Action Optional (SRAO) errors. Therefore, we can take some actions that have been used for handling SRAO errors to handle UCNA and Deferred errors. Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Chen Yucong <slaoub@gmail.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2014-11-19 10:56:51 -08:00
Chen Yucong	e3480271f5	x86, mce, severity: Extend the the mce_severity mechanism to handle UCNA/DEFERRED error Until now, the mce_severity mechanism can only identify the severity of UCNA error as MCE_KEEP_SEVERITY. Meanwhile, it is not able to filter out DEFERRED error for AMD platform. This patch extends the mce_severity mechanism for handling UCNA/DEFERRED error. In order to do this, the patch introduces a new severity level - MCE_UCNA/DEFERRED_SEVERITY. In addition, mce_severity is specific to machine check exception, and it will check MCIP/EIPV/RIPV bits. In order to use mce_severity mechanism in non-exception context, the patch also introduces a new argument (is_excp) for mce_severity. `is_excp' is used to explicitly specify the calling context of mce_severity. Reviewed-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Signed-off-by: Chen Yucong <slaoub@gmail.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2014-11-19 10:55:43 -08:00
Al Viro	a455589f18	assorted conversions to %p[dD] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-11-19 13:01:20 -05:00
Dave Hansen	a1ea1c032b	x86: Cleanly separate use of asm-generic/mm_hooks.h asm-generic/mm_hooks.h provides some generic fillers for the 90% of architectures that do not need to hook some mmap-manipulation functions. A comment inside says: > Define generic no-op hooks for arch_dup_mmap and > arch_exit_mmap, to be included in asm-FOO/mmu_context.h > for any arch FOO which doesn't need to hook these. So, does x86 need to hook these? It depends on CONFIG_PARAVIRT. We conditionally include this generic header if we have CONFIG_PARAVIRT=n. That's madness. With this patch, x86 stops using asm-generic/mmu_hooks.h entirely. We use our own copies of the functions. The paravirt code provides some stubs if it is disabled, and we always call those stubs in our x86-private versions of arch_exit_mmap() and arch_dup_mmap(). Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Hansen <dave@sr71.net> Cc: x86@kernel.org Link: http://lkml.kernel.org/r/20141118182349.14567FA5@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-19 11:54:13 +01:00
Dave Hansen	68c009c413	x86 mpx: Change return type of get_reg_offset() get_reg_offset() used to return the register contents themselves instead of the register offset. When it did that, it was an unsigned long. I changed it to return an integer _offset_ instead of the register. But, I neglected to change the return type of the function or the variables in which we store the result of the call. This fixes up the code to clear up the warnings from the smatch bot: New smatch warnings: arch/x86/mm/mpx.c:178 mpx_get_addr_ref() warn: unsigned 'addr_offset' is never less than zero. arch/x86/mm/mpx.c:184 mpx_get_addr_ref() warn: unsigned 'base_offset' is never less than zero. arch/x86/mm/mpx.c:188 mpx_get_addr_ref() warn: unsigned 'indx_offset' is never less than zero. arch/x86/mm/mpx.c:196 mpx_get_addr_ref() warn: unsigned 'addr_offset' is never less than zero. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Hansen <dave@sr71.net> Cc: x86@kernel.org Link: http://lkml.kernel.org/r/20141118182343.C3E0C629@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-19 11:54:12 +01:00
Kees Cook	70b61e3621	x86, kaslr: Handle Gold linker for finding bss/brk When building with the Gold linker, the .bss and .brk areas of vmlinux are shown as consecutive instead of having the same file offset. Allow for either state, as long as things add up correctly. Fixes: `e6023367d7` ("x86, kaslr: Prevent .bss from overlaping initrd") Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Junjie Mao <eternal.n08@gmail.com> Link: http://lkml.kernel.org/r/20141118001604.GA25045@www.outflux.net Cc: stable@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-18 18:32:24 +01:00
Kees Cook	45e2a9d470	x86, mm: Set NX across entire PMD at boot When setting up permissions on kernel memory at boot, the end of the PMD that was split from bss remained executable. It should be NX like the rest. This performs a PMD alignment instead of a PAGE alignment to get the correct span of memory. Before: ---[ High Kernel Mapping ]--- ... 0xffffffff8202d000-0xffffffff82200000 1868K RW GLB NX pte 0xffffffff82200000-0xffffffff82c00000 10M RW PSE GLB NX pmd 0xffffffff82c00000-0xffffffff82df5000 2004K RW GLB NX pte 0xffffffff82df5000-0xffffffff82e00000 44K RW GLB x pte 0xffffffff82e00000-0xffffffffc0000000 978M pmd After: ---[ High Kernel Mapping ]--- ... 0xffffffff8202d000-0xffffffff82200000 1868K RW GLB NX pte 0xffffffff82200000-0xffffffff82e00000 12M RW PSE GLB NX pmd 0xffffffff82e00000-0xffffffffc0000000 978M pmd [ tglx: Changed it to roundup(_brk_end, PMD_SIZE) and added a comment. We really should unmap the reminder along with the holes caused by init,initdata etc. but thats a different issue ] Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Wang Nan <wangnan0@huawei.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20141114194737.GA3091@www.outflux.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-18 18:32:24 +01:00
Borislav Petkov	fb86b97300	x86, microcode: Update BSPs microcode on resume In the situation when we apply early microcode but do not apply late microcode, we fail to update the BSP's microcode on resume because we haven't initialized the uci->mc microcode pointer. So, in order to alleviate that, we go and dig out the stashed microcode patch during early boot. It is basically the same thing that is done on the APs early during boot so do that too here. Tested-by: alex.schnaidt@gmail.com Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=88001 Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: <stable@vger.kernel.org> # v3.9 Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/20141118094657.GA6635@pd.tnic Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-18 18:32:24 +01:00
Rafael J. Wysocki	bd2a0f6754	Merge back cpufreq material for 3.19-rc1.	2014-11-18 01:22:29 +01:00
Dave Hansen	1de4fa14ee	x86, mpx: Cleanup unused bound tables The previous patch allocates bounds tables on-demand. As noted in an earlier description, these can add up to HUGE amounts of memory. This has caused OOMs in practice when running tests. This patch adds support for freeing bounds tables when they are no longer in use. There are two types of mappings in play when unmapping tables: 1. The mapping with the actual data, which userspace is munmap()ing or brk()ing away, etc... 2. The mapping for the bounds table backing the data (is tagged with VM_MPX, see the patch "add MPX specific mmap interface"). If userspace use the prctl() indroduced earlier in this patchset to enable the management of bounds tables in kernel, when it unmaps the first type of mapping with the actual data, the kernel needs to free the mapping for the bounds table backing the data. This patch hooks in at the very end of do_unmap() to do so. We look at the addresses being unmapped and find the bounds directory entries and tables which cover those addresses. If an entire table is unused, we clear associated directory entry and free the table. Once we unmap the bounds table, we would have a bounds directory entry pointing at empty address space. That address space might now be allocated for some other (random) use, and the MPX hardware might now try to walk it as if it were a bounds table. That would be bad. So any unmapping of an enture bounds table has to be accompanied by a corresponding write to the bounds directory entry to invalidate it. That write to the bounds directory can fault, which causes the following problem: Since we are doing the freeing from munmap() (and other paths like it), we hold mmap_sem for write. If we fault, the page fault handler will attempt to acquire mmap_sem for read and we will deadlock. To avoid the deadlock, we pagefault_disable() when touching the bounds directory entry and use a get_user_pages() to resolve the fault. The unmapping of bounds tables happends under vm_munmap(). We also (indirectly) call vm_munmap() to _do_ the unmapping of the bounds tables. We avoid unbounded recursion by disallowing freeing of bounds tables for bounds tables. This would not occur normally, so should not have any practical impact. Being strict about it here helps ensure that we do not have an exploitable stack overflow. Based-on-patch-by: Qiaowei Ren <qiaowei.ren@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: linux-mm@kvack.org Cc: linux-mips@linux-mips.org Cc: Dave Hansen <dave@sr71.net> Link: http://lkml.kernel.org/r/20141114151831.E4531C4A@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-18 00:58:54 +01:00
Dave Hansen	fe3d197f84	x86, mpx: On-demand kernel allocation of bounds tables This is really the meat of the MPX patch set. If there is one patch to review in the entire series, this is the one. There is a new ABI here and this kernel code also interacts with userspace memory in a relatively unusual manner. (small FAQ below). Long Description: This patch adds two prctl() commands to provide enable or disable the management of bounds tables in kernel, including on-demand kernel allocation (See the patch "on-demand kernel allocation of bounds tables") and cleanup (See the patch "cleanup unused bound tables"). Applications do not strictly need the kernel to manage bounds tables and we expect some applications to use MPX without taking advantage of this kernel support. This means the kernel can not simply infer whether an application needs bounds table management from the MPX registers. The prctl() is an explicit signal from userspace. PR_MPX_ENABLE_MANAGEMENT is meant to be a signal from userspace to require kernel's help in managing bounds tables. PR_MPX_DISABLE_MANAGEMENT is the opposite, meaning that userspace don't want kernel's help any more. With PR_MPX_DISABLE_MANAGEMENT, the kernel won't allocate and free bounds tables even if the CPU supports MPX. PR_MPX_ENABLE_MANAGEMENT will fetch the base address of the bounds directory out of a userspace register (bndcfgu) and then cache it into a new field (->bd_addr) in the 'mm_struct'. PR_MPX_DISABLE_MANAGEMENT will set "bd_addr" to an invalid address. Using this scheme, we can use "bd_addr" to determine whether the management of bounds tables in kernel is enabled. Also, the only way to access that bndcfgu register is via an xsaves, which can be expensive. Caching "bd_addr" like this also helps reduce the cost of those xsaves when doing table cleanup at munmap() time. Unfortunately, we can not apply this optimization to #BR fault time because we need an xsave to get the value of BNDSTATUS. ==== Why does the hardware even have these Bounds Tables? ==== MPX only has 4 hardware registers for storing bounds information. If MPX-enabled code needs more than these 4 registers, it needs to spill them somewhere. It has two special instructions for this which allow the bounds to be moved between the bounds registers and some new "bounds tables". They are similar conceptually to a page fault and will be raised by the MPX hardware during both bounds violations or when the tables are not present. This patch handles those #BR exceptions for not-present tables by carving the space out of the normal processes address space (essentially calling the new mmap() interface indroduced earlier in this patch set.) and then pointing the bounds-directory over to it. The tables need to be accessed and controlled by userspace because the instructions for moving bounds in and out of them are extremely frequent. They potentially happen every time a register pointing to memory is dereferenced. Any direct kernel involvement (like a syscall) to access the tables would obviously destroy performance. ==== Why not do this in userspace? ==== This patch is obviously doing this allocation in the kernel. However, MPX does not strictly require anything in the kernel. It can theoretically be done completely from userspace. Here are a few ways this could be done. I don't think any of them are practical in the real-world, but here they are. Q: Can virtual space simply be reserved for the bounds tables so that we never have to allocate them? A: As noted earlier, these tables are HUGE. An X-GB virtual area needs 4X GB of virtual space, plus 2GB for the bounds directory. If we were to preallocate them for the 128TB of user virtual address space, we would need to reserve 512TB+2GB, which is larger than the entire virtual address space today. This means they can not be reserved ahead of time. Also, a single process's pre-popualated bounds directory consumes 2GB of virtual AND* physical memory. IOW, it's completely infeasible to prepopulate bounds directories. Q: Can we preallocate bounds table space at the same time memory is allocated which might contain pointers that might eventually need bounds tables? A: This would work if we could hook the site of each and every memory allocation syscall. This can be done for small, constrained applications. But, it isn't practical at a larger scale since a given app has no way of controlling how all the parts of the app might allocate memory (think libraries). The kernel is really the only place to intercept these calls. Q: Could a bounds fault be handed to userspace and the tables allocated there in a signal handler instead of in the kernel? A: (thanks to tglx) mmap() is not on the list of safe async handler functions and even if mmap() would work it still requires locking or nasty tricks to keep track of the allocation state there. Having ruled out all of the userspace-only approaches for managing bounds tables that we could think of, we create them on demand in the kernel. Based-on-patch-by: Qiaowei Ren <qiaowei.ren@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: linux-mm@kvack.org Cc: linux-mips@linux-mips.org Cc: Dave Hansen <dave@sr71.net> Link: http://lkml.kernel.org/r/20141114151829.AD4310DE@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-18 00:58:53 +01:00
Dave Hansen	fcc7ffd679	x86, mpx: Decode MPX instruction to get bound violation information This patch sets bound violation fields of siginfo struct in #BR exception handler by decoding the user instruction and constructing the faulting pointer. We have to be very careful when decoding these instructions. They are completely controlled by userspace and may be changed at any time up to and including the point where we try to copy them in to the kernel. They may or may not be MPX instructions and could be completely invalid for all we know. Note: This code is based on Qiaowei Ren's specialized MPX decoder, but uses the generic decoder whenever possible. It was tested for robustness by generating a completely random data stream and trying to decode that stream. I also unmapped random pages inside the stream to test the "partial instruction" short read code. We kzalloc() the siginfo instead of stack allocating it because we need to memset() it anyway, and doing this makes it much more clear when it got initialized by the MPX instruction decoder. Changes from the old decoder: * Use the generic decoder instead of custom functions. Saved ~70 lines of code overall. * Remove insn->addr_bytes code (never used??) * Make sure never to possibly overflow the regoff[] array, plus check the register range correctly in 32 and 64-bit modes. * Allow get_reg() to return an error and have mpx_get_addr_ref() handle when it sees errors. * Only call insn_get_() near where we actually use the values instead if trying to call them all at once. Handle short reads from copy_from_user() and check the actual number of read bytes against what we expect from insn_get_length(). If a read stops in the middle of an instruction, we error out. * Actually check the opcodes intead of ignoring them. * Dynamically kzalloc() siginfo_t so we don't leak any stack data. * Detect and handle decoder failures instead of ignoring them. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Based-on-patch-by: Qiaowei Ren <qiaowei.ren@intel.com> Cc: linux-mm@kvack.org Cc: linux-mips@linux-mips.org Cc: Dave Hansen <dave@sr71.net> Link: http://lkml.kernel.org/r/20141114151828.5BDD0915@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-18 00:58:53 +01:00
Qiaowei Ren	57319d80e1	x86, mpx: Add MPX-specific mmap interface We have chosen to perform the allocation of bounds tables in kernel (See the patch "on-demand kernel allocation of bounds tables") and to mark these VMAs with VM_MPX. However, there is currently no suitable interface to actually do this. Existing interfaces, like do_mmap_pgoff(), have no way to set a modified ->vm_ops or ->vm_flags and don't hold mmap_sem long enough to let a caller do it. This patch wraps mmap_region() and hold mmap_sem long enough to make the modifications to the VMA which we need. Also note the 32/64-bit #ifdef in the header. We actually need to do this at runtime eventually. But, for now, we don't support running 32-bit binaries on 64-bit kernels. Support for this will come in later patches. Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: linux-mm@kvack.org Cc: linux-mips@linux-mips.org Cc: Dave Hansen <dave@sr71.net> Link: http://lkml.kernel.org/r/20141114151827.CE440F67@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-18 00:58:53 +01:00
Dave Hansen	95290cf13e	x86, mpx: Add MPX to disabled features This allows us to use cpu_feature_enabled(X86_FEATURE_MPX) as both a runtime and compile-time check. When CONFIG_X86_INTEL_MPX is disabled, cpu_feature_enabled(X86_FEATURE_MPX) will evaluate at compile-time to 0. If CONFIG_X86_INTEL_MPX=y, then the cpuid flag will be checked at runtime. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Qiaowei Ren <qiaowei.ren@intel.com> Cc: linux-mm@kvack.org Cc: linux-mips@linux-mips.org Cc: Dave Hansen <dave@sr71.net> Link: http://lkml.kernel.org/r/20141114151823.B358EAD2@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-18 00:58:53 +01:00
Dave Hansen	62e7759b1b	x86, mpx: Rename cfg_reg_u and status_reg According to Intel SDM extension, MPX configuration and status registers should be BNDCFGU and BNDSTATUS. This patch renames cfg_reg_u and status_reg to bndcfgu and bndstatus. [ tglx: Renamed 'struct bndscr_struct' to 'struct bndscr' ] Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: linux-mm@kvack.org Cc: linux-mips@linux-mips.org Cc: Dave Hansen <dave@sr71.net> Cc: Qiaowei Ren <qiaowei.ren@intel.com> Link: http://lkml.kernel.org/r/20141114151817.031762AC@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-18 00:58:53 +01:00
Dave Hansen	c04e051ccc	x86: mpx: Give bndX registers actual names Consider the bndX MPX registers. There 4 registers each containing a 64-bit lower and a 64-bit upper bound. That's 864 bits and we declare it thusly: struct bndregs_struct { u64 bndregs[8]; } Let's say you want to read the upper bound from the MPX register bnd2 out of the xsave buf. You do: bndregno = 2; upper_bound = xsave_buf->bndregs.bndregs[2bndregno+1]; That kinda sucks. Every time you access it, you need to know: 1. Each bndX register is two entries wide in "bndregs" 2. The lower comes first followed by upper. We do the +1 to get upper vs. lower. This replaces the old definition. You can now access them indexed by the register number directly, and with a meaningful name for the lower and upper bound: bndregno = 2; xsave_buf->bndreg[bndregno].upper_bound; It's now VERY clear that there are 4 registers. The programmer now doesn't have to care what order the lower and upper bounds are in, and it's harder to get it wrong. [ tglx: Changed ub/lb to upper_bound/lower_bound and renamed struct bndreg_struct to struct bndreg ] Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: x86@kernel.org Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Qiaowei Ren <qiaowei.ren@intel.com> Cc: "Yu, Fenghua" <fenghua.yu@intel.com> Cc: Dave Hansen <dave@sr71.net> Link: http://lkml.kernel.org/r/20141031215820.5EA5E0EC@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-18 00:58:52 +01:00
Dave Hansen	6ba48ff46f	x86: Remove arbitrary instruction size limit in instruction decoder The current x86 instruction decoder steps along through the instruction stream but always ensures that it never steps farther than the largest possible instruction size (MAX_INSN_SIZE). The MPX code is now going to be doing some decoding of userspace instructions. We copy those from userspace in to the kernel and they're obviously completely untrusted coming from userspace. In addition to the constraint that instructions can only be so long, we also have to be aware of how long the buffer is that came in from userspace. This _looks_ to be similar to what the perf and kprobes is doing, but it's unclear to me whether they are affected. The whole reason we need this is that it is perfectly valid to be executing an instruction within MAX_INSN_SIZE bytes of an unreadable page. We should be able to gracefully handle short reads in those cases. This adds support to the decoder to record how long the buffer being decoded is and to refuse to "validate" the instruction if we would have gone over the end of the buffer to decode it. The kprobes code probably needs to be looked at here a bit more carefully. This patch still respects the MAX_INSN_SIZE limit there but the kprobes code does look like it might be able to be a bit more strict than it currently is. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Jim Keniston <jkenisto@us.ibm.com> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: x86@kernel.org Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Link: http://lkml.kernel.org/r/20141114153957.E6B01535@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-18 00:58:52 +01:00
Linus Torvalds	de55bbbff2	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Microcode fixes, a Xen fix and a KASLR boot loading fix with certain memory layouts" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, microcode, AMD: Fix ucode patch stashing on 32-bit x86/core, x86/xen/smp: Use 'die_complete' completion when taking CPU down x86, microcode: Fix accessing dis_ucode_ldr on 32-bit x86, kaslr: Prevent .bss from overlaping initrd x86, microcode, AMD: Fix early ucode loading on 32-bit	2014-11-16 11:19:25 -08:00
Linus Torvalds	3b91270a0a	x86-64: make csum_partial_copy_from_user() error handling consistent Al Viro pointed out that the x86-64 csum_partial_copy_from_user() is somewhat confused about what it should do on errors, notably it mostly clears the uncopied end result buffer, but misses that for the initial alignment case. All users should check for errors, so it's dubious whether the clearing is even necessary, and Al also points out that we should probably clean up the calling conventions, but regardless of any future changes to this function, the fact that it is inconsistent is just annoying. So make the __get_user() failure path use the same error exit as all the other errors do. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Cc: David Miller <davem@davemloft.net> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-11-16 11:00:42 -08:00
Thomas Gleixner	0dbcae8847	x86: mm: Move PAT only functions to mm/pat.c Commit `e00c8cc93c` "x86: Use new cache mode type in memtype related functions" broke the ARCH=um build. arch/x86/include/asm/cacheflush.h:67:36: error: return type is an incomplete type static inline enum page_cache_mode get_page_memtype(struct page *pg) The reason is simple. get_page_memtype() and set_page_memtype() require enum page_cache_mode now, which is defined in asm/pgtable_types.h. UM does not include that file for obvious reasons. The simple solution is to move that functions to arch/x86/mm/pat.c where the only callsites of this are located. They should have been there in the first place. Fixes: `e00c8cc93c` "x86: Use new cache mode type in memtype related functions" Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Juergen Gross <jgross@suse.com> Cc: Richard Weinberger <richard@nod.at>	2014-11-16 18:59:19 +01:00
Dave Hansen	2cd3949f70	x86: Require exact match for 'noxsave' command line option We have some very similarly named command-line options: arch/x86/kernel/cpu/common.c:__setup("noxsave", x86_xsave_setup); arch/x86/kernel/cpu/common.c:__setup("noxsaveopt", x86_xsaveopt_setup); arch/x86/kernel/cpu/common.c:__setup("noxsaves", x86_xsaves_setup); __setup() is designed to match options that take arguments, like "foo=bar" where you would have: __setup("foo", x86_foo_func...); The problem is that "noxsave" actually _matches_ "noxsaves" in the same way that "foo" matches "foo=bar". If you boot an old kernel that does not know about "noxsaves" with "noxsaves" on the command line, it will interpret the argument as "noxsave", which is not what you want at all. This makes the "noxsave" handler only return success when it finds an exact match. [ tglx: We really need to make __setup() more robust. ] Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Hansen <dave@sr71.net> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: x86@kernel.org Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20141111220133.FE053984@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-11-16 12:13:16 +01:00
Stephane Eranian	aea48559ac	perf/x86: Add support for sampling PEBS machine state registers PEBS can capture machine state regs at retiremnt of the sampled instructions. When precise sampling is enabled on an event, PEBS is used, so substitute the interrupted state with the PEBS state. Note that not all registers are captured by PEBS. Those missing are replaced by the interrupt state counter-parts. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/1411559322-16548-3-git-send-email-eranian@google.com Cc: cebbert.lkml@gmail.com Cc: jolsa@redhat.com Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-16 11:41:58 +01:00
Andi Kleen	af4bdcf675	perf/x86/intel: Disallow flags for most Core2/Atom/Nehalem/Westmere events Disallow setting inv/cmask/etc. flags for all PEBS events on these CPUs, except for the UOPS_RETIRED.* events on Nehalem/Westmere, which are needed for cycles:p. This avoids an undefined situation strongly discouraged by the Intle SDM. The PLD_* events were already covered. This follows the earlier changes for Sandy Bridge and alter. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Link: http://lkml.kernel.org/r/1411569288-5627-3-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-16 11:41:56 +01:00
Andi Kleen	0dbc94796d	perf/x86/intel: Use INTEL_FLAGS_UEVENT_CONSTRAINT for PRECDIST My earlier commit: `86a04461a9` ("perf/x86: Revamp PEBS event selection") made nearly all PEBS on Sandy/IvyBridge/Haswell to reject non zero flags. However this wasn't done for the INST_RETIRED.PREC_DIST event because no suitable macro existed. Now that we have INTEL_FLAGS_UEVENT_CONSTRAINT enforce zero flags for INST_RETIRED.PREC_DIST too. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Link: http://lkml.kernel.org/r/1411569288-5627-2-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-16 11:41:55 +01:00
Andi Kleen	7550ddffe4	perf/x86: Add INTEL_FLAGS_UEVENT_CONSTRAINT Add a FLAGS_UEVENT_CONSTRAINT macro that allows us to match on event+umask, and in additional all flags. This is needed to ensure the INV and CMASK fields are zero for specific events, as this can cause undefined behavior. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com> Cc: Mark Davies <junk@eslaf.co.uk> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1411569288-5627-1-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-16 11:41:54 +01:00
Andi Kleen	c0737ce453	perf/x86/intel/uncore: Add scaling units to the EP iMC events Add scaling to MB/s to the memory controller read/write events for Sandy/IvyBridge/Haswell-EP similar to how the client does. This makes the events easier to use from the standard perf tool. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Link: http://lkml.kernel.org/r/1415062828-19759-2-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-11-16 11:41:52 +01:00

1 2 3 4 5 ...

20376 commits