linux-stable/kernel
Peter Zijlstra 5d01e06329 locking/qspinlock, x86: Provide liveness guarantee
commit 7aa54be297 upstream.

On x86 we cannot do fetch_or() with a single instruction and thus end up
using a cmpxchg loop, this reduces determinism. Replace the fetch_or()
with a composite operation: tas-pending + load.

Using two instructions of course opens a window we previously did not
have. Consider the scenario:

	CPU0		CPU1		CPU2

 1)	lock
	  trylock -> (0,0,1)

 2)			lock
			  trylock /* fail */

 3)	unlock -> (0,0,0)

 4)					lock
					  trylock -> (0,0,1)

 5)			  tas-pending -> (0,1,1)
			  load-val <- (0,1,0) from 3

 6)			  clear-pending-set-locked -> (0,0,1)

			  FAIL: _2_ owners

where 5) is our new composite operation. When we consider each part of
the qspinlock state as a separate variable (as we can when
_Q_PENDING_BITS == 8) then the above is entirely possible, because
tas-pending will only RmW the pending byte, so the later load is able
to observe prior tail and lock state (but not earlier than its own
trylock, which operates on the whole word, due to coherence).

To avoid this we need 2 things:

 - the load must come after the tas-pending (obviously, otherwise it
   can trivially observe prior state).

 - the tas-pending must be a full word RmW instruction, it cannot be an XCHGB for
   example, such that we cannot observe other state prior to setting
   pending.

On x86 we can realize this by using "LOCK BTS m32, r32" for
tas-pending followed by a regular load.

Note that observing later state is not a problem:

 - if we fail to observe a later unlock, we'll simply spin-wait for
   that store to become visible.

 - if we observe a later xchg_tail(), there is no difference from that
   xchg_tail() having taken place before the tas-pending.

Suggested-by: Will Deacon <will.deacon@arm.com>
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: andrea.parri@amarulasolutions.com
Cc: longman@redhat.com
Fixes: 59fb586b4a ("locking/qspinlock: Remove unbounded cmpxchg() loop from locking slowpath")
Link: https://lkml.kernel.org/r/20181003130957.183726335@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bigeasy: GEN_BINARY_RMWcc macro redo]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-12-21 14:13:09 +01:00
..
bpf bpf: Prevent memory disambiguation attack 2018-12-05 19:41:10 +01:00
cgroup cgroup: Fix dom_cgrp propagation when enabling threaded mode 2018-10-18 09:16:24 +02:00
configs ANDROID: binder: add hwbinder,vndbinder to BINDER_DEVICES. 2017-08-22 18:43:23 -07:00
debug kdb: use memmove instead of overlapping memcpy 2018-12-08 13:03:36 +01:00
events uprobes: Fix handle_swbp() vs. unregister() + register() race once more 2018-12-08 13:03:36 +01:00
gcov License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
irq genirq: Fix race on spurious interrupt detection 2018-11-13 11:15:09 -08:00
livepatch livepatch: Validate module/old func name length 2018-09-09 19:55:58 +02:00
locking locking/qspinlock, x86: Provide liveness guarantee 2018-12-21 14:13:09 +01:00
power PM / sleep: wakeup: Fix build error caused by missing SRCU support 2018-09-09 19:55:58 +02:00
printk printk: Wake klogd when passing console_lock owner 2018-12-17 09:28:55 +01:00
rcu rcu: Make need_resched() respond to urgent RCU-QS needs 2018-12-01 09:43:00 +01:00
sched sched/smt: Expose sched_smt_present static key 2018-12-05 19:41:20 +01:00
time timer/debug: Change /proc/timer_list from 0444 to 0400 2018-12-21 14:13:04 +01:00
trace tracing: Fix memory leak of instance function hash filters 2018-12-21 14:13:06 +01:00
.gitignore
acct.c kernel/acct.c: fix the acct->needcheck check in check_free_space() 2018-01-10 09:31:17 +01:00
async.c kernel/async.c: revert "async: simplify lowest_in_progress()" 2018-02-16 20:23:05 +01:00
audit.c audit: return on memory error to avoid null pointer dereference 2018-05-30 07:52:39 +02:00
audit.h ipc: mqueue: Replace timespec with timespec64 2017-09-03 20:21:24 -04:00
audit_fsnotify.c
audit_tree.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
audit_watch.c audit: fix use-after-free in audit_add_watch 2018-09-26 08:38:09 +02:00
auditfilter.c audit: allow not equal op for audit by executable 2018-08-03 07:50:39 +02:00
auditsc.c audit: fix potential null dereference 'context->module.name' 2018-08-06 16:20:49 +02:00
backtracetest.c
bounds.c kbuild: fix kernel/bounds.c 'W=1' warning 2018-11-13 11:15:08 -08:00
capability.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
compat.c compat: fix 4-byte infoleak via uninitialized struct field 2018-05-16 10:10:26 +02:00
configs.c
context_tracking.c
cpu.c x86/speculation: Rework SMT state change 2018-12-05 19:41:20 +01:00
cpu_pm.c PM / CPU: replace raw_notifier with atomic_notifier 2017-07-31 13:09:49 +02:00
crash_core.c kdump: write correct address of mem_section into vmcoreinfo 2018-01-17 09:45:27 +01:00
crash_dump.c
cred.c
delayacct.c delayacct: Use raw_spinlocks 2018-08-03 07:50:38 +02:00
dma.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
elfcore.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
exec_domain.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
exit.c kernel/exit.c: export abort() to modules 2018-02-13 10:19:49 +01:00
extable.c extable: Enable RCU if it is not watching in kernel_text_address() 2017-09-23 16:50:20 -04:00
fork.c fork: don't copy inconsistent signal handler state to child 2018-09-15 09:45:27 +02:00
freezer.c
futex.c futex: Fix OWNER_DEAD fixup 2018-02-03 17:38:47 +01:00
futex_compat.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
groups.c kernel: make groups_sort calling a responsibility group_info allocators 2017-12-20 10:10:18 +01:00
hung_task.c kernel/hung_task.c: show all hung tasks before panic 2018-08-03 07:50:23 +02:00
irq_work.c
jump_label.c sched/core: Fix cpu.max vs. cpuhotplug deadlock 2018-12-05 19:41:17 +01:00
kallsyms.c kernel/kallsyms.c: replace all_var with IS_ENABLED(CONFIG_KALLSYMS_ALL) 2017-07-10 16:32:34 -07:00
kcmp.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c kcov: ensure irq code sees a valid area 2018-08-03 07:50:22 +02:00
kexec.c kdump: protect vmcoreinfo data under the crash memory 2017-07-12 16:26:00 -07:00
kexec_core.c x86/mm, kexec: Allow kexec to be used with SME 2017-07-18 11:38:04 +02:00
kexec_file.c kexec_file: adjust declaration of kexec_purgatory 2017-07-12 16:26:02 -07:00
kexec_internal.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
kmod.c kmod: move #ifdef CONFIG_MODULES wrapper to Makefile 2017-09-08 18:26:51 -07:00
kprobes.c kprobes: Return error if we fail to reuse kprobe instead of BUG_ON() 2018-11-13 11:14:55 -08:00
ksysfs.c kexec: move vmcoreinfo out of the kernel's .bss section 2017-07-12 16:25:59 -07:00
kthread.c kthread, tracing: Don't expose half-written comm when creating kthreads 2018-08-03 07:50:21 +02:00
latencytop.c
Makefile License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
memremap.c mm: disallow mappings that conflict for devm_memremap_pages() 2018-10-20 09:48:53 +02:00
module-internal.h
module.c module: exclude SHN_UNDEF symbols from kallsyms api 2018-10-03 17:00:53 -07:00
module_signing.c
notifier.c
nsproxy.c
padata.c
panic.c locking/refcounts, x86/asm: Implement fast refcount overflow protection 2017-08-17 10:40:26 +02:00
params.c kernel/params.c: improve STANDARD_PARAM_DEF readability 2017-10-03 17:54:26 -07:00
pid.c pids: make task_tgid_nr_ns() safe 2017-08-21 12:47:31 -07:00
pid_namespace.c userns,pidns: Verify the userns for new pid namespaces 2017-07-20 07:43:58 -05:00
profile.c
ptrace.c ptrace: Remove unused ptrace_may_access_sched() and MODE_IBRS 2018-12-05 19:41:21 +01:00
range.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
reboot.c
relay.c kernel/relay.c: limit kmalloc size to KMALLOC_MAX_SIZE 2018-05-30 07:52:00 +02:00
resource.c resource: fix integer overflow at reallocation 2018-04-24 09:36:22 +02:00
seccomp.c seccomp: Move speculation migitation control to arch code 2018-05-22 18:54:04 +02:00
signal.c signal: Guard against negative signal numbers in copy_siginfo_from_user32 2018-11-13 11:15:07 -08:00
smp.c cpu/hotplug: Fix SMT supported evaluation 2018-08-15 18:13:00 +02:00
smpboot.c watchdog/core, powerpc: Lock cpus across reconfiguration 2017-10-04 10:53:54 +02:00
smpboot.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
softirq.c Mark HI and TASKLET softirq synchronous 2018-08-15 18:12:47 +02:00
stacktrace.c
stop_machine.c stop_machine: Atomically queue and wake stopper threads 2018-09-05 09:26:36 +02:00
sys.c sys: don't hold uts_sem while accessing userspace memory 2018-09-09 19:56:00 +02:00
sys_ni.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
sysctl.c namei: allow restricted O_CREAT of FIFOs and regular files 2018-12-01 09:42:59 +01:00
sysctl_binary.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
task_work.c locking/barriers: Convert users of lockless_dereference() to READ_ONCE() 2017-12-25 14:26:21 +01:00
taskstats.c
test_kprobes.c
torture.c torture: Fix typo suppressing CPU-hotplug statistics 2017-07-25 13:04:45 -07:00
tracepoint.c tracepoint: Do not warn on ENOMEM 2018-05-09 09:51:50 +02:00
tsacct.c
ucount.c
uid16.c kernel: make groups_sort calling a responsibility group_info allocators 2017-12-20 10:10:18 +01:00
umh.c kmod: split out umh code into its own file 2017-09-08 18:26:50 -07:00
up.c smp: Avoid using two cache lines for struct call_single_data 2017-08-29 15:14:38 +02:00
user-return-notifier.c
user.c
user_namespace.c userns: move user access out of the mutex 2018-09-09 19:56:00 +02:00
utsname.c
utsname_sysctl.c sys: don't hold uts_sem while accessing userspace memory 2018-09-09 19:56:00 +02:00
watchdog.c watchdog: Mark watchdog touch functions as notrace 2018-09-05 09:26:42 +02:00
watchdog_hld.c watchdog: Mark watchdog touch functions as notrace 2018-09-05 09:26:42 +02:00
workqueue.c watchdog: Mark watchdog touch functions as notrace 2018-09-05 09:26:42 +02:00
workqueue_internal.h Merge branch 'for-4.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2017-11-06 12:26:49 -08:00