linux-stable/arch/x86/kernel/cpu
Peter Newman 3030f11f27 x86/resctrl: Fix task CLOSID/RMID update race
[ Upstream commit fe1f071438 ]

When the user moves a running task to a new rdtgroup using the task's
file interface or by deleting its rdtgroup, the resulting change in
CLOSID/RMID must be immediately propagated to the PQR_ASSOC MSR on the
task(s) CPUs.

x86 allows reordering loads with prior stores, so if the task starts
running between a task_curr() check that the CPU hoisted before the
stores in the CLOSID/RMID update then it can start running with the old
CLOSID/RMID until it is switched again because __rdtgroup_move_task()
failed to determine that it needs to be interrupted to obtain the new
CLOSID/RMID.

Refer to the diagram below:

CPU 0                                   CPU 1
-----                                   -----
__rdtgroup_move_task():
  curr <- t1->cpu->rq->curr
                                        __schedule():
                                          rq->curr <- t1
                                        resctrl_sched_in():
                                          t1->{closid,rmid} -> {1,1}
  t1->{closid,rmid} <- {2,2}
  if (curr == t1) // false
   IPI(t1->cpu)

A similar race impacts rdt_move_group_tasks(), which updates tasks in a
deleted rdtgroup.

In both cases, use smp_mb() to order the task_struct::{closid,rmid}
stores before the loads in task_curr().  In particular, in the
rdt_move_group_tasks() case, simply execute an smp_mb() on every
iteration with a matching task.

It is possible to use a single smp_mb() in rdt_move_group_tasks(), but
this would require two passes and a means of remembering which
task_structs were updated in the first loop. However, benchmarking
results below showed too little performance impact in the simple
approach to justify implementing the two-pass approach.

Times below were collected using `perf stat` to measure the time to
remove a group containing a 1600-task, parallel workload.

CPU: Intel(R) Xeon(R) Platinum P-8136 CPU @ 2.00GHz (112 threads)

  # mkdir /sys/fs/resctrl/test
  # echo $$ > /sys/fs/resctrl/test/tasks
  # perf bench sched messaging -g 40 -l 100000

task-clock time ranges collected using:

  # perf stat rmdir /sys/fs/resctrl/test

Baseline:                     1.54 - 1.60 ms
smp_mb() every matching task: 1.57 - 1.67 ms

  [ bp: Massage commit message. ]

Fixes: ae28d1aae4 ("x86/resctrl: Use an IPI instead of task_work_add() to update PQR_ASSOC MSR")
Fixes: 0efc89be94 ("x86/intel_rdt: Update task closid immediately on CPU in rmdir and unmount")
Signed-off-by: Peter Newman <peternewman@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20221220161123.432120-1-peternewman@google.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-18 11:42:05 +01:00
..
mce x86/mce: Deduplicate exception handling 2022-07-29 17:14:18 +02:00
microcode x86/microcode/intel: Do not retry microcode reloading on the APs 2023-01-18 11:41:48 +01:00
mtrr x86/smpboot: Move rcu_cpu_starting() earlier 2022-12-19 12:24:15 +01:00
resctrl x86/resctrl: Fix task CLOSID/RMID update race 2023-01-18 11:42:05 +01:00
.gitignore
acrn.c x86/acrn: Use HYPERVISOR_CALLBACK_VECTOR for ACRN guest upcall vector 2019-06-11 21:31:31 +02:00
amd.c x86/cpu: Restore AMD's DE_CFG MSR after resume 2022-11-25 17:42:11 +01:00
aperfmperf.c x86/cpu: Disable frequency requests via aperfmperf IPI for nohz_full CPUs 2019-06-22 17:23:48 +02:00
bugs.c x86/bugs: Flush IBP in ib_prctl_set() 2023-01-18 11:41:59 +01:00
cacheinfo.c drivers: base: cacheinfo: Get rid of DEFINE_SMP_CALL_CACHE_FUNCTION() 2021-09-26 14:07:10 +02:00
centaur.c
common.c x86/speculation: Add RSB VM Exit protections 2022-10-07 09:16:56 +02:00
cpu.h x86/cpu: Fix migration safety with X86_BUG_NULL_SEL 2021-11-17 09:48:19 +01:00
cpuid-deps.c x86/cpufeatures: Enable a new AVX512 CPU feature 2019-07-22 10:38:25 +02:00
cyrix.c x86/cpu/cyrix: Use correct macros for Cyrix calls on Geode processors 2019-03-21 12:28:50 +01:00
hygon.c x86/cpu: Restore AMD's DE_CFG MSR after resume 2022-11-25 17:42:11 +01:00
hypervisor.c x86/paravirt: Remove const mark from x86_hyper_xen_hvm variable 2019-07-17 08:09:59 +02:00
intel.c x86: Fix return value of __setup handlers 2022-06-14 18:11:35 +02:00
intel_epb.c x86: intel_epb: Do not build when CONFIG_PM is unset 2019-05-30 10:58:36 +02:00
intel_pconfig.c
Makefile x86/cpu: Add a "tsx=" cmdline option with TSX disabled by default 2019-10-28 08:36:58 +01:00
match.c x86/cpu: Add a steppings field to struct x86_cpu_id 2022-10-07 09:16:54 +02:00
mkcapflags.sh x86/build: Add 'set -e' to mkcapflags.sh to delete broken capflags.c 2019-06-25 09:52:05 +02:00
mshyperv.c random: remove unused irq_flags argument from add_interrupt_randomness() 2022-06-22 14:11:06 +02:00
perfctr-watchdog.c x86/events: Add Hygon Dhyana support to PMU infrastructure 2018-09-27 18:28:57 +02:00
powerflags.c
proc.c x86: Convert some slow-path static_cpu_has() callers to boot_cpu_has() 2019-04-08 12:13:34 +02:00
rdrand.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 335 2019-06-05 17:37:06 +02:00
scattered.c x86/speculation: Disable RRSBA behavior 2022-10-07 09:16:56 +02:00
topology.c x86/topology: Make __max_die_per_package available unconditionally 2021-01-27 11:47:49 +01:00
transmeta.c
tsx.c x86/tsx: Add a feature bit for TSX control MSR support 2022-12-08 11:23:05 +01:00
umc.c
umwait.c KVM: VMX: Stop context switching MSR_IA32_UMWAIT_CONTROL 2020-06-30 15:37:07 -04:00
vmware.c x86/cpu/vmware: Use the full form of INL in VMWARE_PORT 2019-10-08 13:26:42 +02:00
zhaoxin.c x86/cpu: Create Zhaoxin processors architecture support file 2019-06-22 11:45:57 +02:00