No description
Find a file
Mel Gorman f169c62ff7 sched/numa: Complete scanning of inactive VMAs when there is no alternative
VMAs are skipped if there is no recent fault activity but this represents
a chicken-and-egg problem as there may be no fault activity if the PTEs
are never updated to trap NUMA hints. There is an indirect reliance on
scanning to be forced early in the lifetime of a task but this may fail
to detect changes in phase behaviour. Force inactive VMAs to be scanned
when all other eligible VMAs have been updated within the same scan
sequence.

Test results in general look good with some changes in performance, both
negative and positive, depending on whether the additional scanning and
faulting was beneficial or not to the workload. The autonuma benchmark
workload NUMA01_THREADLOCAL was picked for closer examination. The workload
creates two processes with numerous threads and thread-local storage that
is zero-filled in a loop. It exercises the corner case where unrelated
threads may skip VMAs that are thread-local to another thread and still
has some VMAs that inactive while the workload executes.

The VMA skipping activity frequency with and without the patch:

	6.6.0-rc2-sched-numabtrace-v1
	=============================
	    649 reason=scan_delay
	  9,094 reason=unsuitable
	 48,915 reason=shared_ro
	143,919 reason=inaccessible
	193,050 reason=pid_inactive

	6.6.0-rc2-sched-numabselective-v1
	=============================
	    146 reason=seq_completed
	    622 reason=ignore_pid_inactive

	    624 reason=scan_delay
	  6,570 reason=unsuitable
	 16,101 reason=shared_ro
	 27,608 reason=inaccessible
	 41,939 reason=pid_inactive

Note that with the patch applied, the PID activity is ignored
(ignore_pid_inactive) to ensure a VMA with some activity is completely
scanned. In addition, a small number of VMAs are scanned when no other
eligible VMA is available during a single scan window (seq_completed).
The number of times a VMA is skipped due to no PID activity from the
scanning task (pid_inactive) drops dramatically. It is expected that
this will increase the number of PTEs updated for NUMA hinting faults
as well as hinting faults but these represent PTEs that would otherwise
have been missed. The tradeoff is scan+fault overhead versus improving
locality due to migration.

On a 2-socket Cascade Lake test machine, the time to complete the
workload is as follows;

                                                 6.6.0-rc2              6.6.0-rc2
                                       sched-numabtrace-v1 sched-numabselective-v1
  Min       elsp-NUMA01_THREADLOCAL      174.22 (   0.00%)      117.64 (  32.48%)
  Amean     elsp-NUMA01_THREADLOCAL      175.68 (   0.00%)      123.34 *  29.79%*
  Stddev    elsp-NUMA01_THREADLOCAL        1.20 (   0.00%)        4.06 (-238.20%)
  CoeffVar  elsp-NUMA01_THREADLOCAL        0.68 (   0.00%)        3.29 (-381.70%)
  Max       elsp-NUMA01_THREADLOCAL      177.18 (   0.00%)      128.03 (  27.74%)

The time to complete the workload is reduced by almost 30%:

                     6.6.0-rc2   6.6.0-rc2
                  sched-numabtrace-v1 sched-numabselective-v1 /
  Duration User       91201.80    63506.64
  Duration System      2015.53     1819.78
  Duration Elapsed     1234.77      868.37

In this specific case, system CPU time was not increased but it's not
universally true.

From vmstat, the NUMA scanning and fault activity is as follows;

                                        6.6.0-rc2      6.6.0-rc2
                              sched-numabtrace-v1 sched-numabselective-v1
  Ops NUMA base-page range updates       64272.00    26374386.00
  Ops NUMA PTE updates                   36624.00       55538.00
  Ops NUMA PMD updates                      54.00       51404.00
  Ops NUMA hint faults                   15504.00       75786.00
  Ops NUMA hint local faults %           14860.00       56763.00
  Ops NUMA hint local percent               95.85          74.90
  Ops NUMA pages migrated                 1629.00     6469222.00

Both the number of PTE updates and hint faults is dramatically
increased. While this is superficially unfortunate, it represents
ranges that were simply skipped without the patch. As a result
of the scanning and hinting faults, many more pages were also
migrated but as the time to completion is reduced, the overhead
is offset by the gain.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Raghavendra K T <raghavendra.kt@amd.com>
Link: https://lore.kernel.org/r/20231010083143.19593-7-mgorman@techsingularity.net
2023-10-10 23:42:15 +02:00
arch x86/idle: Disable IBRS when CPU is offline to improve single-threaded performance 2023-10-07 11:33:28 +02:00
block block: fix kernel-doc for disk_force_media_change() 2023-09-26 00:43:34 -06:00
certs certs: Reference revocation list for all keyrings 2023-08-17 20:12:41 +00:00
crypto crypto: sm2 - Fix crash caused by uninitialized context 2023-09-20 13:10:10 +08:00
Documentation sched/topology: Change behaviour of the 'sched_energy_aware' sysctl, based on the platform 2023-10-09 17:24:44 +02:00
drivers intel_idle: Add ibrs_off module parameter to force-disable IBRS 2023-10-07 11:33:28 +02:00
fs Fourteen hotfixes, eleven of which are cc:stable. The remainder pertain 2023-10-01 13:33:25 -07:00
include sched/numa: Complete scanning of inactive VMAs when there is no alternative 2023-10-10 23:42:15 +02:00
init workqueue: Changes for v6.6 2023-09-01 16:06:32 -07:00
io_uring io_uring/fs: remove sqe->rw_flags checking from LINKAT 2023-09-29 03:07:09 -06:00
ipc Add x86 shadow stack support 2023-08-31 12:20:12 -07:00
kernel sched/numa: Complete scanning of inactive VMAs when there is no alternative 2023-10-10 23:42:15 +02:00
lib Merge branch 'sched/urgent' into sched/core, to pick up fixes and refresh the branch 2023-10-07 11:32:24 +02:00
LICENSES LICENSES: Add the copyleft-next-0.3.1 license 2022-11-08 15:44:01 +01:00
mm Merge branch 'sched/urgent' into sched/core, to pick up fixes and refresh the branch 2023-10-07 11:32:24 +02:00
net Networking fixes for 6.6-rc2, including fixes from netfilter and bpf 2023-09-21 11:28:16 -07:00
rust Documentation work keeps chugging along; stuff for 6.6 includes: 2023-08-30 20:05:42 -07:00
samples VFIO updates for v6.6-rc1 2023-08-30 20:36:01 -07:00
scripts kbuild: remove stale code for 'source' symlink in packaging scripts 2023-10-01 23:06:06 +09:00
security selinux: fix handling of empty opts in selinux_fs_context_submount() 2023-09-12 17:31:08 -04:00
sound ASoC: Fixes for v6.6 2023-09-20 15:02:16 +02:00
tools Fourteen hotfixes, eleven of which are cc:stable. The remainder pertain 2023-10-01 13:33:25 -07:00
usr initramfs: Encode dependency on KBUILD_BUILD_TIMESTAMP 2023-06-06 17:54:49 +09:00
virt ARM: 2023-09-07 13:52:20 -07:00
.clang-format iommu: Add for_each_group_device() 2023-05-23 08:15:51 +02:00
.cocciconfig
.get_maintainer.ignore get_maintainer: add Alan to .get_maintainer.ignore 2022-08-20 15:17:44 -07:00
.gitattributes .gitattributes: set diff driver for Rust source code files 2023-05-31 17:48:25 +02:00
.gitignore kbuild: rpm-pkg: rename binkernel.spec to kernel.spec 2023-07-25 00:59:33 +09:00
.mailmap for-linus-2023083101 2023-09-01 12:31:44 -07:00
.rustfmt.toml rust: add .rustfmt.toml 2022-09-28 09:02:20 +02:00
COPYING
CREDITS USB: Remove Wireless USB and UWB documentation 2023-08-09 14:17:32 +02:00
Kbuild Kbuild updates for v6.1 2022-10-10 12:00:45 -07:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS ARM: SoC fixes for 6.6 2023-09-30 18:41:37 -07:00
Makefile Linux 6.6-rc4 2023-10-01 14:15:13 -07:00
README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.