linux-stable/kernel/sched
Mel Gorman 598f0ec0bc sched/numa: Set the scan rate proportional to the memory usage of the task being scanned
The NUMA PTE scan rate is controlled with a combination of the
numa_balancing_scan_period_min, numa_balancing_scan_period_max and
numa_balancing_scan_size. This scan rate is independent of the size
of the task and as an aside it is further complicated by the fact that
numa_balancing_scan_size controls how many pages are marked pte_numa and
not how much virtual memory is scanned.

In combination, it is almost impossible to meaningfully tune the min and
max scan periods and reasoning about performance is complex when the time
to complete a full scan is is partially a function of the tasks memory
size. This patch alters the semantic of the min and max tunables to be
about tuning the length time it takes to complete a scan of a tasks occupied
virtual address space. Conceptually this is a lot easier to understand. There
is a "sanity" check to ensure the scan rate is never extremely fast based on
the amount of virtual memory that should be scanned in a second. The default
of 2.5G seems arbitrary but it is to have the maximum scan rate after the
patch roughly match the maximum scan rate before the patch was applied.

On a similar note, numa_scan_period is in milliseconds and not
jiffies. Properly placed pages slow the scanning rate but adding 10 jiffies
to numa_scan_period means that the rate scanning slows depends on HZ which
is confusing. Get rid of the jiffies_to_msec conversion and treat it as ms.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-18-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-09 12:40:20 +02:00
..
auto_group.c sched/autogroup: Fix race with task_groups list 2013-05-28 09:40:22 +02:00
auto_group.h Revert "sched/autogroup: Fix crash on reboot when autogroup is disabled" 2012-12-11 10:23:45 +01:00
clock.c sched_clock: Prevent 64bit inatomicity on 32bit systems 2013-04-08 11:50:44 +02:00
core.c sched/numa: Initialise numa_next_scan properly 2013-10-09 12:40:19 +02:00
cpuacct.c cgroup: pass around cgroup_subsys_state instead of cgroup in file methods 2013-08-08 20:11:24 -04:00
cpuacct.h sched/cpuacct: Initialize root cpuacct earlier 2013-04-10 13:54:20 +02:00
cpupri.c sched: Fix some kernel-doc warnings 2013-07-18 09:58:21 +02:00
cpupri.h
cputime.c Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-09-05 12:36:46 -07:00
debug.c sched/debug: Take PID namespace into account 2013-09-12 19:14:16 +02:00
fair.c sched/numa: Set the scan rate proportional to the memory usage of the task being scanned 2013-10-09 12:40:20 +02:00
features.h Revert "mm: sched: numa: Delay PTE scanning until a task is scheduled on a new node" 2013-10-09 12:40:17 +02:00
idle_task.c sched: Keep at least 1 tick per second for active dynticks tasks 2013-05-04 08:32:02 +02:00
Makefile sched: Factor out load calculation code from sched/core.c --> sched/proc.c 2013-05-07 13:14:50 +02:00
proc.c sched: Change get_rq_runnable_load() to static and inline 2013-06-27 10:07:44 +02:00
rt.c sched/rt: Remove redundant nr_cpus_allowed test 2013-10-06 11:28:40 +02:00
sched.h sched/balancing: Consider max cost of idle balance per sched domain 2013-09-20 12:03:44 +02:00
stats.c fix a leak in /proc/schedstats 2013-04-29 15:41:45 -04:00
stats.h sched: Micro-optimize by dropping unnecessary task_rq() calls 2013-09-25 13:51:06 +02:00
stop_task.c sched: Use an accessor to read the rq clock 2013-05-28 09:40:27 +02:00