linux-stable/kernel
Steven Rostedt 5181f4a46a sched: Use pushable_tasks to determine next highest prio
Hillf Danton proposed a patch (see link) that cleaned up the
sched_rt code that calculates the priority of the next highest priority
task to be used in finding run queues to pull from.

His patch removed the calculating of the next prio to just use the current
prio when deteriming if we should examine a run queue to pull from. The problem
with his patch was that it caused more false checks. Because we check a run
queue for pushable tasks if the current priority of that run queue is higher
in priority than the task about to run on our run queue. But after grabbing
the locks and doing the real check, we find that there may not be a task
that has a higher prio task to pull. Thus the locks were taken with nothing to
do.

I added some trace_printks() to record when and how many times the run queue
locks were taken to check for pullable tasks, compared to how many times we
pulled a task.

With the current method, it was:

  3806 locks taken vs 2812 pulled tasks

With Hillf's patch:

  6728 locks taken vs 2804 pulled tasks

The number of times locks were taken to pull a task went up almost double with
no more success rate.

But his patch did get me thinking. When we look at the priority of the highest
task to consider taking the locks to do a pull, a failure to pull can be one
of the following: (in order of most likely)

 o RT task was pushed off already between the check and taking the lock
 o Waiting RT task can not be migrated
 o RT task's CPU affinity does not include the target run queue's CPU
 o RT task's priority changed between the check and taking the lock

And with Hillf's patch, the thing that caused most of the failures, is
the RT task to pull was not at the right priority to pull (not greater than
the current RT task priority on the target run queue).

Most of the above cases we can't help. But the current method does not check
if the next highest prio RT task can be migrated or not, and if it can not,
we still grab the locks to do the test (we don't find out about this fact until
after we have the locks). I thought about this case, and realized that the
pushable task plist that is maintained only holds RT tasks that can migrate.
If we move the calculating of the next highest prio task from the inc/dec_rt_task()
functions into the queuing of the pushable tasks, then we only measure the
priorities of those tasks that we push, and we get this basically for free.

Not only does this patch make the code a little more efficient, it cleans it
up and makes it a little simpler.

Thanks to Hillf Danton for inspiring me on this patch.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Gregory Haskins <ghaskins@novell.com>
Link: http://lkml.kernel.org/r/BANLkTimQ67180HxCx5vgMqumqw1EkFh3qg@mail.gmail.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-08-14 12:00:55 +02:00
..
debug Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb 2011-08-01 13:39:40 -10:00
events perf: Remove perf_event_attr::type check 2011-07-21 20:41:55 +02:00
gcov gcov: disable CONSTRUCTORS for UML 2011-07-26 16:49:45 -07:00
irq Merge branch 'imx/dt' into next/dt 2011-07-28 15:25:46 +00:00
power Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2011-07-25 13:56:39 -07:00
time Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-07-22 16:52:18 -07:00
trace Merge branch 'linus' into perf/urgent 2011-08-05 10:33:55 +02:00
.gitignore
acct.c
async.c async: Fixed an include coding style issue 2011-06-14 22:48:46 -04:00
audit.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
audit.h
audit_tree.c audit_tree,rcu: Convert call_rcu(__put_tree) to kfree_rcu() 2011-07-20 14:10:11 -07:00
audit_watch.c kill path_lookup() 2011-03-14 09:15:23 -04:00
auditfilter.c netlink: kill loginuid/sessionid/sid members from struct netlink_skb_parms 2011-03-03 10:55:40 -08:00
auditsc.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
backtracetest.c
bounds.c memcg: remove direct page_cgroup-to-page pointer 2011-03-23 19:46:28 -07:00
capability.c Merge branch 'master' into next 2011-05-19 18:51:57 +10:00
cgroup.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 2011-07-27 19:26:38 -07:00
cgroup_freezer.c cgroups: add per-thread subsystem callbacks 2011-05-26 17:12:34 -07:00
compat.c Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6 2011-07-30 00:08:53 -07:00
configs.c kernel/configs.c: include MODULE_*() when CONFIG_IKCONFIG_PROC=n 2011-07-25 20:57:15 -07:00
cpu.c Fix common misspellings 2011-03-31 11:26:23 -03:00
cpuset.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
crash_dump.c crash_dump: export is_kdump_kernel to modules, consolidate elfcorehdr_addr, setup_elfcorehdr and saved_max_pfn 2011-03-23 19:47:19 -07:00
cred.c move RLIMIT_NPROC check from set_user() to do_execve_common() 2011-08-11 11:24:42 -07:00
delayacct.c KVM: Steal time implementation 2011-07-14 12:59:14 +03:00
dma.c
elfcore.c
exec_domain.c
exit.c ipc: introduce shm_rmid_forced sysctl 2011-07-26 16:49:44 -07:00
extable.c extable, core_kernel_data(): Make sure all archs define _sdata 2011-05-20 08:56:56 +02:00
fork.c move RLIMIT_NPROC check from set_user() to do_execve_common() 2011-08-11 11:24:42 -07:00
freezer.c Freezer: Use SMP barriers 2011-05-17 23:19:17 +02:00
futex.c Merge branch 'linus' into core/urgent 2011-08-04 09:09:27 +02:00
futex_compat.c userns: user namespaces: convert several capable() calls 2011-03-23 19:47:08 -07:00
groups.c userns: user namespaces: convert several capable() calls 2011-03-23 19:47:08 -07:00
hrtimer.c hrtimers: Fix typo causing erratic timers 2011-05-25 15:31:58 -07:00
hung_task.c watchdog, hung_task_timeout: Add Kconfig configurable default 2011-04-28 09:13:17 +02:00
irq_work.c irq_work: Use per cpu atomics instead of regular atomics 2010-12-18 15:54:48 +01:00
itimer.c
jump_label.c jump_label: Fix jump_label update for modules 2011-06-29 09:59:17 -04:00
kallsyms.c Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-03-25 17:52:22 -07:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks arch:Kconfig.locks Remove unused config option. 2011-04-10 17:01:05 +02:00
Kconfig.preempt sched: Isolate preempt counting in its own config option 2011-06-10 15:15:40 +02:00
kexec.c treewide: Convert uses of struct resource to resource_size(ptr) 2011-06-10 14:55:36 +02:00
kfifo.c
kmod.c Boot up with usermodehelper disabled 2011-08-03 22:03:29 -10:00
kprobes.c kprobes: Return -ENOENT if probe point doesn't exist 2011-07-15 15:11:47 -04:00
ksysfs.c kernel/ksysfs.c: expose file_caps_enabled in sysfs 2011-04-19 16:45:51 -07:00
kthread.c cpuset: Fix cpuset_cpus_allowed_fallback(), don't update tsk->rt.nr_cpus_allowed 2011-05-28 17:02:57 +02:00
latencytop.c Fix common misspellings 2011-03-31 11:26:23 -03:00
lockdep.c lockdep: Clear whole lockdep_map on initialization 2011-08-04 10:17:56 +02:00
lockdep_internals.h
lockdep_proc.c lockdep: Remove unused 'factor' variable from lockdep_stats_show() 2011-03-23 13:54:47 +01:00
lockdep_states.h
Makefile jump label: Reduce the cycle count by changing the link order 2011-08-05 23:57:33 +02:00
module.c module: add /sys/module/<name>/uevent files 2011-07-24 22:06:04 +09:30
mutex-debug.c mutex: Use p->on_cpu for the adaptive spin 2011-04-14 08:52:33 +02:00
mutex-debug.h mutex: Use p->on_cpu for the adaptive spin 2011-04-14 08:52:33 +02:00
mutex.c lockdep, mutex: provide mutex_lock_nest_lock 2011-05-25 08:39:17 -07:00
mutex.h mutex: Use p->on_cpu for the adaptive spin 2011-04-14 08:52:33 +02:00
notifier.c notifiers: sys: move reboot notifiers into reboot.h 2011-07-25 20:57:14 -07:00
nsproxy.c make sure that nsproxy_cache is initialized early enough 2011-07-20 01:44:07 -04:00
padata.c Fix common misspellings 2011-03-31 11:26:23 -03:00
panic.c panic: panic=-1 for immediate reboot 2011-07-26 16:49:45 -07:00
params.c module: add /sys/module/<name>/uevent files 2011-07-24 22:06:04 +09:30
pid.c rcu: treewide: Do not use rcu_read_lock_held when calling rcu_dereference_check 2011-07-08 22:21:58 +02:00
pid_namespace.c pidns: call pid_ns_prepare_proc() from create_pid_namespace() 2011-03-23 19:46:58 -07:00
pm_qos_params.c plist: Remove the need to supply locks to plist heads 2011-07-08 14:02:53 +02:00
posix-cpu-timers.c hrtimers: Avoid touching inactive timer bases 2011-05-23 13:59:54 +02:00
posix-timers.c posix-timers: RCU conversion 2011-05-24 12:10:51 +02:00
printk.c cap_syslog: don't use WARN_ONCE for CAP_SYS_ADMIN deprecation warning 2011-08-09 18:22:22 -07:00
profile.c kernel/profile.c: remove some duplicate code from profile_hits() 2011-05-26 17:12:37 -07:00
ptrace.c connector: add an event for monitoring process tracers 2011-07-18 21:38:33 +02:00
range.c kernel/range.c: fix clean_sort_range() for the case of full array 2010-11-12 07:55:31 -08:00
rcupdate.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
rcutiny.c sanitize <linux/prefetch.h> usage 2011-05-20 12:50:29 -07:00
rcutiny_plugin.h rcu: Converge TINY_RCU expedited and normal boosting 2011-05-05 23:16:58 -07:00
rcutorture.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
rcutree.c rcu: Prevent RCU callbacks from executing before scheduler initialized 2011-07-13 08:17:56 -07:00
rcutree.h rcu: Move RCU_BOOST #ifdefs to header file 2011-06-16 16:12:05 -07:00
rcutree_plugin.h softirq,rcu: Inform RCU of irq_exit() activity 2011-07-20 10:50:12 -07:00
rcutree_trace.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
relay.c Clean up relay_alloc_page_array() slightly by using vzalloc rather than vmalloc and memset 2010-11-05 08:21:34 -07:00
res_counter.c memcg: res_counter_read_u64(): fix potential races on 32-bit machines 2011-03-23 19:46:22 -07:00
resource.c resources: Add lookup_resource() 2011-07-30 21:21:39 +02:00
rtmutex-debug.c rtmutex: Simplify PI algorithm and make highest prio task get lock 2011-01-27 21:13:51 -05:00
rtmutex-debug.h
rtmutex-tester.c rtmutex: tester: Remove the remaining BKL leftovers 2011-02-22 22:07:22 +01:00
rtmutex.c plist: Remove the need to supply locks to plist heads 2011-07-08 14:02:53 +02:00
rtmutex.h
rtmutex_common.h rtmutex: Simplify PI algorithm and make highest prio task get lock 2011-01-27 21:13:51 -05:00
rwsem.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
sched.c sched: fix broken SCHED_RESET_ON_FORK handling 2011-08-14 12:00:43 +02:00
sched_autogroup.c Fix common misspellings 2011-03-31 11:26:23 -03:00
sched_autogroup.h sched: Skip autogroup when looking for all rt sched groups 2011-07-01 10:39:08 +02:00
sched_clock.c sched: Add some clock info to sched_debug 2010-11-23 10:29:08 +01:00
sched_cpupri.c
sched_cpupri.h
sched_debug.c sched: Get rid of lock_depth 2011-04-24 13:18:38 +02:00
sched_fair.c sched: Remove noop in lowest_flag_domain() 2011-08-14 12:00:46 +02:00
sched_features.h sched: Kill WAKEUP_PREEMPT 2011-08-14 12:00:41 +02:00
sched_idletask.c sched: Drop the rq argument to sched_class::select_task_rq() 2011-04-14 08:52:36 +02:00
sched_rt.c sched: Use pushable_tasks to determine next highest prio 2011-08-14 12:00:55 +02:00
sched_stats.h sched: More sched_domain iterations fixes 2011-05-28 17:02:54 +02:00
sched_stoptask.c sched: Drop the rq argument to sched_class::select_task_rq() 2011-04-14 08:52:36 +02:00
seccomp.c
semaphore.c
signal.c signals: sys_ssetmask/sys_rt_sigsuspend should use set_current_blocked() 2011-07-27 12:53:36 -07:00
smp.c generic-ipi: Fix kexec boot crash by initializing call_single_queue before enabling interrupts 2011-06-17 10:17:12 +02:00
softirq.c softirq,rcu: Inform RCU of irq_exit() activity 2011-07-20 10:50:12 -07:00
spinlock.c
srcu.c rcu: demote SRCU_SYNCHRONIZE_DELAY from kernel-parameter status 2011-01-14 04:56:49 -08:00
stacktrace.c stack_trace: Add weak save_stack_trace_regs() 2011-06-14 22:48:52 -04:00
stop_machine.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
sys.c move RLIMIT_NPROC check from set_user() to do_execve_common() 2011-08-11 11:24:42 -07:00
sys_ni.c ipc: Add missing sys_ni entries for ipc/compat.c functions 2011-05-20 13:53:02 -07:00
sysctl.c sysctl,rcu: Convert call_rcu(free_head) to kfree 2011-07-20 14:10:18 -07:00
sysctl_binary.c open-style analog of vfs_path_lookup() 2011-03-14 09:15:28 -04:00
sysctl_check.c sysctl_check: drop dead code 2011-03-23 19:46:51 -07:00
taskstats.c taskstats: add_del_listener() should ignore !valid listeners 2011-08-03 14:25:20 -10:00
test_kprobes.c
time.c Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-03-15 18:53:35 -07:00
timeconst.pl
timer.c timers: Consider slack value in mod_timer() 2011-06-03 15:02:32 +02:00
tracepoint.c jump label: Introduce static_branch() interface 2011-04-04 12:48:08 -04:00
tsacct.c
uid16.c userns: user namespaces: convert several capable() calls 2011-03-23 19:47:08 -07:00
up.c
user-return-notifier.c Fix common misspellings 2011-03-31 11:26:23 -03:00
user.c userns: add a user_namespace as creator/owner of uts_namespace 2011-03-23 19:46:59 -07:00
user_namespace.c user_ns: improve the user_ns on-the-slab packaging 2011-01-13 08:03:18 -08:00
utsname.c ns proc: Add support for the uts namespace 2011-05-10 14:35:35 -07:00
utsname_sysctl.c
wait.c Fix common misspellings 2011-03-31 11:26:23 -03:00
watchdog.c perf, x86: P4 PMU - Introduce event alias feature 2011-07-14 17:25:04 -04:00
workqueue.c Merge branch 'for-3.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2011-07-22 15:07:15 -07:00
workqueue_sched.h