linux-stable/kernel
Milton Miller 7378547f2c sched: fix sched_domain sysctl registration again
commit  029190c515 (cpuset
sched_load_balance flag) was not tested SCHED_DEBUG enabled as
committed as it dereferences NULL when used and it reordered
the sysctl registration to cause it to never show any domains
or their tunables.

Fixes:

1) restore arch_init_sched_domains ordering
	we can't walk the domains before we build them

	presently we register cpus with empty directories (no domain
	directories or files).

2) make unregister_sched_domain_sysctl do nothing when already unregistered
	detach_destroy_domains is now called one set of cpus at a time
	unregister_syctl dereferences NULL if called with a null.

	While the the function would always dereference null if called
	twice, in the previous code it was always called once and then
	was followed a register.  So only the hidden bug of the
	sysctl_root_table not being allocated followed by an attempt to
	free it would have shown the error.

3) always call unregister and register in partition_sched_domains
	The code is "smart" about unregistering only needed domains.
	Since we aren't guaranteed any calls to unregister, always 
	unregister.   Without calling register on the way out we
	will not have a table or any sysctl tree.

4) warn if register is called without unregistering
	The previous table memory is lost, leaving pointers to the
	later freed memory in sysctl and leaking the memory of the
	tables.

Before this patch on a 2-core 4-thread box compiled for SMT and NUMA,
the domains appear empty (there are actually 3 levels per cpu).  And as
soon as two domains a null pointer is dereferenced (unreliable in this
case is stack garbage):

bu19a:~# ls -R /proc/sys/kernel/sched_domain/
/proc/sys/kernel/sched_domain/:
cpu0  cpu1  cpu2  cpu3

/proc/sys/kernel/sched_domain/cpu0:

/proc/sys/kernel/sched_domain/cpu1:

/proc/sys/kernel/sched_domain/cpu2:

/proc/sys/kernel/sched_domain/cpu3:

bu19a:~# mkdir /dev/cpuset
bu19a:~# mount -tcpuset cpuset /dev/cpuset/
bu19a:~# cd /dev/cpuset/
bu19a:/dev/cpuset# echo 0 > sched_load_balance 
bu19a:/dev/cpuset# mkdir one
bu19a:/dev/cpuset# echo 1 > one/cpus               
bu19a:/dev/cpuset# echo 0 > one/sched_load_balance 
Unable to handle kernel paging request for data at address 0x00000018
Faulting instruction address: 0xc00000000006b608
NIP: c00000000006b608 LR: c00000000006b604 CTR: 0000000000000000
REGS: c000000018d973f0 TRAP: 0300   Not tainted  (2.6.23-bml)
MSR: 9000000000009032 <EE,ME,IR,DR>  CR: 28242442  XER: 00000000
DAR: 0000000000000018, DSISR: 0000000040000000
TASK = c00000001912e340[1987] 'bash' THREAD: c000000018d94000 CPU: 2
..
NIP [c00000000006b608] .unregister_sysctl_table+0x38/0x110
LR [c00000000006b604] .unregister_sysctl_table+0x34/0x110
Call Trace:
[c000000018d97670] [c000000007017270] 0xc000000007017270 (unreliable)
[c000000018d97720] [c000000000058710] .detach_destroy_domains+0x30/0xb0
[c000000018d977b0] [c00000000005cf1c] .partition_sched_domains+0x1bc/0x230
[c000000018d97870] [c00000000009fdc4] .rebuild_sched_domains+0xb4/0x4c0
[c000000018d97970] [c0000000000a02e8] .update_flag+0x118/0x170
[c000000018d97a80] [c0000000000a1768] .cpuset_common_file_write+0x568/0x820
[c000000018d97c00] [c00000000009d95c] .cgroup_file_write+0x7c/0x180
[c000000018d97cf0] [c0000000000e76b8] .vfs_write+0xe8/0x1b0
[c000000018d97d90] [c0000000000e810c] .sys_write+0x4c/0x90
[c000000018d97e30] [c00000000000852c] syscall_exit+0x0/0x40

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-24 18:23:48 +02:00
..
irq Fix synchronize_irq races with IRQ handler 2007-10-23 09:01:31 -07:00
power trivial copy_data_pages() tidy up 2007-10-20 02:26:04 +02:00
time Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 2007-10-19 13:12:46 -07:00
.gitignore
acct.c whitespace fixes: process accounting 2007-10-18 14:37:24 -07:00
audit.c [PATCH] audit: watching subtrees 2007-10-21 02:37:45 -04:00
audit.h [PATCH] audit: watching subtrees 2007-10-21 02:37:45 -04:00
audit_tree.c [PATCH] audit: watching subtrees 2007-10-21 02:37:45 -04:00
auditfilter.c [PATCH] audit: watching subtrees 2007-10-21 02:37:45 -04:00
auditsc.c auditsc: fix kernel-doc param warnings 2007-10-22 19:40:02 -07:00
capability.c Uninline find_pid etc set of functions 2007-10-19 11:53:41 -07:00
cgroup.c cgroup: kill unused variable 2007-10-23 21:28:39 -04:00
cgroup_debug.c Task Control Groups: simple task cgroup debug info subsystem 2007-10-19 11:53:36 -07:00
compat.c Merge ssh://master.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt 2007-10-18 15:12:41 -07:00
configs.c use simple_read_from_buffer in kernel/ 2007-05-09 12:30:49 -07:00
cpu.c CPU HOTPLUG: avoid hotadd when proper possible_map isn't specified 2007-10-19 11:53:44 -07:00
cpu_acct.c Task Control Groups: example CPU accounting subsystem 2007-10-19 11:53:36 -07:00
cpuset.c hotplug cpu: migrate a task within its cpuset 2007-10-19 11:53:44 -07:00
delayacct.c Add scaled time to taskstats based process accounting 2007-10-18 14:37:28 -07:00
dma.c whitespace fixes: DMA channel allocator 2007-10-18 14:37:24 -07:00
exec_domain.c whitespace fixes: execution domains 2007-10-18 14:37:26 -07:00
exit.c Uninline fork.c/exit.c 2007-10-19 11:53:56 -07:00
extable.c
fork.c Uninline fork.c/exit.c 2007-10-19 11:53:56 -07:00
futex.c Uninline find_task_by_xxx set of functions 2007-10-19 11:53:40 -07:00
futex_compat.c Uninline find_task_by_xxx set of functions 2007-10-19 11:53:40 -07:00
hrtimer.c fix comment: unlock_hrtimer_base is the counterpart of lock_hrtimer_base 2007-10-20 01:56:53 +02:00
itimer.c whitespace fixes: interval timers 2007-10-18 14:37:26 -07:00
kallsyms.c kallsyms: make KSYM_NAME_LEN include space for trailing '\0' 2007-07-17 10:23:03 -07:00
Kconfig.hz [PATCH] HZ: 300Hz support 2006-12-07 08:39:36 -08:00
Kconfig.instrumentation Linux Kernel Markers 2007-10-19 11:53:54 -07:00
Kconfig.preempt Move PREEMPT_NOTIFIERS into an always-included Kconfig 2007-10-17 08:42:55 -07:00
kexec.c Extended crashkernel command line 2007-10-19 11:53:49 -07:00
kfifo.c is_power_of_2: kernel/kfifo.c 2007-07-16 09:05:50 -07:00
kmod.c Restore call_usermodehelper_pipe() behaviour 2007-09-11 17:21:20 -07:00
kprobes.c kprobes: support kretprobe blacklist 2007-10-16 09:43:10 -07:00
ksysfs.c add-vmcore: cleanup the coding style according to Andrew's comments 2007-10-17 08:42:54 -07:00
kthread.c kthread: silence bogus section mismatch warning 2007-07-31 15:39:42 -07:00
latency.c [PATCH] severing module.h->sched.h 2006-12-04 02:00:22 -05:00
lockdep.c Use helpers to obtain task pid in printks 2007-10-19 11:53:43 -07:00
lockdep_internals.h [PATCH] lockdep: more chains 2006-12-07 08:39:43 -08:00
lockdep_proc.c lockdep: Avoid /proc/lockdep & lock_stat infinite output 2007-10-11 22:11:11 +02:00
Makefile [PATCH] audit: watching subtrees 2007-10-21 02:37:45 -04:00
marker.c Linux Kernel Markers 2007-10-19 11:53:54 -07:00
module.c Linux Kernel Markers 2007-10-19 11:53:54 -07:00
mutex-debug.c [PATCH] remove many unneeded #includes of sched.h 2007-02-14 08:09:54 -08:00
mutex-debug.h
mutex.c lockdep: fixup mutex annotations 2007-10-11 22:11:12 +02:00
mutex.h
notifier.c Add kernel/notifier.c 2007-10-19 11:53:34 -07:00
ns_cgroup.c cgroups: implement namespace tracking subsystem 2007-10-19 11:53:37 -07:00
nsproxy.c pid namespaces: allow cloning of new namespace 2007-10-19 11:53:39 -07:00
panic.c trivial comment wording/typo fix regarding taint flags 2007-10-20 00:30:06 +02:00
params.c param_sysfs_builtin memchr argument fix 2007-10-18 14:37:21 -07:00
pid.c Uninline the task_xid_nr_ns() calls 2007-10-19 11:53:41 -07:00
posix-cpu-timers.c Isolate some explicit usage of task->tgid 2007-10-19 11:53:40 -07:00
posix-timers.c Isolate some explicit usage of task->tgid 2007-10-19 11:53:40 -07:00
printk.c serial: turn serial console suspend a boot rather than compile time option 2007-10-18 14:37:19 -07:00
profile.c make kernel/profile.c:time_hook static 2007-10-17 08:42:55 -07:00
ptrace.c Isolate some explicit usage of task->tgid 2007-10-19 11:53:40 -07:00
rcupdate.c Clean up duplicate includes in kernel/ 2007-10-17 08:42:48 -07:00
rcutorture.c Make rcutorture RNG use temporal entropy 2007-10-17 08:42:53 -07:00
relay.c whitespace fixes: relayfs 2007-10-18 14:37:24 -07:00
resource.c memory unplug: memory hotplug cleanup 2007-10-16 09:43:01 -07:00
rtmutex-debug.c Use helpers to obtain task pid in printks 2007-10-19 11:53:43 -07:00
rtmutex-debug.h
rtmutex-tester.c Freezer: make kernel threads nonfreezable by default 2007-07-17 10:23:02 -07:00
rtmutex.c Use helpers to obtain task pid in printks 2007-10-19 11:53:43 -07:00
rtmutex.h
rtmutex_common.h FUTEX: Tidy up the code 2007-07-16 09:05:49 -07:00
rwsem.c lockstat: hook into spinlock_t, rwlock_t, rwsem and mutex 2007-07-19 10:04:49 -07:00
sched.c sched: fix sched_domain sysctl registration again 2007-10-24 18:23:48 +02:00
sched_debug.c sched: reduce schedstat variable overhead a bit 2007-10-18 21:32:56 +02:00
sched_fair.c sched: fix new task startup crash 2007-10-17 16:55:11 +02:00
sched_idletask.c sched: mark scheduling classes as const 2007-10-15 17:00:12 +02:00
sched_rt.c sched: tidy up SCHED_RR 2007-10-15 17:00:13 +02:00
sched_stats.h sched: reduce schedstat variable overhead a bit 2007-10-18 21:32:56 +02:00
seccomp.c make seccomp zerocost in schedule 2007-07-16 09:05:50 -07:00
signal.c Use helpers to obtain task pid in printks 2007-10-19 11:53:43 -07:00
softirq.c [KERNEL]: Unexport raise_softirq_irqoff 2007-10-10 16:49:18 -07:00
softlockup.c Use helpers to obtain task pid in printks 2007-10-19 11:53:43 -07:00
spinlock.c lockstat: hook into spinlock_t, rwlock_t, rwsem and mutex 2007-07-19 10:04:49 -07:00
srcu.c
stacktrace.c
stop_machine.c Fix stop_machine_run problem with naughty real time process 2007-07-16 09:05:41 -07:00
sys.c Isolate the explicit usage of signal->pgrp 2007-10-19 11:53:43 -07:00
sys_ni.c kernel/sys_ni.c: add dummy sys_ni_syscall() prototype 2007-10-17 08:42:55 -07:00
sysctl.c pid namespaces: changes to show virtual ids to user 2007-10-19 11:53:40 -07:00
sysctl_check.c Fix appletalk sysctl entry name 2007-10-22 19:15:59 -07:00
taskstats.c Fix misspellings of "system", "controller", "interrupt" and "necessary". 2007-10-19 23:10:43 +02:00
time.c whitespace fixes: time syscalls 2007-10-18 14:37:24 -07:00
timer.c pid namespaces: changes to show virtual ids to user 2007-10-19 11:53:40 -07:00
tsacct.c Add scaled time to taskstats based process accounting 2007-10-18 14:37:28 -07:00
uid16.c header cleaning: don't include smp_lock.h when not used 2007-05-08 11:15:07 -07:00
user.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched 2007-10-17 09:11:18 -07:00
user_namespace.c Fix user namespace exiting OOPs 2007-09-19 11:24:18 -07:00
utsname.c Fix UTS corruption during clone(CLONE_NEWUTS) 2007-09-19 11:24:17 -07:00
utsname_sysctl.c remove CONFIG_UTS_NS and CONFIG_IPC_NS 2007-07-16 09:05:47 -07:00
wait.c Fix occurrences of "the the " 2007-05-09 08:57:56 +02:00
workqueue.c Use helpers to obtain task pid in printks 2007-10-19 11:53:43 -07:00