linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-10-28 23:24:50 +00:00

Author	SHA1	Message	Date
Tejun Heo	e2bd416f62	cgroup: fix deadlock on cgroup_mutex via drop_parsed_module_refcounts() `eb178d0633` ("cgroup: grab cgroup_mutex in drop_parsed_module_refcounts()") made drop_parsed_module_refcounts() grab cgroup_mutex to make lockdep assertion in for_each_subsys() happy. Unfortunately, cgroup_remount() calls the function while holding cgroup_mutex in its failure path leading to the following deadlock. # mount -t cgroup -o remount,memory,blkio cgroup blkio cgroup: option changes via remount are deprecated (pid=525 comm=mount) ============================================= [ INFO: possible recursive locking detected ] 3.10.0-rc4-work+ #1 Not tainted --------------------------------------------- mount/525 is trying to acquire lock: (cgroup_mutex){+.+.+.}, at: [<ffffffff8110a3e1>] drop_parsed_module_refcounts+0x21/0xb0 but task is already holding lock: (cgroup_mutex){+.+.+.}, at: [<ffffffff8110e4e1>] cgroup_remount+0x51/0x200 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(cgroup_mutex); lock(cgroup_mutex); * DEADLOCK * May be due to missing lock nesting notation 4 locks held by mount/525: #0: (&type->s_umount_key#30){+.+...}, at: [<ffffffff811e9a0d>] do_mount+0x2bd/0xa30 #1: (&sb->s_type->i_mutex_key#9){+.+.+.}, at: [<ffffffff8110e4d3>] cgroup_remount+0x43/0x200 #2: (cgroup_mutex){+.+.+.}, at: [<ffffffff8110e4e1>] cgroup_remount+0x51/0x200 #3: (cgroup_root_mutex){+.+.+.}, at: [<ffffffff8110e4ef>] cgroup_remount+0x5f/0x200 stack backtrace: CPU: 2 PID: 525 Comm: mount Not tainted 3.10.0-rc4-work+ #1 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 ffffffff829651f0 ffff88000ec2fc28 ffffffff81c24bb1 ffff88000ec2fce8 ffffffff810f420d 0000000000000006 0000000000000001 0000000000000056 ffff8800153b4640 ffff880000000000 ffffffff81c2e468 ffff8800153b4640 Call Trace: [<ffffffff81c24bb1>] dump_stack+0x19/0x1b [<ffffffff810f420d>] __lock_acquire+0x15dd/0x1e60 [<ffffffff810f531c>] lock_acquire+0x9c/0x1f0 [<ffffffff81c2a805>] mutex_lock_nested+0x65/0x410 [<ffffffff8110a3e1>] drop_parsed_module_refcounts+0x21/0xb0 [<ffffffff8110e63e>] cgroup_remount+0x1ae/0x200 [<ffffffff811c9bb2>] do_remount_sb+0x82/0x190 [<ffffffff811e9d41>] do_mount+0x5f1/0xa30 [<ffffffff811ea203>] SyS_mount+0x83/0xc0 [<ffffffff81c2fb82>] system_call_fastpath+0x16/0x1b Fix it by moving the drop_parsed_module_refcounts() invocation outside cgroup_mutex. Signed-off-by: Tejun Heo <tj@kernel.org>	2013-06-27 19:37:23 -07:00
Tejun Heo	a4ea1cc906	cgroup: always use RCU accessors for protected accesses kernel/cgroup.c still has places where a RCU pointer is set and accessed directly without going through RCU_INIT_POINTER() or rcu_dereference_protected(). They're all properly protected accesses so nothing is broken but it leads to spurious sparse RCU address space warnings. Substitute direct accesses with RCU_INIT_POINTER() and rcu_dereference_protected(). Note that %true is specified as the extra condition for all derference updates. This isn't ideal as all it does is suppressing warning without actually policing synchronization rules; however, most are scheduled to be removed pretty soon along with css_id itself, so no reason to be more elaborate. Combined with the previous changes, this removes all RCU related sparse warnings from cgroup. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Acked-by; Li Zefan <lizefan@huawei.com>	2013-06-26 10:48:48 -07:00
Tejun Heo	a8ad805cfd	cgroup: fix RCU accesses around task->cgroups There are several places in kernel/cgroup.c where task->cgroups is accessed and modified without going through proper RCU accessors. None is broken as they're all lock protected accesses; however, this still triggers sparse RCU address space warnings. * Consistently use task_css_set() for task->cgroups dereferencing. * Use RCU_INIT_POINTER() to clear task->cgroups to &init_css_set on exit. * Remove unnecessary rcu_dereference_raw() from cset->subsys[] dereference in cgroup_exit(). Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Acked-by: Li Zefan <lizefan@huawei.com>	2013-06-26 10:48:38 -07:00
Tejun Heo	14611e51a5	cgroup: fix RCU accesses to task->cgroups task->cgroups is a RCU pointer pointing to struct css_set. A task switches to a different css_set on cgroup migration but a css_set doesn't change once created and its pointers to cgroup_subsys_states aren't RCU protected. task_subsys_state[_check]() is the macro to acquire css given a task and subsys_id pair. It RCU-dereferences task->cgroups->subsys[] not task->cgroups, so the RCU pointer task->cgroups ends up being dereferenced without read_barrier_depends() after it. It's broken. Fix it by introducing task_css_set[_check]() which does RCU-dereference on task->cgroups. task_subsys_state[_check]() is reimplemented to directly dereference ->subsys[] of the css_set returned from task_css_set[_check](). This removes some of sparse RCU warnings in cgroup. v2: Fixed unbalanced parenthsis and there's no need to use rcu_dereference_raw() when !CONFIG_PROVE_RCU. Both spotted by Li. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Acked-by: Li Zefan <lizefan@huawei.com> Cc: stable@vger.kernel.org	2013-06-26 10:42:46 -07:00
Tejun Heo	eb178d0633	cgroup: grab cgroup_mutex in drop_parsed_module_refcounts() This isn't strictly necessary as all subsystems specified in @subsys_mask are guaranteed to be pinned; however, it does spuriously trigger lockdep warning. Let's grab cgroup_mutex around it. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>	2013-06-26 10:40:10 -07:00
Tejun Heo	1672d04070	cgroup: fix cgroupfs_root early destruction path cgroupfs_root used to have ->actual_subsys_mask in addition to ->subsys_mask. `a8a648c4ac` ("cgroup: remove cgroup->actual_subsys_mask") removed it noting that the subsys_mask is essentially temporary and doesn't belong in cgroupfs_root; however, the patch made it impossible to tell whether a cgroupfs_root actually has the subsystems bound or just have the bits set leading to the following BUG when trying to mount with subsystems which are already mounted elsewhere. kernel BUG at kernel/cgroup.c:1038! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC ... CPU: 1 PID: 7973 Comm: mount Tainted: G W 3.10.0-rc7-next-20130625-sasha-00011-g1c1dc0e #1105 task: ffff880fc0ae8000 ti: ffff880fc0b9a000 task.ti: ffff880fc0b9a000 RIP: 0010:[<ffffffff81249b29>] [<ffffffff81249b29>] rebind_subsystems+0x409/0x5f0 ... Call Trace: [<ffffffff8124bd4f>] cgroup_kill_sb+0xff/0x210 [<ffffffff813d21af>] deactivate_locked_super+0x4f/0x90 [<ffffffff8124f3b3>] cgroup_mount+0x673/0x6e0 [<ffffffff81257169>] cpuset_mount+0xd9/0x110 [<ffffffff813d2580>] mount_fs+0xb0/0x2d0 [<ffffffff81404afd>] vfs_kern_mount+0xbd/0x180 [<ffffffff814070b5>] do_new_mount+0x145/0x2c0 [<ffffffff814085d6>] do_mount+0x356/0x3c0 [<ffffffff8140873d>] SyS_mount+0xfd/0x140 [<ffffffff854eb600>] tracesys+0xdd/0xe2 We still want rebind_subsystems() to take added/removed masks, so let's fix it by marking whether a cgroupfs_root has finished binding or not. Also, document what's going on around ->subsys_mask initialization so that similar mistakes aren't repeated. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Sasha Levin <sasha.levin@oracle.com> Acked-by: Li Zefan <lizefan@huawei.com>	2013-06-26 10:39:46 -07:00
Tejun Heo	fc76df7061	cgroup: reserve ID 0 for dummy_root and 1 for unified hierarchy Before `1a57423166` ("cgroup: make hierarchy_id use cyclic idr"), hierarchy IDs were allocated from 0. As the dummy hierarchy was always the one first initialized, it got assigned 0 and all other hierarchies from 1. The patch accidentally changed the minimum useable ID to 2. Let's restore ID 0 for dummy_root and while at it reserve 1 for unified hierarchy. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Cc: stable@vger.kernel.org	2013-06-25 11:53:37 -07:00
Tejun Heo	30159ec7a9	cgroup: implement for_each_[builtin_]subsys() There are quite a few places where all loaded [builtin] subsys are iterated. Implement for_each_[builtin_]subsys() and replace manual iterations with those to simplify those places a bit. The new iterators automatically skip NULL subsystems. This shouldn't cause any functional difference. Iteration loops which scan all subsystems and then skipping modular ones explicitly are converted to use for_each_builtin_subsys(). While at it, reorder variable declarations and adjust whitespaces a bit in the affected functions. v2: Add lockdep_assert_held() in for_each_subsys() and add comments about synchronization as suggested by Li. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>	2013-06-25 11:53:37 -07:00
Tejun Heo	82fe9b0da0	cgroup: move init_css_set initialization inside cgroup_mutex cgroup_init() was doing init_css_set initialization outside cgroup_mutex, which is fine but we want to add lockdep annotation on subsystem iterations and cgroup_init() will trigger it spuriously. Move init_css_set initialization inside cgroup_mutex. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>	2013-06-25 11:53:37 -07:00
Tejun Heo	5549c49791	cgroup: s/for_each_subsys()/for_each_root_subsys()/ for_each_subsys() walks over subsystems attached to a hierarchy and we're gonna add iterators which walk over all available subsystems. Rename for_each_subsys() to for_each_root_subsys() so that it's more appropriately named and for_each_subsys() can be used to iterate all subsystems. While at it, remove unnecessary underbar prefix from macro arguments, put them inside parentheses, and adjust indentation for the two for_each_*() macros. This patch is purely cosmetic. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>	2013-06-24 15:21:48 -07:00
Tejun Heo	b326f9d0db	cgroup: clean up find_css_set() and friends find_css_set() passes uninitialized on-stack template[] array to find_existing_css_set() which sets the entries for all subsystems. Passing around an uninitialized array is a bit icky and we want to introduce an iterator which only iterates loaded subsystems. Let's initialize it on definition. While at it, also make the following cosmetic cleanups. * Convert to proper /** comments. * Reorder variable declarations. * Replace comment on synchronization with lockdep_assert_held(). This patch doesn't make any functional differences. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>	2013-06-24 15:21:48 -07:00
Tejun Heo	a8a648c4ac	cgroup: remove cgroup->actual_subsys_mask cgroup curiously has two subsystem masks, ->subsys_mask and ->actual_subsys_mask. The latter only exists because the new target subsys_mask is passed into rebind_subsystems() via @root>subsys_mask. rebind_subsystems() needs to know what the current mask is to decide how to reach the target mask so ->actual_subsys_mask is used as the temp location to remember the current state. Adding a temporary field to a permanent data structure is rather silly and can be misleading. Update rebind_subsystems() to take @added_mask and @removed_mask instead and remove @root->actual_subsys_mask. This patch shouldn't introduce any behavior changes. v2: Comment and description updated as suggested by Li. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>	2013-06-24 15:21:47 -07:00
Tejun Heo	9871bf9550	cgroup: prefix global variables with "cgroup_" Global variable names in kernel/cgroup.c are asking for trouble - subsys, roots, rootnode and so on. Rename them to have "cgroup_" prefix. * s/subsys/cgroup_subsys/ * s/rootnode/cgroup_dummy_root/ * s/dummytop/cgroup_cummy_top/ * s/roots/cgroup_roots/ * s/root_count/cgroup_root_count/ This patch is purely cosmetic. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>	2013-06-24 15:21:47 -07:00
Tejun Heo	02c402d985	cgroup: convert CFTYPE_* flags to enums Purely cosmetic. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>	2013-06-24 15:21:47 -07:00
Li Zefan	03c78cbebb	cgroup: rename cont to cgrp Cont is short for container. control group was named process container at first, but then people found container already has a meaning in linux kernel. Clean up the leftover variable name @cont. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-06-19 01:22:50 -07:00
Tejun Heo	00356bd5f0	cgroup: clean up cgroup_serial_nr_cursor cgroup_serial_nr_cursor was created atomic64_t because I thought it was never gonna used for anything other than assigning unique numbers to cgroups and didn't want to worry about synchronization; however, now we're using it as an event-stamp to distinguish cgroups created before and after certain point which assumes that it's protected by cgroup_mutex. Let's make it clear by making it a u64. Also, rename it to cgroup_serial_nr_next and make it point to the next nr to allocate so that where it's pointing to is clear and more conventional. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Li Zefan <lizefan@huawei.com>	2013-06-18 11:17:32 -07:00
Li Zefan	e8c82d20a9	cgroup: convert cgroup_cft_commit() to use cgroup_for_each_descendant_pre() We used root->allcg_list to iterate cgroup hierarchy because at that time cgroup_for_each_descendant_pre() hasn't been invented. tj: In cgroup_cfts_commit(), s/@serial_nr/@update_upto/, move the assignment right above releasing cgroup_mutex and explain what's going on there. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-06-18 11:14:22 -07:00
Li Zefan	794611a1df	cgroup: make serial_nr_cursor available throughout cgroup.c The next patch will use it to determine if a cgroup is newly created while we're iterating the cgroup hierarchy. tj: Rephrased the comment on top of cgroup_serial_nr_cursor. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-06-18 11:14:21 -07:00
Li Zefan	f57947d277	cgroup: fix memory leak in cgroup_rm_cftypes() The memory allocated in cgroup_add_cftypes() should be freed. The effect of this bug is we leak a bit memory everytime we unload cfq-iosched module if blkio cgroup is enabled. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-06-18 09:04:30 -07:00
Li Zefan	1c8158eeae	cgroup: fix umount vs cgroup_event_remove() race commit `5db9a4d99b` Author: Tejun Heo <tj@kernel.org> Date: Sat Jul 7 16:08:18 2012 -0700 cgroup: fix cgroup hierarchy umount race This commit fixed a race caused by the dput() in css_dput_fn(), but the dput() in cgroup_event_remove() can also lead to the same BUG(). Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org	2013-06-18 09:04:30 -07:00
Li Zefan	084457f284	cgroup: fix umount vs cgroup_cfts_commit() race cgroup_cfts_commit() uses dget() to keep cgroup alive after cgroup_mutex is dropped, but dget() won't prevent cgroupfs from being umounted. When the race happens, vfs will see some dentries with non-zero refcnt while umount is in process. Keep running this: mount -t cgroup -o blkio xxx /cgroup umount /cgroup And this: modprobe cfq-iosched rmmod cfs-iosched After a while, the BUG() in shrink_dcache_for_umount_subtree() may be triggered: BUG: Dentry xxx{i=0,n=blkio.yyy} still in use (1) [umount of cgroup cgroup] Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org	2013-06-18 09:04:30 -07:00
Tejun Heo	6db8e85c5c	cgroup: disallow rename(2) if sane_behavior cgroup's rename(2) isn't a proper migration implementation - it can't move the cgroup to a different parent in the hierarchy. All it can do is swapping the name string for that cgroup. This isn't useful and can mislead users to think that cgroup supports proper cgroup-level migration. Disallow rename(2) if sane_behavior. v2: Fail with -EPERM instead of -EINVAL so that it matches the vfs return value when ->rename is not implemented as suggested by Li. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>	2013-06-18 08:14:23 -07:00
Tejun Heo	f63674fd0d	cgroup: update sane_behavior documentation `f12dc02014` ("cgroup: mark "tasks" cgroup file as insane") and `cc5943a781` ("cgroup: mark "notify_on_release" and "release_agent" cgroup files insane") forgot to update the changed behavior documentation in cgroup.h. Update it. Signed-off-by: Tejun Heo <tj@kernel.org>	2013-06-13 20:02:26 -07:00
Tejun Heo	d3daf28da1	cgroup: use percpu refcnt for cgroup_subsys_states A css (cgroup_subsys_state) is how each cgroup is represented to a controller. As such, it can be used in hot paths across the various subsystems different controllers are associated with. One of the common operations is reference counting, which up until now has been implemented using a global atomic counter and can have significant adverse impact on scalability. For example, css refcnt can be gotten and put multiple times by blkcg for each IO request. For highops configurations which try to do as much per-cpu as possible, the global frequent refcnting can be very expensive. In general, given the various and hugely diverse paths css's end up being used from, we need to make it cheap and highly scalable. In its usage, css refcnting isn't very different from module refcnting. This patch converts css refcnting to use the recently added percpu_ref. css_get/tryget/put() directly maps to the matching percpu_ref operations and the deactivation logic is no longer necessary as percpu_ref already has refcnt killing. The only complication is that as the refcnt is per-cpu, percpu_ref_kill() in itself doesn't ensure that further tryget operations will fail, which we need to guarantee before invoking ->css_offline()'s. This is resolved collecting kill confirmation using percpu_ref_kill_and_confirm() and initiating the offline phase of destruction after all css refcnt's are confirmed to be seen as killed on all CPUs. The previous patches already splitted destruction into two phases, so percpu_ref_kill_and_confirm() can be hooked up easily. This patch removes css_refcnt() which is used for rcu dereference sanity check in css_id(). While we can add a percpu refcnt API to ask the same question, css_id() itself is scheduled to be removed fairly soon, so let's not bother with it. Just drop the sanity check and use rcu_dereference_raw() instead. v2: - init_cgroup_css() was calling percpu_ref_init() without checking the return value. This causes two problems - the obvious lack of error handling and percpu_ref_init() being called from cgroup_init_subsys() before the allocators are up, which triggers warnings but doesn't cause actual problems as the refcnt isn't used for roots anyway. Fix both by moving percpu_ref_init() to cgroup_create(). - The base references were put too early by percpu_ref_kill_and_confirm() and cgroup_offline_fn() put the refs one extra time. This wasn't noticeable because css's go through another RCU grace period before being freed. Update cgroup_destroy_locked() to grab an extra reference before killing the refcnts. This problem was noticed by Kent. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Kent Overstreet <koverstreet@google.com> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: "Alasdair G. Kergon" <agk@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Glauber Costa <glommer@gmail.com>	2013-06-13 19:43:12 -07:00
Tejun Heo	2b0e53a7c8	Merge branch 'for-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu into for-3.11 This is to receive percpu_refcount which will replace atomic_t reference count in cgroup_subsys_state. Signed-off-by: Tejun Heo <tj@kernel.org>	2013-06-13 19:42:22 -07:00
Tejun Heo	ea15f8ccdb	cgroup: split cgroup destruction into two steps Split cgroup_destroy_locked() into two steps and put the latter half into cgroup_offline_fn() which is executed from a work item. The latter half is responsible for offlining the css's, removing the cgroup from internal lists, and propagating release notification to the parent. The separation is to allow using percpu refcnt for css. Note that this allows for other cgroup operations to happen between the first and second halves of destruction, including creating a new cgroup with the same name. As the target cgroup is marked DEAD in the first half and cgroup internals don't care about the names of cgroups, this should be fine. A comment explaining this will be added by the next patch which implements the actual percpu refcnting. As RCU freeing is guaranteed to happen after the second step of destruction, we can use the same work item for both. This patch renames cgroup->free_work to ->destroy_work and uses it for both purposes. INIT_WORK() is now performed right before queueing the work item. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>	2013-06-13 19:27:42 -07:00
Tejun Heo	455050d23e	cgroup: reorder the operations in cgroup_destroy_locked() This patch reorders the operations in cgroup_destroy_locked() such that the userland visible parts happen before css offlining and removal from the ->sibling list. This will be used to make css use percpu refcnt. While at it, split out CGRP_DEAD related comment from the refcnt deactivation one and correct / clarify how different guarantees are met. While this patch changes the specific order of operations, it shouldn't cause any noticeable behavior difference. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>	2013-06-13 19:27:41 -07:00
Tejun Heo	dbece3a0f1	percpu-refcount: implement percpu_tryget() along with percpu_ref_kill_and_confirm() Implement percpu_tryget() which stops giving out references once the percpu_ref is visible as killed. Because the refcnt is per-cpu, different CPUs will start to see a refcnt as killed at different points in time and tryget() may continue to succeed on subset of cpus for a while after percpu_ref_kill() returns. For use cases where it's necessary to know when all CPUs start to see the refcnt as dead, percpu_ref_kill_and_confirm() is added. The new function takes an extra argument @confirm_kill which is invoked when the refcnt is guaranteed to be viewed as killed on all CPUs. While this isn't the prettiest interface, it doesn't force synchronous wait and is much safer than requiring the caller to do its own call_rcu(). v2: Patch description rephrased to emphasize that tryget() may continue to succeed on some CPUs after kill() returns as suggested by Kent. v3: Function comment in percpu_ref_kill_and_confirm() updated warning people to not depend on the implied RCU grace period from the confirm callback as it's an implementation detail. Signed-off-by: Tejun Heo <tj@kernel.org> Slightly-Grumpily-Acked-by: Kent Overstreet <koverstreet@google.com>	2013-06-13 19:23:53 -07:00
Tejun Heo	bc497bd33b	percpu-refcount: implement percpu_ref_cancel_init() Normally, percpu_ref_init() initializes and percpu_ref_kill() initiates destruction which completes asynchronously. The asynchronous destruction can be problematic in init failure path where the caller wants to destroy half-constructed object - distinguishing half-constructed objects from the usual release method can be painful for complex objects. This patch implements percpu_ref_cancel_init() which synchronously destroys the percpu_ref without invoking release. To avoid unintentional misuses, the function requires the ref to have finished percpu_ref_init() but never used and triggers WARN otherwise. v2: Explain the weird name and usage restriction in the function comment. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Kent Overstreet <koverstreet@google.com>	2013-06-13 11:08:27 -07:00
Tejun Heo	acac7883ee	percpu-refcount: add __must_check to percpu_ref_init() and don't use ACCESS_ONCE() in percpu_ref_kill_rcu() Two small changes. * Unlike most init functions, percpu_ref_init() allocates memory and may fail. Let's mark it with __must_check in case the caller forgets. * percpu_ref_kill_rcu() is unnecessarily using ACCESS_ONCE() to dereference @ref->pcpu_count, which can be misleading. The pointer is guaranteed to be valid and visible and can't change underneath the function. Drop ACCESS_ONCE(). Signed-off-by: Tejun Heo <tj@kernel.org>	2013-06-13 11:08:26 -07:00
Tejun Heo	6f3d828f0f	cgroup: remove cgroup->count and use cgroup->count tracks the number of css_sets associated with the cgroup and used only to verify that no css_set is associated when the cgroup is being destroyed. It's superflous as the destruction path can simply check whether cgroup->cset_links is empty instead. Drop cgroup->count and check ->cset_links directly from cgroup_destroy_locked(). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>	2013-06-13 10:55:18 -07:00
Tejun Heo	ddd69148bd	cgroup: drop unnecessary RCU dancing from __put_css_set() __put_css_set() does RCU read access on @cgrp across dropping @cgrp->count so that it can continue accessing @cgrp even if the count reached zero and destruction of the cgroup commenced. Given that both sides - __css_put() and cgroup_destroy_locked() - are cold paths, this is unnecessary. Just making cgroup_destroy_locked() grab css_set_lock while checking @cgrp->count is enough. Remove the RCU read locking from __put_css_set() and make cgroup_destroy_locked() read-lock css_set_lock when checking @cgrp->count. This will also allow removing @cgrp->count. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>	2013-06-13 10:55:18 -07:00
Tejun Heo	54766d4a1d	cgroup: rename CGRP_REMOVED to CGRP_DEAD We will add another flag indicating that the cgroup is in the process of being killed. REMOVING / REMOVED is more difficult to distinguish and cgroup_is_removing()/cgroup_is_removed() are a bit awkward. Also, later percpu_ref usage will involve "kill"ing the refcnt. s/CGRP_REMOVED/CGRP_DEAD/ s/cgroup_is_removed()/cgroup_is_dead() This patch is purely cosmetic. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>	2013-06-13 10:55:18 -07:00
Tejun Heo	5de0107e63	cgroup: clean up css_[try]get() and css_put() * __css_get() isn't used by anyone. Fold it into css_get(). * Add proper function comments to all css reference functions. This patch is purely cosmetic. v2: Typo fix as per Li. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>	2013-06-13 10:55:17 -07:00
Tejun Heo	f4f4be2bd2	cgroup: use kzalloc() instead of kmalloc() There's no point in using kmalloc() instead of the clearing variant for trivial stuff. We can live dangerously elsewhere. Use kzalloc() instead and drop 0 inits. While at it, do trivial code reorganization in cgroup_file_open(). This patch doesn't introduce any functional changes. v2: I was caught in the very distant past where list_del() didn't poison and the initial version converted list_del()s to list_del_init()s too. Li and Kent took me out of the stasis chamber. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Kent Overstreet <koverstreet@google.com> Acked-by: Li Zefan <lizefan@huawei.com>	2013-06-13 10:55:17 -07:00
Tejun Heo	69d0206c79	cgroup: bring some sanity to naming around cg_cgroup_link cgroups and css_sets are mapped M:N and this M:N mapping is represented by struct cg_cgroup_link which forms linked lists on both sides. The naming around this mapping is already confusing and struct cg_cgroup_link exacerbates the situation quite a bit. >From cgroup side, it starts off ->css_sets and runs through ->cgrp_link_list. From css_set side, it starts off ->cg_links and runs through ->cg_link_list. This is rather reversed as cgrp_link_list is used to iterate css_sets and cg_link_list cgroups. Also, this is the only place which is still using the confusing "cg" for css_sets. This patch cleans it up a bit. * s/cgroup->css_sets/cgroup->cset_links/ s/css_set->cg_links/css_set->cgrp_links/ s/cgroup_iter->cg_link/cgroup_iter->cset_link/ * s/cg_cgroup_link/cgrp_cset_link/ * s/cgrp_cset_link->cg/cgrp_cset_link->cset/ s/cgrp_cset_link->cgrp_link_list/cgrp_cset_link->cset_link/ s/cgrp_cset_link->cg_link_list/cgrp_cset_link->cgrp_link/ * s/init_css_set_link/init_cgrp_cset_link/ s/free_cg_links/free_cgrp_cset_links/ s/allocate_cg_links/allocate_cgrp_cset_links/ * s/cgl[12]/link[12]/ in compare_css_sets() * s/saved_link/tmp_link/ s/tmp/tmp_links/ and a couple similar adustments. * Comment and whiteline adjustments. After the changes, we have list_for_each_entry(link, &cont->cset_links, cset_link) { struct css_set cset = link->cset; instead of list_for_each_entry(link, &cont->css_sets, cgrp_link_list) { struct css_set cset = link->cg; This patch is purely cosmetic. v2: Fix broken sentences in the patch description. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>	2013-06-13 10:55:17 -07:00
Tejun Heo	5abb885573	cgroup: consistently use @cset for struct css_set variables cgroup.c uses @cg for most struct css_set variables, which in itself could be a bit confusing, but made much worse by the fact that there are places which use @cg for struct cgroup variables. compare_css_sets() epitomizes this confusion - @[old_]cg are struct css_set while @cg[12] are struct cgroup. It's not like the whole deal with cgroup, css_set and cg_cgroup_link isn't already confusing enough. Let's give it some sanity by uniformly using @cset for all struct css_set variables. * s/cg/cset/ for all css_set variables. * s/oldcg/old_cset/ s/oldcgrp/old_cgrp/. The same for the ones prefixed with "new". * s/cg/cgrp/ for cgroup variables in compare_css_sets(). * s/css/cset/ for the cgroup variable in task_cgroup_from_root(). * Whiteline adjustments. This patch is purely cosmetic. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>	2013-06-13 10:55:17 -07:00
Tejun Heo	3fc3db9a3a	cgroup: remove now unused css_depth() Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>	2013-06-13 10:55:17 -07:00
Tejun Heo	ac899061a9	percpu-refcount: cosmetic updates * s/percpu_ref_release/percpu_ref_func_t/ as it's customary to have _t postfix for types and the type is gonna be used for a different type of callback too. * Add @ARG to function comments. * Drop unnecessary and unaligned indentation from percpu_ref_init() function comment. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Kent Overstreet <koverstreet@google.com>	2013-06-12 20:43:06 -07:00
Tejun Heo	6a24474da8	percpu-refcount: consistently use plain (non-sched) RCU percpu_ref_get/put() are using preempt_disable/enable() while percpu_ref_kill() is using plain call_rcu() instead of call_rcu_sched(). This is buggy as grace periods of the two may not match. Fix it by using plain RCU in percpu_ref_get/put(). (I suggested using sched RCU in the first place but there's no actual benefit in doing so unless we're gonna introduce different variants of get/put to be called while preemption is alredy disabled, which we definitely shouldn't.) Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Kent Overstreet <koverstreet@google.com>	2013-06-12 20:43:06 -07:00
Tejun Heo	d5c56ced77	cgroup: clean up the cftype array for the base cgroup files * Rename it from files[] (really?) to cgroup_base_files[]. * Drop CGROUP_FILE_GENERIC_PREFIX which was defined as "cgroup." and used inconsistently. Just use "cgroup." directly. * Collect insane files at the end. Note that only the insane ones are missing "cgroup." prefix. This patch doesn't introduce any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>	2013-06-05 12:00:33 -07:00
Tejun Heo	cc5943a781	cgroup: mark "notify_on_release" and "release_agent" cgroup files insane The empty cgroup notification mechanism currently implemented in cgroup is tragically outdated. Forking and execing userland process stopped being a viable notification mechanism more than a decade ago. We're gonna have a saner mechanism. Let's make it clear that this abomination is going away. Mark "notify_on_release" and "release_agent" with CFTYPE_INSANE. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>	2013-06-05 12:00:33 -07:00
Tejun Heo	f12dc02014	cgroup: mark "tasks" cgroup file as insane Some resources controlled by cgroup aren't per-task and cgroup core allowing threads of a single thread_group to be in different cgroups forced memcg do explicitly find the group leader and use it. This is gonna be nasty when transitioning to unified hierarchy and in general we don't want and won't support granularity finer than processes. Mark "tasks" with CFTYPE_INSANE. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Balbir Singh <bsingharora@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: cgroups@vger.kernel.org Cc: Vivek Goyal <vgoyal@redhat.com>	2013-06-05 12:00:33 -07:00
Kent Overstreet	c1ae6e9b4d	percpu-refcount: Don't use silly cmpxchg() The cmpxchg() was just to ensure the debug check didn't race, which was a bit excessive. The caller is supposed to do the appropriate synchronization, which means percpu_ref_kill() can just do a simple store. Signed-off-by: Kent Overstreet <koverstreet@google.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-06-03 16:04:04 -07:00
Kent Overstreet	215e262f2a	percpu: implement generic percpu refcounting This implements a refcount with similar semantics to atomic_get()/atomic_dec_and_test() - but percpu. It also implements two stage shutdown, as we need it to tear down the percpu counts. Before dropping the initial refcount, you must call percpu_ref_kill(); this puts the refcount in "shutting down mode" and switches back to a single atomic refcount with the appropriate barriers (synchronize_rcu()). It's also legal to call percpu_ref_kill() multiple times - it only returns true once, so callers don't have to reimplement shutdown synchronization. [akpm@linux-foundation.org: fix build] [akpm@linux-foundation.org: coding-style tweak] Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Reviewed-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-06-03 15:36:41 -07:00
Linus Torvalds	042dd60ca6	regulator: Fixes for v3.10 A few small fixes for v3.10, documentation things in the core and a few driver bugs. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJRrLnIAAoJELSic+t+oim99hQP/j3SRcv5hgUorXjK3X06Lw3o QhQh9uH1YvEASJcqt+eXCskHV6LE+lutb+glyOu/nS0BaNRs5KJjKMMLMyKLYOlp P3Y3jgn3K2Joq+o6b8u7cSrTV1ps2spKvHxoEeam5JTJF8QslNW1FUJF9C2qdDYx NHmKCNhz56B3yhf2N3nZuM9P85xPUH8pj48a1+fQiC0EbYyFX0LP3A20D9uH6PAl WGqN3BXctqtelBMkMfCRHRF3PqqBMSpeZ9gUAG+ckg7rn2ctToeB6rCh7/nGJfs1 Jc0yzN3UtPfBZoALWqtXP/tRXnrtrK4qFAUTmNLt/TAYk3FiPzn5YC8mOCWduA+K awqRtZVOhpQ3kigVmHgKyvl7b/rZuTDaFAMH9PtGvltRoKo9eRIIqbZTfgAvgnOh S8FL7DdikMQxjz9E+eF7Rzngo79qMswq+RWA7ACemtxJD5QR9etITQZdAyuPopwL jnDE4ICjqnNywWibo330nmrWWJoOFmRpKCoaKvwFPflUmjnQ3TrvdhIgybDytQAh LnNUK4vi9tjKmaIl1ZL/LUF0J7rDwCfgQWzrRpRv4yfXDfl1UD57dwqRwJwvpYfx 1UGxDHztLw2/Q53NEHVQxRIqIo/hrlg22JGkBIvGqBxPZKorKmIaOjhtcBqh6mfE eTluTNnxmWLhsk3fCMRW =yE/I -----END PGP SIGNATURE----- Merge tag 'regulator-v3.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fixes from Mark Brown: "A few small fixes for v3.10, documentation things in the core and a few driver bugs." * tag 'regulator-v3.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: palmas: Fix "enable_reg" to point to the correct reg for SMPS10 regulator: palmas: Fix incorrect condition regulator: core: Correct spelling mistake in comment regulator: dbx500: Make local symbol static regulator: Fix kernel-doc generation warnings.	2013-06-04 06:34:51 +09:00
Linus Torvalds	0ab60871b4	A couple jfs bug fixes for 3.10-rc5 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.20 (GNU/Linux) iQIcBAABAgAGBQJRrLmSAAoJEDaohF61QIxkakoP/jo8fo2KaixVHRag1h7pBwVG 7UJrgYmojoFUsmem+k34rXOzwc8eqv5BzTwGJe81ktatUjPkEQELdwryB4Rw6YYI F8/Mrc2AjHEavuTnojmENP22+YkPOW90r+k00dVk+cY9FQsc1/AhNYtzpVg9hSRX RBhGcgfnQQCAlIf09vr7R19oBLiHfrFNnUYJeUg/d0bxW8uiTimX7ENDTOtn61WF hLEABnL2tC0UtK3z/B0qNIpDE5DHHaLo8WcGjlXE3Pte4JA2rPDJwaHaUgvCEo6R iYCw5Dq6bL7Ce7wMP1GzY+LHgaeUg5Nknnkl5vEKZSZuy7dfC2eBlYbziqyJm73A 0e7UgypkaNGa4p4MzM0J5OW+4bRRPzNir99BnpU8Bm6dJTY8ZeIMTTQFX2Fla8+G SCo6dMjEUfF+yIykDwmeZYjHl5XeKM6B3imoJ8+yD9vZLzg6MY4SUmgPEKL7BFoi U4aiID6N67LYAMb5vapzpcl+uHZbrw7CBnI2FanqdsAoPjath1CgrEvseBAUowpy EUeCLaQ05xR8ynm44tQCPQFgqBpRxAiVtpy9nPulA3cQCbwelEuLKrZB17DBfK9i w7f5e8nNG8gXfRuL5idGzS7g6Hp90JUhBWKh0dJC820OM7idWnXGCLbYTm7t6pvo m931/xUofSsXCLNwEo88 =JZIK -----END PGP SIGNATURE----- Merge tag 'jfs-3.10-rc5' of git://github.com/kleikamp/linux-shaggy Pull jfs bugfixes from David Kleikamp: "A couple jfs bug fixes for 3.10-rc5" * tag 'jfs-3.10-rc5' of git://github.com/kleikamp/linux-shaggy: fs/jfs: Add check if journaling to disk has been disabled in lbmRead() jfs: Several bugs in jfs_freeze() and jfs_unfreeze()	2013-06-04 06:33:44 +09:00
Linus Torvalds	aa4f608478	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k Pull m68k fix from Geert Uytterhoeven: "A boot lock-up on Mac, also destined for stable" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k: m68k/mac: Fix unexpected interrupt with CONFIG_EARLY_PRINTK	2013-06-03 18:09:42 +09:00
Linus Torvalds	286e050bc0	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Martin Schwidefsky: "Recent bug fixes, one of them touches a common code file. It adds two #ifndef/#endif pairs to asm-generic/io.h to be able to override xlate_dev_kmem_ptr and xlate_dev_mem_ptr." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/pgtable: Fix gmap notifier address s390/dasd: fix handling of gone paths s390/pgtable: Fix check for pgste/storage key handling arch: s390: appldata: using strncpy() and strnlen() instead of sprintf() s390/smp: lost IPIs on cpu hotplug kernel: Fix s390 absolute memory access for /dev/mem s390/dma: do not call debug_dma after free	2013-06-03 18:04:07 +09:00
Linus Torvalds	7d80fea426	Merge branch 'for-3.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: - Fix for yet another xattr bug which may lead to NULL deref. - A subtle bug in for_each_descendant_pre(). This bug requires quite specific conditions to trigger and isn't too likely to actually happen in the wild, but maybe that just makes it that much more nastier. - A warning message added for silly cgroup re-mount (not -o remount, but unmount followed by mount) behavior. * 'for-3.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: warn about mismatching options of a new mount of an existing hierarchy cgroup: fix a subtle bug in descendant pre-order walk cgroup: initialize xattr before calling d_instantiate()	2013-06-03 17:57:16 +09:00

1 2 3 4 5 ...

376638 commits