Commit graph

25 commits

Author SHA1 Message Date
Tejun Heo
9e804d1f58 percpu_ref: rename things to prepare for decoupling percpu/atomic mode switch
percpu_ref will be restructured so that percpu/atomic mode switching
and reference killing are dedoupled.  In preparation, do the following
renames.

* percpu_ref->confirm_kill	-> percpu_ref->confirm_switch
* __PERCPU_REF_DEAD		-> __PERCPU_REF_ATOMIC
* __percpu_ref_alive()		-> __ref_is_percpu()

This patch is pure rename and doesn't introduce any functional
changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Kent Overstreet <kmo@daterainc.com>
2014-09-24 13:31:48 -04:00
Tejun Heo
eecc16ba9a percpu_ref: replace pcpu_ prefix with percpu_
percpu_ref uses pcpu_ prefix for internal stuff and percpu_ for
externally visible ones.  This is the same convention used in the
percpu allocator implementation.  It works fine there but percpu_ref
doesn't have too much internal-only stuff and scattered usages of
pcpu_ prefix are confusing than helpful.

This patch replaces all pcpu_ prefixes with percpu_.  This is pure
rename and there's no functional change.  Note that PCPU_REF_DEAD is
renamed to __PERCPU_REF_DEAD to signify that the flag is internal.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Kent Overstreet <kmo@daterainc.com>
2014-09-24 13:31:48 -04:00
Tejun Heo
6251f9976a percpu_ref: minor code and comment updates
* Some comments became stale.  Updated.
* percpu_ref_tryget() unnecessarily initializes @ret.  Removed.
* A blank line removed from percpu_ref_kill_rcu().
* Explicit function name in a WARN format string replaced with __func__.
* WARN_ON() in percpu_ref_reinit() converted to WARN_ON_ONCE().

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Kent Overstreet <kmo@daterainc.com>
2014-09-24 13:31:48 -04:00
Tejun Heo
a223737019 percpu_ref: relocate percpu_ref_reinit()
percpu_ref is gonna go through restructuring.  Move
percpu_ref_reinit() after percpu_ref_kill_and_confirm().  This will
make later changes easier to follow and result in cleaner
organization.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Kent Overstreet <kmo@daterainc.com>
2014-09-24 13:31:48 -04:00
Tejun Heo
9eca80461a Revert "blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe"
This reverts commit 0a30288da1, which
was a temporary fix for SCSI blk-mq stall issue.  The following
patches will fix the issue properly by introducing atomic mode to
percpu_ref.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
2014-09-24 13:07:33 -04:00
Tejun Heo
d06efebf0c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block into for-3.18
This is to receive 0a30288da1 ("blk-mq, percpu_ref: implement a
kludge for SCSI blk-mq stall during probe") which implements
__percpu_ref_kill_expedited() to work around SCSI blk-mq stall.  The
commit reverted and patches to implement proper fix will be added.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
2014-09-24 13:00:21 -04:00
Tejun Heo
0a30288da1 blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe
blk-mq uses percpu_ref for its usage counter which tracks the number
of in-flight commands and used to synchronously drain the queue on
freeze.  percpu_ref shutdown takes measureable wallclock time as it
involves a sched RCU grace period.  This means that draining a blk-mq
takes measureable wallclock time.  One would think that this shouldn't
matter as queue shutdown should be a rare event which takes place
asynchronously w.r.t. userland.

Unfortunately, SCSI probing involves synchronously setting up and then
tearing down a lot of request_queues back-to-back for non-existent
LUNs.  This means that SCSI probing may take more than ten seconds
when scsi-mq is used.

This will be properly fixed by implementing a mechanism to keep
q->mq_usage_counter in atomic mode till genhd registration; however,
that involves rather big updates to percpu_ref which is difficult to
apply late in the devel cycle (v3.17-rc6 at the moment).  As a
stop-gap measure till the proper fix can be implemented in the next
cycle, this patch introduces __percpu_ref_kill_expedited() and makes
blk_mq_freeze_queue() use it.  This is heavy-handed but should work
for testing the experimental SCSI blk-mq implementation.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Christoph Hellwig <hch@infradead.org>
Link: http://lkml.kernel.org/g/20140919113815.GA10791@lst.de
Fixes: add703fda9 ("blk-mq: use percpu_ref for mq usage count")
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Tested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-24 08:29:36 -06:00
Tejun Heo
e625305b39 percpu-refcount: make percpu_ref based on longs instead of ints
percpu_ref is currently based on ints and the number of refs it can
cover is (1 << 31).  This makes it impossible to use a percpu_ref to
count memory objects or pages on 64bit machines as it may overflow.
This forces those users to somehow aggregate the references before
contributing to the percpu_ref which is often cumbersome and sometimes
challenging to get the same level of performance as using the
percpu_ref directly.

While using ints for the percpu counters makes them pack tighter on
64bit machines, the possible gain from using ints instead of longs is
extremely small compared to the overall gain from per-cpu operation.
This patch makes percpu_ref based on longs so that it can be used to
directly count memory objects or pages.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
2014-09-20 01:27:25 -04:00
Tejun Heo
a34375ef9e percpu-refcount: add @gfp to percpu_ref_init()
Percpu allocator now supports allocation mask.  Add @gfp to
percpu_ref_init() so that !GFP_KERNEL allocation masks can be used
with percpu_refs too.

This patch doesn't make any functional difference.

v2: blk-mq conversion was missing.  Updated.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
Cc: Jens Axboe <axboe@kernel.dk>
2014-09-08 09:51:30 +09:00
Tejun Heo
2d7227828e percpu-refcount: implement percpu_ref_reinit() and percpu_ref_is_zero()
Now that explicit invocation of percpu_ref_exit() is necessary to free
the percpu counter, we can implement percpu_ref_reinit() which
reinitializes a released percpu_ref.  This can be used implement
scalable gating switch which can be drained and then re-opened without
worrying about memory allocation failures.

percpu_ref_is_zero() is added to be used in a sanity check in
percpu_ref_exit().  As this function will be useful for other purposes
too, make it a public interface.

v2: Use smp_read_barrier_depends() instead of smp_load_acquire().  We
    only need data dep barrier and smp_load_acquire() is stronger and
    heavier on some archs.  Spotted by Lai Jiangshan.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
2014-06-28 08:10:14 -04:00
Tejun Heo
9a1049da9b percpu-refcount: require percpu_ref to be exited explicitly
Currently, a percpu_ref undoes percpu_ref_init() automatically by
freeing the allocated percpu area when the percpu_ref is killed.
While seemingly convenient, this has the following niggles.

* It's impossible to re-init a released reference counter without
  going through re-allocation.

* In the similar vein, it's impossible to initialize a percpu_ref
  count with static percpu variables.

* We need and have an explicit destructor anyway for failure paths -
  percpu_ref_cancel_init().

This patch removes the automatic percpu counter freeing in
percpu_ref_kill_rcu() and repurposes percpu_ref_cancel_init() into a
generic destructor now named percpu_ref_exit().  percpu_ref_destroy()
is considered but it gets confusing with percpu_ref_kill() while
"exit" clearly indicates that it's the counterpart of
percpu_ref_init().

All percpu_ref_cancel_init() users are updated to invoke
percpu_ref_exit() instead and explicit percpu_ref_exit() calls are
added to the destruction path of all percpu_ref users.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Benjamin LaHaise <bcrl@kvack.org>
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
Cc: Li Zefan <lizefan@huawei.com>
2014-06-28 08:10:14 -04:00
Tejun Heo
7d74207512 percpu-refcount: use unsigned long for pcpu_count pointer
percpu_ref->pcpu_count is a percpu pointer with a status flag in its
lowest bit.  As such, it always goes through arithmetic operations
which is very cumbersome to do on a pointer.  It has to be first
casted to unsigned long and then back.

Let's just make the field unsigned long so that we can skip the first
casts.  While at it, rename it to pcpu_counter_ptr to clarify that
it's a pointer value.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
2014-06-28 08:10:13 -04:00
Tejun Heo
eae7975ddf percpu-refcount: add helpers for ->percpu_count accesses
* All four percpu_ref_*() operations implemented in the header file
  perform the same operation to determine whether the percpu_ref is
  alive and extract the percpu pointer.  Factor out the common logic
  into __pcpu_ref_alive().  This doesn't change the generated code.

* There are a couple places in percpu-refcount.c which masks out
  PCPU_REF_DEAD to obtain the percpu pointer.  Factor it out into
  pcpu_count_ptr().

* The above changes make the WARN_ON_ONCE() conditional at the top of
  percpu_ref_kill_and_confirm() the only user of REF_STATUS().  Test
  PCPU_REF_DEAD directly and remove REF_STATUS().

This patch doesn't introduce any functional change.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
2014-06-28 08:10:13 -04:00
Tejun Heo
d630dc4c9a percpu-refcount: one bit is enough for REF_STATUS
percpu-refcount currently reserves two lowest bits of its percpu
pointer to indicate its state; however, only one bit is used for
PCPU_REF_DEAD.

Simplify it by removing PCPU_STATUS_BITS/MASK and testing
PCPU_REF_DEAD directly.  This also allows the compiler to choose a
more efficient instruction depending on the architecture.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
2014-06-28 08:10:12 -04:00
Tejun Heo
315c5554c4 Merge branch 'for-3.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu.git into for-3.16
Pull percpu/for-3.15-fixes into percpu/for-3.16 to receive
0c36b390a5 ("percpu-refcount: fix usage of this_cpu_ops").

The merge doesn't produce any conflict but the automatic merge is
still incorrect because 4fb6e25049 ("percpu-refcount: implement
percpu_ref_tryget()") added another use of __this_cpu_inc() which
should also be converted to this_cpu_ince().

This commit pulls in percpu/for-3.15-fixes and converts the newly
added __this_cpu_inc() to this_cpu_inc().

Signed-off-by: Tejun Heo <tj@kernel.org>
2014-06-04 12:50:47 -04:00
Sebastian Ott
0c36b390a5 percpu-refcount: fix usage of this_cpu_ops
The percpu-refcount infrastructure uses the underscore variants of
this_cpu_ops in order to modify percpu reference counters.
(e.g. __this_cpu_inc()).

However the underscore variants do not atomically update the percpu
variable, instead they may be implemented using read-modify-write
semantics (more than one instruction).  Therefore it is only safe to
use the underscore variant if the context is always the same (process,
softirq, or hardirq). Otherwise it is possible to lose updates.

This problem is something that Sebastian has seen within the aio
subsystem which uses percpu refcounters both in process and softirq
context leading to reference counts that never dropped to zeroes; even
though the number of "get" and "put" calls matched.

Fix this by using the non-underscore this_cpu_ops variant which
provides correct per cpu atomic semantics and fixes the corrupted
reference counts.

Cc: Kent Overstreet <kmo@daterainc.com>
Cc: <stable@vger.kernel.org> # v3.11+
Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
References: http://lkml.kernel.org/g/alpine.LFD.2.11.1406041540520.21183@denkbrett
2014-06-04 12:12:29 -04:00
Tejun Heo
4fb6e25049 percpu-refcount: implement percpu_ref_tryget()
Implement percpu_ref_tryget() which fails if the refcnt already
reached zero.  Note that this is different from the recently renamed
percpu_ref_tryget_live() which fails if the refcnt has been killed and
is draining the remaining references.  percpu_ref_tryget() succeeds on
a killed refcnt as long as its current refcnt is above zero.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Kent Overstreet <kmo@daterainc.com>
2014-05-09 15:42:35 -04:00
Tejun Heo
2070d50e1c percpu-refcount: rename percpu_ref_tryget() to percpu_ref_tryget_live()
percpu_ref_tryget() is different from the usual tryget semantics in
that it fails if the refcnt is in its dying stage even if the refcnt
hasn't reached zero yet.  We're about to introduce the more
conventional tryget and the current one has only one user.  Let's
rename it to percpu_ref_tryget_live() so that it explicitly signifies
the peculiarities of its semantics.

This is pure rename.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Kent Overstreet <kmo@daterainc.com>
2014-05-09 15:42:15 -04:00
Tejun Heo
a4244454df percpu-refcount: use RCU-sched insted of normal RCU
percpu-refcount was incorrectly using preempt_disable/enable() for RCU
critical sections against call_rcu().  6a24474da8 ("percpu-refcount:
consistently use plain (non-sched) RCU") fixed it by converting the
preepmtion operations with rcu_read_[un]lock() citing that there isn't
any advantage in using sched-RCU over using the usual one; however,
rcu_read_[un]lock() for the preemptible RCU implementation -
CONFIG_TREE_PREEMPT_RCU, chosen when CONFIG_PREEMPT - are slightly
more expensive than preempt_disable/enable().

In a contrived microbench which repeats the followings,

 - percpu_ref_get()
 - copy 32 bytes of data into percpu buffer
 - percpu_put_get()
 - copy 32 bytes of data into percpu buffer

rcu_read_[un]lock() used in percpu_ref_get/put() makes it go slower by
about 15% when compared to using sched-RCU.

As the RCU critical sections are extremely short, using sched-RCU
shouldn't have any latency implications.  Convert to RCU-sched.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Kent Overstreet <koverstreet@google.com>
Acked-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Rusty Russell <rusty@rustcorp.com.au>
2013-06-16 16:12:26 -07:00
Tejun Heo
dbece3a0f1 percpu-refcount: implement percpu_tryget() along with percpu_ref_kill_and_confirm()
Implement percpu_tryget() which stops giving out references once the
percpu_ref is visible as killed.  Because the refcnt is per-cpu,
different CPUs will start to see a refcnt as killed at different
points in time and tryget() may continue to succeed on subset of cpus
for a while after percpu_ref_kill() returns.

For use cases where it's necessary to know when all CPUs start to see
the refcnt as dead, percpu_ref_kill_and_confirm() is added.  The new
function takes an extra argument @confirm_kill which is invoked when
the refcnt is guaranteed to be viewed as killed on all CPUs.

While this isn't the prettiest interface, it doesn't force synchronous
wait and is much safer than requiring the caller to do its own
call_rcu().

v2: Patch description rephrased to emphasize that tryget() may
    continue to succeed on some CPUs after kill() returns as suggested
    by Kent.

v3: Function comment in percpu_ref_kill_and_confirm() updated warning
    people to not depend on the implied RCU grace period from the
    confirm callback as it's an implementation detail.

Signed-off-by: Tejun Heo <tj@kernel.org>
Slightly-Grumpily-Acked-by: Kent Overstreet <koverstreet@google.com>
2013-06-13 19:23:53 -07:00
Tejun Heo
bc497bd33b percpu-refcount: implement percpu_ref_cancel_init()
Normally, percpu_ref_init() initializes and percpu_ref_kill()
initiates destruction which completes asynchronously.  The
asynchronous destruction can be problematic in init failure path where
the caller wants to destroy half-constructed object - distinguishing
half-constructed objects from the usual release method can be painful
for complex objects.

This patch implements percpu_ref_cancel_init() which synchronously
destroys the percpu_ref without invoking release.  To avoid
unintentional misuses, the function requires the ref to have finished
percpu_ref_init() but never used and triggers WARN otherwise.

v2: Explain the weird name and usage restriction in the function
    comment.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Kent Overstreet <koverstreet@google.com>
2013-06-13 11:08:27 -07:00
Tejun Heo
acac7883ee percpu-refcount: add __must_check to percpu_ref_init() and don't use ACCESS_ONCE() in percpu_ref_kill_rcu()
Two small changes.

* Unlike most init functions, percpu_ref_init() allocates memory and
  may fail.  Let's mark it with __must_check in case the caller
  forgets.

* percpu_ref_kill_rcu() is unnecessarily using ACCESS_ONCE() to
  dereference @ref->pcpu_count, which can be misleading.  The pointer
  is guaranteed to be valid and visible and can't change underneath
  the function.  Drop ACCESS_ONCE().

Signed-off-by: Tejun Heo <tj@kernel.org>
2013-06-13 11:08:26 -07:00
Tejun Heo
ac899061a9 percpu-refcount: cosmetic updates
* s/percpu_ref_release/percpu_ref_func_t/ as it's customary to have _t
  postfix for types and the type is gonna be used for a different type
  of callback too.

* Add @ARG to function comments.

* Drop unnecessary and unaligned indentation from percpu_ref_init()
  function comment.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Kent Overstreet <koverstreet@google.com>
2013-06-12 20:43:06 -07:00
Tejun Heo
6a24474da8 percpu-refcount: consistently use plain (non-sched) RCU
percpu_ref_get/put() are using preempt_disable/enable() while
percpu_ref_kill() is using plain call_rcu() instead of
call_rcu_sched().  This is buggy as grace periods of the two may not
match.  Fix it by using plain RCU in percpu_ref_get/put().

(I suggested using sched RCU in the first place but there's no actual
 benefit in doing so unless we're gonna introduce different variants
 of get/put to be called while preemption is alredy disabled, which we
 definitely shouldn't.)

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Kent Overstreet <koverstreet@google.com>
2013-06-12 20:43:06 -07:00
Kent Overstreet
215e262f2a percpu: implement generic percpu refcounting
This implements a refcount with similar semantics to
atomic_get()/atomic_dec_and_test() - but percpu.

It also implements two stage shutdown, as we need it to tear down the
percpu counts.  Before dropping the initial refcount, you must call
percpu_ref_kill(); this puts the refcount in "shutting down mode" and
switches back to a single atomic refcount with the appropriate
barriers (synchronize_rcu()).

It's also legal to call percpu_ref_kill() multiple times - it only
returns true once, so callers don't have to reimplement shutdown
synchronization.

[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: coding-style tweak]
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Reviewed-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Tejun Heo <tj@kernel.org>
2013-06-03 15:36:41 -07:00