Commit graph

196 commits

Author SHA1 Message Date
Mikulas Patocka
09ee96b214 dm snapshot: suspend merging snapshot when doing exception handover
The "dm snapshot: suspend origin when doing exception handover" commit
fixed a exception store handover bug associated with pending exceptions
to the "snapshot-origin" target.

However, a similar problem exists in snapshot merging.  When snapshot
merging is in progress, we use the target "snapshot-merge" instead of
"snapshot-origin".  Consequently, during exception store handover, we
must find the snapshot-merge target and suspend its associated
mapped_device.

To avoid lockdep warnings, the target must be suspended and resumed
without holding _origins_lock.

Introduce a dm_hold() function that grabs a reference on a
mapped_device, but unlike dm_get(), it doesn't crash if the device has
the DMF_FREEING flag set, it returns an error in this case.

In snapshot_resume() we grab the reference to the origin device using
dm_hold() while holding _origins_lock (_origins_lock guarantees that the
device won't disappear).  Then we release _origins_lock, suspend the
device and grab _origins_lock again.

NOTE to stable@ people:
When backporting to kernels 3.18 and older, use dm_internal_suspend and
dm_internal_resume instead of dm_internal_suspend_fast and
dm_internal_resume_fast.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
2015-02-27 14:53:16 -05:00
Mikulas Patocka
b735fede8d dm snapshot: suspend origin when doing exception handover
In the function snapshot_resume we perform exception store handover.  If
there is another active snapshot target, the exception store is moved
from this target to the target that is being resumed.

The problem is that if there is some pending exception, it will point to
an incorrect exception store after that handover, causing a crash due to
dm-snap-persistent.c:get_exception()'s BUG_ON.

This bug can be triggered by repeatedly changing snapshot permissions
with "lvchange -p r" and "lvchange -p rw" while there are writes on the
associated origin device.

To fix this bug, we must suspend the origin device when doing the
exception store handover to make sure that there are no pending
exceptions:
- introduce _origin_hash that keeps track of dm_origin structures.
- introduce functions __lookup_dm_origin, __insert_dm_origin and
  __remove_dm_origin that manipulate the origin hash.
- modify snapshot_resume so that it calls dm_internal_suspend_fast() and
  dm_internal_resume_fast() on the origin device.

NOTE to stable@ people:

When backporting to kernels 3.12-3.18, use dm_internal_suspend and
dm_internal_resume instead of dm_internal_suspend_fast and
dm_internal_resume_fast.

When backporting to kernels older than 3.12, you need to pick functions
dm_internal_suspend and dm_internal_resume from the commit
fd2ed4d252.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
2015-02-27 14:49:47 -05:00
Mikulas Patocka
22aa66a3ee dm snapshot: fix a possible invalid memory access on unload
When the snapshot target is unloaded, snapshot_dtr() waits until
pending_exceptions_count drops to zero.  Then, it destroys the snapshot.
Therefore, the function that decrements pending_exceptions_count
should not touch the snapshot structure after the decrement.

pending_complete() calls free_pending_exception(), which decrements
pending_exceptions_count, and then it performs up_write(&s->lock) and it
calls retry_origin_bios() which dereferences  s->origin.  These two
memory accesses to the fields of the snapshot may touch the dm_snapshot
struture after it is freed.

This patch moves the call to free_pending_exception() to the end of
pending_complete(), so that the snapshot will not be destroyed while
pending_complete() is in progress.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
2015-02-18 09:41:54 -05:00
NeilBrown
743162013d sched: Remove proliferation of wait_on_bit() action functions
The current "wait_on_bit" interface requires an 'action'
function to be provided which does the actual waiting.
There are over 20 such functions, many of them identical.
Most cases can be satisfied by one of just two functions, one
which uses io_schedule() and one which just uses schedule().

So:
 Rename wait_on_bit and        wait_on_bit_lock to
        wait_on_bit_action and wait_on_bit_lock_action
 to make it explicit that they need an action function.

 Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io
 which are *not* given an action function but implicitly use
 a standard one.
 The decision to error-out if a signal is pending is now made
 based on the 'mode' argument rather than being encoded in the action
 function.

 All instances of the old wait_on_bit and wait_on_bit_lock which
 can use the new version have been changed accordingly and their
 action functions have been discarded.
 wait_on_bit{_lock} does not return any specific error code in the
 event of a signal so the caller must check for non-zero and
 interpolate their own error code as appropriate.

The wait_on_bit() call in __fscache_wait_on_invalidate() was
ambiguous as it specified TASK_UNINTERRUPTIBLE but used
fscache_wait_bit_interruptible as an action function.
David Howells confirms this should be uniformly
"uninterruptible"

The main remaining user of wait_on_bit{,_lock}_action is NFS
which needs to use a freezer-aware schedule() call.

A comment in fs/gfs2/glock.c notes that having multiple 'action'
functions is useful as they display differently in the 'wchan'
field of 'ps'. (and /proc/$PID/wchan).
As the new bit_wait{,_io} functions are tagged "__sched", they
will not show up at all, but something higher in the stack.  So
the distinction will still be visible, only with different
function names (gds2_glock_wait versus gfs2_glock_dq_wait in the
gfs2/glock.c case).

Since first version of this patch (against 3.15) two new action
functions appeared, on in NFS and one in CIFS.  CIFS also now
uses an action function that makes the same freezer aware
schedule call as NFS.

Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: David Howells <dhowells@redhat.com> (fscache, keys)
Acked-by: Steven Whitehouse <swhiteho@redhat.com> (gfs2)
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steve French <sfrench@samba.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brown
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-16 15:10:39 +02:00
Linus Torvalds
0e04c641b1 . Add dm_accept_partial_bio interface to DM core to allow DM targets
to only process a portion of a bio, the remainder being sent in the
   next bio.  This enables the old dm snapshot-origin target to only
   split write bios on chunk boundaries, read bios are now sent to the
   origin device unchanged.
 
 . Add DM core support for disabling WRITE SAME if the underlying SCSI
   layer disables it due to command failure.
 
 . Reduce lock contention in DM's bio-prison.
 
 . A few small cleanups and fixes to dm-thin and dm-era.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJTmavIAAoJEMUj8QotnQNay/EIAJI6lEFlQK3gG830+Yaw1m2U
 7mnnX/Rd/N6RnccyhmFqk7Xu0REM7gJEgWicTQjR58La2DKFi072N0mwgHgHfB8f
 1oOvUKN5Nb/a1CmRcVzSO0sbYcJn9I1r+k0buqfFHivU68wuedG+MrVya3YzOjvC
 63MQiu4+3icDprcToxn+etz75FhrFps5QAsS0cH6t1VZqFCGIzxqgUKgY8zGo1CH
 P9hkYpJhhJe2aDh4vlFvpFYVFXt9zPoR+MkqXFW6Dn9GbR36gGaldTwnyhapla4N
 XYHDEdETcTqm2srOM5wW+jD0p+Id9/Lmd5Ld5J2zyARCYn/EG/SDCbiaVqOxEnc=
 =tv1B
 -----END PGP SIGNATURE-----

Merge tag 'dm-3.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:
 "This pull request is later than I'd have liked because I was waiting
  for some performance data to help finally justify sending the
  long-standing dm-crypt cpu scalability improvements upstream.

  Unfortunately we came up short, so those dm-crypt changes will
  continue to wait, but it seems we're not far off.

   . Add dm_accept_partial_bio interface to DM core to allow DM targets
     to only process a portion of a bio, the remainder being sent in the
     next bio.  This enables the old dm snapshot-origin target to only
     split write bios on chunk boundaries, read bios are now sent to the
     origin device unchanged.

   . Add DM core support for disabling WRITE SAME if the underlying SCSI
     layer disables it due to command failure.

   . Reduce lock contention in DM's bio-prison.

   . A few small cleanups and fixes to dm-thin and dm-era"

* tag 'dm-3.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm thin: update discard_granularity to reflect the thin-pool blocksize
  dm bio prison: implement per bucket locking in the dm_bio_prison hash table
  dm: remove symbol export for dm_set_device_limits
  dm: disable WRITE SAME if it fails
  dm era: check for a non-NULL metadata object before closing it
  dm thin: return ENOSPC instead of EIO when error_if_no_space enabled
  dm thin: cleanup noflush_work to use a proper completion
  dm snapshot: do not split read bios sent to snapshot-origin target
  dm snapshot: allocate a per-target structure for snapshot-origin target
  dm: introduce dm_accept_partial_bio
  dm: change sector_count member in clone_info from sector_t to unsigned
2014-06-12 13:33:29 -07:00
Mikulas Patocka
298eaa89b0 dm snapshot: do not split read bios sent to snapshot-origin target
Change the snapshot-origin target so that only write bios are split on
chunk boundary.  Read bios are passed unchanged to the underlying
device, so they don't have to be split.

Later, we could change the target so that it accepts a larger write bio
if it spans an area that is completely covered by snapshot exceptions.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-06-03 13:44:07 -04:00
Mikulas Patocka
599cdf3bfb dm snapshot: allocate a per-target structure for snapshot-origin target
Allocate a per-target dm_origin structure.  This is a prerequisite for
the next commit ("dm snapshot: do not split read bios sent to
snapshot-origin target") which adds a new member to this structure.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-06-03 13:44:07 -04:00
Peter Zijlstra
4e857c58ef arch: Mass conversion of smp_mb__*()
Mostly scripted conversion of the smp_mb__* barriers.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-arch@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-18 14:20:48 +02:00
Linus Torvalds
f568849eda Merge branch 'for-3.14/core' of git://git.kernel.dk/linux-block
Pull core block IO changes from Jens Axboe:
 "The major piece in here is the immutable bio_ve series from Kent, the
  rest is fairly minor.  It was supposed to go in last round, but
  various issues pushed it to this release instead.  The pull request
  contains:

   - Various smaller blk-mq fixes from different folks.  Nothing major
     here, just minor fixes and cleanups.

   - Fix for a memory leak in the error path in the block ioctl code
     from Christian Engelmayer.

   - Header export fix from CaiZhiyong.

   - Finally the immutable biovec changes from Kent Overstreet.  This
     enables some nice future work on making arbitrarily sized bios
     possible, and splitting more efficient.  Related fixes to immutable
     bio_vecs:

        - dm-cache immutable fixup from Mike Snitzer.
        - btrfs immutable fixup from Muthu Kumar.

  - bio-integrity fix from Nic Bellinger, which is also going to stable"

* 'for-3.14/core' of git://git.kernel.dk/linux-block: (44 commits)
  xtensa: fixup simdisk driver to work with immutable bio_vecs
  block/blk-mq-cpu.c: use hotcpu_notifier()
  blk-mq: for_each_* macro correctness
  block: Fix memory leak in rw_copy_check_uvector() handling
  bio-integrity: Fix bio_integrity_verify segment start bug
  block: remove unrelated header files and export symbol
  blk-mq: uses page->list incorrectly
  blk-mq: use __smp_call_function_single directly
  btrfs: fix missing increment of bi_remaining
  Revert "block: Warn and free bio if bi_end_io is not set"
  block: Warn and free bio if bi_end_io is not set
  blk-mq: fix initializing request's start time
  block: blk-mq: don't export blk_mq_free_queue()
  block: blk-mq: make blk_sync_queue support mq
  block: blk-mq: support draining mq queue
  dm cache: increment bi_remaining when bi_end_io is restored
  block: fixup for generic bio chaining
  block: Really silence spurious compiler warnings
  block: Silence spurious compiler warnings
  block: Kill bio_pair_split()
  ...
2014-01-30 11:19:05 -08:00
Mikulas Patocka
119bc54736 dm snapshot: use GFP_KERNEL when initializing exceptions
The list of initial exceptions is loaded in the target constructor.  We
are allowed to allocate memory with GFP_KERNEL at this point.  So,
change alloc_completed_exception to use GFP_KERNEL when being called
from the constructor.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-01-14 11:18:16 -05:00
Jens Axboe
b28bc9b38c Linux 3.13-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJSwLfoAAoJEHm+PkMAQRiGi6QH/1U1B7lmHChDTw3jj1lfm9gA
 189Si4QJlnxFWCKHvKEL+pcaVuACU+aMGI8+KyMYK4/JfuWVjjj5fr/SvyHH2/8m
 LdSK8aHMhJ46uBS4WJ/l6v46qQa5e2vn8RKSBAyKm/h4vpt+hd6zJdoFrFai4th7
 k/TAwOAEHI5uzexUChwLlUBRTvbq4U8QUvDu+DeifC8cT63CGaaJ4qVzjOZrx1an
 eP6UXZrKDASZs7RU950i7xnFVDQu4PsjlZi25udsbeiKcZJgPqGgXz5ULf8ZH8RQ
 YCi1JOnTJRGGjyIOyLj7pyB01h7XiSM2+eMQ0S7g54F2s7gCJ58c2UwQX45vRWU=
 =/4/R
 -----END PGP SIGNATURE-----

Merge tag 'v3.13-rc6' into for-3.14/core

Needed to bring blk-mq uptodate, since changes have been going in
since for-3.14/core was established.

Fixup merge issues related to the immutable biovec changes.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

Conflicts:
	block/blk-flush.c
	fs/btrfs/check-integrity.c
	fs/btrfs/extent_io.c
	fs/btrfs/scrub.c
	fs/logfs/dev_bdev.c
2013-12-31 09:51:02 -07:00
Mikulas Patocka
230c83afdd dm snapshot: avoid snapshot space leak on crash
There is a possible leak of snapshot space in case of crash.

The reason for space leaking is that chunks in the snapshot device are
allocated sequentially, but they are finished (and stored in the metadata)
out of order, depending on the order in which copying finished.

For example, supposed that the metadata contains the following records
SUPERBLOCK
METADATA (blocks 0 ... 250)
DATA 0
DATA 1
DATA 2
...
DATA 250

Now suppose that you allocate 10 new data blocks 251-260. Suppose that
copying of these blocks finish out of order (block 260 finished first
and the block 251 finished last). Now, the snapshot device looks like
this:
SUPERBLOCK
METADATA (blocks 0 ... 250, 260, 259, 258, 257, 256)
DATA 0
DATA 1
DATA 2
...
DATA 250
DATA 251
DATA 252
DATA 253
DATA 254
DATA 255
METADATA (blocks 255, 254, 253, 252, 251)
DATA 256
DATA 257
DATA 258
DATA 259
DATA 260

Now, if the machine crashes after writing the first metadata block but
before writing the second metadata block, the space for areas DATA 250-255
is leaked, it contains no valid data and it will never be used in the
future.

This patch makes dm-snapshot complete exceptions in the same order they
were allocated, thus fixing this bug.

Note: when backporting this patch to the stable kernel, change the version
field in the following way:
* if version in the stable kernel is {1, 11, 1}, change it to {1, 12, 0}
* if version in the stable kernel is {1, 10, 0} or {1, 10, 1}, change it
  to {1, 10, 2}
Userspace reads the version to determine if the bug was fixed, so the
version change is needed.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
2013-12-10 16:34:25 -05:00
Kent Overstreet
196d38bccf block: Generic bio chaining
This adds a generic mechanism for chaining bio completions. This is
going to be used for a bio_split() replacement, and it turns out to be
very useful in a fair amount of driver code - a fair number of drivers
were implementing this in their own roundabout ways, often painfully.

Note that this means it's no longer to call bio_endio() more than once
on the same bio! This can cause problems for drivers that save/restore
bi_end_io. Arguably they shouldn't be saving/restoring bi_end_io at all
- in all but the simplest cases they'd be better off just cloning the
bio, and immutable biovecs is making bio cloning cheaper. But for now,
we add a bio_endio_nodec() for these cases.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
2013-11-23 22:33:56 -08:00
Kent Overstreet
4f024f3797 block: Abstract out bvec iterator
Immutable biovecs are going to require an explicit iterator. To
implement immutable bvecs, a later patch is going to add a bi_bvec_done
member to this struct; for now, this patch effectively just renames
things.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Ed L. Cashin" <ecashin@coraid.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Cc: Sage Weil <sage@inktank.com>
Cc: Alex Elder <elder@inktank.com>
Cc: ceph-devel@vger.kernel.org
Cc: Joshua Morris <josh.h.morris@us.ibm.com>
Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: dm-devel@redhat.com
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux390@de.ibm.com
Cc: Boaz Harrosh <bharrosh@panasas.com>
Cc: Benny Halevy <bhalevy@tonian.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Dave Kleikamp <shaggy@kernel.org>
Cc: Joern Engel <joern@logfs.org>
Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Ben Myers <bpm@sgi.com>
Cc: xfs@oss.sgi.com
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Guo Chao <yan@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchand@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Peng Tao <tao.peng@emc.com>
Cc: Andy Adamson <andros@netapp.com>
Cc: fanchaoting <fanchaoting@cn.fujitsu.com>
Cc: Jie Liu <jeff.liu@oracle.com>
Cc: Sunil Mushran <sunil.mushran@gmail.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Namjae Jeon <namjae.jeon@samsung.com>
Cc: Pankaj Kumar <pankaj.km@samsung.com>
Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: Mel Gorman <mgorman@suse.de>6
2013-11-23 22:33:47 -08:00
Mikulas Patocka
60e356f381 dm-snapshot: fix performance degradation due to small hash size
LVM2, since version 2.02.96, creates origin with zero size, then loads
the snapshot driver and then loads the origin.  Consequently, the
snapshot driver sees the origin size zero and sets the hash size to the
lower bound 64.  Such small hash table causes performance degradation.

This patch changes it so that the hash size is determined by the size of
snapshot volume, not minimum of origin and snapshot size.  It doesn't
make sense to set the snapshot size significantly larger than the origin
size, so we do not need to take origin size into account when
calculating the hash size.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
2013-09-20 10:36:34 -04:00
Wei Yongjun
09e8b81389 dm snapshot: fix error return code in snapshot_ctr
Return -ENOMEM instead of success if unable to allocate pending
exception mempool in snapshot_ctr.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: stable@vger.kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-05-10 14:37:15 +01:00
Mikulas Patocka
df5d2e9089 dm kcopyd: introduce configurable throttling
This patch allows the administrator to reduce the rate at which kcopyd
issues I/O.

Each module that uses kcopyd acquires a throttle parameter that can be
set in /sys/module/*/parameters.

We maintain a history of kcopyd usage by each module in the variables
io_period and total_period in struct dm_kcopyd_throttle. The actual
kcopyd activity is calculated as a percentage of time equal to
"(100 * io_period / total_period)".  This is compared with the user-defined
throttle percentage threshold and if it is exceeded, we sleep.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01 22:45:49 +00:00
Mikulas Patocka
23cb21092e dm snapshot: add missing module aliases
Add module aliases so that autoloading works correctly if the user
tries to activate "snapshot-origin" or "snapshot-merge" targets.

Reference: https://bugzilla.redhat.com/889973

Reported-by: Chao Yang <chyang@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01 22:45:47 +00:00
Alasdair G Kergon
55a62eef8d dm: rename request variables to bios
Use 'bio' in the name of variables and functions that deal with
bios rather than 'request' to avoid confusion with the normal
block layer use of 'request'.

No functional changes.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01 22:45:47 +00:00
Mikulas Patocka
fd7c092e71 dm: fix truncated status strings
Avoid returning a truncated table or status string instead of setting
the DM_BUFFER_FULL_FLAG when the last target of a table fills the
buffer.

When processing a table or status request, the function retrieve_status
calls ti->type->status. If ti->type->status returns non-zero,
retrieve_status assumes that the buffer overflowed and sets
DM_BUFFER_FULL_FLAG.

However, targets don't return non-zero values from their status method
on overflow. Most targets returns always zero.

If a buffer overflow happens in a target that is not the last in the
table, it gets noticed during the next iteration of the loop in
retrieve_status; but if a buffer overflow happens in the last target, it
goes unnoticed and erroneously truncated data is returned.

In the current code, the targets behave in the following way:
* dm-crypt returns -ENOMEM if there is not enough space to store the
  key, but it returns 0 on all other overflows.
* dm-thin returns errors from the status method if a disk error happened.
  This is incorrect because retrieve_status doesn't check the error
  code, it assumes that all non-zero values mean buffer overflow.
* all the other targets always return 0.

This patch changes the ti->type->status function to return void (because
most targets don't use the return code). Overflow is detected in
retrieve_status: if the status method fills up the remaining space
completely, it is assumed that buffer overflow happened.

Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01 22:45:44 +00:00
Sasha Levin
b67bfe0d42 hlist: drop the node parameter from iterators
I'm not sure why, but the hlist for each entry iterators were conceived

        list_for_each_entry(pos, head, member)

The hlist ones were greedy and wanted an extra parameter:

        hlist_for_each_entry(tpos, pos, head, member)

Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.

Besides the semantic patch, there was some manual work required:

 - Fix up the actual hlist iterators in linux/list.h
 - Fix up the declaration of other iterators based on the hlist ones.
 - A very small amount of places were using the 'node' parameter, this
 was modified to use 'obj->member' instead.
 - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
 properly, so those had to be fixed up manually.

The semantic patch which is mostly the work of Peter Senna Tschudin is here:

@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

type T;
expression a,c,d,e;
identifier b;
statement S;
@@

-T b;
    <+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
    ...+>

[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 19:10:24 -08:00
Mikulas Patocka
7de3ee57da dm: remove map_info
This patch removes map_info from bio-based device mapper targets.
map_info is still used for request-based targets.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:41 +00:00
Mikulas Patocka
ee18026ac6 dm snapshot: do not use map_context
Eliminate struct map_info from dm-snap.

map_info->ptr was used in dm-snap to indicate if the bio was tracked.
If map_info->ptr was non-NULL, the bio was linked in tracked_chunk_hash.

This patch removes the use of map_info->ptr. We determine if the bio was
tracked based on hlist_unhashed(&c->node). If hlist_unhashed is true,
the bio is not tracked, if it is false, the bio is tracked.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:41 +00:00
Mikulas Patocka
ddbd658f64 dm: move target request nr to dm_target_io
This patch moves target_request_nr from map_info to dm_target_io and
makes it accessible with dm_bio_get_target_request_nr.

This patch is a preparation for the next patch that removes map_info.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:39 +00:00
Mikulas Patocka
42bc954f2a dm snapshot: use per_bio_data
Replace tracked_chunk_pool with per_bio_data in dm-snap.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:38 +00:00
Mikulas Patocka
9aa0c0e60f dm snapshot: optimize track_chunk
track_chunk is always called with interrupts enabled. Consequently, we
do not need to save and restore interrupt state in "flags" variable.
This patch changes spin_lock_irqsave to spin_lock_irq and
spin_unlock_irqrestore to spin_unlock_irq.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:33 +00:00
Alasdair G Kergon
1f4e0ff079 dm thin: commit before gathering status
Commit outstanding metadata before returning the status for a dm thin
pool so that the numbers reported are as up-to-date as possible.

The commit is not performed if the device is suspended or if
the DM_NOFLUSH_FLAG is supplied by userspace and passed to the target
through a new 'status_flags' parameter in the target's dm_status_fn.

The userspace dmsetup tool will support the --noflush flag with the
'dmsetup status' and 'dmsetup wait' commands from version 1.02.76
onwards.

Tested-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-07-27 15:08:16 +01:00
Mike Snitzer
542f903814 dm: support non power of two target max_io_len
Remove the restriction that limits a target's specified maximum incoming
I/O size to be a power of 2.

Rename this setting from 'split_io' to the less-ambiguous 'max_io_len'.
Change it from sector_t to uint32_t, which is plenty big enough, and
introduce a wrapper function dm_set_target_max_io_len() to set it.
Use sector_div() to process it now that it is not necessarily a power of 2.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-07-27 15:08:00 +01:00
Alasdair G Kergon
70c4861102 dm snapshot: remove redundant assignment in merge fn
Remove redundant bvm->bi_sector self-assignment in dm snapshot's
origin_merge().

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-07-27 15:07:59 +01:00
Mikulas Patocka
a6e50b409d dm snapshot: skip reading origin when overwriting complete chunk
If we write a full chunk in the snapshot, skip reading the origin device
because the whole chunk will be overwritten anyway.

This patch changes the snapshot write logic when a full chunk is written.
In this case:
  1. allocate the exception
  2. dispatch the bio (but don't report the bio completion to device mapper)
  3. write the exception record
  4. report bio completed

Callbacks must be done through the kcopyd thread, because callbacks must not
race with each other.  So we create two new functions:

  dm_kcopyd_prepare_callback: allocate a job structure and prepare the callback.
  (This function must not be called from interrupt context.)

  dm_kcopyd_do_callback: submit callback.
  (This function may be called from interrupt context.)

Performance test (on snapshots with 4k chunk size):
  without the patch:
    non-direct-io sequential write (dd):    17.7MB/s
    direct-io sequential write (dd):        20.9MB/s
    non-direct-io random write (mkfs.ext2): 0.44s

  with the patch:
    non-direct-io sequential write (dd):    26.5MB/s
    direct-io sequential write (dd):        33.2MB/s
    non-direct-io random write (mkfs.ext2): 0.27s

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02 12:32:04 +01:00
Jonathan Brassow
a2d2b0345a dm snapshot: style cleanups
Coding style cleanups.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
2011-08-02 12:32:03 +01:00
Mikulas Patocka
aa3f0794d2 dm snapshot: remove unused definitions
Remove a couple of unused #defines.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02 12:32:03 +01:00
Mikulas Patocka
fa34ce7307 dm kcopyd: return client directly and not through a pointer
Return client directly from dm_kcopyd_client_create, not through a
parameter, making it consistent with dm_io_client_create.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-05-29 13:03:13 +01:00
Mikulas Patocka
5f43ba2950 dm kcopyd: reserve fewer pages
Reserve just the minimum of pages needed to process one job.

Because we allocate pages from page allocator, we don't need to reserve
a large number of pages.  The maximum job size is SUB_JOB_SIZE and we
calculate the number of reserved pages based on this.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-05-29 13:03:11 +01:00
Milan Broz
024d37e95e dm: fix opening log and cow devices for read only tables
If a table is read-only, also open any log and cow devices it uses read-only.

Previously, even read-only devices were opened read-write internally.
After patch 75f1dc0d07
  block: check bdev_read_only() from blkdev_get()
was applied, loading such tables began to fail.  The patch
was reverted by e51900f7d3
  block: revert block_dev read-only check
but this patch fixes this part of the code to work with the original patch.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-03-24 13:52:14 +00:00
Mike Snitzer
b83b2f295a dm snapshot: avoid storing private suspended state
Use dm_suspended() rather than having each snapshot target maintain a
private 'suspended' flag in struct dm_snapshot.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-01-13 19:59:59 +00:00
Tejun Heo
fecec20e55 dm snapshot: remove unused dm_snapshot queued_bios_work
dm_snapshot->queued_bios_work isn't used.  Remove ->queued_bios[_work]
from dm_snapshot structure, the flush_queued_bios work function and
ksnapd workqueue.

The DM snapshot changes that were going to use the ksnapd workqueue were
either superseded (fix for origin write races) or never completed
(deallocation of invalid snapshot's memory via workqueue).

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-01-13 19:59:56 +00:00
Linus Torvalds
a2887097f2 Merge branch 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-block: (46 commits)
  xen-blkfront: disable barrier/flush write support
  Added blk-lib.c and blk-barrier.c was renamed to blk-flush.c
  block: remove BLKDEV_IFL_WAIT
  aic7xxx_old: removed unused 'req' variable
  block: remove the BH_Eopnotsupp flag
  block: remove the BLKDEV_IFL_BARRIER flag
  block: remove the WRITE_BARRIER flag
  swap: do not send discards as barriers
  fat: do not send discards as barriers
  ext4: do not send discards as barriers
  jbd2: replace barriers with explicit flush / FUA usage
  jbd2: Modify ASYNC_COMMIT code to not rely on queue draining on barrier
  jbd: replace barriers with explicit flush / FUA usage
  nilfs2: replace barriers with explicit flush / FUA usage
  reiserfs: replace barriers with explicit flush / FUA usage
  gfs2: replace barriers with explicit flush / FUA usage
  btrfs: replace barriers with explicit flush / FUA usage
  xfs: replace barriers with explicit flush / FUA usage
  block: pass gfp_mask and flags to sb_issue_discard
  dm: convey that all flushes are processed as empty
  ...
2010-10-22 17:07:18 -07:00
Martin K. Petersen
c8bf133682 Consolidate min_not_zero
We have several users of min_not_zero, each of them using their own
definition.  Move the define to kernel.h.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@carl.home.kernel.dk>
2010-09-10 20:07:38 +02:00
Tejun Heo
d87f4c14f2 dm: implement REQ_FLUSH/FUA support for bio-based dm
This patch converts bio-based dm to support REQ_FLUSH/FUA instead of
now deprecated REQ_HARDBARRIER.

* -EOPNOTSUPP handling logic dropped.

* Preflush is handled as before but postflush is dropped and replaced
  with passing down REQ_FUA to member request_queues.  This replaces
  one array wide cache flush w/ member specific FUA writes.

* __split_and_process_bio() now calls __clone_and_map_flush() directly
  for flushes and guarantees all FLUSH bio's going to targets are zero
`  length.

* It's now guaranteed that all FLUSH bio's which are passed onto dm
  targets are zero length.  bio_empty_barrier() tests are replaced
  with REQ_FLUSH tests.

* Empty WRITE_BARRIERs are replaced with WRITE_FLUSHes.

* Dropped unlikely() around REQ_FLUSH tests.  Flushes are not unlikely
  enough to be marked with unlikely().

* Block layer now filters out REQ_FLUSH/FUA bio's if the request_queue
  doesn't support cache flushing.  Advertise REQ_FLUSH | REQ_FUA
  capability.

* Request based dm isn't converted yet.  dm_init_request_based_queue()
  resets flush support to 0 for now.  To avoid disturbing request
  based dm code, dm->flush_error is added for bio based dm while
  requested based dm continues to use dm->barrier_error.

Lightly tested linear, stripe, raid1, snap and crypt targets.  Please
proceed with caution as I'm not familiar with the code base.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: dm-devel@redhat.com
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-10 12:35:38 +02:00
Mike Snitzer
57cba5d365 dm: rename map_info flush_request to target_request_nr
'target_request_nr' is a more generic name that reflects the fact that
it will be used for both flush and discard support.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2010-08-12 04:14:04 +01:00
Mikulas Patocka
b1d5552838 dm snapshot: implement merge
Implement merge method for the snapshot origin to improve read
performance.

Without merge method, dm asks the upper layers to submit smallest possible
bios --- one page. Submitting such small bios impacts performance negatively
when reading or writing the origin device.

Without this patch, CPU consumption when reading the origin on lvm on md-raid0
was 6 to 12%, with this patch, it drops to 1 to 4%.

Note: in my testing, it actually degraded performance in some settings, I
traced it to Maxtor disks having problems with > 512-sector requests.
Reducing the number of sectors to /sys/block/sd*/queue/max_sectors_kb to
256 fixed the read performance. I think we don't have to care about weird
disks that actually degrade performance because of large requests being
sent to them.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2010-08-12 04:14:02 +01:00
Mikulas Patocka
c241104506 dm snapshot: test chunk size against both origin and snapshot
Validate chunk size against both origin and snapshot sector size

Don't allow chunk size smaller than either origin or snapshot logical
sector size. Reading or writing data not aligned to sector size is not
allowed and causes immediate errors.

This requires us to open the origin before initialising the
exception store and to export dm_snap_origin.

Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2010-08-12 04:13:51 +01:00
Mikulas Patocka
1e5554c842 dm snapshot: iterate origin and cow devices
Iterate both origin and snapshot devices

iterate_devices method should call the callback for all the devices where
the bio may be remapped. Thus, snapshot_iterate_devices should call the callback
for both snapshot and origin underlying devices because it remaps some bios
to the snapshot and some to the origin.

snapshot_iterate_devices called the callback only for the origin device.
This led to badly calculated device limits if snapshot and origin were placed
on different types of disks.

Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2010-08-12 04:13:50 +01:00
Mike Snitzer
924e600d41 dm: eliminate some holes data structures
Eliminate a 4-byte hole in 'struct dm_io_memory' by moving 'offset' above the
'ptr' to which it applies (size reduced from 24 to 16 bytes).  And by
association, 1-4 byte hole is eliminated in 'struct dm_io_request' (size
reduced from 56 to 48 bytes).

Eliminate all 6 4-byte holes and 1 cache-line in 'struct dm_snapshot' (size
reduced from 392 to 368 bytes).

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2010-03-06 02:32:33 +00:00
Nikanth Karthikesan
8215d6ec5f dm table: remove unused dm_get_device range parameters
Remove unused parameters(start and len) of dm_get_device()
and fix the callers.

Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2010-03-06 02:32:27 +00:00
Mikulas Patocka
d2fdb776e0 dm snapshot: use merge origin if snapshot invalid
If the snapshot we are merging became invalid (e.g. it ran out of
space) redirect all I/O directly to the origin device.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-12-10 23:52:36 +00:00
Mike Snitzer
d8ddb1cfff dm snapshot: report merge failure in status
Set 'merge_failed' flag if a snapshot fails to merge.  Update
snapshot_status() to report "Merge failed" if 'merge_failed' is set.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-12-10 23:52:35 +00:00
Mike Snitzer
8a2d528620 dm snapshot: merge consecutive chunks together
s->store->type->prepare_merge returns the number of chunks that can be
copied linearly working backwards from the returned chunk number.

For example, if it returns 3 chunks with old_chunk == 10 and new_chunk
== 20, then chunk 20 can be copied to 10, chunk 19 to 9 and 18 to 8.

Until now kcopyd only copied one chunk at a time.  This patch now copies
the full set at once.

Consequently, snapshot_merge_process() needs to delay the merging of all
chunks if any have writes in progress, not just the first chunk in the
region that is to be merged.

snapshot-merge's performance is now comparable to the original
snapshot-origin target.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-12-10 23:52:34 +00:00
Mikulas Patocka
73dfd078cf dm snapshot: trigger exceptions in remaining snapshots during merge
When there is one merging snapshot and other non-merging snapshots,
snapshot_merge_process() must make exceptions in the non-merging
snapshots.

Use a sequence count to resolve the race between I/O to chunks that are
about to be merged.  The count increases each time an exception
reallocation finishes.  Use wait_event() to wait until the count
changes.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-12-10 23:52:34 +00:00
Mikulas Patocka
17aa03326d dm snapshot: delay merging a chunk until writes to it complete
Track writes to chunks that are currently being merged and delay merging
a chunk until all writes to that chunk finish.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-12-10 23:52:33 +00:00
Mikulas Patocka
9fe8625488 dm snapshot: queue writes to chunks being merged
While a set of chunks is being merged, any overlapping writes need to be
queued.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-12-10 23:52:33 +00:00
Mikulas Patocka
1e03f97e43 dm snapshot: add merging
Merging is started when origin is resumed and it is stopped when
origin is suspended or when the merging snapshot is destroyed or
errors are detected.

Merging is not yet interlocked with writes: this will be handled in
subsequent patches.

The code relies on callbacks from a private kcopyd thread.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-12-10 23:52:32 +00:00
Mikulas Patocka
9d3b15c4c7 dm snapshot: permit only one merge at once
Merging more than one snapshot is not supported, so prevent
this happening.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-12-10 23:52:32 +00:00
Mike Snitzer
10b8106a70 dm snapshot: support barriers in snapshot merge target
Sets num_flush_requests=2 to support flushing both the origin and cow
devices used by the snapshot-merge target.

Also, snapshot_ctr() now gets the origin device using FMODE_WRITE if the
target is snapshot-merge (which writes to the origin device).

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-12-10 23:52:31 +00:00
Mikulas Patocka
3452c2a1eb dm snapshot: avoid allocating exceptions in merge
The snapshot-merge target should not allocate new exceptions because the
intent is to merge all of its exceptions as quickly and safely as
possible.

This patch introduces the snapshot-merge mapping function and updates
__origin_write() so that it doesn't allocate exceptions on any snapshots
that are being merged.

If a write request to a merging snapshot device is to be dispatched
directly to the origin (because the chunk is not remapped or was already
merged), snapshot_merge_map() must make exceptions in other snapshots so
calls do_origin().

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-12-10 23:52:31 +00:00
Mikulas Patocka
515ad66cc4 dm snapshot: rework writing to origin
To track the completion of exceptions relating to the same location on
the device, the current code selects one exception as primary_pe, links
the other exceptions to it and uses reference counting to wait until all
the reallocations are complete.

It is considered too complicated to extend this code to handle the new
snapshot-merge target, where sets of non-overlapping chunks would also
need to become linked.

Instead, a simpler (but less efficient) approach is taken.  Bios are
linked to one exception.  When it completes, bios are simply retried,
and if other related exceptions are still outstanding, they'll get
queued again to wait for another one.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-12-10 23:52:30 +00:00
Mikulas Patocka
d698aa4500 dm snapshot: add merge target
The snapshot-merge target allows a snapshot to be merged back into the
snapshot's origin device.

One anticipated use of snapshot merging is the rollback of filesystems
to back out problematic system upgrades.

This patch adds snapshot-merge target management to both
dm_snapshot_init() and dm_snapshot_exit().  As an initial place-holder,
snapshot-merge is identical to the snapshot target.  Documentation is
provided.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-12-10 23:52:30 +00:00
Mike Snitzer
615d1eb9ca dm snapshot: create function for chunk_is_tracked wait
Move the __chunk_is_tracked() loop into a separate function as we will
also need to call it from the write path in the rare case of conflicting
writes to the same chunk.

Originally introduced in commit a8d41b59f3
("dm snapshot: fix race during exception creation").

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-12-10 23:52:29 +00:00
Mikulas Patocka
9eaae8ffbc dm snapshot: make bio optional in __origin_write
To support the merging of snapshots back into their origin we need
to trigger exceptions in other snapshots not being merged without
any incoming bio on the origin device.  The bio parameter to
__origin_write() becomes optional and the sector needs supplying
separately.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-12-10 23:52:28 +00:00
Mike Snitzer
c1f0c183f6 dm snapshot: allow live exception store handover between tables
Permit in-use snapshot exception data to be 'handed over' from one
snapshot instance to another.  This is a pre-requisite for patches
that allow the changes made in a snapshot device to be merged back into
its origin device and also allows device resizing.

The basic call sequence is:

  dmsetup load new_snapshot (referencing the existing in-use cow device)
     - the ctr code detects that the cow is already in use and allows the
       two snapshot target instances to be linked together
  dmsetup suspend original_snapshot
  dmsetup resume new_snapshot
     - the new_snapshot becomes live, and if anything now tries to access
       the original one it will receive -EIO
  dmsetup remove original_snapshot

(There can only be two snapshot targets referencing the same cow device
simultaneously.)

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-12-10 23:52:24 +00:00
Mike Snitzer
c26655ca3c dm snapshot: track suspended state in target
Keep track of whether or not the device is suspended within the snapshot
target module, the same as we do in dm-raid1.

We will use this later to enforce the correct sequence of ioctls to
transfer the in-core exceptions from a snapshot target instance in
one table to a replacement one capable of merging them back
into the origin.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-12-10 23:52:12 +00:00
Mike Snitzer
fc56f6fbcc dm snapshot: move cow ref from exception store to snap core
Store the reference to the snapshot cow device in the core snapshot
code instead of each exception store.  It can be accessed through the
new function dm_snap_cow().  Exception stores should each now maintain a
reference to their parent snapshot struct.

This is cleaner and makes part of the forthcoming snapshot merge code simpler.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Reviewed-by: Jonathan Brassow <jbrassow@redhat.com>
Cc: Mikulas Patocka <mpatocka@redhat.com>
2009-12-10 23:52:12 +00:00
Mike Snitzer
985903bb3a dm snapshot: add allocated metadata to snapshot status
Add number of sectors used by metadata to the end of the snapshot's status
line.

Renamed dm_exception_store_type's 'fraction_full' to 'usage'.  Renamed
arguments to be clearer about what is being returned.  Also added
'metadata_sectors'.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-12-10 23:52:11 +00:00
Jon Brassow
3510cb94ff dm snapshot: rename exception functions
Rename exception functions.  Preparing to pull them out of
dm-snap.c for broader use.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-12-10 23:52:11 +00:00
Jon Brassow
191437a53c dm snapshot: rename exception_table to dm_exception_table
Rename exception_table for broader use outside dm-snap.c

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-12-10 23:52:10 +00:00
Jon Brassow
1d4989c858 dm snapshot: rename dm_snap_exception to dm_exception
The exception structure is not necessarily just a snapshot
element (especially after we pull it out of dm-snap.c).

Renaming appropriately.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-12-10 23:52:10 +00:00
Jon Brassow
d32a6ea65f dm snapshot: consolidate insert exception functions
Consolidate the insert_*exception functions.  'insert_completed_exception'
already contains all the logic to handle 'insert_exception' (via
check for a hash_shift of 0), so remove redundant function.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-12-10 23:52:09 +00:00
Mikulas Patocka
7e201b3513 dm snapshot: abstract minimum_chunk_size fn
The origin needs to find minimum chunksize of all snapshots.  This logic is
moved to a separate function because it will be used at another place in
the snapshot merge patches.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-12-10 23:52:08 +00:00
Mikulas Patocka
8e87b9b81b dm snapshot: cope with chunk size larger than origin
Under some special conditions the snapshot hash_size is calculated as zero.
This patch instead sets a minimum value of 64, the same as for the
pending exception table.

rounddown_pow_of_two(0) is an undefined operation (it expands to shift
by -1).  init_exception_table with an argument of 0 would fail with -ENOMEM.

The way to trigger the problem is to create a snapshot with a chunk size
that is larger than the origin device.

Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-12-10 23:51:54 +00:00
Mikulas Patocka
94e76572b5 dm snapshot: only take lock for statustype info not table
Take snapshot lock only for STATUSTYPE_INFO, not STATUSTYPE_TABLE.

Commit 4c6fff445d
(dm-snapshot-lock-snapshot-while-supplying-status.patch)
introduced this use of the lock, but userspace applications using
libdevmapper have been found to request STATUSTYPE_TABLE while the device
is suspended and the lock is already held, leading to deadlock.  Since
the lock is not necessary in this case, don't try to take it.

Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-12-10 23:51:53 +00:00
Mikulas Patocka
df96eee679 dm snapshot: use unsigned integer chunk size
Use unsigned integer chunk size.

Maximum chunk size is 512kB, there won't ever be need to use 4GB chunk size,
so the number can be 32-bit. This fixes compiler failure on 32-bit systems
with large block devices.

Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-10-16 23:18:17 +01:00
Mikulas Patocka
4c6fff445d dm snapshot: lock snapshot while supplying status
This patch locks the snapshot when returning status.  It fixes a race
when it could return an invalid number of free chunks if someone
was simultaneously modifying it.

Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-10-16 23:18:16 +01:00
Mikulas Patocka
3f2412dc85 dm snapshot: require non zero chunk size by end of ctr
If we are creating snapshot with memory-stored exception store, fail if
the user didn't specify chunk size. Zero chunk size would probably crash
a lot of places in the rest of snapshot code.

Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Jonathan Brassow <jbrassow@redhat.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-10-16 23:18:16 +01:00
Jonathan Brassow
034a186d29 dm snapshot: free exception store on init failure
While initializing the snapshot module, if we fail to register
the snapshot target then we must back-out the exception store
module initialization.

Cc: stable@kernel.org
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-10-16 23:18:14 +01:00
Mikulas Patocka
6d45d93ead dm snapshot: sort by chunk size to fix race
Avoid a race causing corruption when snapshots of the same origin have
different chunk sizes by sorting the internal list of snapshots by chunk
size, largest first.
  https://bugzilla.redhat.com/show_bug.cgi?id=182659

For example, let's have two snapshots with different chunk sizes. The
first snapshot (1) has small chunk size and the second snapshot (2) has
large chunk size.  Let's have chunks A, B, C in these snapshots:
snapshot1: ====A====   ====B====
snapshot2: ==========C==========

(Chunk size is a power of 2. Chunks are aligned.)

A write to the origin at a position within A and C comes along. It
triggers reallocation of A, then reallocation of C and links them
together using A as the 'primary' exception.

Then another write to the origin comes along at a position within B and
C.  It creates pending exception for B.  C already has a reallocation in
progress and it already has a primary exception (A), so nothing is done
to it: B and C are not linked.

If the reallocation of B finishes before the reallocation of C, because
there is no link with the pending exception for C it does not know to
wait for it and, the second write is dispatched to the origin and causes
data corruption in the chunk C in snapshot2.

To avoid this situation, we maintain snapshots sorted in descending
order of chunk size.  This leads to a guaranteed ordering on the links
between the pending exceptions and avoids the problem explained above -
both A and B now get linked to C.

Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-10-16 23:18:14 +01:00
Mike Snitzer
8811f46c1f dm snapshot: implement iterate devices
Implement the .iterate_devices for the origin and snapshot targets.
dm-snapshot's lack of .iterate_devices resulted in the inability to
properly establish queue_limits for both targets.

With 4K sector drives: an unfortunate side-effect of not establishing
proper limits in either targets' DM device was that IO to the devices
would fail even though both had been created without error.

Commit af4874e03e ("dm target:s introduce
iterate devices fn") in 2.6.31-rc1 should have implemented .iterate_devices
for dm-snap.c's origin and snapshot targets.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-09-04 20:40:19 +01:00
Mikulas Patocka
494b3ee7d4 dm snapshot: support barriers
Flush support for dm-snapshot target.

This patch just forwards the flush request to either the origin or the snapshot
device.  (It doesn't flush exception store metadata.)

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:12:25 +01:00
Christoph Hellwig
8f3d8ba20e block: move bio list helpers into bio.h
It's used by DM and MD and generally useful, so move the bio list
helpers into bio.h.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-04-15 08:28:09 +02:00
Jonathan Brassow
1e302a929e dm snapshot: move status to exception store
Let the exception store types print out their status through
the new API, rather than having the snapshot code do it.

Adjust the buffer position to allow for the preceding DMEMIT in the
arguments to type->status().

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-02 19:55:35 +01:00
Jonathan Brassow
fee1998e9c dm snapshot: move ctr parsing to exception store
First step of having the exception stores parse their own arguments -
generalizing the interface.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-02 19:55:34 +01:00
Jonathan Brassow
2e4a31df2b dm snapshot: use DMEMIT macro for status
Use DMEMIT in place of snprintf.  This makes it easier later when
other modules are helping to populate our status output.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-02 19:55:34 +01:00
Jonathan Brassow
ccc45ea8ae dm snapshot: remove dm_snap header
Move some of the last bits from dm-snap.h into dm-snap.c where they
belong and remove dm-snap.h.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-02 19:55:34 +01:00
Jonathan Brassow
71fab00a6b dm snapshot: remove dm_snap header use
Move useful functions out of dm-snap.h and stop using dm-snap.h.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-02 19:55:33 +01:00
Jonathan Brassow
49beb2b87a dm exception store: move cow pointer
Move COW device from snapshot to exception store.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-02 19:55:33 +01:00
Jonathan Brassow
d021684951 dm exception store: move chunk_fields
Move chunk fields from snapshot to exception store.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-02 19:55:32 +01:00
Jonathan Brassow
0cea9c7827 dm exception store: move dm_target pointer
Move target pointer from snapshot to exception store.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-02 19:55:32 +01:00
Jonathan Brassow
493df71c64 dm exception store: introduce registry
Move exception stores into a registry.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-02 19:55:31 +01:00
Jonathan Brassow
b2a1146529 dm exception store: separate type from instance
Introduce struct dm_exception_store_type.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-02 19:55:30 +01:00
Mikulas Patocka
35bf659b00 dm snapshot: avoid having two exceptions for the same chunk
We need to check if the exception was completed after dropping the lock.

After regaining the lock, __find_pending_exception checks if the exception
was already placed into &s->pending hash.

But we don't check if the exception was already completed and placed into
&s->complete hash. If the process waiting in alloc_pending_exception was
delayed at this point because of a scheduling latency and the exception
was meanwhile completed, we'd miss that and allocate another pending
exception for already completed chunk.

It would lead to a situation where two records for the same chunk exist
and potential data corruption because multiple snapshot I/Os to the
affected chunk could be redirected to different locations in the
snapshot.

Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-02 19:55:26 +01:00
Mikulas Patocka
c66213921c dm snapshot: avoid dropping lock in __find_pending_exception
It is uncommon and bug-prone to drop a lock in a function that is called with
the lock held, so this is moved to the caller.

Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-02 19:55:25 +01:00
Mikulas Patocka
2913808eb5 dm snapshot: refactor __find_pending_exception
Move looking-up of a pending exception from __find_pending_exception to another
function.

Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-02 19:55:25 +01:00
Jonathan Brassow
a159c1ac5f dm snapshot: extend exception store functions
Supply dm_add_exception as a callback to the read_metadata function.
Add a status function ready for a later patch and name the functions
consistently.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06 03:05:19 +00:00
Alasdair G Kergon
4db6bfe02b dm snapshot: split out exception store implementations
Move the existing snapshot exception store implementations out into
separate files.  Later patches will place these behind a new
interface in preparation for alternative implementations.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06 03:05:17 +00:00
Jonathan Brassow
aea53d92f7 dm snapshot: separate out exception store interface
Pull structures that bridge the gap between snapshot and
exception store out of dm-snap.h and put them in a new
.h file - dm-exception-store.h.  This file will define the
API for new exception stores.

Ultimately, dm-snap.h is unnecessary, since only dm-snap.c
should be using it.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06 03:05:15 +00:00
Mikulas Patocka
10d3bd09a3 dm: consolidate target deregistration error handling
Change dm_unregister_target to return void and use BUG() for error
reporting.

dm_unregister_target can only fail because of programming bug in the
target driver. It can't fail because of user's behavior or disk errors.

This patch changes unregister_target to return void and use BUG if
someone tries to unregister non-registered target or unregister target
that is in use.

This patch removes code duplication (testing of error codes in all dm
targets) and reports bugs in just one place, in dm_unregister_target. In
some target drivers, these return codes were ignored, which could lead
to a situation where bugs could be missed.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06 03:04:58 +00:00
Mikulas Patocka
90fa1527bd dm snapshot: change yield to msleep
Change yield() to msleep(1). If the thread had realtime priority,
yield() doesn't really yield, so the yielding process would loop
indefinitely and cause machine lockup.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06 03:04:54 +00:00
Mikulas Patocka
879129d208 dm snapshot: wait for chunks in destructor
If there are several snapshots sharing an origin and one is removed
while the origin is being written to, the snapshot's mempool may get
deleted while elements are still referenced.

Prior to dm-snapshot-use-per-device-mempools.patch the pending
exceptions may still have been referenced after the snapshot was
destroyed, but this was not a problem because the shared mempool
was still there.

This patch fixes the problem by tracking the number of mempool elements
in use.

The scenario:
- You have an origin and two snapshots 1 and 2.
- Someone writes to the origin.
- It creates two exceptions in the snapshots, snapshot 1 will be primary
exception, snapshot 2's pending_exception->primary_pe will point to the
exception in snapshot 1.
- The exceptions are being relocated, relocation of exception 1 finishes
(but it's pending_exception is still allocated, because it is referenced
by an exception from snapshot 2)
- The user lvremoves snapshot 1 --- it calls just suspend (does nothing)
and destructor. md->pending is zero (there is no I/O submitted to the
snapshot by md layer), so it won't help us.
- The destructor waits for kcopyd jobs to finish on snapshot 1 --- but
there are none.
- The destructor on snapshot 1 cleans up everything.
- The relocation of exception on snapshot 2 finishes, it drops reference
on primary_pe. This frees its primary_pe pointer. Primary_pe points to
pending exception created for snapshot 1. So it frees memory into
non-existing mempool.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2008-10-30 13:33:16 +00:00
Mikulas Patocka
60c856c8e2 dm snapshot: fix register_snapshot deadlock
register_snapshot() performs a GFP_KERNEL allocation while holding
_origins_lock for write, but that could write out dirty pages onto a
device that attempts to acquire _origins_lock for read, resulting in
deadlock.

So move the allocation up before taking the lock.

This path is not performance-critical, so it doesn't matter that we
allocate memory and free it if we find that we won't need it.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2008-10-30 13:33:12 +00:00
Mikulas Patocka
f68d4f3d39 dm snapshot: drop unused last_percent
The last_percent field is unused - remove it.
(It dates from when events were triggered as each X% filled up.)

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2008-10-21 17:44:53 +01:00