Commit graph

138 commits

Author SHA1 Message Date
Philipp Reisner
599377acb7 drbd: Avoid NetworkFailure state during disconnect
Disconnecting is a cluster wide state change. In case the peer node agrees
to the state transition, it sends back the fact on the meta-data connection
and closes both sockets.

In case the node node that initiated the state transfer sees the closing
action on the data-socket, before the P_STATE_CHG_REPLY packet, it was
going into one of the network failure states.

At least with the fencing option set to something else thatn "dont-care",
the unclean shutdown of the connection causes a short IO freeze or
a fence operation.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-10-30 08:39:17 +01:00
Lars Ellenberg
02b91b5526 drbd: introduce stop-sector to online verify
We now can schedule only a specific range of sectors for online verify,
or interrupt a running verify without interrupting the connection.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-10-30 08:39:01 +01:00
Philipp Reisner
9f2247bb9b drbd: Protect accesses to the uuid set with a spinlock
There is at least the worker context, the receiver context, the context of
receiving netlink packts and processes reading a sysfs attribute that access
the uuids.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-10-30 08:39:01 +01:00
Philipp Reisner
d1aa4d04da drbd: Write all pages of the bitmap after an online resize
We need to write the whole bitmap after we moved the meta data
due to an online resize operation.

With the support for one peta byte devices bitmap IO was optimized
to only write out touched pages. This optimization must be turned
off when writing the bitmap after an online resize.

This issue was introduced with drbd-8.3.10.

The impact of this bug is that after an online resize, the next
resync could become larger than expected.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-08-16 17:17:35 +02:00
Lars Ellenberg
db141b2f42 drbd: fix max_bio_size to be unsigned
We capped our max_bio_size respectively max_hw_sectors with
min_t(int, lower level limit, our limit);
unfortunately, some drivers, e.g. the kvm virtio block driver, initialize their
limits to "-1U", and that is of course a smaller "int" value than our limit.

Impact: we started to request 16 MB resync requests,
which lead to protocol error and a reconnect loop.

Fix all relevant constants and parameters to be unsigned int.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-07-24 15:14:00 +02:00
Lars Ellenberg
7ee1fb93f3 drbd: flush drbd work queue before invalidate/invalidate remote
If you do back to back wait-sync/invalidate on a Primary in a tight loop,
during application IO load, you could trigger a race:
  kernel: block drbd6: FIXME going to queue 'set_n_write from StartingSync'
	but 'write from resync_finished' still pending?

Fix this by changing the order of the drbd_queue_work() and
the wake_up() in dec_ap_pending(), and adding the additional
drbd_flush_workqueue() before requesting the full sync.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-07-24 14:15:58 +02:00
Lars Ellenberg
c2ba686f35 drbd: report congestion if we are waiting for some userland callback
If the drbd worker thread is synchronously waiting for some userland
callback, we don't want some casual pageout to block on us.
Have drbd_congested() report congestion in that case.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-07-24 14:07:18 +02:00
Lars Ellenberg
383606e0de drbd: differentiate between normal and forced detach
Aborting local requests (not waiting for completion from the lower level
disk) is dangerous: if the master bio has been completed to upper
layers, data pages may be re-used for other things already.
If local IO is still pending and later completes,
this may cause crashes or corrupt unrelated data.

Only abort local IO if explicitly requested.
Intended use case is a lower level device that turned into a tarpit,
not completing io requests, not even doing error completion.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-07-24 14:06:18 +02:00
Lars Ellenberg
d264580145 drbd: cleanup, remove two unused global flags
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-07-24 14:02:41 +02:00
Lars Ellenberg
9476f39d66 drbd: introduce a bio_set to allocate housekeeping bios from
Don't rely on availability of bios from the global fs_bio_set,
we should use our own bio_set for meta data IO.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-05-09 15:17:07 +02:00
Lars Ellenberg
4281808fb3 drbd: add page pool to be used for meta data IO
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-05-09 15:17:02 +02:00
Lars Ellenberg
0e8488ade2 drbd: allow bitmap to change during writeout from resync_finished
Symptom: messages similar to
 "FIXME asender in bm_change_bits_to,
  bitmap locked for 'write from resync_finished' by worker"

If a resync or verify is finished (or aborted), a full bitmap writeout
is triggered.  If we have ongoing local IO, the bitmap may still change
during that writeout, pending and not yet processed acks may cause bits
to be cleared, while new writes may cause bits to be to be set.

To fix this, introduce the drbd_bm_write_copy_pages() variant.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-05-09 15:17:00 +02:00
Lars Ellenberg
ba280c092e drbd: fix resend/resubmit of frozen IO
DRBD can freeze IO, due to fencing policy (fencing resource-and-stonith),
or because we lost access to data (on-no-data-accessible suspend-io).

Resuming from there (re-connect, or re-attach, or explicit admin
intervention) should "just work".

Unfortunately, if the re-attach/re-connect did not happen within
the timeout, since the commit
  drbd: Implemented real timeout checking for request processing time
if so configured, the request_timer_fn() would timeout and
detach/disconnect virtually immediately.

This change tracks the most recent attach and connect, and does not
timeout within <configured timeout interval> after attach/connect.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-05-09 15:16:58 +02:00
Philipp Reisner
197296ffed drbd: Delay/reject other state changes while establishing a connection
Changes to the role and disk state should be delayed or rejected
while we establish a connection.

This is necessary, since the peer will base its resync decision
on the UUIDs and the state we sent in the drbd_connect() function.

The most prominent example for this race is becoming primary after
sending state and UUIDs and before the state changes to C_WF_CONNECTION.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-05-09 15:16:55 +02:00
Lars Ellenberg
7ffcaa7194 drbd: remove unused static helper function
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-05-09 15:16:44 +02:00
Lars Ellenberg
a5d214f621 drbd: remove some very outdated comments
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-05-09 15:16:43 +02:00
Lars Ellenberg
671a74e749 drbd: remove now unused seq_num member from struct drbd_request
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-05-09 15:16:40 +02:00
Philipp Reisner
7caacb69ac drbd: Consider the disk-timeout also for meta-data IO operations
If the backing device is already frozen during attach, we failed
to recognize that. The current disk-timeout code works on top
of the drbd_request objects. During attach we do not allow IO
and therefore never generate a drbd_request object but block
before that in drbd_make_request().

This patch adds the timeout to all drbd_md_sync_page_io().

Before this patch we used to go from D_ATTACHING directly
to D_DISKLESS if IO failed during attach. We can no longer
do this since we have to stay in D_FAILED until all IO
ops issued to the backing device returned.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-05-09 15:16:30 +02:00
Lars Ellenberg
f479ea0661 drbd: send intermediate state change results to the peer
DRBD state changes schedule after_state_ch() actions to a worker thread,
which decides on the old and new states of that change, whether to send
an informational state update packet (P_STATE) to the peer.
If it decides to drbd_send_state(), it would however always send the
_curent_ state, which, if a second state change happens before the
after_state_ch() of the first ran, may "fast-forward" the peer's view
about this node.  In most cases that is harmless, but sometimes this can
confuse DRBD, for example into not actually starting a necessary resync
if you do a very tight detach/attach loop on a Connected Secondary.

Fix this by always sending the "new" state of the respective state
transition which scheduled this after_state_ch() work.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-05-09 15:15:56 +02:00
Philipp Reisner
5ca1de0384 drbd: Allow new IOs while the local disk in in FAILED state
The last bunch of commits prepared the 'detach from tar pit' feature.
With that we can be for long time in disk state FAILED. We need
to accept new IO requests during that time.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-05-09 15:10:34 +02:00
Philipp Reisner
0c46442515 drbd: Implemented wait_until_done_or_disk_failure()
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-05-09 10:26:51 +02:00
Philipp Reisner
e17117310b drbd: Replaced md_io_mutex by an atomic: md_io_in_use
The new function drbd_md_get_buffer() aborts waiting for the buffer
in case the disk failes in the meantime.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-05-09 10:22:31 +02:00
Philipp Reisner
cc94c65015 drbd: moved md_io into mdev
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-05-09 10:17:24 +02:00
Philipp Reisner
6d7e32f568 drbd: Keep a reference to barrier acked requests
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-05-09 10:15:28 +02:00
David Howells
5f138ce01a DRBD: Fix comparison always false warning due to long/long long compare
Fix warnings of the following nature in the drbd header:

In file included from drivers/block/drbd/drbd_bitmap.c:32:
drivers/block/drbd/drbd_int.h: In function 'drbd_get_syncer_progress':
drivers/block/drbd/drbd_int.h:2234: warning: comparison is always false due to limited range of data

where mdev->rs_total (an unsigned long) is being compared to 1ULL << 32, which
is always false on a 32-bit machine.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
2012-05-09 10:03:19 +02:00
Rusty Russell
90ab5ee941 module_param: make bool parameters really bool (drivers & misc)
module_param(bool) used to counter-intuitively take an int.  In
fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy
trick.

It's time to remove the int/unsigned int option.  For this version
it'll simply give a warning, but it'll break next kernel version.

Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-13 09:32:20 +10:30
Linus Torvalds
b4fdcb02f1 Merge branch 'for-3.2/core' of git://git.kernel.dk/linux-block
* 'for-3.2/core' of git://git.kernel.dk/linux-block: (29 commits)
  block: don't call blk_drain_queue() if elevator is not up
  blk-throttle: use queue_is_locked() instead of lockdep_is_held()
  blk-throttle: Take blkcg->lock while traversing blkcg->policy_list
  blk-throttle: Free up policy node associated with deleted rule
  block: warn if tag is greater than real_max_depth.
  block: make gendisk hold a reference to its queue
  blk-flush: move the queue kick into
  blk-flush: fix invalid BUG_ON in blk_insert_flush
  block: Remove the control of complete cpu from bio.
  block: fix a typo in the blk-cgroup.h file
  block: initialize the bounce pool if high memory may be added later
  block: fix request_queue lifetime handling by making blk_queue_cleanup() properly shutdown
  block: drop @tsk from attempt_plug_merge() and explain sync rules
  block: make get_request[_wait]() fail if queue is dead
  block: reorganize throtl_get_tg() and blk_throtl_bio()
  block: reorganize queue draining
  block: drop unnecessary blk_get/put_queue() in scsi_cmd_ioctl() and blk_get_tg()
  block: pass around REQ_* flags instead of broken down booleans during request alloc/free
  block: move blk_throtl prototypes to block/blk.h
  block: fix genhd refcounting in blkio_policy_parse_and_set()
  ...

Fix up trivial conflicts due to "mddev_t" -> "struct mddev" conversion
and making the request functions be of type "void" instead of "int" in
 - drivers/md/{faulty.c,linear.c,md.c,md.h,multipath.c,raid0.c,raid1.c,raid10.c,raid5.c}
 - drivers/staging/zram/zram_drv.c
2011-11-04 17:06:58 -07:00
Jesper Juhl
e5de063016 Remove unneeded version.h includes from drivers/block/
It was pointed out by 'make versioncheck' that some includes of
linux/version.h are not needed in drivers/block/.
This patch removes them.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-09-15 14:57:06 +02:00
Joe Perches
1d273b929c drbd: Use angle brackets for system includes
Use the normal include style.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-09-15 14:02:57 +02:00
Christoph Hellwig
5a7bbad27a block: remove support for bio remapping from ->make_request
There is very little benefit in allowing to let a ->make_request
instance update the bios device and sector and loop around it in
__generic_make_request when we can archive the same through calling
generic_make_request from the driver and letting the loop in
generic_make_request handle it.

Note that various drivers got the return value from ->make_request and
returned non-zero values for errors.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-09-12 12:12:01 +02:00
Linus Torvalds
929cfdd5d3 Merge branch 'for-2.6.40/drivers' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.40/drivers' of git://git.kernel.dk/linux-2.6-block: (110 commits)
  loop: handle on-demand devices correctly
  loop: limit 'max_part' module param to DISK_MAX_PARTS
  drbd: fix warning
  drbd: fix warning
  drbd: Fix spelling
  drbd: fix schedule in atomic
  drbd: Take a more conservative approach when deciding max_bio_size
  drbd: Fixed state transitions after async outdate-peer-handler returned
  drbd: Disallow the peer_disk_state to be D_OUTDATED while connected
  drbd: Fix for the connection problems on high latency links
  drbd: fix potential activity log refcount imbalance in error path
  drbd: Only downgrade the disk state in case of disk failures
  drbd: fix disconnect/reconnect loop, if ping-timeout == ping-int
  drbd: fix potential distributed deadlock
  lru_cache.h: fix comments referring to ts_ instead of lc_
  drbd: Fix for application IO with the on-io-error=pass-on policy
  xen/p2m: Add EXPORT_SYMBOL_GPL to the M2P override functions.
  xen/p2m/m2p/gnttab: Support GNTMAP_host_map in the M2P override.
  xen/blkback: don't fail empty barrier requests
  xen/blkback: fix xenbus_transaction_start() hang caused by double xenbus_transaction_end()
  ...
2011-05-25 09:15:35 -07:00
Andrew Morton
0ddf72be4e drbd: fix warning
In file included from drivers/block/drbd/drbd_main.c:54:                        drivers/block/drbd/drbd_int.h:1190: warning: parameter has incomplete type

Forward declarations of enums do not work.

Fix it unpleasantly by moving the prototype.

Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Lars Ellenberg <drbd-dev@lists.linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2011-05-24 10:38:33 +02:00
Bart Van Assche
24c4830c8e drbd: Fix spelling
Found these with the help of ispell -l.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
2011-05-24 10:21:29 +02:00
Lars Ellenberg
9a0d9d0389 drbd: fix schedule in atomic
An administrative detach used to request a state change directly to D_DISKLESS,
first suspending IO to avoid the last put_ldev() occuring from an endio handler,
potentially in irq context.

This is not enough on the receiving side (typically secondary), we may miss
some peer_req on the way to local disk, which then may do the last put_ldev()
from their drbd_peer_request_endio().

This patch makes the detach always go through the intermediate D_FAILED state.
We may consider to rename it D_DETACHING.

Alternative approach would be to create yet an other work item to be scheduled
on the worker, do the destructor work from there, and get the timing right.

manually picked commit 564040f from the drbd 8.4 branch.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-05-24 10:14:32 +02:00
Philipp Reisner
99432fcc52 drbd: Take a more conservative approach when deciding max_bio_size
The old (optimistic) implementation could shrink the bio size
on an primary device.

Shrinking the bio size on a primary device is bad. Since there
we might get BIOs with the old (bigger) size shortly after
we published the new size.

The new implementation is more conservative, and eventually
increases the max_bio_size on a primary device (which is valid).
It does so, when it knows the local limit AND the remote limit.

 We cache the last seen max_bio_size of the peer in the meta
 data, and rely on that, to make the operation of single
 nodes more efficient.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-05-24 10:08:58 +02:00
Philipp Reisner
d2e17807e3 drbd: Only downgrade the disk state in case of disk failures
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-05-24 10:05:48 +02:00
Philipp Reisner
738a84b25c drbd: Fix for application IO with the on-io-error=pass-on policy
In case a write failes on the local disk, go into D_INCONSISTENT
disk state. That causes future reads of that block to be shipped
to the peer.

Read retry remote was already in place.

Actually the documentation needs to get fixed now. Since the
application is still shielded from the error. (as long as we have
only a single disk failing) The difference to detach is that
we keep the disk. And therefore might keep all the other, still
working sectors up to date.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-05-24 09:59:49 +02:00
Paul Gortmaker
70c7160619 Add appropriate <linux/prefetch.h> include for prefetch users
After discovering that wide use of prefetch on modern CPUs
could be a net loss instead of a win, net drivers which were
relying on the implicit inclusion of prefetch.h via the list
headers showed up in the resulting cleanup fallout.  Give
them an explicit include via the following $0.02 script.

 =========================================
 #!/bin/bash
 MANUAL=""
 for i in `git grep -l 'prefetch(.*)' .` ; do
 	grep -q '<linux/prefetch.h>' $i
 	if [ $? = 0 ] ; then
 		continue
 	fi

 	(	echo '?^#include <linux/?a'
 		echo '#include <linux/prefetch.h>'
 		echo .
 		echo w
 		echo q
 	) | ed -s $i > /dev/null 2>&1
 	if [ $? != 0 ]; then
 		echo $i needs manual fixup
 		MANUAL="$i $MANUAL"
 	fi
 done
 echo ------------------- 8\<----------------------
 echo vi $MANUAL
 =========================================

Signed-off-by: Paul <paul.gortmaker@windriver.com>
[ Fixed up some incorrect #include placements, and added some
  non-network drivers and the fib_trie.c case    - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-22 21:41:57 -07:00
Lucas De Marchi
25985edced Fix common misspellings
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-31 11:26:23 -03:00
Philipp Reisner
7fde2be930 drbd: Implemented real timeout checking for request processing time
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10 11:48:16 +01:00
Lars Ellenberg
20ceb2b22e drbd: describe bitmap locking for bulk operation in finer detail
Now that we do no longer in-place endian-swap the bitmap, we allow
selected bitmap operations (testing bits, sometimes even settting bits)
during some bulk operations.

This caused us to hit a lot of FIXME asserts similar to
	FIXME asender in drbd_bm_count_bits,
	bitmap locked for 'write from resync_finished' by worker
Which now is nonsense: looking at the bitmap is perfectly legal
as long as it is not being resized.

This cosmetic patch defines some flags to describe expectations in finer
detail, so the asserts in e.g. bm_change_bits_to() can be skipped if
appropriate.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10 11:48:02 +01:00
Lars Ellenberg
62b0da3a24 drbd: log UUIDs whenever they change
All decisions about sync, sync direction, and wether or not to
allow a connect or attach are based on our set of UUIDs to tag a
data generation.

Log changes to the UUIDs whenever they occur,
logging "new current UUID P:Q:R:S" is more useful
than "Creating new current UUID".

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10 11:48:01 +01:00
Philipp Reisner
370a43e798 drbd: Work on the Ahead -> SyncSource transition
The test if rs_pending_cnt == 0 was too weak. Using Test for
unacked_cnt == 0 instead. Moved that into the worker.

Since unacked_cnt gets already increased when an P_RS_DATA_REQ
comes in.

Also using a timer to make Ahead -> SyncSource -> Ahead cycles
slower...

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10 11:45:36 +01:00
Philipp Reisner
4a23f26496 drbd: Do not full sync if a P_SYNC_UUID packet gets lost
See also commit from 2009-08-15
"drbd_uuid_compare(): Do not full sync in case a P_SYNC_UUID packet gets lost."

We saw cases where the History UUIDs where not as expected. So the
detection of the special case did not trigger. With the sync UUID
no longer being a random number, but deducible from the previous
bitmap UUID, the detection of this special case becomes more
reliable.

The SyncUUID now is the previous bitmap UUID + 0x1000000000000.

Rule 5a:
Cs = H1p & H1p + Offset = Bp
  Connection was lost before SyncUUID Packet came through.
  Corrent (peer) UUIDs:
   Bp = H1p
   H1p = H2p
   H2p = 0
  Become Sync target.

Rule 7a:
Cp = H1s & H1s + Offset = Bs
  Connection was lost before SyncUUID Packet came through.
  Correct (own) UUIDs:
   Bs = H1s
   H1s = H2s
   H2s = 0
  Become Sync source.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10 11:45:32 +01:00
Andreas Gruenbacher
110a204a35 drbd: Remove useless / wrong comments
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10 11:45:29 +01:00
Philipp Reisner
794abb753e drbd: Cleaned up the resync timer logic
Besides removed a few lines of code, this moves the inspection
of the state from before the queuing process to after the queuing.
I.e. more closely to the actual invocation of the work.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10 11:45:28 +01:00
Philipp Reisner
d612d309e4 drbd: No longer answer P_RS_DATA_REQUEST packets when in C_AHEAD mode
When the sync source node replies to a P_RS_DATA_REQUEST packet
when it is already in ahead mode. I.e. those two packets
crossed each other on the wire, that may lead to diverging
bitmaps.

  This never happens in a well-tuned-system. In a well-tuned-
  system the resync controller has reduced the resync speed
  to zero long before we got into ahead-mode.

But we have to be prepared for the not-well-tuned-system
of course as well.
Because -> diverging bitmaps = non terminating resync.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10 11:45:25 +01:00
Lars Ellenberg
5a22db8968 drbd: serialize sending of resync uuid with pending w_send_oos
To improve the latency of IO requests during bitmap exchange,
we recently allowed writes while waiting for the bitmap, sending "set
out-of-sync" information packets for any newly dirtied bits.

We have to make sure that the new resync-uuid does not overtake
these "set oos" packets. Once the resync-uuid is received, the
sync target starts the resync process, and expects the bitmap to
only be cleared, not re-set.

If we use this protocol extension, we queue the generation and sending
of the resync-uuid on the worker, which naturally serializes with all
previously queued packets.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10 11:43:35 +01:00
Lars Ellenberg
4b0715f096 drbd: allow petabyte storage on 64bit arch
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10 11:43:24 +01:00
Lars Ellenberg
19f843aa08 drbd: bitmap keep track of changes vs on-disk bitmap
When we set or clear bits in a bitmap page,
also set a flag in the page->private pointer.

This allows us to skip writes of unchanged pages.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10 11:43:19 +01:00