linux-stable/drivers/md
Song Liu a39f7afde3 md/r5cache: write-out phase and reclaim support
There are two limited resources, stripe cache and journal disk space.
For better performance, we priotize reclaim of full stripe writes.
To free up more journal space, we free earliest data on the journal.

In current implementation, reclaim happens when:
1. Periodically (every R5C_RECLAIM_WAKEUP_INTERVAL, 30 seconds) reclaim
   if there is no reclaim in the past 5 seconds.
2. when there are R5C_FULL_STRIPE_FLUSH_BATCH (256) cached full stripes,
   or cached stripes is enough for a full stripe (chunk size / 4k)
   (r5c_check_cached_full_stripe)
3. when there is pressure on stripe cache (r5c_check_stripe_cache_usage)
4. when there is pressure on journal space (r5l_write_stripe, r5c_cache_data)

r5c_do_reclaim() contains new logic of reclaim.

For stripe cache:

When stripe cache pressure is high (more than 3/4 stripes are cached,
or there is empty inactive lists), flush all full stripe. If fewer
than R5C_RECLAIM_STRIPE_GROUP (NR_STRIPE_HASH_LOCKS * 2) full stripes
are flushed, flush some paritial stripes. When stripe cache pressure
is moderate (1/2 to 3/4 of stripes are cached), flush all full stripes.

For log space:

To avoid deadlock due to log space, we need to reserve enough space
to flush cached data. The size of required log space depends on total
number of cached stripes (stripe_in_journal_count). In current
implementation, the writing-out phase automatically include pending
data writes with parity writes (similar to write through case).
Therefore, we need up to (conf->raid_disks + 1) pages for each cached
stripe (1 page for meta data, raid_disks pages for all data and
parity). r5c_log_required_to_flush_cache() calculates log space
required to flush cache. In the following, we refer to the space
calculated by r5c_log_required_to_flush_cache() as
reclaim_required_space.

Two flags are added to r5conf->cache_state: R5C_LOG_TIGHT and
R5C_LOG_CRITICAL. R5C_LOG_TIGHT is set when free space on the log
device is less than 3x of reclaim_required_space. R5C_LOG_CRITICAL
is set when free space on the log device is less than 2x of
reclaim_required_space.

r5c_cache keeps all data in cache (not fully committed to RAID) in
a list (stripe_in_journal_list). These stripes are in the order of their
first appearance on the journal. So the log tail (last_checkpoint)
should point to the journal_start of the first item in the list.

When R5C_LOG_TIGHT is set, r5l_reclaim_thread starts flushing out
stripes at the head of stripe_in_journal. When R5C_LOG_CRITICAL is
set, the state machine only writes data that are already in the
log device (in stripe_in_journal_list).

This patch includes a fix to improve performance by
Shaohua Li <shli@fb.com>.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2016-11-18 13:26:48 -08:00
..
bcache block: export bio_free_pages to other modules 2016-09-22 07:48:03 -06:00
persistent-data dm array: introduce cursor api 2016-09-22 11:15:04 -04:00
bitmap.c md/bitmap: add blktrace event for writes to the bitmap 2016-11-18 09:34:45 -08:00
bitmap.h md-cluster: sync bitmap when node received RESYNCING msg 2016-05-04 12:39:35 -07:00
dm-bio-prison.c block: add a bi_error field to struct bio 2015-07-29 08:55:15 -06:00
dm-bio-prison.h
dm-bio-record.h
dm-bufio.c . various fixes and cleanups for request-based DM core 2016-10-09 17:16:18 -07:00
dm-bufio.h
dm-builtin.c dm: move request-based code out to dm-rq.[hc] 2016-06-10 15:15:44 -04:00
dm-cache-block-types.h
dm-cache-metadata.c dm cache metadata: switch to using the new cursor api for loading metadata 2016-09-22 11:15:05 -04:00
dm-cache-metadata.h dm cache: make sure every metadata function checks fail_io 2016-03-10 17:12:12 -05:00
dm-cache-policy-cleaner.c dm cache: speed up writing of the hint array 2016-09-22 11:15:02 -04:00
dm-cache-policy-internal.h dm cache: speed up writing of the hint array 2016-09-22 11:15:02 -04:00
dm-cache-policy-smq.c dm cache policy smq: distribute entries to random levels when switching to smq 2016-09-22 11:15:03 -04:00
dm-cache-policy.c
dm-cache-policy.h dm cache: speed up writing of the hint array 2016-09-22 11:15:02 -04:00
dm-cache-target.c block: rename bio bi_rw to bi_opf 2016-08-07 14:41:02 -06:00
dm-core.h dm: move request-based code out to dm-rq.[hc] 2016-06-10 15:15:44 -04:00
dm-crypt.c . various fixes and cleanups for request-based DM core 2016-10-09 17:16:18 -07:00
dm-delay.c dm: rename target's per_bio_data_size to per_io_data_size 2016-02-22 22:34:37 -05:00
dm-era-target.c block: rename bio bi_rw to bi_opf 2016-08-07 14:41:02 -06:00
dm-exception-store.c - Revert a dm-multipath change that caused a regression for unprivledged 2015-11-04 21:19:53 -08:00
dm-exception-store.h dm snapshot: fix hung bios when copy error occurs 2016-01-08 20:03:05 -05:00
dm-flakey.c dm flakey: fix reads to be issued if drop_writes configured 2016-08-24 21:55:05 -04:00
dm-io.c block: rename bio bi_rw to bi_opf 2016-08-07 14:41:02 -06:00
dm-ioctl.c dm: allow bio-based table to be upgraded to bio-based with DAX support 2016-07-20 23:49:52 -04:00
dm-kcopyd.c dm: move request-based code out to dm-rq.[hc] 2016-06-10 15:15:44 -04:00
dm-linear.c libnvdimm for 4.8 2016-07-28 17:38:16 -07:00
dm-log-userspace-base.c dm: drop NULL test before kmem_cache_destroy() and mempool_destroy() 2015-10-31 19:06:00 -04:00
dm-log-userspace-transfer.c
dm-log-userspace-transfer.h
dm-log-writes.c Merge branch 'for-4.9/block' of git://git.kernel.dk/linux-block 2016-10-07 14:42:05 -07:00
dm-log.c dm log: fix unitialized bio operation flags 2016-08-24 21:55:05 -04:00
dm-mpath.c dm mpath: always return reservation conflict without failing over 2016-09-29 10:57:07 -04:00
dm-mpath.h
dm-path-selector.c
dm-path-selector.h dm path selector: remove 'repeat_count' return from .select_path hook 2016-02-22 22:34:42 -05:00
dm-queue-length.c dm path selector: remove 'repeat_count' return from .select_path hook 2016-02-22 22:34:42 -05:00
dm-raid.c dm raid: fix activation of existing raid4/10 devices 2016-10-17 16:41:31 -04:00
dm-raid1.c dm mirror: use all available legs on multiple failures 2016-10-14 11:55:17 -04:00
dm-region-hash.c block: rename bio bi_rw to bi_opf 2016-08-07 14:41:02 -06:00
dm-round-robin.c dm round robin: do not use this_cpu_ptr() without having preemption disabled 2016-08-15 09:23:14 -04:00
dm-rq.c - A couple DM raid and DM mirror fixes 2016-10-28 09:27:58 -07:00
dm-rq.h dm rq: introduce dm_mq_kick_requeue_list() 2016-09-15 11:16:05 -04:00
dm-service-time.c dm path selector: remove 'repeat_count' return from .select_path hook 2016-02-22 22:34:42 -05:00
dm-snap-persistent.c dm: use bio op accessors 2016-06-07 13:41:38 -06:00
dm-snap-transient.c dm snapshot: fix hung bios when copy error occurs 2016-01-08 20:03:05 -05:00
dm-snap.c block: rename bio bi_rw to bi_opf 2016-08-07 14:41:02 -06:00
dm-stats.c dm: move request-based code out to dm-rq.[hc] 2016-06-10 15:15:44 -04:00
dm-stats.h
dm-stripe.c block: rename bio bi_rw to bi_opf 2016-08-07 14:41:02 -06:00
dm-switch.c dm switch: simplify conditional in alloc_region_table() 2015-10-31 19:06:06 -04:00
dm-sysfs.c dm: move request-based code out to dm-rq.[hc] 2016-06-10 15:15:44 -04:00
dm-table.c dm table: fix missing dm_put_target_type() in dm_table_add_target() 2016-10-24 11:17:46 -04:00
dm-target.c libnvdimm for 4.8 2016-07-28 17:38:16 -07:00
dm-thin-metadata.c dm thin: fix a race condition between discarding and provisioning a block 2016-07-20 12:43:35 -04:00
dm-thin-metadata.h dm thin: fix a race condition between discarding and provisioning a block 2016-07-20 12:43:35 -04:00
dm-thin.c block: rename bio bi_rw to bi_opf 2016-08-07 14:41:02 -06:00
dm-uevent.c
dm-uevent.h
dm-verity-fec.c dm verity fec: fix block calculation 2016-07-01 23:29:08 -04:00
dm-verity-fec.h dm verity: add support for forward error correction 2015-12-10 10:39:03 -05:00
dm-verity-target.c dm: rename target's per_bio_data_size to per_io_data_size 2016-02-22 22:34:37 -05:00
dm-verity.h dm verity: add ignore_zero_blocks feature 2015-12-10 10:39:03 -05:00
dm-zero.c block: rename bio bi_rw to bi_opf 2016-08-07 14:41:02 -06:00
dm.c - A couple DM raid and DM mirror fixes 2016-10-28 09:27:58 -07:00
dm.h dm: add infrastructure for DAX support 2016-07-20 23:49:49 -04:00
faulty.c MD: rename some functions 2016-01-20 13:52:20 -08:00
Kconfig dm: add missing newline between DM_DEBUG_BLOCK_STACK_TRACING and DM_BUFIO 2016-03-10 17:12:11 -05:00
linear.c md: add block tracing for bio_remapping 2016-11-18 09:32:50 -08:00
linear.h
Makefile dm: move request-based code out to dm-rq.[hc] 2016-06-10 15:15:44 -04:00
md-cluster.c md-cluster: make resync lock also could be interruptted 2016-09-21 09:09:44 -07:00
md-cluster.h md-cluster: gather resync infos and enable recv_thread after bitmap is ready 2016-05-09 09:24:03 -07:00
md.c md: add blktrace event for writes to superblock 2016-11-18 09:47:57 -08:00
md.h md: define mddev flags, recovery flags and r1bio state bits using enums 2016-11-09 12:53:52 -08:00
multipath.c md/multipath: replace printk() with pr_*() 2016-11-07 15:08:22 -08:00
multipath.h
raid0.c md: add block tracing for bio_remapping 2016-11-18 09:32:50 -08:00
raid0.h block: kill merge_bvec_fn() completely 2015-08-13 12:31:57 -06:00
raid1.c md/raid1, raid10: add blktrace records when IO is delayed 2016-11-18 09:35:37 -08:00
raid1.h md: define mddev flags, recovery flags and r1bio state bits using enums 2016-11-09 12:53:52 -08:00
raid5-cache.c md/r5cache: write-out phase and reclaim support 2016-11-18 13:26:48 -08:00
raid5.c md/r5cache: write-out phase and reclaim support 2016-11-18 13:26:48 -08:00
raid5.h md/r5cache: write-out phase and reclaim support 2016-11-18 13:26:48 -08:00
raid10.c md/raid1, raid10: add blktrace records when IO is delayed 2016-11-18 09:35:37 -08:00
raid10.h raid10: improve random reads performance 2016-07-19 15:20:28 -07:00