linux-stable/fs
Dave Chinner 638f44163d xfs: recovery of swap extents operations for CRC filesystems
This is the recovery side of the btree block owner change operation
performed by swapext on CRC enabled filesystems. We detect that an
owner change is needed by the flag that has been placed on the inode
log format flag field. Because the inode recovery is being replayed
after the buffers that make up the BMBT in the given checkpoint, we
can walk all the buffers and directly modify them when we see the
flag set on an inode.

Because the inode can be relogged and hence present in multiple
chekpoints with the "change owner" flag set, we could do multiple
passes across the inode to do this change. While this isn't optimal,
we can't directly ignore the flag as there may be multiple
independent swap extent operations being replayed on the same inode
in different checkpoints so we can't ignore them.

Further, because the owner change operation uses ordered buffers, we
might have buffers that are newer on disk than the current
checkpoint and so already have the owner changed in them. Hence we
cannot just peek at a buffer in the tree and check that it has the
correct owner and assume that the change was completed.

So, for the moment just brute force the owner change every time we
see an inode with the flag set. Note that we have to be careful here
because the owner of the buffers may point to either the old owner
or the new owner. Currently the verifier can't verify the owner
directly, so there is no failure case here right now. If we verify
the owner exactly in future, then we'll have to take this into
account.

This was tested in terms of normal operation via xfstests - all of
the fsr tests now pass without failure. however, we really need to
modify xfs/227 to stress v3 inodes correctly to ensure we fully
cover this case for v5 filesystems.

In terms of recovery testing, I used a hacked version of xfs_fsr
that held the temp inode open for a few seconds before exiting so
that the filesystem could be shut down with an open owner change
recovery flags set on at least the temp inode. fsr leaves the temp
inode unlinked and in btree format, so this was necessary for the
owner change to be reliably replayed.

logprint confirmed the tmp inode in the log had the correct flag set:

INO: cnt:3 total:3 a:0x69e9e0 len:56 a:0x69ea20 len:176 a:0x69eae0 len:88
        INODE: #regs:3   ino:0x44  flags:0x209   dsize:88
	                                 ^^^^^

0x200 is set, indicating a data fork owner change needed to be
replayed on inode 0x44.  A printk in the revoery code confirmed that
the inode change was recovered:

XFS (vdc): Mounting Filesystem
XFS (vdc): Starting recovery (logdev: internal)
recovering owner change ino 0x44
XFS (vdc): Version 5 superblock detected. This kernel L support enabled!
Use of these features in this kernel is at your own risk!
XFS (vdc): Ending recovery (logdev: internal)

The script used to test this was:

$ cat ./recovery-fsr.sh
#!/bin/bash

dev=/dev/vdc
mntpt=/mnt/scratch
testfile=$mntpt/testfile

umount $mntpt
mkfs.xfs -f -m crc=1 $dev
mount $dev $mntpt
chmod 777 $mntpt

for i in `seq 10000 -1 0`; do
        xfs_io -f -d -c "pwrite $(($i * 4096)) 4096" $testfile > /dev/null 2>&1
done
xfs_bmap -vp $testfile |head -20

xfs_fsr -d -v $testfile &
sleep 10
/home/dave/src/xfstests-dev/src/godown -f $mntpt
wait
umount $mntpt

xfs_logprint -t $dev |tail -20
time mount $dev $mntpt
xfs_bmap -vp $testfile
umount $mntpt
$

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-09-10 12:49:57 -05:00
..
9p Second round of 9p patches for the 3.11 merge window. 2013-07-11 10:21:23 -07:00
adfs Don't pass inode to ->d_hash() and ->d_compare() 2013-06-29 12:57:36 +04:00
affs Don't pass inode to ->d_hash() and ->d_compare() 2013-06-29 12:57:36 +04:00
afs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-07-03 09:10:19 -07:00
autofs4 helper for reading ->d_count 2013-07-05 18:59:33 +04:00
befs [readdir] convert befs 2013-06-29 12:56:55 +04:00
bfs [readdir] convert bfs 2013-06-29 12:56:33 +04:00
btrfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs 2013-07-09 12:33:09 -07:00
cachefiles mm: remove lru parameter from __pagevec_lru_add and remove parts of pagevec API 2013-07-03 16:07:31 -07:00
ceph Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client 2013-07-09 12:39:10 -07:00
cifs CIFS: Fix a deadlock when a file is reopened 2013-07-11 18:05:41 -05:00
coda helper for reading ->d_count 2013-07-05 18:59:33 +04:00
configfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-07-14 11:42:26 -07:00
cramfs [readdir] convert f2fs 2013-06-29 12:56:46 +04:00
debugfs debugfs: write_file_bool() - ensure strtobool() operates on valid data 2013-06-03 13:55:02 -07:00
devpts
dlm dlm: Avoid LVB truncation 2013-06-26 11:38:02 -05:00
ecryptfs Code cleanups and improved buffer handling during page crypto operations 2013-07-11 10:20:18 -07:00
efivarfs efivarfs: we can use simple_lookup() now 2013-07-14 17:48:35 +04:00
efs [readdir] convert efs 2013-06-29 12:56:31 +04:00
exofs Lots of bug fixes, cleanups and optimizations. In the bug fixes 2013-07-02 09:39:34 -07:00
exportfs [readdir] constify ->actor 2013-06-29 12:57:05 +04:00
ext2 [O_TMPFILE] it's still short a few helpers, but infrastructure should be OK now... 2013-06-29 12:57:10 +04:00
ext3 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2013-07-09 12:08:43 -07:00
ext4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-07-03 09:10:19 -07:00
f2fs f2fs: fix readdir incorrectness 2013-07-08 13:35:48 +04:00
fat fatfs: add FAT_IOCTL_GET_VOLUME_ID 2013-07-09 10:33:25 -07:00
freevxfs [readdir] convert freevxfs 2013-06-29 12:56:53 +04:00
fscache FS-Cache: Don't use spin_is_locked() in assertions 2013-06-19 14:16:47 +01:00
fuse mm: use totalram_pages instead of num_physpages at runtime 2013-07-03 16:07:35 -07:00
gfs2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-07-03 09:10:19 -07:00
hfs Don't pass inode to ->d_hash() and ->d_compare() 2013-06-29 12:57:36 +04:00
hfsplus Don't pass inode to ->d_hash() and ->d_compare() 2013-06-29 12:57:36 +04:00
hostfs [readdir] convert hostfs 2013-06-29 12:56:59 +04:00
hpfs Merge branch 'hpfs' from Mikulas Patocka 2013-07-04 11:22:55 -07:00
hppfs clean up scary strncpy(dst, src, strlen(src)) uses 2013-07-03 16:07:41 -07:00
hugetlbfs hugetlbfs: fix mmap failure in unaligned size request 2013-05-07 18:38:27 -07:00
isofs Don't pass inode to ->d_hash() and ->d_compare() 2013-06-29 12:57:36 +04:00
jbd jbd: change journal_invalidatepage() to accept length 2013-05-21 23:26:36 -04:00
jbd2 jbd2: invalidate handle if jbd2_journal_restart() fails 2013-07-01 08:12:41 -04:00
jffs2 [readdir] convert jffs2 2013-06-29 12:56:47 +04:00
jfs A couple cleanups to JFS for 3.11 2013-07-11 10:19:34 -07:00
lockd drivers: avoid parsing names as kthread_run() format strings 2013-07-03 16:07:41 -07:00
logfs Lots of bug fixes, cleanups and optimizations. In the bug fixes 2013-07-02 09:39:34 -07:00
minix minix: bug widening a binary "not" operation 2013-06-29 12:57:35 +04:00
ncpfs ncpfs: fix error return code in ncp_parse_options() 2013-07-09 10:33:25 -07:00
nfs NFS: Allow nfs_updatepage to extend a write under additional circumstances 2013-07-09 19:32:50 -04:00
nfs_common
nfsd Merge branch 'for-3.11' of git://linux-nfs.org/~bfields/linux 2013-07-11 10:17:13 -07:00
nilfs2 helper for reading ->d_count 2013-07-05 18:59:33 +04:00
nls
notify fsnotify: update comments concerning locking scheme 2013-07-09 10:33:20 -07:00
ntfs Lots of bug fixes, cleanups and optimizations. In the bug fixes 2013-07-02 09:39:34 -07:00
ocfs2 ocfs2: fix NULL pointer dereference when traversing o2hb_all_regions 2013-07-03 16:07:25 -07:00
omfs [readdir] convert omfs 2013-06-29 12:56:37 +04:00
openpromfs [readdir] convert openpromfs 2013-06-29 12:56:32 +04:00
proc fs/proc/kcore.c: using strlcpy() instead of strncpy() 2013-07-03 16:08:02 -07:00
pstore Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc 2013-07-04 10:29:23 -07:00
qnx4 [readdir] convert qnx4 2013-06-29 12:56:38 +04:00
qnx6 [readdir] convert qnx6 2013-06-29 12:56:39 +04:00
quota quota: Add a new quotactl command Q_XGETQSTATV 2013-08-20 16:53:58 -05:00
ramfs
reiserfs Lots of bug fixes, cleanups and optimizations. In the bug fixes 2013-07-02 09:39:34 -07:00
romfs [readdir] convert romfs 2013-06-29 12:56:29 +04:00
squashfs [readdir] convert squashfs 2013-06-29 12:56:28 +04:00
sysfs Driver core patches for 3.11-rc1 2013-07-02 11:44:19 -07:00
sysv Don't pass inode to ->d_hash() and ->d_compare() 2013-06-29 12:57:36 +04:00
ubifs Only a single patch which fixes a message. 2013-07-05 12:08:47 -07:00
udf udf: provide ->tmpfile() 2013-06-29 12:57:12 +04:00
ufs [readdir] simple local unixlike: switch to ->iterate() 2013-06-29 12:46:47 +04:00
xfs xfs: recovery of swap extents operations for CRC filesystems 2013-09-10 12:49:57 -05:00
aio.c aio: fix wrong comment in aio_complete() 2013-07-03 16:08:06 -07:00
anon_inodes.c
attr.c
bad_inode.c [readdir] ->readdir() is gone 2013-06-29 12:57:04 +04:00
binfmt_aout.c mm: remove free_area_cache 2013-07-10 18:11:34 -07:00
binfmt_elf.c mm: remove free_area_cache 2013-07-10 18:11:34 -07:00
binfmt_elf_fdpic.c Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc 2013-05-02 10:16:16 -07:00
binfmt_em86.c
binfmt_flat.c new helper: read_code() 2013-04-29 15:40:23 -04:00
binfmt_misc.c binfmt_misc: reuse string_unescape_inplace() 2013-04-30 17:04:03 -07:00
binfmt_script.c
binfmt_som.c
bio-integrity.c
bio.c Merge branch 'for-3.10/core' of git://git.kernel.dk/linux-block 2013-05-08 10:13:35 -07:00
block_dev.c Merge branch 'for-3.11/core' of git://git.kernel.dk/linux-block 2013-07-11 13:03:24 -07:00
buffer.c mm: vmscan: take page buffers dirty and locked state into account 2013-07-03 16:07:29 -07:00
char_dev.c
compat.c [readdir] constify ->actor 2013-06-29 12:57:05 +04:00
compat_binfmt_elf.c
compat_ioctl.c compat.c: LOOP_CLR_FD is taken care of in loop.c itself... 2013-06-29 12:46:44 +04:00
coredump.c coredump: '% at the end' shouldn't bypass core_uses_pid logic 2013-07-03 16:08:02 -07:00
coredump.h
dcache.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-07-03 09:10:19 -07:00
dcookies.c
direct-io.c Merge branch 'for-3.10/core' of git://git.kernel.dk/linux-block 2013-05-08 10:13:35 -07:00
drop_caches.c
eventfd.c
eventpoll.c Merge branch 'akpm' (updates from Andrew Morton) 2013-07-03 17:12:13 -07:00
exec.c fs/exec.c:de_thread: mt-exec should update ->real_start_time 2013-07-03 16:08:03 -07:00
fcntl.c
fhandle.c
file.c don't bother with deferred freeing of fdtables 2013-05-01 17:31:42 -04:00
file_table.c fput: turn "list_head delayed_fput_list" into llist_head 2013-07-13 13:29:10 +04:00
filesystems.c
fs-writeback.c mm/writeback: don't check force_wait to handle bdi->work_list 2013-07-09 10:33:22 -07:00
fs_struct.c
generic_acl.c
inode.c allow the temp files created by open() to be linked to 2013-06-29 12:57:11 +04:00
internal.h constify rw_verify_area() 2013-06-29 12:57:34 +04:00
ioctl.c
ioprio.c
Kconfig efivarfs: Move to fs/efivarfs 2013-04-17 13:25:09 +01:00
Kconfig.binfmt fs: make binfmt support for #! scripts modular and removable 2013-04-30 17:04:04 -07:00
libfs.c make simple_lookup() usable for filesystems that set ->s_d_op 2013-07-14 17:43:25 +04:00
locks.c locks: move file_lock_list to a set of percpu hlist_heads and convert file_lock_lock to an lglock 2013-07-08 13:36:42 +04:00
Makefile Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-05-01 17:51:54 -07:00
mbcache.c
mount.h get rid of full-hash scan on detaching vfsmounts 2013-04-09 14:12:52 -04:00
mpage.c
namei.c Safer ABI for O_TMPFILE 2013-07-13 13:26:37 +04:00
namespace.c create_mnt_ns: unidiomatic use of list_add() 2013-05-04 15:18:53 -04:00
no-block.c
open.c Safer ABI for O_TMPFILE 2013-07-13 13:26:37 +04:00
pipe.c aio: don't include aio.h in sched.h 2013-05-07 20:16:25 -07:00
pnode.c vfs: Fix invalid ida_remove() call 2013-05-31 15:16:33 -04:00
pnode.h Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-05-01 17:51:54 -07:00
posix_acl.c
proc_namespace.c
read_write.c vfs: export lseek_execute() to modules 2013-07-03 16:23:27 +04:00
readdir.c [readdir] constify ->actor 2013-06-29 12:57:05 +04:00
select.c net: rename include/net/ll_poll.h to include/net/busy_poll.h 2013-07-10 17:08:27 -07:00
seq_file.c seq_file: add seq_list_*_percpu helpers 2013-07-08 13:36:41 +04:00
signalfd.c
splice.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-07-03 09:10:19 -07:00
stack.c
stat.c
statfs.c
super.c
sync.c
timerfd.c timerfd: Add alarm timers 2013-05-29 12:57:34 -07:00
utimes.c
xattr.c
xattr_acl.c