linux-stable/fs
Dave Chinner 07428d7f0c xfs: fix attr tree double split corruption
In certain circumstances, a double split of an attribute tree is
needed to insert or replace an attribute. In rare situations, this
can go wrong, leaving the attribute tree corrupted. In this case,
the attr being replaced is the last attr in a leaf node, and the
replacement is larger so doesn't fit in the same leaf node.
When we have the initial condition of a node format attribute
btree with two leaves at index 1 and 2. Call them L1 and L2.  The
leaf L1 is completely full, there is not a single byte of free space
in it. L2 is mostly empty.  The attribute being replaced - call it X
- is the last attribute in L1.

The way an attribute replace is executed is that the replacement
attribute - call it Y - is first inserted into the tree, but has an
INCOMPLETE flag set on it so that list traversals ignore it. Once
this transaction is committed, a second transaction it run to
atomically mark Y as COMPLETE and X as INCOMPLETE, so that a
traversal will now find Y and skip X. Once that transaction is
committed, attribute X is then removed.

So, the initial condition is:

     +--------+     +--------+
     |   L1   |     |   L2   |
     | fwd: 2 |---->| fwd: 0 |
     | bwd: 0 |<----| bwd: 1 |
     | fsp: 0 |     | fsp: N |
     |--------|     |--------|
     | attr A |     | attr 1 |
     |--------|     |--------|
     | attr B |     | attr 2 |
     |--------|     |--------|
     ..........     ..........
     |--------|     |--------|
     | attr X |     | attr n |
     +--------+     +--------+


So now we go to replace X, and see that L1:fsp = 0 - it is full so
we can't insert Y in the same leaf. So we record the the location of
attribute X so we can track it for later use, then we split L1 into
L1 and L3 and reblance across the two leafs. We end with:


     +--------+     +--------+     +--------+
     |   L1   |     |   L3   |     |   L2   |
     | fwd: 3 |---->| fwd: 2 |---->| fwd: 0 |
     | bwd: 0 |<----| bwd: 1 |<----| bwd: 3 |
     | fsp: M |     | fsp: J |     | fsp: N |
     |--------|     |--------|     |--------|
     | attr A |     | attr X |     | attr 1 |
     |--------|     +--------+     |--------|
     | attr B |                    | attr 2 |
     |--------|                    |--------|
     ..........                    ..........
     |--------|                    |--------|
     | attr W |                    | attr n |
     +--------+                    +--------+


And we track that the original attribute is now at L3:0.

We then try to insert Y into L1 again, and find that there isn't
enough room because the new attribute is larger than the old one.
Hence we have to split again to make room for Y. We end up with
this:


     +--------+     +--------+     +--------+     +--------+
     |   L1   |     |   L4   |     |   L3   |     |   L2   |
     | fwd: 4 |---->| fwd: 3 |---->| fwd: 2 |---->| fwd: 0 |
     | bwd: 0 |<----| bwd: 1 |<----| bwd: 4 |<----| bwd: 3 |
     | fsp: M |     | fsp: J |     | fsp: J |     | fsp: N |
     |--------|     |--------|     |--------|     |--------|
     | attr A |     | attr Y |     | attr X |     | attr 1 |
     |--------|     + INCOMP +     +--------+     |--------|
     | attr B |     +--------+                    | attr 2 |
     |--------|                                   |--------|
     ..........                                   ..........
     |--------|                                   |--------|
     | attr W |                                   | attr n |
     +--------+                                   +--------+

And now we have the new (incomplete) attribute @ L4:0, and the
original attribute at L3:0. At this point, the first transaction is
committed, and we move to the flipping of the flags.

This is where we are supposed to end up with this:

     +--------+     +--------+     +--------+     +--------+
     |   L1   |     |   L4   |     |   L3   |     |   L2   |
     | fwd: 4 |---->| fwd: 3 |---->| fwd: 2 |---->| fwd: 0 |
     | bwd: 0 |<----| bwd: 1 |<----| bwd: 4 |<----| bwd: 3 |
     | fsp: M |     | fsp: J |     | fsp: J |     | fsp: N |
     |--------|     |--------|     |--------|     |--------|
     | attr A |     | attr Y |     | attr X |     | attr 1 |
     |--------|     +--------+     + INCOMP +     |--------|
     | attr B |                    +--------+     | attr 2 |
     |--------|                                   |--------|
     ..........                                   ..........
     |--------|                                   |--------|
     | attr W |                                   | attr n |
     +--------+                                   +--------+

But that doesn't happen properly - the attribute tracking indexes
are not pointing to the right locations. What we end up with is both
the old attribute to be removed pointing at L4:0 and the new
attribute at L4:1.  On a debug kernel, this assert fails like so:

XFS: Assertion failed: args->index2 < be16_to_cpu(leaf2->hdr.count), file: fs/xfs/xfs_attr_leaf.c, line: 2725

because the new attribute location does not exist. On a production
kernel, this goes unnoticed and the code proceeds ahead merrily and
removes L4 because it thinks that is the block that is no longer
needed. This leaves the hash index node pointing to entries
L1, L4 and L2, but only blocks L1, L3 and L2 to exist. Further, the
leaf level sibling list is L1 <-> L4 <-> L2, but L4 is now free
space, and so everything is busted. This corruption is caused by the
removal of the old attribute triggering a join - it joins everything
correctly but then frees the wrong block.

xfs_repair will report something like:

bad sibling back pointer for block 4 in attribute fork for inode 131
problem with attribute contents in inode 131
would clear attr fork
bad nblocks 8 for inode 131, would reset to 3
bad anextents 4 for inode 131, would reset to 0

The problem lies in the assignment of the old/new blocks for
tracking purposes when the double leaf split occurs. The first split
tries to place the new attribute inside the current leaf (i.e.
"inleaf == true") and moves the old attribute (X) to the new block.
This sets up the old block/index to L1:X, and newly allocated
block to L3:0. It then moves attr X to the new block and tries to
insert attr Y at the old index. That fails, so it splits again.

With the second split, the rebalance ends up placing the new attr in
the second new block - L4:0 - and this is where the code goes wrong.
What is does is it sets both the new and old block index to the
second new block. Hence it inserts attr Y at the right place (L4:0)
but overwrites the current location of the attr to replace that is
held in the new block index (currently L3:0). It over writes it with
L4:1 - the index we later assert fail on.

Hopefully this table will show this in a foramt that is a bit easier
to understand:

Split		old attr index		new attr index
		vanilla	patched		vanilla	patched
before 1st	L1:26	L1:26		N/A	N/A
after 1st	L3:0	L3:0		L1:26	L1:26
after 2nd	L4:0	L3:0		L4:1	L4:0
                ^^^^			^^^^
		wrong			wrong

The fix is surprisingly simple, for all this analysis - just stop
the rebalance on the out-of leaf case from overwriting the new attr
index - it's already correct for the double split case.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-11-13 14:45:29 -06:00
..
9p The following changes since commit 4cbe5a555f: 2012-10-12 09:59:23 +09:00
adfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-10-02 20:25:04 -07:00
affs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-10-02 20:25:04 -07:00
afs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-10-02 20:25:04 -07:00
autofs4 autofs4 - fix reset pending flag on mount fail 2012-10-11 10:21:16 +09:00
befs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-10-02 20:25:04 -07:00
bfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-10-02 20:25:04 -07:00
btrfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2012-10-13 13:23:39 -07:00
cachefiles fs: cachefiles: add support for large files in filesystem caching 2012-07-30 17:25:21 -07:00
ceph tmpfs,ceph,gfs2,isofs,reiserfs,xfs: fix fh_len checking 2012-10-09 23:33:55 -04:00
cifs Merge branch 'modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux 2012-10-14 13:39:34 -07:00
coda fs: push rcu_barrier() from deactivate_locked_super() to filesystems 2012-10-02 21:35:55 -04:00
configfs userns: Convert configfs to use kuid and kgid where appropriate 2012-09-18 01:01:37 -07:00
cramfs userns: Convert cramfs to use kuid/kgid where appropriate 2012-09-21 03:13:08 -07:00
debugfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2012-10-02 11:11:09 -07:00
devpts VFS: Pass mount flags to sget() 2012-07-14 16:38:34 +04:00
dlm Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2012-10-02 13:38:27 -07:00
ecryptfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-10-02 20:25:04 -07:00
efs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-10-02 20:25:04 -07:00
exofs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-10-12 10:52:03 +09:00
exportfs switch dentry_open() to struct path, make it grab references itself 2012-07-23 00:01:29 +04:00
ext2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-10-02 20:25:04 -07:00
ext3 ext3: drop lock/unlock super 2012-10-09 23:33:38 -04:00
ext4 mm: kill vma flag VM_CAN_NONLINEAR 2012-10-09 16:22:17 +09:00
fat fat: drop lock/unlock super 2012-10-09 23:33:38 -04:00
freevxfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-10-02 20:25:04 -07:00
fscache
fuse mm: kill vma flag VM_CAN_NONLINEAR 2012-10-09 16:22:17 +09:00
gfs2 tmpfs,ceph,gfs2,isofs,reiserfs,xfs: fix fh_len checking 2012-10-09 23:33:55 -04:00
hfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-10-02 20:25:04 -07:00
hfsplus Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-10-02 20:25:04 -07:00
hostfs Merge branch 'for-linus-37rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml 2012-10-10 11:15:20 +09:00
hpfs hpfs: drop lock/unlock super 2012-10-09 23:33:38 -04:00
hppfs hppfs: fix the return value of get_inode() 2012-10-09 22:34:52 +02:00
hugetlbfs mm: replace vma prio_tree with an interval tree 2012-10-09 16:22:39 +09:00
isofs tmpfs,ceph,gfs2,isofs,reiserfs,xfs: fix fh_len checking 2012-10-09 23:33:55 -04:00
jbd jbd: Fix assertion failure in commit code due to lacking transaction credits 2012-09-12 15:52:03 +02:00
jbd2 The big new feature added this time is supporting online resizing 2012-10-08 06:36:39 +09:00
jffs2 UAPI Disintegration 2012-10-09 2012-10-09 15:04:25 +01:00
jfs JFS TRIM support and some minor fixes 2012-10-03 08:48:21 -07:00
lockd Merge branch 'for-3.7' of git://linux-nfs.org/~bfields/linux 2012-10-13 10:53:54 +09:00
logfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-10-02 20:25:04 -07:00
minix Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-10-02 20:25:04 -07:00
ncpfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-10-02 20:25:04 -07:00
nfs Merge branch 'for-3.7' of git://linux-nfs.org/~bfields/linux 2012-10-13 10:53:54 +09:00
nfs_common
nfsd UAPI Disintegration 2012-10-09 2012-10-09 18:35:22 -04:00
nilfs2 mm: kill vma flag VM_CAN_NONLINEAR 2012-10-09 16:22:17 +09:00
nls nls: fix (and rename) mac NLS table files and config options 2012-06-01 19:51:22 -07:00
notify switch simple cases of fget_light to fdget 2012-09-26 22:20:08 -04:00
ntfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-10-02 20:25:04 -07:00
ocfs2 mm: kill vma flag VM_CAN_NONLINEAR 2012-10-09 16:22:17 +09:00
omfs omfs: convert to use beXX_add_cpu() 2012-10-06 03:05:31 +09:00
openpromfs fs: push rcu_barrier() from deactivate_locked_super() to filesystems 2012-10-02 21:35:55 -04:00
proc procfs: don't need a PATH_MAX allocation to hold a string representation of an int 2012-10-12 20:15:10 -04:00
pstore pstore: Avoid recursive spinlocks in the oops_in_progress case 2012-09-20 17:04:50 -07:00
qnx4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-10-02 20:25:04 -07:00
qnx6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-10-02 20:25:04 -07:00
quota vfs: define struct filename and have getname() return it 2012-10-12 20:14:55 -04:00
ramfs don't pass nameidata to ->create() 2012-07-14 16:34:47 +04:00
reiserfs tmpfs,ceph,gfs2,isofs,reiserfs,xfs: fix fh_len checking 2012-10-09 23:33:55 -04:00
romfs fs: push rcu_barrier() from deactivate_locked_super() to filesystems 2012-10-02 21:35:55 -04:00
squashfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-10-02 20:25:04 -07:00
sysfs sysfs: Fix comment typo "sysf_create_link". 2012-09-04 16:11:31 -07:00
sysv sysv: drop lock/unlock super 2012-10-09 23:33:39 -04:00
ubifs mm: kill vma flag VM_CAN_NONLINEAR 2012-10-09 16:22:17 +09:00
udf Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2012-10-04 09:14:01 -07:00
ufs ufs: drop lock/unlock super 2012-10-09 23:33:39 -04:00
xfs xfs: fix attr tree double split corruption 2012-11-13 14:45:29 -06:00
aio.c aio: now fput() is OK from interrupt context; get rid of manual delayed __fput() 2012-07-22 23:57:59 +04:00
anon_inodes.c
attr.c ima: add inode_post_setattr call 2012-09-07 14:57:46 -04:00
bad_inode.c don't pass nameidata to ->create() 2012-07-14 16:34:47 +04:00
binfmt_aout.c coredump: pass siginfo_t* to do_coredump() and below, not merely signr 2012-10-06 03:05:16 +09:00
binfmt_elf.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal 2012-10-10 12:02:25 +09:00
binfmt_elf_fdpic.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal 2012-10-10 12:02:25 +09:00
binfmt_em86.c
binfmt_flat.c coredump: pass siginfo_t* to do_coredump() and below, not merely signr 2012-10-06 03:05:16 +09:00
binfmt_misc.c
binfmt_script.c
binfmt_som.c
bio-integrity.c block: Ues bi_pool for bio_integrity_alloc() 2012-09-09 10:35:38 +02:00
bio.c block: makes bio_split support bio without data 2012-09-28 10:38:48 +02:00
block_dev.c Merge branch 'for-3.7/core' of git://git.kernel.dk/linux-block 2012-10-11 09:04:23 +09:00
buffer.c The big new feature added this time is supporting online resizing 2012-10-08 06:36:39 +09:00
char_dev.c
compat.c vfs: define struct filename and have getname() return it 2012-10-12 20:14:55 -04:00
compat_binfmt_elf.c coredump: extend core dump note section to contain file names of mapped files 2012-10-06 03:05:17 +09:00
compat_ioctl.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-10-02 20:25:04 -07:00
coredump.c coredump: pass siginfo_t* to do_coredump() and below, not merely signr 2012-10-06 03:05:16 +09:00
coredump.h coredump: update coredump-related headers 2012-10-06 03:05:15 +09:00
dcache.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-10-02 20:25:04 -07:00
dcookies.c
direct-io.c block: move down direct IO plugging 2012-08-09 15:23:09 +02:00
drop_caches.c
eventfd.c eventfd: change int to __u64 in eventfd_signal() 2012-05-31 17:49:32 -07:00
eventpoll.c epoll: support for disabling items, and a self-test app 2012-10-06 03:05:00 +09:00
exec.c vfs: make path_openat take a struct filename pointer 2012-10-12 20:15:09 -04:00
fcntl.c Fix F_DUPFD_CLOEXEC breakage 2012-10-09 15:52:31 +09:00
fhandle.c switch simple cases of fget_light to fdget 2012-09-26 22:20:08 -04:00
fifo.c fifo: Do not restart open() if it already found a partner 2012-07-16 08:33:14 -07:00
file.c dup3: Return an error when oldfd == newfd. 2012-10-09 23:33:38 -04:00
file_table.c lglock: add DEFINE_STATIC_LGLOCK() 2012-10-10 01:15:44 -04:00
filesystems.c vfs: define struct filename and have getname() return it 2012-10-12 20:14:55 -04:00
fs-writeback.c Merge branch 'writeback-for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux 2012-10-12 10:46:03 +09:00
fs_struct.c get rid of ->mnt_longterm 2012-07-14 16:32:47 +04:00
generic_acl.c userns: Pass a userns parameter into posix_acl_to_xattr and posix_acl_from_xattr 2012-09-18 01:01:35 -07:00
inode.c mm: replace vma prio_tree with an interval tree 2012-10-09 16:22:39 +09:00
internal.h vfs: make path_openat take a struct filename pointer 2012-10-12 20:15:09 -04:00
ioctl.c switch simple cases of fget_light to fdget 2012-09-26 22:20:08 -04:00
ioprio.c
Kconfig
Kconfig.binfmt coredump: make core dump functionality optional 2012-10-06 03:05:15 +09:00
libfs.c vfs: fix kerneldoc for generic_fh_to_parent() 2012-09-05 10:59:30 +02:00
locks.c UAPI Disintegration 2012-10-09 2012-10-09 18:35:22 -04:00
Makefile coredump: make core dump functionality optional 2012-10-06 03:05:15 +09:00
mbcache.c
mount.h get rid of magic in proc_namespace.c 2012-07-14 16:32:48 +04:00
mpage.c
namei.c vfs: embed struct filename inside of names_cache allocation if possible 2012-10-12 20:15:10 -04:00
namespace.c vfs: define struct filename and have getname() return it 2012-10-12 20:14:55 -04:00
no-block.c
open.c vfs: make path_openat take a struct filename pointer 2012-10-12 20:15:09 -04:00
pipe.c pipe(2) - race-free error recovery 2012-09-26 21:08:52 -04:00
pnode.c VFS: Make clone_mnt()/copy_tree()/collect_mounts() return errors 2012-07-14 16:37:27 +04:00
pnode.h
posix_acl.c userns: Convert vfs posix_acl support to use kuids and kgids 2012-09-18 01:01:35 -07:00
proc_namespace.c get rid of magic in proc_namespace.c 2012-07-14 16:32:48 +04:00
read_write.c compat: fs: Generic compat_sys_sendfile implementation 2012-10-02 21:35:55 -04:00
read_write.h compat: fs: Generic compat_sys_sendfile implementation 2012-10-02 21:35:55 -04:00
readdir.c switch simple cases of fget_light to fdget 2012-09-26 22:20:08 -04:00
select.c switch simple cases of fget_light to fdget 2012-09-26 22:20:08 -04:00
seq_file.c userns: Make seq_file's user namespace accessible 2012-08-14 21:47:55 -07:00
signalfd.c switch simple cases of fget_light to fdget 2012-09-26 22:20:08 -04:00
splice.c switch simple cases of fget_light to fdget 2012-09-26 22:20:08 -04:00
stack.c
stat.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-10-02 20:25:04 -07:00
statfs.c switch simple cases of fget_light to fdget 2012-09-26 22:20:08 -04:00
super.c vfs: drop lock/unlock super 2012-10-09 23:33:39 -04:00
sync.c switch simple cases of fget_light to fdget 2012-09-26 22:20:08 -04:00
timerfd.c switch simple cases of fget_light to fdget 2012-09-26 22:20:08 -04:00
utimes.c switch simple cases of fget_light to fdget 2012-09-26 22:20:08 -04:00
xattr.c audit: set the name_len in audit_inode for parent lookups 2012-10-12 00:32:01 -04:00
xattr_acl.c userns: Fix posix_acl_file_xattr_userns gid conversion 2012-10-12 13:16:48 -07:00