linux-stable/fs
Dave Chinner c854363e80 xfs: Use delayed write for inodes rather than async V2
We currently do background inode flush asynchronously, resulting in
inodes being written in whatever order the background writeback
issues them. Not only that, there are also blocking and non-blocking
asynchronous inode flushes, depending on where the flush comes from.

This patch completely removes asynchronous inode writeback. It
removes all the strange writeback modes and replaces them with
either a synchronous flush or a non-blocking delayed write flush.
That is, inode flushes will only issue IO directly if they are
synchronous, and background flushing may do nothing if the operation
would block (e.g. on a pinned inode or buffer lock).

Delayed write flushes will now result in the inode buffer sitting in
the delwri queue of the buffer cache to be flushed by either an AIL
push or by the xfsbufd timing out the buffer. This will allow
accumulation of dirty inode buffers in memory and allow optimisation
of inode cluster writeback at the xfsbufd level where we have much
greater queue depths than the block layer elevators. We will also
get adjacent inode cluster buffer IO merging for free when a later
patch in the series allows sorting of the delayed write buffers
before dispatch.

This effectively means that any inode that is written back by
background writeback will be seen as flush locked during AIL
pushing, and will result in the buffers being pushed from there.
This writeback path is currently non-optimal, but the next patch
in the series will fix that problem.

A side effect of this delayed write mechanism is that background
inode reclaim will no longer directly flush inodes, nor can it wait
on the flush lock. The result is that inode reclaim must leave the
inode in the reclaimable state until it is clean. Hence attempts to
reclaim a dirty inode in the background will simply skip the inode
until it is clean and this allows other mechanisms (i.e. xfsbufd) to
do more optimal writeback of the dirty buffers. As a result, the
inode reclaim code has been rewritten so that it no longer relies on
the ambiguous return values of xfs_iflush() to determine whether it
is safe to reclaim an inode.

Portions of this patch are derived from patches by Christoph
Hellwig.

Version 2:
- cleanup reclaim code as suggested by Christoph
- log background reclaim inode flush errors
- just pass sync flags to xfs_iflush

Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-02-06 12:39:36 +11:00
..
9p 9p: fix build breakage introduced by FS-Cache 2009-12-01 07:35:11 -08:00
adfs adfs: remove redundant test on unsigned 2009-09-24 07:21:05 -07:00
affs
afs afs: remove manual O_SYNC handling 2009-12-10 15:02:50 +01:00
autofs trivial: remove unnecessary semicolons 2009-09-21 15:14:58 +02:00
autofs4 autofs4: always use lookup for lookup 2009-12-16 07:19:58 -08:00
befs fs: Make unload_nls() NULL pointer safe 2009-09-24 07:47:42 -04:00
bfs
btrfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable 2009-12-17 16:01:03 -08:00
cachefiles Untangling ima mess, part 2: deal with counters 2009-12-16 12:16:47 -05:00
cifs Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 2009-12-31 15:17:26 -08:00
coda sysctl: Drop & in front of every proc_handler. 2009-11-18 08:37:40 -08:00
configfs writeback: add name to backing_dev_info 2009-09-11 09:20:26 +02:00
cramfs
debugfs debugfs: fix create mutex racy fops and private data 2009-12-11 11:24:53 -08:00
devpts devpts_get_tty() should validate inode 2009-12-11 15:18:05 -08:00
dlm Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm 2009-12-10 09:33:59 -08:00
ecryptfs fsstack/ecryptfs: remove unused get_nlinks param to fsstack_copy_attr_all 2009-12-17 10:57:30 -05:00
efs
exofs exofs: simple_write_end does not mark_inode_dirty 2010-01-05 09:14:32 +02:00
exportfs nfs: new subdir Documentation/filesystems/nfs 2009-10-27 19:34:04 -04:00
ext2 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2009-12-16 12:04:02 -08:00
ext3 ext3: Replace lock/unlock_super() with an explicit lock for resizing 2009-12-23 13:44:12 +01:00
ext4 ext4: Calculate metadata requirements more accurately 2010-01-01 02:41:30 -05:00
fat Merge git://git.kernel.org/pub/scm/linux/kernel/git/hirofumi/fatfs-2.6 2009-12-16 10:29:26 -08:00
freevxfs
fscache FS-Cache: Avoid maybe-used-uninitialised warning on variable 2009-12-16 07:20:13 -08:00
fuse fuse: reject O_DIRECT flag also in fuse_create 2009-11-27 16:37:13 +01:00
gfs2 GFS2: Use MAX_LFS_FILESIZE for meta inode size 2010-01-11 08:57:55 +00:00
hfs hfs: fix a potential buffer overflow 2009-12-15 08:53:10 -08:00
hfsplus hfsplus: refuse to mount volumes larger than 2TB 2009-10-29 07:39:27 -07:00
hostfs
hpfs hpfs: use bitmap_weight() 2009-12-16 07:20:06 -08:00
hppfs
hugetlbfs Untangling ima mess, part 1: alloc_file() 2009-12-16 12:16:47 -05:00
isofs Merge branch 'for-2.6.33' of git://linux-nfs.org/~bfields/linux 2009-12-16 10:43:34 -08:00
jbd jbd: jbd-debug and jbd2-debug should be writable 2009-12-23 13:44:13 +01:00
jbd2 jbd2: don't use __GFP_NOFAIL in journal_init_common() 2009-12-23 08:05:15 -05:00
jffs2 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2009-12-16 12:04:02 -08:00
jfs jfs: Fix 32bit build warning 2009-12-22 12:27:35 -05:00
lockd Merge branch 'for-2.6.33' of git://linux-nfs.org/~bfields/linux 2009-12-16 10:43:34 -08:00
minix V3 minixfs: add missing directory type checking 2009-09-23 07:39:57 -07:00
ncpfs tree-wide: fix assorted typos all over the place 2009-12-04 15:39:55 +01:00
nfs nfs: fix oops in nfs_rename() 2010-01-06 18:48:26 -05:00
nfs_common
nfsd Merge branch 'for-2.6.33' of git://linux-nfs.org/~bfields/linux 2010-01-06 18:10:15 -08:00
nilfs2 nilfs2: Storage class should be before const qualifier 2009-12-25 13:01:50 +09:00
nls Merge git://git.kernel.org/pub/scm/linux/kernel/git/hirofumi/fatfs-2.6 2009-09-30 09:31:14 -07:00
notify switch alloc_file() to passing struct path 2009-12-16 12:16:42 -05:00
ntfs kill I_LOCK 2009-12-17 11:03:25 -05:00
ocfs2 ocfs2: Handle O_DIRECT when writing to a refcounted cluster. 2009-12-30 19:53:35 -08:00
omfs tree-wide: fix assorted typos all over the place 2009-12-04 15:39:55 +01:00
openpromfs
partitions partitions: read whole sector with EFI GPT header 2009-11-23 09:29:58 +01:00
proc smaps: fix wrong rss count 2010-01-11 09:34:07 -08:00
qnx4 qnx4: use hweight8 2009-12-16 07:20:18 -08:00
quota quota: Fix dquot_transfer for filesystems different from ext4 2010-01-11 13:06:41 +01:00
ramfs nommu: ramfs: remove unused local var 2009-12-17 15:45:31 -08:00
reiserfs Merge branch 'reiserfs/kill-bkl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing 2010-01-08 14:03:55 -08:00
romfs ROMFS: fix length used with romfs_dev_strnlen() function 2009-10-11 11:33:56 -07:00
smbfs fs: Make unload_nls() NULL pointer safe 2009-09-24 07:47:42 -04:00
squashfs const: mark remaining super_operations const 2009-09-22 07:17:24 -07:00
sysfs sysfs: Add lockdep annotations for the sysfs active reference 2010-01-04 12:34:46 -08:00
sysv
ubifs lib: Introduce generic list_sort function 2010-01-12 21:02:00 -08:00
udf udf: Avoid IO in udf_clear_inode 2009-12-14 21:40:04 +01:00
ufs ufs: NFS support 2009-12-16 07:20:06 -08:00
xfs xfs: Use delayed write for inodes rather than async V2 2010-02-06 12:39:36 +11:00
Kconfig Revert "task_struct: make journal_info conditional" 2009-12-17 13:23:24 -08:00
Kconfig.binfmt
Makefile
aio.c aio: remove unused field 2009-12-16 07:20:13 -08:00
anon_inodes.c Sanitize f_flags helpers 2009-12-22 12:27:34 -05:00
attr.c truncate: new helpers 2009-09-24 08:41:47 -04:00
bad_inode.c
binfmt_aout.c mm: introduce coredump parameter structure 2009-12-17 15:45:31 -08:00
binfmt_elf.c mm: introduce coredump parameter structure 2009-12-17 15:45:31 -08:00
binfmt_elf_fdpic.c FDPIC: Respect PT_GNU_STACK exec protection markings when creating NOMMU stack 2010-01-06 18:16:02 -08:00
binfmt_em86.c
binfmt_flat.c mm: introduce coredump parameter structure 2009-12-17 15:45:31 -08:00
binfmt_misc.c
binfmt_script.c
binfmt_som.c mm: introduce coredump parameter structure 2009-12-17 15:45:31 -08:00
bio-integrity.c
bio.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2009-12-09 19:43:33 -08:00
block_dev.c Merge branch 'for-linus' into for-2.6.33 2009-11-03 21:14:39 +01:00
buffer.c Merge branch 'writeback' of git://git.kernel.dk/linux-2.6-block 2009-09-25 09:27:30 -07:00
char_dev.c fs/char_dev.c: remove useless loop 2009-09-24 07:21:03 -07:00
compat.c compat.c: Remove dependence on nfsd private headers 2009-12-14 18:12:10 -05:00
compat_binfmt_elf.c
compat_ioctl.c fs/compat_ioctl.c: fix build error when !BLOCK 2009-12-22 12:27:33 -05:00
dcache.c libfs: move EXPORT_SYMBOL for d_alloc_name 2009-12-16 12:16:48 -05:00
dcookies.c
direct-io.c dio: fix use-after-free 2009-12-17 04:52:13 -05:00
drop_caches.c sysctl: remove "struct file *" argument of ->proc_handler 2009-09-24 07:21:04 -07:00
eventfd.c anonfd: Allow making anon files read-only 2009-12-22 12:27:34 -05:00
eventpoll.c anonfd: Allow making anon files read-only 2009-12-22 12:27:34 -05:00
exec.c mm: introduce coredump parameter structure 2009-12-17 15:45:31 -08:00
fcntl.c fcntl: rename F_OWNER_GID to F_OWNER_PGRP 2009-11-17 17:40:33 -08:00
fifo.c
file.c headers: remove sched.h from interrupt.h 2009-10-11 11:20:58 -07:00
file_table.c alloc_file(): simplify handling of mnt_clone_write() errors 2009-12-22 12:27:33 -05:00
filesystems.c
fs-writeback.c writeback: add missing kernel-doc notation 2010-01-02 10:09:44 -08:00
fs_struct.c
generic_acl.c make generic_acl slightly more generic 2009-12-16 12:16:49 -05:00
inode.c kill I_LOCK 2009-12-17 11:03:25 -05:00
internal.h Fix f_flags/f_mode in case of lookup_instantiate_filp() from open(pathname, 3) 2009-12-22 12:27:34 -05:00
ioctl.c __generic_block_fiemap(): fix for files bigger than 4GB 2009-11-12 07:26:01 -08:00
ioprio.c
libfs.c libfs: move EXPORT_SYMBOL for d_alloc_name 2009-12-16 12:16:48 -05:00
locks.c const: make lock_manager_operations const 2009-09-22 07:17:25 -07:00
mbcache.c
mpage.c
namei.c generic_permission: MAY_OPEN is not write access 2009-12-30 12:35:44 -08:00
namespace.c Revert "fix mismerge with Trond's stuff (create_mnt_ns() export is gone now)" 2009-12-17 12:51:05 -08:00
nfsctl.c vfs: nfsctl.c un-used nfsd #includes 2009-12-14 18:12:11 -05:00
no-block.c
open.c Sanitize f_flags helpers 2009-12-22 12:27:34 -05:00
pipe.c fs: no games with DCACHE_UNHASHED 2009-12-17 10:51:40 -05:00
pnode.c
pnode.h
posix_acl.c
read_write.c sendfile(): check f_op.splice_write() rather than f_op.sendpage() 2009-11-04 09:09:52 +01:00
read_write.h
readdir.c
select.c headers: remove sched.h from poll.h 2009-10-04 15:05:10 -07:00
seq_file.c vfs: seq_file: add helpers for data filling 2009-09-24 07:47:35 -04:00
signalfd.c anonfd: Allow making anon files read-only 2009-12-22 12:27:34 -05:00
splice.c sendfile(): check f_op.splice_write() rather than f_op.sendpage() 2009-11-04 09:09:52 +01:00
stack.c VFS/fsstack: handle 32-bit smp + preempt + large files in fsstack_copy_inode_size 2009-12-17 10:58:17 -05:00
stat.c Add unlocked version of inode_add_bytes() function 2009-12-23 13:33:54 +01:00
super.c vfs: get_sb_single() - do not pass options twice 2009-12-23 11:23:43 -08:00
sync.c fold do_sync_file_range into sys_sync_file_range 2009-12-17 11:03:25 -05:00
timerfd.c anonfd: Allow making anon files read-only 2009-12-22 12:27:34 -05:00
utimes.c
xattr.c sanitize xattr handler prototypes 2009-12-16 12:16:49 -05:00
xattr_acl.c VFS: Use GFP_NOFS in posix_acl_from_xattr() 2009-12-03 11:48:07 +00:00