linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-11-01 17:08:10 +00:00

Author	SHA1	Message	Date
Trond Myklebust	e4eff1a622	SUNRPC: Clean up the sillyrename code Fix a couple of bugs: - Don't rely on the parent dentry still being valid when the call completes. Fixes a race with shrink_dcache_for_umount_subtree() - Don't remove the file if the filehandle has been labelled as stale. Fix a couple of inefficiencies - Remove the global list of sillyrenamed files. Instead we can cache the sillyrename information in the dentry->d_fsdata - Move common code from unlink_setup/unlink_done into fs/nfs/unlink.c Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-19 15:21:39 -04:00
Trond Myklebust	4fdc17b2a7	NFS: Introduce struct nfs_removeargs+nfs_removeres We need a common structure for setting up an unlink() rpc call in order to fix the asynchronous unlink code. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-19 15:21:39 -04:00
Trond Myklebust	3062c532ad	NFS: Use dentry->d_time to store the parent directory verifier. This will free up the d_fsdata field for other use. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-19 15:21:39 -04:00
Trond Myklebust	e3a535e173	NFSv4: Fix the nfsv4 readlink reply buffer alignment Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-19 15:09:04 -04:00
Trond Myklebust	d6ac02dfaa	NFSv4: Fix the readdir reply buffer alignment Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-19 15:09:04 -04:00
Trond Myklebust	9104a55dc3	NFSv4: More NFSv4 xdr cleanups Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-19 15:09:04 -04:00
Trond Myklebust	9936781d01	NFSv4: Try to recover from getfh failures in nfs4_xdr_dec_open Try harder to recover the open state if the server failed to return a filehandle. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-19 15:09:03 -04:00
Trond Myklebust	56659e9926	NFSv4: 'constify' lookup arguments. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-19 15:09:03 -04:00
Trond Myklebust	365c8f589a	NFSv4: Don't fail nfs4_xdr_dec_open if decode_restorefh() failed We can already easily recover from that inside _nfs4_proc_open(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-19 15:09:03 -04:00
Trond Myklebust	6f220ed5a8	NFSv4: Fix open state recovery Ensure that opendata->state is always initialised when we do state recovery. Ensure that we set the filehandle in the case where we're doing an "OPEN_CLAIM_PREVIOUS" call due to a server reboot. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-19 15:09:03 -04:00
Trond Myklebust	8cd69e1bc7	NFSD/SUNRPC: Fix the automatic selection of RPCSEC_GSS Bruce's patch broke the ability to compile RPCSEC_GSS as a module. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-19 15:09:02 -04:00
Andrew Morton	275afcac99	afs build fix Bruce and David's patches clashed. fs/afs/flock.c: In function 'afs_do_getlk': fs/afs/flock.c:459: error: void value not ignored as it ought to be Cc: "J. Bruce Fields" <bfields@fieldses.org> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:57 -07:00
J. Bruce Fields	c7d51402d2	knfsd: clean up EX_RDONLY Share a little common code, reverse the arguments for consistency, drop the unnecessary "inline", and lowercase the name. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Acked-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:52 -07:00
J. Bruce Fields	e22841c637	knfsd: move EX_RDONLY out of header EX_RDONLY is only called in one place; just put it there. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Acked-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:52 -07:00
J. Bruce Fields	5d3dbbeaf5	nfsd: remove unnecessary NULL checks from nfsd_cross_mnt We can now assume that rqst_exp_get_by_name() does not return NULL; so clean up some unnecessary checks. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Acked-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:52 -07:00
J. Bruce Fields	9a25b96c1f	nfsd: return errors, not NULL, from export functions I converted the various export-returning functions to return -ENOENT instead of NULL, but missed a few cases. This particular case could cause actual bugs in the case of a krb5 client that doesn't match any ip-based client and that is trying to access a filesystem not exported to krb5 clients. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Acked-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:52 -07:00
J. Bruce Fields	a280df32db	nfsd: fix possible read-ahead cache and export table corruption The value of nperbucket calculated here is too small--we should be rounding up instead of down--with the result that the index j in the following loop can overflow the raparm_hash array. At least in my case, the next thing in memory turns out to be export_table, so the symptoms I see are crashes caused by the appearance of four zeroed-out export entries in the first bucket of the hash table of exports (which were actually entries in the readahead cache, a pointer to which had been written to the export table in this initialization code). It looks like the bug was probably introduced with commit `fce1456a19` ("knfsd: make the readahead params cache SMP-friendly"). Cc: <stable@kernel.org> Cc: Greg Banks <gnb@melbourne.sgi.com> Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:52 -07:00
Yoann Padioleau	dd00cc486a	some kmalloc/memset ->kzalloc (tree wide) Transform some calls to kmalloc/memset to a single kzalloc (or kcalloc). Here is a short excerpt of the semantic patch performing this transformation: @@ type T2; expression x; identifier f,fld; expression E; expression E1,E2; expression e1,e2,e3,y; statement S; @@ x = - kmalloc + kzalloc (E1,E2) ... when != \(x->fld=E;\\|y=f(...,x,...);\\|f(...,x,...);\\|x=E;\\|while(...) S\\|for(e1;e2;e3) S\) - memset((T2)x,0,E1); @@ expression E1,E2,E3; @@ - kzalloc(E1 * E2,E3) + kcalloc(E1,E2,E3) [akpm@linux-foundation.org: get kcalloc args the right way around] Signed-off-by: Yoann Padioleau <padator@wanadoo.fr> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Acked-by: Russell King <rmk@arm.linux.org.uk> Cc: Bryan Wu <bryan.wu@analog.com> Acked-by: Jiri Slaby <jirislaby@gmail.com> Cc: Dave Airlie <airlied@linux.ie> Acked-by: Roland Dreier <rolandd@cisco.com> Cc: Jiri Kosina <jkosina@suse.cz> Acked-by: Dmitry Torokhov <dtor@mail.ru> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Mauro Carvalho Chehab <mchehab@infradead.org> Acked-by: Pierre Ossman <drzeus-list@drzeus.cx> Cc: Jeff Garzik <jeff@garzik.org> Cc: "David S. Miller" <davem@davemloft.net> Acked-by: Greg KH <greg@kroah.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: "Antonino A. Daplas" <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:50 -07:00
Jan Harkes	5b7f13bd26	coda: update module information Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:49 -07:00
Jan Harkes	3cf01f28c3	coda: remove statistics counters from /proc/fs/coda Similar information can easily be obtained with strace -c. Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:48 -07:00
Jan Harkes	a1b0aa8764	coda: remove struct coda_sb_info The sb_info structure only contains a single pointer to the character device, there is no need for the added indirection. Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:48 -07:00
Jan Harkes	5fd31e9a67	coda: cleanup downcall handler Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:48 -07:00
Jan Harkes	ed36f72367	coda: cleanup coda_lookup, use dsplice_alias Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:48 -07:00
Jan Harkes	970648eb03	coda: ignore returned values when upcalls return errors Venus returns an ENOENT error on open, so we shouldn't try to grab the filehandle for the returned fd. Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:48 -07:00
Jan Harkes	37461e1957	coda: replace upc_alloc/upc_free with kmalloc/kfree Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:48 -07:00
Jan Harkes	978752534e	coda: avoid lockdep warning in coda_readdir Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:48 -07:00
Jan Harkes	d9664c95af	coda: block signals during upcall processing We ignore signals for about 30 seconds to give userspace a chance to see the upcall. As we did not block signals we ended up in a busy loop for the remainder of the period when a signal is received. Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:48 -07:00
Jan Harkes	fe71b5f387	coda: cleanup for upcall handling path Make the code that processes upcall responses more straightforward, uncovered at least one bad assumption. We trusted that vc_inuse would be 0 when upcalls are aborted, however the device may have been reopened. Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:48 -07:00
Jan Harkes	8706551963	coda: cleanup /dev/cfs open and close handling - Make sure device index is not a negative number. - Unlink queued requests when the device is closed to avoid passing them to the next opener. Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:48 -07:00
Jan Harkes	ed31a7dd63	coda: use ilookup5 Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:48 -07:00
Jan Harkes	fac1f0e340	coda: coda doesn't track atime Set MS_NOATIME flag to avoid unnecessary calls when the coda inode is accessed. Also, set statfs.f_bsize to 4k. 1k is obviously too small for the suggested IO size. Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:48 -07:00
Jan Harkes	8c6d215284	coda: allow removal of busy directories A directory without children may still be busy when it is the cwd for some process. We can safely remove such a directory because the VFS prevents further operations. Also we don't need to call d_delete as it is already called in vfs_rmdir. Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:48 -07:00
Jan Harkes	d728900cd5	coda: fix nlink updates for directories The Coda client sets the directory link count to 1 when it isn't sure how many subdirectories we have. In this case we shouldn't change the link count in the kernel when a subdirectory is created or removed. Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:48 -07:00
Jan Harkes	56ee354794	coda: correctly invalidate cached access rights Change the epoch value to forces a refresh instead of clearing the cached rights mask and block all further accesses to the object. Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:48 -07:00
Jan Harkes	38c2e4370d	coda: do not grab an uninitialized fd when the open upcall returns an error When open fails the fd in the response is uninitialized and we ended up taking a reference on the file struct and never released it. Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:48 -07:00
Mingming Cao	b38bd33a6b	fix ext4/JBD2 build warnings Looking at the current linus-git tree jbd_debug() define in include/linux/jbd2.h extern u8 journal_enable_debug; #define jbd_debug(n, f, a...) \ do { \ if ((n) <= journal_enable_debug) { \ printk (KERN_DEBUG "(%s, %d): %s: ", \ __FILE__, __LINE__, __FUNCTION__); \ printk (f, ## a); \ } \ } while (0) > fs/ext4/inode.c: In function âext4_write_inodeâ: > fs/ext4/inode.c:2906: warning: comparison is always true due to limited > range of data type > > fs/jbd2/recovery.c: In function âjbd2_journal_recoverâ: > fs/jbd2/recovery.c:254: warning: comparison is always true due to > limited range of data type > fs/jbd2/recovery.c:257: warning: comparison is always true due to > limited range of data type > > fs/jbd2/recovery.c: In function âjbd2_journal_skip_recoveryâ: > fs/jbd2/recovery.c:301: warning: comparison is always true due to > limited range of data type > Noticed all warnings are occurs when the debug level is 0. Then found the "jbd2: Move jbd2-debug file to debugfs" patch http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0f49d5d019afa4e94253bfc92f0daca3badb990b changed the jbd2_journal_enable_debug from int type to u8, makes the jbd_debug comparision is always true when the debugging level is 0. Thus the compile warning occurs. Thought about changing the jbd2_journal_enable_debug data type back to int, but can't, because the jbd2-debug is moved to debug fs, where calling debugfs_create_u8() to create the debugfs entry needs the value to be u8 type. Even if we changed the data type back to int, the code is still buggy, kernel should not print jbd2 debug message if the jbd2_journal_enable_debug is set to 0. But this is not the case. The fix is change the level of debugging to 1. The same should fixed in ext3/JBD, but currently ext3 jbd-debug via /proc fs is broken, so we probably should fix it all together. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Cc: Jeff Garzik <jeff@garzik.org> Cc: Theodore Tso <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:47 -07:00
Kawai, Hidehiro	ee78b0a61f	coredump masking: ELF-FDPIC: enable core dump filtering This patch enables core dump filtering for ELF-FDPIC-formatted core file. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:47 -07:00
Kawai, Hidehiro	e2e00906a0	coredump masking: ELF-FDPIC: remove an unused argument This patch removes an unused argument from elf_fdpic_dump_segments(). Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:47 -07:00
Kawai, Hidehiro	a1b59e802f	coredump masking: ELF: enable core dump filtering This patch enables core dump filtering for ELF-formatted core file. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:47 -07:00
Kawai, Hidehiro	3cb4a0bb1e	coredump masking: add an interface for core dump filter This patch adds an interface to set/reset flags which determines each memory segment should be dumped or not when a core file is generated. /proc/<pid>/coredump_filter file is provided to access the flags. You can change the flag status for a particular process by writing to or reading from the file. The flag status is inherited to the child process when it is created. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:47 -07:00
Kawai, Hidehiro	6c5d523826	coredump masking: reimplementation of dumpable using two flags This patch changes mm_struct.dumpable to a pair of bit flags. set_dumpable() converts three-value dumpable to two flags and stores it into lower two bits of mm_struct.flags instead of mm_struct.dumpable. get_dumpable() behaves in the opposite way. [akpm@linux-foundation.org: export set_dumpable] Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:46 -07:00
Josef 'Jeff' Sipek	f79c20f525	fs: remove path_walk export Signed-off-by: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu> Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Christoph Hellwig <hch@lst.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Neil Brown <neilb@suse.de> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:45 -07:00
Josef 'Jeff' Sipek	c4a7808fc3	fs: mark link_path_walk static Signed-off-by: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu> Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Christoph Hellwig <hch@lst.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Neil Brown <neilb@suse.de> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:45 -07:00
Josef 'Jeff' Sipek	16b6287a52	nfsctl: use vfs_path_lookup use vfs_path_lookup instead of open-coding the necessary functionality. Signed-off-by: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu> Acked-by: NeilBrown <neilb@suse.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Christoph Hellwig <hch@lst.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:45 -07:00
Josef 'Jeff' Sipek	16f1820028	fs: introduce vfs_path_lookup Stackable file systems, among others, frequently need to lookup paths or path components starting from an arbitrary point in the namespace (identified by a dentry and a vfsmount). Currently, such file systems use lookup_one_len, which is frowned upon [1] as it does not pass the lookup intent along; not passing a lookup intent, for example, can trigger BUG_ON's when stacking on top of NFSv4. The first patch introduces a new lookup function to allow lookup starting from an arbitrary point in the namespace. This approach has been suggested by Christoph Hellwig [2]. The second patch changes sunrpc to use vfs_path_lookup. The third patch changes nfsctl.c to use vfs_path_lookup. The fourth patch marks link_path_walk static. The fifth, and last patch, unexports path_walk because it is no longer unnecessary to call it directly, and using the new vfs_path_lookup is cleaner. For example, the following snippet of code, looks up "some/path/component" in a directory pointed to by parent_{dentry,vfsmnt}: err = vfs_path_lookup(parent_dentry, parent_vfsmnt, "some/path/component", 0, &nd); if (!err) { /* exits / ... / once done, release the references / path_release(&nd); } else if (err == -ENOENT) { / doesn't exist / } else { / other error */ } VFS functions such as lookup_create can be used on the nameidata structure to pass the create intent to the file system. Signed-off-by: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu> Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Christoph Hellwig <hch@lst.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Neil Brown <neilb@suse.de> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:45 -07:00
Ollie Wild	b6a2fea393	mm: variable length argument support Remove the arg+env limit of MAX_ARG_PAGES by copying the strings directly from the old mm into the new mm. We create the new mm before the binfmt code runs, and place the new stack at the very top of the address space. Once the binfmt code runs and figures out where the stack should be, we move it downwards. It is a bit peculiar in that we have one task with two mm's, one of which is inactive. [a.p.zijlstra@chello.nl: limit stack size] Signed-off-by: Ollie Wild <aaw@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <linux-arch@vger.kernel.org> Cc: Hugh Dickins <hugh@veritas.com> [bunk@stusta.de: unexport bprm_mm_init] Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:45 -07:00
Peter Zijlstra	bdf4c48af2	audit: rework execve audit The purpose of audit_bprm() is to log the argv array to a userspace daemon at the end of the execve system call. Since user-space hasn't had time to run, this array is still in pristine state on the process' stack; so no need to copy it, we can just grab it from there. In order to minimize the damage to audit_log_*() copy each string into a temporary kernel buffer first. Currently the audit code requires that the full argument vector fits in a single packet. So currently it does clip the argv size to a (sysctl) limit, but only when execve auditing is enabled. If the audit protocol gets extended to allow for multiple packets this check can be removed. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ollie Wild <aaw@google.com> Cc: <linux-audit@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:45 -07:00
Rusty Russell	cf914a7d65	readahead: split ondemand readahead interface into two functions Split ondemand readahead interface into two functions. I think this makes it a little clearer for non-readahead experts (like Rusty). Internally they both call ondemand_readahead(), but the page argument is changed to an obvious boolean flag. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:44 -07:00
Fengguang Wu	d8983910a4	readahead: pass real splice size Pass real splice size to page_cache_readahead_ondemand(). The splice code works in chunks of 16 pages internally. The readahead code should be told of the overall splice size, instead of the internal chunk size. Otherwize bad things may happen. Imagine some 17-page random splice reads. The code before this patch will result in two readahead calls: readahead(16); readahead(1); That leads to one 16-page I/O and one 32-page I/O: one extra I/O and 31 readahead miss pages. Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:44 -07:00
Fengguang Wu	431a4820bf	readahead: move synchronous readahead call out of splice loop Move synchronous page_cache_readahead_ondemand() call out of splice loop. This avoids one pointless page allocation/insertion in case of non-zero ra_pages, or many pointless readahead calls in case of zero ra_pages. Note that if a user sets ra_pages to less than PIPE_BUFFERS=16 pages, he will not get expected readahead behavior anyway. The splice code works in batches of 16 pages, which can be taken as another form of synchronous readahead. Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:44 -07:00
Fengguang Wu	dc7868fcb9	readahead: convert ext3/ext4 invocations Convert ext3/ext4 dir reads to use on-demand readahead. Readahead for dirs operates _not_ on file level, but on blockdev level. This makes a difference when the data blocks are not continuous. And the read routine is somehow opaque: there's no handy info about the status of current page. So a simplified call scheme is employed: to call into readahead whenever the current page falls out of readahead windows. Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Cc: Steven Pratt <slpratt@austin.ibm.com> Cc: Ram Pai <linuxram@us.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:44 -07:00
Fengguang Wu	a08a166fe7	readahead: convert splice invocations Convert splice reads to use on-demand readahead. Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Cc: Steven Pratt <slpratt@austin.ibm.com> Cc: Ram Pai <linuxram@us.ibm.com> Cc: Jens Axboe <axboe@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:44 -07:00
Michael Halcrow	64ee4808a7	eCryptfs: ecryptfs_setattr() bugfix There is another bug recently introduced into the ecryptfs_setattr() function in 2.6.22. eCryptfs will attempt to treat special files like regular eCryptfs files on chmod, chown, and so forth. This leads to a NULL pointer dereference. This patch validates that the file is a regular file before proceeding with operations related to the inode's crypt_stat. Thanks to Ryusuke Konishi for finding this bug and suggesting the fix. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:43 -07:00
Ravikiran G Thirumalai	4004c69ad6	Avoid too many remote cpu references due to /proc/stat Optimize show_stat to collect per-irq information just once. On x86_64, with newer kernel versions, kstat_irqs is a bit of a problem. On every call to kstat_irqs, the process brings in per-cpu data from all online cpus. Doing this for NR_IRQS, which is now 256 + 32 * NR_CPUS results in (256+3263) 63 remote cpu references on a 64 cpu config. Considering the fact that we already compute this value per-cpu, we can save on the remote references as below. Signed-off-by: Alok N Kataria <alok.kataria@calsoftinc.com> Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:43 -07:00
Akinobu Mita	e53252d97e	unregister_chrdev() return void unregister_chrdev() does not return meaningful value. This patch makes it return void like most unregister_* functions. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:43 -07:00
Cyrill Gorcunov	cb00ea3528	UDF: coding style conversion - lindent This patch converts UDF coding style to kernel coding style using Lindent. Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Jan Kara <jack@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:43 -07:00
Nick Piggin	83c54070ee	mm: fault feedback #2 This patch completes Linus's wish that the fault return codes be made into bit flags, which I agree makes everything nicer. This requires requires all handle_mm_fault callers to be modified (possibly the modifications should go further and do things like fault accounting in handle_mm_fault -- however that would be for another patch). [akpm@linux-foundation.org: fix alpha build] [akpm@linux-foundation.org: fix s390 build] [akpm@linux-foundation.org: fix sparc build] [akpm@linux-foundation.org: fix sparc64 build] [akpm@linux-foundation.org: fix ia64 build] Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ian Molton <spyro@f2s.com> Cc: Bryan Wu <bryan.wu@analog.com> Cc: Mikael Starvik <starvik@axis.com> Cc: David Howells <dhowells@redhat.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Greg Ungerer <gerg@uclinux.org> Cc: Matthew Wilcox <willy@debian.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp> Cc: Richard Curnow <rc@rc0.org.uk> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp> Cc: Chris Zankel <chris@zankel.net> Acked-by: Kyle McMartin <kyle@mcmartin.ca> Acked-by: Haavard Skinnemoen <hskinnemoen@atmel.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> [ Still apparently needs some ARM and PPC loving - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:41 -07:00
Nick Piggin	d0217ac04c	mm: fault feedback #1 Change ->fault prototype. We now return an int, which contains VM_FAULT_xxx code in the low byte, and FAULT_RET_xxx code in the next byte. FAULT_RET_ code tells the VM whether a page was found, whether it has been locked, and potentially other things. This is not quite the way he wanted it yet, but that's changed in the next patch (which requires changes to arch code). This means we no longer set VM_CAN_INVALIDATE in the vma in order to say that a page is locked which requires filemap_nopage to go away (because we can no longer remain backward compatible without that flag), but we were going to do that anyway. struct fault_data is renamed to struct vm_fault as Linus asked. address is now a void __user * that we should firmly encourage drivers not to use without really good reason. The page is now returned via a page pointer in the vm_fault struct. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:41 -07:00
Nick Piggin	54cb8821de	mm: merge populate and nopage into fault (fixes nonlinear) Nonlinear mappings are (AFAIKS) simply a virtual memory concept that encodes the virtual address -> file offset differently from linear mappings. ->populate is a layering violation because the filesystem/pagecache code should need to know anything about the virtual memory mapping. The hitch here is that the ->nopage handler didn't pass down enough information (ie. pgoff). But it is more logical to pass pgoff rather than have the ->nopage function calculate it itself anyway (because that's a similar layering violation). Having the populate handler install the pte itself is likewise a nasty thing to be doing. This patch introduces a new fault handler that replaces ->nopage and ->populate and (later) ->nopfn. Most of the old mechanism is still in place so there is a lot of duplication and nice cleanups that can be removed if everyone switches over. The rationale for doing this in the first place is that nonlinear mappings are subject to the pagefault vs invalidate/truncate race too, and it seemed stupid to duplicate the synchronisation logic rather than just consolidate the two. After this patch, MAP_NONBLOCK no longer sets up ptes for pages present in pagecache. Seems like a fringe functionality anyway. NOPAGE_REFAULT is removed. This should be implemented with ->fault, and no users have hit mainline yet. [akpm@linux-foundation.org: cleanup] [randy.dunlap@oracle.com: doc. fixes for readahead] [akpm@linux-foundation.org: build fix] Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:41 -07:00
Nick Piggin	d00806b183	mm: fix fault vs invalidate race for linear mappings Fix the race between invalidate_inode_pages and do_no_page. Andrea Arcangeli identified a subtle race between invalidation of pages from pagecache with userspace mappings, and do_no_page. The issue is that invalidation has to shoot down all mappings to the page, before it can be discarded from the pagecache. Between shooting down ptes to a particular page, and actually dropping the struct page from the pagecache, do_no_page from any process might fault on that page and establish a new mapping to the page just before it gets discarded from the pagecache. The most common case where such invalidation is used is in file truncation. This case was catered for by doing a sort of open-coded seqlock between the file's i_size, and its truncate_count. Truncation will decrease i_size, then increment truncate_count before unmapping userspace pages; do_no_page will read truncate_count, then find the page if it is within i_size, and then check truncate_count under the page table lock and back out and retry if it had subsequently been changed (ptl will serialise against unmapping, and ensure a potentially updated truncate_count is actually visible). Complexity and documentation issues aside, the locking protocol fails in the case where we would like to invalidate pagecache inside i_size. do_no_page can come in anytime and filemap_nopage is not aware of the invalidation in progress (as it is when it is outside i_size). The end result is that dangling (->mapping == NULL) pages that appear to be from a particular file may be mapped into userspace with nonsense data. Valid mappings to the same place will see a different page. Andrea implemented two working fixes, one using a real seqlock, another using a page->flags bit. He also proposed using the page lock in do_no_page, but that was initially considered too heavyweight. However, it is not a global or per-file lock, and the page cacheline is modified in do_no_page to increment _count and _mapcount anyway, so a further modification should not be a large performance hit. Scalability is not an issue. This patch implements this latter approach. ->nopage implementations return with the page locked if it is possible for their underlying file to be invalidated (in that case, they must set a special vm_flags bit to indicate so). do_no_page only unlocks the page after setting up the mapping completely. invalidation is excluded because it holds the page lock during invalidation of each page (and ensures that the page is not mapped while holding the lock). This also allows significant simplifications in do_no_page, because we have the page locked in the right place in the pagecache from the start. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:41 -07:00
David Chinner	c32676eea1	[XFS] Fix inode size update before data write in xfs_setattr When changing the file size by a truncate() call, we log the change in the inode size. However, we do not flush any outstanding data that might not have been written to disk, thereby violating the data/inode size update order. This can leave files full of NULLs on crash. Hence if we are truncating the file, flush any unwritten data that may lie between the curret on disk inode size and the new inode size that is being logged to ensure that ordering is preserved. SGI-PV: 966308 SGI-Modid: xfs-linux-melb:xfs-kern:29174a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-19 19:52:05 +10:00
David Chinner	91ebecc74e	[XFS] Allow punching holes to free space when at ENOSPC Make the free file space transaction able to dip into the reserved blocks to ensure that we can successfully free blocks when the filesystem is at ENOSPC. SGI-PV: 967788 SGI-Modid: xfs-linux-melb:xfs-kern:29167a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Vlad Apostolov <vapo@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-19 19:51:46 +10:00
David Chinner	4f57dbc6b5	[XFS] Implement ->page_mkwrite in XFS. Hook XFS up to ->page_mkwrite to ensure that we know about mmap pages being written to. This allows use to do correct delayed allocation and ENOSPC checking as well as remap unwritten extents so that they get converted correctly during writeback. This is done via the generic block_page_mkwrite code. SGI-PV: 940392 SGI-Modid: xfs-linux-melb:xfs-kern:29149a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-19 19:51:21 +10:00
David Chinner	5417169026	[FS] Implement block_page_mkwrite. Many filesystems need a ->page-mkwrite callout to correctly set up pages that have been written to by mmap. This is especially important when mmap is writing into holes as it allows filesystems to correctly account for and allocate space before the mmap write is allowed to proceed. Protection against truncate races is provided by locking the page and checking to see whether the page mapping is correct and whether it is beyond EOF so we don't end up allowing allocations beyond the current EOF or changing EOF as a result of a mmap write. SGI-PV: 940392 SGI-Modid: 2.6.x-xfs-melb:linux:29146a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-19 19:50:50 +10:00
Mark Fasheh	385820a38d	ocfs2: ->fallocate() support Plug ocfs2 into the ->fallocate() callback. This just re-uses the existing preallocation code. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-19 00:23:55 -07:00
Linus Torvalds	789c56b7f7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: (24 commits) [CIFS] merge conflict in fs/cifs/export.c [CIFS] Allow disabling CIFS Unix Extensions as mount option [CIFS] More whitespace/formatting fixes (noticed by checkpatch) [CIFS] Typo in previous patch [CIFS] zero_user_page() conversions [CIFS] use simple_prepare_write to zero page data [CIFS] Fix build break - inet.h not included when experimental ifdef off [CIFS] Add support for new POSIX unlink [CIFS] whitespace/formatting fixes [CIFS] Fix oops in cifs_create when nfsd server exports cifs mount [CIFS] whitespace cleanup [CIFS] Fix packet signatures for NTLMv2 case [CIFS] more whitespace fixes [CIFS] more whitespace cleanup [CIFS] whitespace cleanup [CIFS] whitespace cleanup [CIFS] ipv6 support no longer experimental [CIFS] Mount should fail if server signing off but client mount option requires it [CIFS] whitespace fixes [CIFS] Fix sign mount option and sign proc config setting ...	2007-07-18 18:32:28 -07:00
Linus Torvalds	29e7ee378e	Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: sysfs: cosmetic clean up on node creation failure paths sysfs: kill an extra put in sysfs_create_link() failure path Driver core: check return code of sysfs_create_link() HOWTO: Add the knwon_regression URI to the documentation dev_vdbg() documentation dev_vdbg(), available with -DVERBOSE_DEBUG sysfs: make sysfs_init_inode() static sysfs: fix sysfs root inode nlink accounting Documentation fix devres.txt: lib/iomap.c -> lib/devres.c sysfs: avoid kmem_cache_free(NULL) PM: remove deprecated dpm_runtime_* routines PM: Remove deprecated sysfs files Driver core: accept all valid action-strings in uevent-trigger debugfs: remove rmdir() non-empty complaint	2007-07-18 18:28:08 -07:00
Linus Torvalds	a8dcf12f9e	Merge branch 'for-linus' of git://linux-nfs.org/~bfields/linux * 'for-linus' of git://linux-nfs.org/~bfields/linux: locks: fix vfs_test_lock() comment locks: make posix_test_lock() interface more consistent nfs: disable leases over NFS gfs2: stop giving out non-cluster-coherent leases locks: export setlease to filesystems locks: provide a file lease method enabling cluster-coherent leases locks: rename lease functions to reflect locks.c conventions locks: share more common lease code locks: clean up lease_alloc() locks: convert an -EINVAL return to a BUG leases: minor break_lease() comment clarification	2007-07-18 18:27:00 -07:00
Steve French	1ff8392c32	Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: fs/cifs/export.c	2007-07-19 00:38:57 +00:00
Steve French	70b315b0dd	[CIFS] merge conflict in fs/cifs/export.c Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-07-19 00:32:25 +00:00
Steve French	c18c842b1f	[CIFS] Allow disabling CIFS Unix Extensions as mount option Previously the only way to do this was to umount all mounts to that server, turn off a proc setting (/proc/fs/cifs/LinuxExtensionsEnabled). Fixes Samba bugzilla bug number: 4582 (and also 2008) Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-07-18 23:21:09 +00:00
J. Bruce Fields	6924c55492	locks: fix vfs_test_lock() comment Thanks to Doug Chapman for pointing out that the comment here is inconsistent with the function prototype. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>	2007-07-18 19:17:19 -04:00
J. Bruce Fields	6d34ac199a	locks: make posix_test_lock() interface more consistent Since posix_test_lock(), like fcntl() and ->lock(), indicates absence or presence of a conflict lock by setting fl_type to, respectively, F_UNLCK or something other than F_UNLCK, the return value is no longer needed. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>	2007-07-18 19:17:19 -04:00
J. Bruce Fields	370f6599e8	nfs: disable leases over NFS As Peter Staubach says elsewhere (http://marc.info/?l=linux-kernel&m=118113649526444&w=2): > The problem is that some file system such as NFSv2 and NFSv3 do > not have sufficient support to be able to support leases correctly. > In particular for these two file systems, there is no over the wire > protocol support. > > Currently, these two file systems fail the fcntl(F_SETLEASE) call > accidentally, due to a reference counting difference. These file > systems should fail more consciously, with a proper error to > indicate that the call is invalid for them. Define an nfs setlease method that just returns -EINVAL. If someone can demonstrate a real need, perhaps we could reenable them in the presence of the "nolock" mount option. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Cc: Peter Staubach <staubach@redhat.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-18 19:17:19 -04:00
Marc Eshel	60446067ba	gfs2: stop giving out non-cluster-coherent leases Since gfs2 can't prevent conflicting opens or leases on other nodes, we probably shouldn't allow it to give out leases at all. Put the newly defined lease operation into use in gfs2 by turning off lease, unless we're using the "nolock' locking module (in which case all locking is local anyway). Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: Steven Whitehouse <swhiteho@redhat.com>	2007-07-18 19:17:19 -04:00
J. Bruce Fields	4698afe8e3	locks: export setlease to filesystems Export setlease so it can used by filesystems to implement their lease methods. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>	2007-07-18 19:17:06 -04:00
J. Bruce Fields	f9ffed26d6	locks: provide a file lease method enabling cluster-coherent leases Currently leases are only kept locally, so there's no way for a distributed filesystem to enforce them against multiple clients. We're particularly interested in the case of nfsd exporting a cluster filesystem, in which case nfsd needs cluster-coherent leases in order to implement delegations correctly. Also add some documentation. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2007-07-18 19:14:47 -04:00
J. Bruce Fields	a9933cea7a	locks: rename lease functions to reflect locks.c conventions We've been using the convention that vfs_foo is the function that calls a filesystem-specific foo method if it exists, or falls back on a generic method if it doesn't; thus vfs_foo is what is called when some other part of the kernel (normally lockd or nfsd) wants to get a lock, whereas foo is what filesystems call to use the underlying local functionality as part of their lock implementation. So rename setlease to vfs_setlease (which will call a filesystem-specific setlease after a later patch) and __setlease to setlease. Also, vfs_setlease need only be GPL-exported as long as it's only needed by lockd and nfsd. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>	2007-07-18 19:14:12 -04:00
J. Bruce Fields	6d5e8b05ca	locks: share more common lease code Share more code between setlease (used by nfsd) and fcntl. Also some minor cleanup. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Acked-by: Christoph Hellwig <hch@infradead.org>	2007-07-18 19:09:27 -04:00
J. Bruce Fields	e32b8ee27b	locks: clean up lease_alloc() Return the newly allocated structure as the return value instead of using a struct ** parameter. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2007-07-18 19:09:27 -04:00
J. Bruce Fields	d2ab0b0c4c	locks: convert an -EINVAL return to a BUG There's no point trying to return an error in these cases, which all represent bugs in the callers. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2007-07-18 19:09:27 -04:00
david m. richter	87250dd26a	leases: minor break_lease() comment clarification clarify that break_lease() checks for presence of any lock, not just leases. Signed-off-by: David M. Richter <richterd@citi.umich.edu> Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>	2007-07-18 19:09:27 -04:00
Tejun Heo	967e35dcc9	sysfs: cosmetic clean up on node creation failure paths Node addition failure is detected by testing return value of sysfs_addfm_finish() which returns the number of added and removed nodes. As the function is called as the last step of addition right on top of error handling block, the if blocks looked like the following. if (sysfs_addrm_finish(&acxt)) success handling, usually return; /* fall through to error handling */ This is the opposite of usual convention in sysfs and makes the code difficult to understand. This patch inverts the test and makes those blocks look more like others. Signed-off-by: Tejun Heo <htejun@gmail.com> Cc: Gabriel C <nix.or.die@googlemail.com> Cc: Miles Lane <miles.lane@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-18 15:49:50 -07:00
Tejun Heo	a1da4dfe35	sysfs: kill an extra put in sysfs_create_link() failure path There is a subtle bug in sysfs_create_link() failure path. When symlink creation fails because there's already a node with the same name, the target sysfs_dirent is put twice - once by failure path of sysfs_create_link() and once more when the symlink is released. Fix it by making only the symlink node responsible for putting target_sd. Signed-off-by: Tejun Heo <htejun@gmail.com> Cc: Gabriel C <nix.or.die@googlemail.com> Cc: Miles Lane <miles.lane@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-18 15:49:50 -07:00
Tejun Heo	bc37e28303	sysfs: make sysfs_init_inode() static With sysfs_fill_super() converted to use sysfs_get_inode(), there is no user of sysfs_init_inode() outside of fs/sysfs/inode.c. Make it static. Signed-off-by: Tejun Heo <htejun@gmail.com> Acked-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-18 15:49:49 -07:00
Tejun Heo	e080e436f6	sysfs: fix sysfs root inode nlink accounting While making sysfs indoes hashed, sysfs root inode was left out. Now that nlink accounting depends on the inode being on the hash, sysfs root inode nlink isn't adjusted properly. Put sysfs root inode on the inode hash by allocating it using sysfs_get_inode() like other sysfs inodes. While at it, massage comments a bit. Signed-off-by: Tejun Heo <htejun@gmail.com> Acked-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-18 15:49:49 -07:00
Akinobu Mita	01da2425f3	sysfs: avoid kmem_cache_free(NULL) kmem_cache_free() with NULL is not allowed. But it may happen if out of memory error is triggered in sysfs_new_dirent(). This patch fixes that error handling. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-18 15:49:49 -07:00
Jens Axboe	a6bb340da3	debugfs: remove rmdir() non-empty complaint Hi, This patch kills the pointless debugfs rmdir() printk() when called on a non-empty directory. blktrace will sometimes have to call it a few times when forcefully ending a trace, which polutes the log with pointless warnings. Rationale: - It's more code to work-around this "problem" in the debugfs users, and you would have to add code to check for empty directories to do so (or assume that debugfs is using simple_ helpers, but that would be a layering violation). - Other rmdir() implementations don't complain about something this silly. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-18 15:49:48 -07:00
Linus Torvalds	d756d10e24	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: extent macros cleanup Fix compilation with EXT_DEBUG, also fix leXX_to_cpu conversions. ext4: remove extra IS_RDONLY() check ext4: Use is_power_of_2() Use zero_user_page() in ext4 where possible ext4: Remove 65000 subdirectory limit ext4: Expand extra_inodes space per the s_{want,min}_extra_isize fields ext4: Add nanosecond timestamps jbd2: Move jbd2-debug file to debugfs jbd2: Fix CONFIG_JBD_DEBUG ifdef to be CONFIG_JBD2_DEBUG ext4: Set the journal JBD2_FEATURE_INCOMPAT_64BIT on large devices ext4: Make extents code sanely handle on-disk corruption ext4: copy i_flags to inode flags on write ext4: Enable extents by default Change on-disk format to support 2^15 uninitialized extents write support for preallocated blocks fallocate support in ext4 sys_fallocate() implementation on i386, x86_64 and powerpc	2007-07-18 10:32:00 -07:00
Jeremy Fitzhardinge	86313c488a	usermodehelper: Tidy up waiting Rather than using a tri-state integer for the wait flag in call_usermodehelper_exec, define a proper enum, and use that. I've preserved the integer values so that any callers I've missed should still work OK. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Andi Kleen <ak@suse.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Joel Becker <joel.becker@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: David Howells <dhowells@redhat.com>	2007-07-18 08:47:40 -07:00
Dmitry Monakhov	e9f410b1c0	ext4: extent macros cleanup Use the EXT_LAST_INDEX macro; that's what it's there for. Clean up ext4_ext_ext_grow_indepth() so the correct EXT_FIRST_INDEX or EXT_FIRST_MACRO is used as necessary. The two macros are equivalent, so the C will collapse the if statement out, but it makes the code much more readable. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Acked-by: Alex Tomas <alex@clusterfs.com> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Singed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-07-18 09:09:15 -04:00
Dmitry Monakhov	26d535ed24	Fix compilation with EXT_DEBUG, also fix leXX_to_cpu conversions. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Acked-by: Alex Tomas <alex@clusterfs.com> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-07-18 08:33:37 -04:00
Dave Hansen	d699594dc1	ext4: remove extra IS_RDONLY() check ext4_change_inode_journal_flag() is only called from one location: ext4_ioctl(EXT3_IOC_SETFLAGS). That ioctl case already has a IS_RDONLY() call in it so this one is superfluous. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-07-18 08:33:51 -04:00
Vignesh Babu	1330593eb2	ext4: Use is_power_of_2() Replace (n & (n-1)) in the context of power of 2 checks with is_power_of_2() Signed-off-by: Vignesh Babu <vignesh.babu@wipro.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-07-18 09:11:02 -04:00
Eric Sandeen	fc0e15a667	Use zero_user_page() in ext4 where possible Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-07-18 09:20:44 -04:00
Andreas Dilger	f8628a14a2	ext4: Remove 65000 subdirectory limit This patch adds support to ext4 for allowing more than 65000 subdirectories. Currently the maximum number of subdirectories is capped at 32000. If we exceed 65000 subdirectories in an htree directory it sets the inode link count to 1 and no longer counts subdirectories. The directory link count is not actually used when determining if a directory is empty, as that only counts subdirectories and not regular files that might be in there. A EXT4_FEATURE_RO_COMPAT_DIR_NLINK flag has been added and it is set if the subdir count for any directory crosses 65000. A later fsck will clear EXT4_FEATURE_RO_COMPAT_DIR_NLINK if there are no longer any directory with >65000 subdirs. Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Kalpak Shah <kalpak@clusterfs.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-07-18 08:38:01 -04:00
Kalpak Shah	6dd4ee7cab	ext4: Expand extra_inodes space per the s_{want,min}_extra_isize fields We need to make sure that existing ext3 filesystems can also avail the new fields that have been added to the ext4 inode. We use s_want_extra_isize and s_min_extra_isize to decide by how much we should expand the inode. If EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE feature is set then we expand the inode by max(s_want_extra_isize, s_min_extra_isize , sizeof(ext4_inode) - EXT4_GOOD_OLD_INODE_SIZE) bytes. Actually it is still an open question about whether users should be able to set s_*_extra_isize smaller than the known fields or not. This patch also adds the functionality to expand inodes to include the newly added fields. We start by trying to expand by s_want_extra_isize bytes and if its fails we try to expand by s_min_extra_isize bytes. This is done by changing the i_extra_isize if enough space is available in the inode and no EAs are present. If EAs are present and there is enough space in the inode then the EAs in the inode are shifted to make space. If enough space is not available in the inode due to the EAs then 1 or more EAs are shifted to the external EA block. In the worst case when even the external EA block does not have enough space we inform the user that some EA would need to be deleted or s_min_extra_isize would have to be reduced. Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Kalpak Shah <kalpak@clusterfs.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-07-18 09:19:57 -04:00
Kalpak Shah	ef7f38359e	ext4: Add nanosecond timestamps This patch adds nanosecond timestamps for ext4. This involves adding *time_extra fields to the ext4_inode to extend the timestamps to 64-bits. Creation time is also added by this patch. These extended fields will fit into an inode if the filesystem was formatted with large inodes (-I 256 or larger) and there are currently no EAs consuming all of the available space. For new inodes we always reserve enough space for the kernel's known extended fields, but for inodes created with an old kernel this might not have been the case. So this patch also adds the EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE feature flag(ro-compat so that older kernels can't create inodes with a smaller extra_isize). which indicates if the fields fitting inside s_min_extra_isize are available or not. If the expansion of inodes if unsuccessful then this feature will be disabled. This feature is only enabled if requested by the sysadmin. None of the extended inode fields is critical for correct filesystem operation. Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Kalpak Shah <kalpak@clusterfs.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-07-18 09:15:20 -04:00
Jose R. Santos	0f49d5d019	jbd2: Move jbd2-debug file to debugfs The jbd2-debug file used to be located in /proc/sys/fs/jbd2-debug, but it incorrectly used create_proc_entry() instead of the sysctl routines, and no proc entry was ever created. Instead of fixing this we might as well move the jbd2-debug file to debugfs which would be the preferred location for this kind of tunable. The new location is now /sys/kernel/debug/jbd2/jbd2-debug. Signed-off-by: Jose R. Santos <jrs@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-07-18 08:50:18 -04:00
Jose R. Santos	e23291b912	jbd2: Fix CONFIG_JBD_DEBUG ifdef to be CONFIG_JBD2_DEBUG When the JBD code was forked to create the new JBD2 code base, the references to CONFIG_JBD_DEBUG where never changed to CONFIG_JBD2_DEBUG. This patch fixes that. Signed-off-by: Jose R. Santos <jrs@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-07-18 08:57:06 -04:00

1 2 3 4 5 ...

6182 commits