linux-stable

Commit Graph

Author	SHA1	Message	Date
Miklos Szeredi	d70b67c8bc	[patch] vfs: fix lookup on deleted directory Lookup can install a child dentry for a deleted directory. This keeps the directory dentry alive, and the inode pinned in the cache and on disk, even after all external references have gone away. This isn't a big problem normally, since memory pressure or umount will clear out the directory dentry and its children, releasing the inode. But for UBIFS this causes problems because its orphan area can overflow. Fix this by returning ENOENT for all lookups on a S_DEAD directory before creating a child dentry. Thanks to Zoltan Sogor for noticing this while testing UBIFS, and Artem for the excellent analysis of the problem and testing. Reported-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Tested-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-07-26 20:53:05 -04:00
Theodore Ts'o	00b32b7fb6	ext4: unexport jbd2_journal_update_superblock Remove the unused EXPORT_SYMBOL(jbd2_journal_update_superblock). Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-07-26 17:33:53 -04:00
Theodore Ts'o	2b2d6d0197	ext4: Cleanup whitespace and other miscellaneous style issues Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-07-26 16:15:44 -04:00
Linus Torvalds	4b1fefaca9	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: When verifying the decoded header before decoding the object identifier [CIFS] Fix warnings from checkpatch [CIFS] Fix improper endian conversion of ACL subauth field [CIFS] Fix possible double free if search immediately after search rewind fails [CIFS] remove checkpatch warning Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> cifs: assorted endian annotations [CIFS] break ATTR_SIZE changes out into their own function lockdep: annotate cifs in-kernel sockets [CIFS] Fix compiler warning on 64-bit	2008-07-26 12:45:32 -07:00
Roland McGrath	ebcb67341f	/proc/PID/syscall This adds /proc/PID/syscall and /proc/PID/task/TID/syscall magic files. These use task_current_syscall() to show the task's current system call number and argument registers, stack pointer and PC. For a task blocked but not in a syscall, the file shows "-1" in place of the syscall number, followed by only the SP and PC. For a task that's not blocked, it shows "running". Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Reviewed-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:10 -07:00
Roland McGrath	0d094efeb1	tracehook: tracehook_tracer_task This adds the tracehook_tracer_task() hook to consolidate all forms of "Who is using ptrace on me?" logic. This is used for "TracerPid:" in /proc and for permission checks. We also clean up the selinux code the called an identical accessor. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Reviewed-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:08 -07:00
Roland McGrath	6341c393fc	tracehook: exec This moves all the ptrace hooks related to exec into tracehook.h inlines. This also lifts the calls for tracing out of the binfmt load_binary hooks into search_binary_handler() after it calls into the binfmt module. This change has no effect, since all the binfmt modules' load_binary functions did the call at the end on success, and now search_binary_handler() does it immediately after return if successful. We consolidate the repeated code, and binfmt modules no longer need to import ptrace_notify(). Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Reviewed-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:08 -07:00
Arjan van de Ven	267e2a9c71	Use WARN() in fs/proc/ Use WARN() instead of a printk+WARN_ON() pair; this way the message becomes part of the warning section for better reporting/collection. This way, the entire if() {} section can collapse into the WARN() as well. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:08 -07:00
Arjan van de Ven	99fcd77d15	Use WARN() in fs/sysfs Use WARN() instead of a printk+WARN_ON() pair; this way the message becomes part of the warning section for better reporting/collection. Also, with this, one fo the if() sections collapses entirely into the WARN(). Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:07 -07:00
Arjan van de Ven	5c752ad9f3	Use WARN() in fs/ Use WARN() instead of a printk+WARN_ON() pair; this way the message becomes part of the warning section for better reporting/collection. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:07 -07:00
Alexey Dobriyan	51cc50685a	SL*B: drop kmem cache argument from constructor Kmem cache passed to constructor is only needed for constructors that are themselves multiplexeres. Nobody uses this "feature", nor does anybody uses passed kmem cache in non-trivial way, so pass only pointer to object. Non-trivial places are: arch/powerpc/mm/init_64.c arch/powerpc/mm/hugetlbpage.c This is flag day, yes. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Christoph Lameter <cl@linux-foundation.org> Cc: Jon Tollefson <kniht@linux.vnet.ibm.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Matt Mackall <mpm@selenic.com> [akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c] [akpm@linux-foundation.org: fix mm/slab.c] [akpm@linux-foundation.org: fix ubifs] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:07 -07:00
Nick Piggin	19fd623127	mm: spinlock tree_lock mapping->tree_lock has no read lockers. convert the lock from an rwlock to a spinlock. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Hugh Dickins <hugh@veritas.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:06 -07:00
Nick Piggin	bc40d73c95	splice: use get_user_pages_fast Use get_user_pages_fast in splice. This reverts some mmap_sem batching there, however the biggest problem with mmap_sem tends to be hold times blocking out other threads rather than cacheline bouncing. Further: on architectures that implement get_user_pages_fast without locks, mmap_sem can be avoided completely anyway. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andi Kleen <andi@firstfloor.org> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Zach Brown <zach.brown@oracle.com> Cc: Jens Axboe <jens.axboe@oracle.com> Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:06 -07:00
Nick Piggin	f5dd33c494	dio: use get_user_pages_fast Use get_user_pages_fast in the common/generic block and fs direct IO paths. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andi Kleen <andi@firstfloor.org> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Zach Brown <zach.brown@oracle.com> Cc: Jens Axboe <jens.axboe@oracle.com> Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:06 -07:00
Bob Copeland	63ca8ce2a2	omfs: update kbuild to include OMFS Adds OMFS to the fs Kconfig and Makefile Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:05 -07:00
Bob Copeland	36cc410a67	omfs: add bitmap routines Add block allocation and block bitmap management routines for OMFS. Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:05 -07:00
Bob Copeland	8f09e98768	omfs: add file routines Add functions for reading and manipulating the storage of file data in the extent-based OMFS. Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:05 -07:00
Bob Copeland	a3ab7155ea	omfs: add directory routines Add lookup and directory management routines for OMFS. The filesystem uses hashing based on the filename and stores collisions, unordered, in siblings of files' inode structures. To support telldir, the current position in the hash table is encoded in fpos. Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:05 -07:00
Bob Copeland	555e3775ce	omfs: add inode routines Add basic superblock and inode handling routines for OMFS Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:05 -07:00
Bob Copeland	1b002d7b17	omfs: define filesystem structures Add header files containing OMFS on-disk and memory structures. Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:05 -07:00
Dmitri Vorobiev	3f165e4cf2	bfs: kill BKL Replace the BKL-based locking scheme used in the bfs driver by a private filesystem-wide mutex. Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@movial.fi> Cc: Tigran Aivazian <tigran_aivazian@symantec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:03 -07:00
Dmitri Vorobiev	75b25b4cab	bfs: assorted cleanups This patch makes the following cleanups: o removing an unused variable from bfs_fill_super(); o removing unneeded blank spaces from pointer definitions. Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@movial.fi> Cc: Tigran Aivazian <tigran_aivazian@symantec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:03 -07:00
Matthias Kaehlcke	7d135a5d50	affs: convert s_bmlock into a mutex The semaphore s_bmlock is used as a mutex. Convert it to the mutex API. Signed-off-by: Matthias Kaehlcke <matthias@kaehlcke.net> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:03 -07:00
Kentaro Makita	f3c6ba986a	vfs: add cond_resched_lock while scanning dentry LRU lists Add cond_resched_lock(&dcache_lock) while scanning LRU lists on superblocks in __shrink_dcache_sb() Signed-off-by: Kentaro Makita <k-makita@np.css.fujitsu.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:02 -07:00
Linus Torvalds	5047887caf	Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (34 commits) powerpc: Wireup new syscalls Move update_mmu_cache() declaration from tlbflush.h to pgtable.h powerpc/pseries: Remove kmalloc call in handling writes to lparcfg powerpc/pseries: Update arch vector to indicate support for CMO ibmvfc: Add support for collaborative memory overcommit ibmvscsi: driver enablement for CMO ibmveth: enable driver for CMO ibmveth: Automatically enable larger rx buffer pools for larger mtu powerpc/pseries: Verify CMO memory entitlement updates with virtual I/O powerpc/pseries: vio bus support for CMO powerpc/pseries: iommu enablement for CMO powerpc/pseries: Add CMO paging statistics powerpc/pseries: Add collaborative memory manager powerpc/pseries: Utilities to set firmware page state powerpc/pseries: Enable CMO feature during platform setup powerpc/pseries: Split retrieval of processor entitlement data into a helper routine powerpc/pseries: Add memory entitlement capabilities to /proc/ppc64/lparcfg powerpc/pseries: Split processor entitlement retrieval and gathering to helper routines powerpc/pseries: Remove extraneous error reporting for hcall failures in lparcfg powerpc: Fix compile error with binutils 2.15 ... Fixed up conflict in arch/powerpc/platforms/52xx/Kconfig manually.	2008-07-25 11:08:17 -07:00
Miklos Szeredi	48e90761b5	fuse: lockd support If fuse filesystem doesn't define it's own lock operations, then allow the lock manager to work with fuse. Adding lockd support for remote locking is also possible, but more rarely used, so leave it till later. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Matthew Wilcox <matthew@wil.cx> Cc: David Teigland <teigland@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:48 -07:00
Miklos Szeredi	33670fa296	fuse: nfs export special lookups Implement the get_parent export operation by sending a LOOKUP request with ".." as the name. Implement looking up an inode by node ID after it has been evicted from the cache. This is done by seding a LOOKUP request with "." as the name (for all file types, not just directories). The filesystem can set the FUSE_EXPORT_SUPPORT flag in the INIT reply, to indicate that it supports these special lookups. Thanks to John Muir for the original implementation of this feature. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Matthew Wilcox <matthew@wil.cx> Cc: David Teigland <teigland@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:48 -07:00
Miklos Szeredi	c180eebe13	fuse: add fuse_lookup_name() helper Add a new helper function which sends a LOOKUP request with the supplied name. This will be used by the next patch to send special LOOKUP requests with "." and ".." as the name. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:48 -07:00
Miklos Szeredi	dbd561d236	fuse: add export operations Implement export_operations, to allow fuse filesystems to be exported to NFS. This feature has been in the out-of-tree fuse module, and is widely used and tested. It has not been originally merged into mainline, because doing the NFS export in userspace was thought to be a cleaner and more efficient way of doing it, than through the kernel. While that is true, it would also have involved a lot of duplicated effort at reimplementing NFS exporting (all the different versions of the protocol). This effort was unfortunately not undertaken by anyone, so we are left with doing it the easy but less efficient way. If this feature goes in, the out-of-tree fuse module can go away, which would have several advantages: - not having to maintain two versions - less confusion for users - no bugs due to kernel API changes Comment from hch: - Use the same fh_type values as XFS, since we use the same fh encoding. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:48 -07:00
Miklos Szeredi	0de6256daa	fuse: prepare lookup for nfs export Use d_splice_alias() instead of d_add() in fuse lookup code, to allow NFS exporting. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:48 -07:00
Miklos Szeredi	764c76b371	locks: allow ->lock() to return FILE_LOCK_DEFERRED Allow filesystem's ->lock() method to call posix_lock_file() instead of posix_lock_file_wait(), and return FILE_LOCK_DEFERRED. This makes it possible to implement a such a ->lock() function, that works with the lock manager, which needs the call to be asynchronous. Now the vfs_lock_file() helper can be used, so this is a cleanup as well. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Matthew Wilcox <matthew@wil.cx> Cc: David Teigland <teigland@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:48 -07:00
Miklos Szeredi	b648a6de00	locks: cleanup code duplication Extract common code into a function. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Matthew Wilcox <matthew@wil.cx> Cc: David Teigland <teigland@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:48 -07:00
Miklos Szeredi	bde74e4bc6	locks: add special return value for asynchronous locks Use a special error value FILE_LOCK_DEFERRED to mean that a locking operation returned asynchronously. This is returned by posix_lock_file() for sleeping locks to mean that the lock has been queued on the block list, and will be woken up when it might become available and needs to be retried (either fl_lmops->fl_notify() is called or fl_wait is woken up). f_op->lock() to mean either the above, or that the filesystem will call back with fl_lmops->fl_grant() when the result of the locking operation is known. The filesystem can do this for sleeping as well as non-sleeping locks. This is to make sure, that return values of -EAGAIN and -EINPROGRESS by filesystems are not mistaken to mean an asynchronous locking. This also makes error handling in fs/locks.c and lockd/svclock.c slightly cleaner. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Matthew Wilcox <matthew@wil.cx> Cc: David Teigland <teigland@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:47 -07:00
Miklos Szeredi	cc77b1521d	lockd: dont return EAGAIN for a permanent error Fix nlm_fopen() to return NLM_FAILED (or NLM_LCK_DENIED_NOLOCKS) instead of NLM_LCK_DENIED. The latter means the lock request failed because of a conflicting lock (i.e. a temporary error), which is wrong in this case. Also fix the client to return ENOLCK instead of EAGAIN if a blocking lock request returns with NLM_LOCK_DENIED. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Matthew Wilcox <matthew@wil.cx> Cc: David Teigland <teigland@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:47 -07:00
Andrea Righi	297c5d9263	task IO accounting: provide distinct tgid/tid I/O statistics Report per-thread I/O statistics in /proc/pid/task/tid/io and aggregate parent I/O statistics in /proc/pid/io. This approach follows the same model used to account per-process and per-thread CPU times. As a practial application, this allows for example to quickly find the top I/O consumer when a process spawns many child threads that perform the actual I/O work, because the aggregated I/O statistics can always be found in /proc/pid/io. [ Oleg Nesterov points out that we should check that the task is still alive before we iterate over the threads, but also says that we can do that fixup on top of this later. - Linus ] Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: Andrea Righi <righi.andrea@gmail.com> Cc: Matt Heaton <matt@hostmonster.com> Cc: Shailabh Nagar <nagar@watson.ibm.com> Acked-by-with-comments: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:47 -07:00
Alexey Dobriyan	6eedf8d30d	proc: move Kconfig to fs/proc/Kconfig Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:45 -07:00
Alexey Dobriyan	a9bd4a3e07	proc: remove pathetic remount code MS_RMT_MASK will unmask changes in do_remount_sb() anyway. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:45 -07:00
Alexey Dobriyan	881adb8535	proc: always do ->release Current two-stage scheme of removing PDE emphasizes one bug in proc: open rmmod remove_proc_entry close ->release won't be called because ->proc_fops were cleared. In simple cases it's small memory leak. For every ->open, ->release has to be done. List of openers is introduced which is traversed at remove_proc_entry() if neeeded. Discussions with Al long ago (sigh). Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:44 -07:00
Adrian Bunk	6e644c3126	move proc_kmsg_operations to fs/proc/internal.h This patch moves the extern of struct proc_kmsg_operations to fs/proc/internal.h and adds an #include "internal.h" to fs/proc/kmsg.c so that the latter sees the former. Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:44 -07:00
Thomas Gleixner	d991696263	fs/partitions/efi: convert to pr_debug convert the local Dprintk() compile time debug printk wrappers to the generic pr_debug() wrapper. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Matt Domsch <Matt_Domsch@dell.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:44 -07:00
Abdel Benamrouche	04ebd4aee5	block/ioctl.c and fs/partition/check.c: check value returned by add_partition() Now that add_partition() has been aught to propagate errors, let's check them. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Abdel Benamrouche <draconux@gmail.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:44 -07:00
Abdel Benamrouche	d805dda412	fs/partition/check.c: fix return value warning fs/partitions/check.c:381: warning: ignoring return value of ___device_add___, declared with attribute warn_unused_result [akpm@linux-foundation.org: multiple-return-statements-per-function are evil] Signed-off-by: Abdel Benamrouche <draconux@gmail.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:44 -07:00
Edgar E. Iglesias	79885b2277	elf: use ELF_CORE_EFLAGS for kcore ELF header flags ELF_CORE_EFLAGS is already used by the binfmt_elf coredumper to set correct arch specific ELF header flags on coredumps. Use it for kcore dumps as well. At the moment, this affects the CRIS and the H8300 arch. Signed-off-by: Edgar E. Iglesias <edgar@axis.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:42 -07:00
Oleg Nesterov	565b9b14e7	coredump: format_corename: fix the "core_uses_pid" logic I don't understand why the multi-thread coredump implies the core_uses_pid behaviour, but we shouldn't use mm->mm_users for that. This counter can be incremented by get_task_mm(). Use the valued returned by coredump_wait() instead. Also, remove the "const char *pattern" argument, format_corename() can use core_pattern directly. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:40 -07:00
Oleg Nesterov	a94e2d408e	coredump: kill mm->core_done Now that we have core_state->dumper list we can use it to wake up the sub-threads waiting for the coredump completion. This uglifies the code and .text grows by 47 bytes, but otoh mm_struct lessens by sizeof(struct completion). Also, with this change we can decouple exit_mm() from the coredumping code. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:40 -07:00
Oleg Nesterov	182c515fd2	coredump: elf_fdpic_core_dump: use core_state->dumper list Kill the nasty rcu_read_lock() + do_each_thread() loop, use the list encoded in mm->core_state instead, s/GFP_ATOMIC/GFP_KERNEL/. This patch allows futher cleanups in binfmt_elf_fdpic.c. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Roland McGrath <roland@redhat.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:40 -07:00
Oleg Nesterov	83914441f9	coredump: elf_core_dump: use core_state->dumper list Kill the nasty rcu_read_lock() + do_each_thread() loop, use the list encoded in mm->core_state instead, s/GFP_ATOMIC/GFP_KERNEL/. This patch allows futher cleanups in binfmt_elf.c, in particular we can kill the parallel info->threads list. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:40 -07:00
Oleg Nesterov	b564daf806	coredump: construct the list of coredumping threads at startup time binfmt->core_dump() has to iterate over the all threads in system in order to find the coredumping threads and construct the list using the GFP_ATOMIC allocations. With this patch each thread allocates the list node on exit_mm()'s stack and adds itself to the list. This allows us to do further changes: - simplify ->core_dump() - change exit_mm() to clear ->mm first, then wait for ->core_done. this makes the coredumping process visible to oom_kill - kill mm->core_done Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:40 -07:00
Oleg Nesterov	9d5b327bf1	coredump: make mm->core_state visible to ->core_dump() Move the "struct core_state core_state" from coredump_wait() to do_coredump(), this makes mm->core_state visible to binfmt->core_dump(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:40 -07:00
Oleg Nesterov	c5f1cc8c18	coredump: turn core_state->nr_threads into atomic_t Turn core_state->nr_threads into atomic_t and kill now unneeded down_write(&mm->mmap_sem) in exit_mm(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:39 -07:00
Oleg Nesterov	8cd9c24912	coredump: simplify core_state->nr_threads calculation Change zap_process() to return int instead of incrementing mm->core_state->nr_threads directly. Change zap_threads() to set mm->core_state only on success. This patch restores the original size of .text, and more importantly now ->nr_threads is used in two places only. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:39 -07:00
Oleg Nesterov	999d9fc167	coredump: move mm->core_waiters into struct core_state Move mm->core_waiters into "struct core_state" allocated on stack. This shrinks mm_struct a little bit and allows further changes. This patch mostly does s/core_waiters/core_state. The only essential change is that coredump_wait() must clear mm->core_state before return. The coredump_wait()'s path is uglified and .text grows by 30 bytes, this is fixed by the next patch. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:39 -07:00
Oleg Nesterov	32ecb1f26d	coredump: turn mm->core_startup_done into the pointer to struct core_state mm->core_startup_done points to "struct completion startup_done" allocated on the coredump_wait()'s stack. Introduce the new structure, core_state, which holds this "struct completion". This way we can add more info visible to the threads participating in coredump without enlarging mm_struct. No changes in affected .o files. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:39 -07:00
Oleg Nesterov	24d5288f06	coredump: elf_core_dump: skip kernel threads linux_binfmt->core_dump() runs before the process does exit_aio(), this means that we can hit the kernel thread which shares the same ->mm. Afaics, nothing really bad can happen, but perhaps it makes sense to fix this minor bug. It is sad we have to iterate over all threads in system and use GFP_ATOMIC. Hopefully we can kill theses ugly do_each_thread()s, but this needs some nontrivial changes in mm_struct and do_coredump. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:39 -07:00
Oleg Nesterov	15b9f360c0	coredump: zap_threads() must skip kernel threads The main loop in zap_threads() must skip kthreads which may use the same mm. Otherwise we "kill" this thread erroneously (for example, it can not fork or exec after that), and the coredumping task stucks in the TASK_UNINTERRUPTIBLE state forever because of the wrong ->core_waiters count. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:39 -07:00
Oleg Nesterov	246bb0b1de	kill PF_BORROWED_MM in favour of PF_KTHREAD Kill PF_BORROWED_MM. Change use_mm/unuse_mm to not play with ->flags, and do s/PF_BORROWED_MM/PF_KTHREAD/ for a couple of other users. No functional changes yet. But this allows us to do further fixes/cleanups. oom_kill/ptrace/etc often check "p->mm != NULL" to filter out the kthreads, this is wrong because of use_mm(). The problem with PF_BORROWED_MM is that we need task_lock() to avoid races. With this patch we can check PF_KTHREAD directly, or use a simple lockless helper: /* The result must not be dereferenced !!! / struct mm_struct __get_task_mm(struct task_struct *tsk) { if (tsk->flags & PF_KTHREAD) return NULL; return tsk->mm; } Note also ecard_task(). It runs with ->mm != NULL, but it's the kernel thread without PF_BORROWED_MM. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:39 -07:00
Oleg Nesterov	7b34e4283c	introduce PF_KTHREAD flag Introduce the new PF_KTHREAD flag to mark the kernel threads. It is set by INIT_TASK() and copied to the forked childs (we could set it in kthreadd() along with PF_NOFREEZE instead). daemonize() was changed as well. In that case testing of PF_KTHREAD is racy, but daemonize() is hopeless anyway. This flag is cleared in do_execve(), before search_binary_handler(). Probably not the best place, we can do this in exec_mmap() or in start_thread(), or clear it along with PF_FORKNOEXEC. But I think this doesn't matter in practice, and if do_execve() fails kthread should die soon. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:39 -07:00
Oleg Nesterov	e4901f92a8	coredump: zap_threads: comments && use while_each_thread() No changes in fs/exec.o The for_each_process() loop in zap_threads() is very subtle, it is not clear why we don't race with fork/exit/exec. Add the fat comment. Also, change the code to use while_each_thread(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:38 -07:00
Jan Kara	657d3bfa98	quota: implement sending information via netlink about user below quota Sometimes it may be useful for userspace to know (e.g. for some hosting guys) that some user stopped exceeding his hardlimit or softlimit in quotas. Implement sending of such events to userspace via quota netlink protocol so that they don't have to poll for such events. Based on idea and initial implementation by Vladislav Bogdanov. Cc: Vladislav Bogdanov <slava@nsys.by> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:35 -07:00
Jan Kara	74abb9890d	quota: move function-macros from quota.h to quotaops.h Move declarations of some macros, which should be in fact functions to quotaops.h. This way they can be later converted to inline functions because we can now use declarations from quota.h. Also add necessary includes of quotaops.h to a few files. [akpm@linux-foundation.org: fix JFS build] [akpm@linux-foundation.org: fix UFS build] [vegard.nossum@gmail.com: fix QUOTA=n build] Signed-off-by: Jan Kara <jack@suse.cz> Cc: Vegard Nossum <vegard.nossum@gmail.com> Cc: Arjen Pool <arjenpool@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:35 -07:00
Jan Kara	02a55ca871	quota: cleanup loop in sync_dquots() Make loop in sync_dquots() checking whether there's something to write more readable, remove useless variable and macro info_any_dirty() which is used only in this place. Signed-off-by: Jan Kara <jack@suse.cz> Cc: "Vegard Nossum" <vegard.nossum@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:35 -07:00
Jan Kara	b85f4b87a5	quota: rename quota functions from upper case, make bigger ones non-inline Cleanup quotaops.h: Rename functions from uppercase to lowercase (and define backward compatibility macros), move larger functions to dquot.c and make them non-inline. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:35 -07:00
Jan Kara	b48d380541	quota: fix possible infinite loop in quota code When quota structure is going to be dropped and it is dirty, quota code tries to write it. If the write fails for some reason (e. g. transaction cannot be started because the journal is aborted), we try writing again and again and again... Fix the problem by clearing the dirty bit even if the write failed. (akpm: for 2.6.27, 2.6.26.x and 2.6.25.x) Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: dingdinghua <dingdinghua85@gmail.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:34 -07:00
Joe Peterson	b271e067c8	fatfs: add UTC timestamp option Provide a new mount option ("tz=UTC") for DOS (vfat/msdos) filesystems, allowing timestamps to be in coordinated universal time (UTC) rather than local time in applications where doing this is advantageous. In particular, portable devices that use fat/vfat (such as digital cameras) can benefit from using UTC in their internal clocks, thus avoiding daylight saving time errors and general time ambiguity issues. The user of the device does not have to worry about changing the time when moving from place or when daylight saving changes. The new mount option, when set, disables the counter-adjustment that Linux currently makes to FAT timestamp info in anticipation of the normal userspace time zone correction. When used in this new mode, all daylight saving time and time zone handling is done in userspace as is normal for many other filesystems (like ext3). The default mode, which remains unchanged, is still appropriate when mounting volumes written in Windows (because of its use of local time). I originally based this patch on one submitted last year by Paul Collins, but I updated it to work with current source and changed variable/option naming. Ogawa Hirofumi (who maintains these filesystems) and I discussed this patch at length on lkml, and he suggested using the option name in the attached version of the patch. Barry Bouwsma pointed out a good addition to the patch as well. Signed-off-by: Joe Peterson <joe@skyrush.com> Signed-off-by: Paul Collins <paul@ondioline.org> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Barry Bouwsma <free_beer_for_all@yahoo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:34 -07:00
Adrian Bunk	e8938a62a8	remove unused #include <linux/dirent.h>'s Remove some unused #include <linux/dirent.h>'s. Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:34 -07:00
Rene Scharfe	7557bc66be	msdos fs: remove unsettable atari option It has been impossible to set the option 'atari' of the MSDOS filesystem for several years. Since nobody seems to have missed it, let's remove its remains. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:34 -07:00
OGAWA Hirofumi	dcd8c53f13	fat: small optimization to __fat_readdir() This removes unnecessary parsing for directory entries. If short_only, we don't need to parse longname. And if !both and it found the longname, we don't need shortname. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:34 -07:00
OGAWA Hirofumi	98a1516004	fat: use same logic in fat_search_long() and __fat_readdir() This uses uses stack for shortname, and uses __getname() for longname in fat_search_long() and __fat_readdir(). By this, it removes unneeded __getname() for shortname. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:34 -07:00
OGAWA Hirofumi	d688611674	fat: cleanup fs/fat/dir.c This is no logic changes, just cleans fs/fat/dir.c up. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:34 -07:00
Adrian Bunk	531f710f8e	fat/dir.c: switch to struct __fat_dirent struct __fat_dirent is what was formerly the kernel struct dirent (that was different from the userspace struct dirent). Converting all fat users to struct __fat_dirent will allow us to get rid of the conflicting struct dirent definition. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:34 -07:00
OGAWA Hirofumi	8d44d9741f	fat: fix parse_options() Current parse_options() exits too early. We need to run the code of bottom in this function even if users doesn't specify options. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:34 -07:00
Shen Feng	3264d4ded4	reiserfs: remove double definitions of xattr macros remove the definitions of macros: XATTR_SECURITY_PREFIX XATTR_TRUSTED_PREFIX XATTR_USER_PREFIX since they are defined in linux/xattr.h Signed-off-by: Shen Feng <shen@cn.fujitsu.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:33 -07:00
Jeff Mahoney	90415deac7	reiserfs: convert j_commit_lock to mutex j_commit_lock is a semaphore but uses it as if it were a mutex. This patch converts it to a mutex. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Jeff Mahoney <jeffm@suse.com> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Chris Mason <chris.mason@oracle.com> Cc: Edward Shishkin <edward.shishkin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:33 -07:00
Jeff Mahoney	afe7025907	reiserfs: convert j_flush_sem to mutex j_flush_sem is a semaphore but uses it as if it were a mutex. This patch converts it to a mutex. [akpm@linux-foundation.org: fix mutex_trylock retval treatment] Signed-off-by: Jeff Mahoney <jeffm@suse.com> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Chris Mason <chris.mason@oracle.com> Cc: Edward Shishkin <edward.shishkin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:33 -07:00
Jeff Mahoney	f68215c464	reiserfs: convert j_lock to mutex j_lock is a semaphore but uses it as if it were a mutex. This patch converts it to a mutex. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Chris Mason <chris.mason@oracle.com> Cc: Edward Shishkin <edward.shishkin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:33 -07:00
Jan Kara	00b441970a	reiserfs: correct mount option parsing to detect when quota options can be changed We should not allow user to change quota mount options when quota is just suspended. It would make mount options and internal quota state inconsistent. Also we should not allow user to change quota format when quota is turned on. On the other hand we can just silently ignore when some option is set to the value it already has (some mount versions do this on remount). Finally, we should not discard current quota options if parsing of mount options fails. Cc: <reiserfs-devel@vger.kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:33 -07:00
Jan Kara	4506567b24	reiserfs: fix typos in messages and comments (journalled -> journaled) Cc: <reiserfs-devel@vger.kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:33 -07:00
Jan Kara	5d4f7fddf8	reiserfs: fix synchronization of quota files in journal=data mode In journal=data mode, it is not enough to do write_inode_now() as done in vfs_quota_on() to write all data to their final location (which is needed for quota_read to work correctly). Calling journal_end_sync() before calling vfs_quota_on() does it's job because transactions are committed to the journal and data marked as dirty in memory so write_inode_now() writes them to their final locations. Cc: <reiserfs-devel@vger.kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:33 -07:00
Matthias Kaehlcke	895c23f8c3	hfsplus: convert the extents_lock in a mutex Apple Extended HFS file system: The semaphore extents lock is used as a mutex. Convert it to the mutex API. Signed-off-by: Matthias Kaehlcke <matthias@kaehlcke.net> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:33 -07:00
Matthias Kaehlcke	39f8d472f2	hfs: convert extents_lock in a mutex Apple Macintosh file system: The semaphore extens_lock is used as a mutex. Convert it to the mutex API Signed-off-by: Matthias Kaehlcke <matthias@kaehlcke.net> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:33 -07:00
Matthias Kaehlcke	3084b72de7	hfs: convert bitmap_lock in a mutex Apple Macintosh file system: The semaphore bitmap_lock is used as a mutex. Convert it to the mutex API Signed-off-by: Matthias Kaehlcke <matthias@kaehlcke.net> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:33 -07:00
Adrian Bunk	de0ca06a99	coda: remove CODA_FS_OLD_API While fixing CONFIG_ leakages to the userspace kernel headers I ran into CODA_FS_OLD_API. After five years, are there still people using the old API left? Especially considering that you have to choose at compile time which API to support in the kernel (and distributions tend to offer the new API for some time). Jan: "The old API can definitely go. Around the time the new interface went in there were some non-Coda userspace file system implementations that took a while longer to convert to the new API, but by now they all switched to the new interface or in some cases to a FUSE-based solution." Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:33 -07:00
Adam Greenblatt	c0a1633b62	isofs: fix minor filesystem corruption Some iso9660 images contain files with rockridge data that is either incorrect or incompletely parsed. Prior to commit `f2966632a1` ("[PATCH] rock: handle directory overflows") (included with kernel 2.6.13) the kernel ignored the rockridge data for these files, while still allowing the files to be accessed under their non-rockridge names. That commit inadvertently changed things so that files with invalid rockridge data could not be accessed at all. (I ran across the problem when comparing some old CDs with hard disk copies I had made long ago under kernel 2.4: a few of the files on the hard disk copies were no longer visible on the CDs.) This change reverts to the pre-2.6.13 behavior. Signed-off-by: Adam Greenblatt <adam.greenblatt@gmail.com> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: <stable@kernel.org> [2.6.25.x, 2.6.26.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:33 -07:00
Duane Griffin	275c0a8f12	ext3: validate directory entry data before use ext3_dx_find_entry uses ext3_next_entry without verifying that the entry is valid. If its rec_len == 0 this causes an infinite loop. Refactor the loop to check the validity of entries before checking whether they match and moving onto the next one. There are other uses of ext3_next_entry in this file which also look problematic. They should be reviewed and fixed if/when we have a test-case that triggers them. This patch fixes the first case (image hdb.25.softlockup.gz) reported in http://bugzilla.kernel.org/show_bug.cgi?id=10882. Signed-off-by: Duane Griffin <duaneg@dghda.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:33 -07:00
Hidehiro Kawai	cbe5f466f6	jbd: don't abort if flushing file data failed In ordered mode, the current jbd aborts the journal if a file data buffer has an error. But this behavior is unintended, and we found that it has been adopted accidentally. This patch undoes it and just calls printk() instead of aborting the journal. Additionally, set AS_EIO into the address_space object of the failed buffer which is submitted by journal_do_submit_data() so that fsync() can get -EIO. Missing error checkings are also added to inform errors on file data buffers to the user. The following buffers are targeted. (a) the buffer which has already been written out by pdflush (b) the buffer which has been unlocked before scanned in the t_locked_list loop [akpm@linux-foundation.org: improve grammar in a printk] Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Acked-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:32 -07:00
Li Zefan	8ef2720397	ext3: kill 2 useless magic numbers dx_root_limit() will never return 20, and I can't figure out what 20 stands for. This function has never changed since htree directory indexing was merged. Similar for dx_node_limit() and the magic 22. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Andreas Dilger <adilger@sun.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:32 -07:00
Toshiyuki Okajima	fc80c44277	jbd: positively dispose the unmapped data buffers in journal_commit_transaction() After ext3-ordered files are truncated, there is a possibility that the pages which cannot be estimated still remain. Remaining pages can be released when the system has really few memory. So, it is not memory leakage. But the resource management software etc. may not work correctly. It is possible that journal_unmap_buffer() cannot release the buffers, and the pages to which they belong because they are attached to a commiting transaction and journal_unmap_buffer() cannot release them. To release such the buffers and the pages later, journal_unmap_buffer() leaves it to journal_commit_transaction(). (journal_unmap_buffer() puts the mark 'BH_Freed' to the buffers so that journal_commit_transaction() can identify whether they can be released or not.) In the journalled mode and the writeback mode, jbd does with only metadata buffers. But in the ordered mode, jbd does with metadata buffers and also data buffers. Actually, journal_commit_transaction() releases only the metadata buffers of which release is demanded by journal_unmap_buffer(), and also releases the pages to which they belong if possible. As a result, the data buffers of which release is demanded by journal_unmap_buffer() remain after a transaction commits. And also the pages to which they belong remain. Such the remained pages don't have mapping any longer. Due to this fact, there is a possibility that the pages which cannot be estimated remain. The metadata buffers marked 'BH_Freed' and the pages to which they belong can be released at 'JBD: commit phase 7'. Therefore, by applying the same code into 'JBD: commit phase 2' (where the data buffers are done with), journal_commit_transaction() can also release the data buffers marked 'BH_Freed' and the pages to which they belong. As a result, all the buffers marked 'BH_Freed' can be released, and also all the pages to which these buffers belong can be released at journal_commit_transaction(). So, the page which cannot be estimated is lost. <<Excerpt of code at 'JBD: commit phase 7'>> > spin_lock(&journal->j_list_lock); > while (commit_transaction->t_forget) { > transaction_t cp_transaction; > struct buffer_head bh; > > jh = commit_transaction->t_forget; >... > if (buffer_freed(bh)) { > ^^^^^^^^^^^^^^^^^^^^^^^^ > clear_buffer_freed(bh); > ^^^^^^^^^^^^^^^^^^^^^^^^ > clear_buffer_jbddirty(bh); > } > > if (buffer_jbddirty(bh)) { > JBUFFER_TRACE(jh, "add to new checkpointing trans"); > __journal_insert_checkpoint(jh, commit_transaction); > JBUFFER_TRACE(jh, "refile for checkpoint writeback"); > __journal_refile_buffer(jh); > jbd_unlock_bh_state(bh); > } else { > J_ASSERT_BH(bh, !buffer_dirty(bh)); > ... > JBUFFER_TRACE(jh, "refile or unfile freed buffer"); > __journal_refile_buffer(jh); > if (!jh->b_transaction) { > jbd_unlock_bh_state(bh); > /* needs a brelse / > journal_remove_journal_head(bh); > release_buffer_page(bh); > ^^^^^^^^^^^^^^^^^^^^^^^^ > } else > } *************************************************************** * Apply the code of "^^^^^^" lines into 'JBD: commit phase 2' * **************************************************************** At journal_commit_transaction() code, there is one extra message in the series of jbd debug messages. ("JBD: commit phase 2") This patch fixes it, too. Signed-off-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Acked-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:32 -07:00
Adrian Bunk	a10320e8f7	jbd: unexport journal_update_superblock Remove the unused EXPORT_SYMBOL(journal_update_superblock). Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:32 -07:00
Duane Griffin	3ccc3167b0	ext3: handle deleting corrupted indirect blocks While freeing indirect blocks we attach a journal head to the parent buffer head, free the blocks, then journal the parent. If the indirect block list is corrupted and points to the parent the journal head will be detached when the block is cleared, causing an OOPS. Check for that explicitly and handle it gracefully. This patch fixes the third case (image hdb.20000057.nullderef.gz) reported in http://bugzilla.kernel.org/show_bug.cgi?id=10882. Immediately above the change, in the ext3_free_data function, we call ext3_clear_blocks to clear the indirect blocks in this parent block. If one of those blocks happens to actually be the parent block it will clear b_private / BH_JBD. I did the check at the end rather than earlier as it seemed more elegant. I don't think there should be much practical difference, although it is possible the FS may not be quite so badly corrupted if we did it the other way (and didn't clear the block at all). To be honest, I'm not convinced there aren't other similar failure modes lurking in this code, although I couldn't find any with a quick review. [akpm@linux-foundation.org: fix printk warning] Signed-off-by: Duane Griffin <duaneg@dghda.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:32 -07:00
Hidehiro Kawai	95450f5a7e	ext3: don't read inode block if the buffer has a write error A transient I/O error can corrupt inode data. Here is the scenario: (1) update inode_A at the block_B (2) pdflush writes out new inode_A to the filesystem, but it results in write I/O error, at this point, BH_Uptodate flag of the buffer for block_B is cleared and BH_Write_EIO is set (3) create new inode_C which located at block_B, and __ext3_get_inode_loc() tries to read on-disk block_B because the buffer is not uptodate (4) if it can read on-disk block_B successfully, inode_A is overwritten by old data This patch makes __ext3_get_inode_loc() not read the inode block if the buffer has BH_Write_EIO flag. In this case, the buffer should have the latest information, so setting the uptodate flag to the buffer (this avoids WARN_ON_ONCE() in mark_buffer_dirty().) According to this change, we would need to test BH_Write_EIO flag for the error checking. Currently nobody checks write I/O errors on metadata buffers, but it will be done in other patches I'm working on. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: sugita <yumiko.sugita.yf@hitachi.com> Cc: Satoshi OSHIMA <satoshi.oshima.fk@hitachi.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Jan Kara <jack@ucw.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:32 -07:00
Duane Griffin	ae76dd9a6b	ext3: handle corrupted orphan list at mount If the orphan node list includes valid, untruncatable nodes with nlink > 0 the ext3_orphan_cleanup loop which attempts to delete them will not do so, causing it to loop forever. Fix by checking for such nodes in the ext3_orphan_get function. This patch fixes the second case (image hdb.20000009.softlockup.gz) reported in http://bugzilla.kernel.org/show_bug.cgi?id=10882. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: printk warning fix] Signed-off-by: Duane Griffin <duaneg@dghda.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:32 -07:00
Shen Feng	ef1afd3951	ext3: remove double definitions of xattr macros remove the definitions of macros: XATTR_TRUSTED_PREFIX XATTR_USER_PREFIX since they are defined in linux/xattr.h Signed-off-by: Shen Feng <shen@cn.fujitsu.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:32 -07:00
Mingming Cao	3f31fddfa2	jbd: fix race between free buffer and commit transaction journal_try_to_free_buffers() could race with jbd commit transaction when the later is holding the buffer reference while waiting for the data buffer to flush to disk. If the caller of journal_try_to_free_buffers() request tries hard to release the buffers, it will treat the failure as error and return back to the caller. We have seen the directo IO failed due to this race. Some of the caller of releasepage() also expecting the buffer to be dropped when passed with GFP_KERNEL mask to the releasepage()->journal_try_to_free_buffers(). With this patch, if the caller is passing the __GFP_WAIT and __GFP_FS to indicating this call could wait, in case of try_to_free_buffers() failed, let's waiting for journal_commit_transaction() to finish commit the current committing transaction, then try to free those buffers again. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Mingming Cao <cmm@us.ibm.com> Reviewed-by: Badari Pulavarty <pbadari@us.ibm.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:32 -07:00
Shen Feng	9ebfbe9f92	ext3: improve some code in rb tree part of dir.c - remove unnecessary code in free_rb_tree_fname - rename free_rb_tree_fname to ext3_htree_create_dir_info since it and ext3_htree_free_dir_info are a pair - replace kmalloc with kzalloc in ext3_htree_free_dir_info Signed-off-by: Shen Feng <shen@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:32 -07:00
Duane Griffin	1984bb763c	jbd: tidy up revoke cache initialisation and destruction Make revocation cache destruction safe to call if initialisation fails partially or entirely. This allows it to be used to cleanup in the case of initialisation failure, simplifying that code slightly. Signed-off-by: Duane Griffin <duaneg@dghda.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:32 -07:00
Duane Griffin	f4d79ca2fa	jbd: eliminate duplicated code in revocation table init/destroy functions The revocation table initialisation/destruction code is repeated for each of the two revocation tables stored in the journal. Refactoring the duplicated code into functions is tidier, simplifies the logic in initialisation in particular, and slightly reduces the code size. There should not be any functional change. Signed-off-by: Duane Griffin <duaneg@dghda.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:32 -07:00
Duane Griffin	3850f7a521	jbd: replace potentially false assertion with if block If an error occurs during jbd cache initialisation it is possible for the journal_head_cache to be NULL when journal_destroy_journal_head_cache is called. Replace the J_ASSERT with an if block to handle the situation correctly. Note that even with this fix things will break badly if jbd is statically compiled in and cache initialisation fails. Signed-off-by: Duane Griffin <duaneg@dghda.com Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:32 -07:00
Jan Kara	d06bf1d252	ext3: correct mount option parsing to detect when quota options can be changed We should not allow user to change quota mount options when quota is just suspended. I would make mount options and internal quota state inconsistent. Also we should not allow user to change quota format when quota is turned on. On the other hand we can just silently ignore when some option is set to the value it already has (mount does this on remount). Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:32 -07:00
Jan Kara	99aeaf639f	ext3: fix typos in messages and comments (journalled -> journaled) Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:31 -07:00
Jan Kara	9cfe7b9010	ext3: fix synchronization of quota files in journal=data mode In journal=data mode, it is not enough to do write_inode_now as done in vfs_quota_on() to write all data to their final location (which is needed for quota_read to work correctly). Calling journal_flush() does its job. Reported-by: Nick <gentuu@gmail.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:31 -07:00
Shen Feng	f905f06fca	ext2: remove double definitions of xattr macros remove the definitions of macros: XATTR_TRUSTED_PREFIX XATTR_USER_PREFIX since they are defined in linux/xattr.h Signed-off-by: Shen Feng <shen@cn.fujitsu.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:31 -07:00
Adrian Bunk	fb523f3227	minix: remove !NO_TRUNCATE code This patch removes the !NO_TRUNCATE code that anyway required a manual editing of the code for being used. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:30 -07:00
Hugh Dickins	ba92a43dba	exec: remove some includes fs/exec.c used to need mman.h pagemap.h swap.h and rmap.h when it did mm-ish stuff in install_arg_page(); but no need for them after 2.6.22. [akpm@linux-foundation.org: unbreak arm] Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:28 -07:00
Harvey Harrison	b7bbf8fa6b	fs: ldm.[ch] use get_unaligned_* helpers Replace the private BE16/BE32/BE64 macros with direct calls to get_unaligned_be16/32/64. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:26 -07:00
David Woodhouse	ff877ea80e	Merge branch 'linux-next' of git://git.infradead.org/~dedekind/ubi-2.6	2008-07-25 10:40:14 -04:00
Nathan Lynch	483fad1c3f	ELF loader support for auxvec base platform string Some IBM POWER-based platforms have the ability to run in a mode which mostly appears to the OS as a different processor from the actual hardware. For example, a Power6 system may appear to be a Power5+, which makes the AT_PLATFORM value "power5+". This means that programs are restricted to the ISA supported by Power5+; Power6-specific instructions are treated as illegal. However, some applications (virtual machines, optimized libraries) can benefit from knowledge of the underlying CPU model. A new aux vector entry, AT_BASE_PLATFORM, will denote the actual hardware. For example, on a Power6 system in Power5+ compatibility mode, AT_PLATFORM will be "power5+" and AT_BASE_PLATFORM will be "power6". The idea is that AT_PLATFORM indicates the instruction set supported, while AT_BASE_PLATFORM indicates the underlying microarchitecture. If the architecture has defined ELF_BASE_PLATFORM, copy that value to the user stack in the same manner as ELF_PLATFORM. Signed-off-by: Nathan Lynch <ntl@pobox.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2008-07-25 15:44:39 +10:00
Adrian Bunk	fb2e405fc1	fix fs/nfs/nfsroot.c compilation This fixes the following compile error caused by commit `f9247273cb` ("UFS: add const to parser token table"): CC fs/nfs/nfsroot.o /home/bunk/linux/kernel-2.6/git/linux-2.6/fs/nfs/nfsroot.c:130: error: tokens causes a section type conflict make[3]: *** [fs/nfs/nfsroot.o] Error 1 Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 17:32:41 -07:00
Chris Wright	e2d2867ff8	When verifying the decoded header before decoding the object identifier (expecting a SPNEGO pseudo-mechanism oid), the test to verify it is a primitive encoding is compared against the asn1 class. Primitive is not a class. This brings check in line with similar check for krb/ntlmssp oid. Signed-off-by: Chris Wright <chrisw@sous-sol.org> Cc: Steven French <sfrench@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Steve French <sfrench@us.ibm.com>	2008-07-24 20:43:34 +00:00
Steven Whitehouse	f9247273cb	UFS: add const to parser token table This patch adds a "const" to the parser token table. I've done an allmodconfig build to see if this produces any warnings/failures and the patch includes a fix for the only warning that was produced. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Acked-by: Alexander Viro <aviro@redhat.com> Acked-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 11:50:15 -07:00
Ian Kent	aa55ddf340	autofs4: remove unused ioctls The ioctls AUTOFS_IOC_TOGGLEREGHOST and AUTOFS_IOC_ASKREGHOST were added several years ago but what they were intended for has never been implemented (as far as I'm aware noone uses them) so remove them. Signed-off-by: Ian Kent <raven@themaw.net> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:33 -07:00
Ian Kent	06a3598552	autofs4: reorganize expire pending wait function calls This patch re-orgnirzes the checking for and waiting on active expires and elininates redundant checks. Signed-off-by: Ian Kent <raven@themaw.net> Cc: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:32 -07:00
Ian Kent	ec6e8c7d3f	autofs4: fix direct mount pending expire race - correction Appologies, somehow I seem to have sent an out dated version of this patch. Here is an additional patch that brings the patch up to date. Signed-off-by: Ian Kent <raven@themaw.net> Cc: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:32 -07:00
Ian Kent	6e60a9ab5f	autofs4: fix direct mount pending expire race For direct and offset type mounts that are covered by another mount we cannot check the AUTOFS_INF_EXPIRING flag during a path walk which leads to lookups walking into an expiring mount while it is being expired. For example, for the direct multi-mount map entry with a couple of offsets: /race/mm1 / <server1>:/<path1> /om1 <server2>:/<path2> /om2 <server1>:/<path3> an autofs trigger mount is mounted on /race/mm1 and when accessed it is over mounted and trigger mounts made for /race/mm1/om1 and /race/mm1/om2. So it isn't possible for path walks to see the expiring flag at all and they happily walk into the file system while it is expiring. When expiring these mounts follow_down() must stop at the autofs mount and all processes must block in the ->follow_link() method (except the daemon) until the expire is complete. This is done by decrementing the d_mounted field of the autofs trigger mount root dentry until the expire is completed. In ->follow_link() all processes wait on the expire and the mount following is completed for the daemon until the expire is complete. Signed-off-by: Ian Kent <raven@themaw.net> Cc: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:32 -07:00
Ian Kent	97e7449a7a	autofs4: fix indirect mount pending expire race The selection of a dentry for expiration and the setting of the AUTOFS_INF_EXPIRING flag isn't done atomically which can lead to lookups walking into an expiring mount. What happens is that an expire is initiated by the daemon and a dentry is selected for expire but, since there is no lock held between the selection and setting of the expiring flag, a process may find the flag clear and continue walking into the mount tree at the same time the daemon attempts the expire it. Signed-off-by: Ian Kent <raven@themaw.net> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:32 -07:00
Ian Kent	26e81b3142	autofs4: fix pending checks There are two cases for which a dentry that has a pending mount request does not wait for completion. One is via autofs4_revalidate() and the other via autofs4_follow_link(). In revalidate, after the mount point directory is created, but before the mount is done, the check in try_to_fill_dentry() can can fail to send the dentry to the wait queue since the dentry is positive and the lookup flags may contain only LOOKUP_FOLLOW. Although we don't trigger a mount for the LOOKUP_FOLLOW flag, if ther's one pending we might as well wait and use the mounted dentry for the lookup. In autofs4_follow_link() the dentry is not checked to see if it is pending so it may fail to call try_to_fill_dentry() and not wait for mount completion. A dentry that is pending must always be sent to the wait queue. Signed-off-by: Ian Kent <raven@themaw.net> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:32 -07:00
Ian Kent	ff9cd499d6	autofs4: cleanup redundant readir code The mount triggering functionality of readdir and related functions is no longer used (and is quite broken as well). The unused portions have been removed. Signed-off-by: Ian Kent <raven@themaw.net> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:32 -07:00
Ian Kent	c72305b547	autofs4: indirect dentry must almost always be positive We have been seeing mount requests comming to the automount daemon for keys of the form "<map key>/<non key directory>" which are lookups for invalid map keys. But we can check for this in the kernel module and return a fail immediately, without having to send a request to the daemon. It is possible to recognise these requests are invalid based on whether the request dentry is negative and its relation to the autofs file system root. For example, given the indirect multi-mount map entry: idm1 \ /mm1 <server>:/<path1> /mm2 <server>:/<path2> For a request to mount idm1, IS_ROOT((idm1)->d_parent) will be always be true and the dentry may be negative. But directories idm1/mm1 and idm1/mm2 will always be created as part of the mount request for idm1. So any mount request within idm1 itself must have a positive dentry otherwise the map key is invalid. In version 4 these multi-mount entries are all mounted and umounted as a single request and in version 5 the directories idm1/mm1 and idm1/mm2 are created and an autofs fs mounted on them to act as a mount trigger so the above is also true. This also holds true for the autofs version 4 pseudo direct mount feature. When this feature is used without the "--ghost" option automount(8) will create internal submounts as we go down the map key paths which are essentially normal indirect mounts for which the above holds. If the "--ghost" option is given the directories for map keys are created at daemon startup so valid map entries correspond to postive dentries in the autofs fs. autofs version 5 direct mount maps are similar except that the IS_ROOT check is not needed. This has been addressed in a previous patch tittled "autofs4 - detect invalid direct mount requests". For example, given the direct multi-mount map entry: /test/dm1 \ /mm1 <server>:/<path1> /mm2 <server>:/<path2> An autofs fs is mounted on /test/dm1 as a trigger mount and when a mount is triggered for /test/dm1, the multi-mount offset directories /test/dm1/mm1 and /test/dm1/mm2 are created and an autofs fs is mounted on them to act as mount triggers. So valid direct mount requests must always have a positive dentry if they correspond to a valid map entry. Signed-off-by: Ian Kent <raven@themaw.net> Acked-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:32 -07:00
Ian Kent	eb3b176796	autofs4: detect invalid direct mount requests autofs v5 direct and offset mounts within an autofs filesystem are triggered by existing autofs triger mounts so the mount point dentry must be positive. If the mount point dentry is negative then the trigger doesn't exist so we can return fail immediately. Signed-off-by: Ian Kent <raven@themaw.net> Cc: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:32 -07:00
Ian Kent	296f7bf78b	autofs4: fix waitq memory leak If an autofs mount becomes catatonic before autofs4_wait_release() is called the wait queue counter will not be decremented down to zero and the entry will never be freed. There are also races decrementing the wait counter in the wait release function. To deal with this the counter needs to be updated while holding the wait queue mutex and waiters need to be woken up unconditionally when the wait is removed from the queue to ensure we eventually free the wait. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:32 -07:00
Ian Kent	e64be33cca	autofs4: check kernel communication pipe is valid for write It is possible for an autofs mount to become catatonic (and for the daemon communication pipe to become NULL) after a wait has been initiallized but before the request has been sent to the daemon. We need to check for this before sending the request packet. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:32 -07:00
Ian Kent	f4c7da0261	autofs4: add missing kfree It see that the patch tittled "autofs4 - fix pending mount race" is missing a change that I had recently made. It's missing a kfree for the case mutex_lock_interruptible() fails to aquire the wait queue mutex. Signed-off-by: Ian Kent <raven@themaw.net> Cc: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:32 -07:00
Ian Kent	a1362fe92f	autofs4: fix pending mount race Close a race between a pending mount that is about to finish and a new lookup for the same directory. Process P1 triggers a mount of directory foo. It sets DCACHE_AUTOFS_PENDING in the ->lookup routine, creates a waitq entry for 'foo', and calls out to the daemon to perform the mount. The autofs daemon will then create the directory 'foo', using a new dentry that will be hashed in the dcache. Before the mount completes, another process, P2, tries to walk into the 'foo' directory. The vfs path walking code finds an entry for 'foo' and calls the revalidate method. Revalidate finds that the entry is not PENDING (because PENDING was never set on the dentry created by the mkdir), but it does find the directory is empty. Revalidate calls try_to_fill_dentry, which sets the PENDING flag and then calls into the autofs4 wait code to trigger or wait for a mount of 'foo'. The wait code finds the entry for 'foo' and goes to sleep waiting for the completion of the mount. Yet another process, P3, tries to walk into the 'foo' directory. This process again finds a dentry in the dcache for 'foo', and calls into the autofs revalidate code. The revalidate code finds that the PENDING flag is set, and so calls try_to_fill_dentry. a) try_to_fill_dentry sets the PENDING flag redundantly for this dentry, then calls into the autofs4 wait code. b) the autofs4 wait code takes the waitq mutex and searches for an entry for 'foo' Between a and b, P1 is woken up because the mount completed. P1 takes the wait queue mutex, clears the PENDING flag from the dentry, and removes the waitqueue entry for 'foo' from the list. When it releases the waitq mutex, P3 (eventually) acquires it. At this time, it looks for an existing waitq for 'foo', finds none, and so creates a new one and calls out to the daemon to mount the 'foo' directory. Now, the reason that three processes are required to trigger this race is that, because the PENDING flag is not set on the dentry created by mkdir, the window for the race would be way to slim for it to ever occur. Basically, between the testing of d_mountpoint(dentry) and the taking of the waitq mutex, the mount would have to complete and the daemon would have to be woken up, and that in turn would have to wake up P1. This is simply impossible. Add the third process, though, and it becomes slightly more likely. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:32 -07:00
Ian Kent	5a11d4d0ee	autofs4: fix waitq locking The autofs4_catatonic_mode() function accesses the wait queue without any locking but can be called at any time. This could lead to a possible double free of the name field of the wait and a double fput of the daemon communication pipe or an fput of a NULL file pointer. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:32 -07:00
Jeff Moyer	70b52a0a50	autofs4: use struct qstr in waitq.c The autofs_wait_queue already contains all of the fields of the struct qstr, so change it into a qstr. This patch, from Jeff Moyer, has been modified a liitle by myself. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:32 -07:00
Ian Kent	6d5cb926fa	autofs4: use lookup intent flags to trigger mounts When an open(2) call is made on an autofs mount point directory that already exists and the O_DIRECTORY flag is not used the needed mount callback to the daemon is not done. This leads to the path walk continuing resulting in a callback to the daemon with an incorrect key. open(2) is called without O_DIRECTORY by the "find" utility but this should be handled properly anyway. This happens because autofs needs to use the lookup flags to decide when to callback to the daemon to perform a mount to prevent mount storms. For example, an autofs indirect mount map that has the "browse" option will have the mount point directories are pre-created and the stat(2) call made by a color ls against each directory will cause all these directories to be mounted. It is unfortunate we need to resort to this but mount maps can be quite large. Additionally, if a user manually umounts an autofs indirect mount the directory isn't removed which also leads to this situation. To resolve this autofs needs to use the lookup intent flags to enable it to make this decision. This patch adds this check and triggers a call back if any of the lookup intent flags are set as all these calls warrant a mount attempt be requested. I know that external VFS code which uses the lookup flags is something that the VFS would like to eliminate but I have no choice as I can't see any other way to do this. A VFS dentry or inode operation callback which returns the lookup "type" (requires a definition) would be sufficient. But this change is needed now and I'm not aware of the form that coming VFS changes will take so I'm not willing to propose anything along these lines. If anyone can provide an alternate method I would be happy to use it. [akpm@linux-foundation.org: fix build for concurrent VFS changes] Signed-off-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:31 -07:00
Ian Kent	c432c2586a	autofs4: don't release directory mutex if called in oz_mode Since we now delay hashing of dentrys until the ->mkdir() call, droping and re-taking the directory mutex within the ->lookup() function when we are being called by user space is not needed. This can lead to a race when other processes are attempting to access the same directory during mount point directory creation. In this case we need to hang onto the mutex to ensure we don't get user processes trying to create a mount request for a newly created dentry after the mount point entry has already been created. This ensures that when we need to check a dentry passed to autofs4_wait(), if it is hashed, it is always the mount point dentry and not a new dentry created by another lookup during directory creation. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:31 -07:00
Ian Kent	ef581a7428	autofs4: fix symlink name allocation The length of the symlink name has been moved but it needs to be set before allocating space for it in the dentry info struct. This corrects a mistake in a recent patch. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:31 -07:00
Ian Kent	2576737873	autofs4: use look aside list for lookups A while ago a patch to resolve a deadlock during directory creation was merged. This delayed the hashing of lookup dentrys until the ->mkdir() (or ->symlink()) operation completed to ensure we always went through ->lookup() instead of also having processes go through ->revalidate() so our VFS locking remained consistent. Now we are seeing a couple of side affects of that change in situations with heavy mount activity. Two cases have been identified: 1) When a mount request is triggered, due to the delayed hashing, the directory created by user space for the mount point doesn't have the DCACHE_AUTOFS_PENDING flag set. In the case of an autofs multi-mount where a tree of mount point directories are created this can lead to the path walk continuing rather than the dentry being sent to the wait queue to wait for request completion. This is because, if the pending flag isn't set, the criteria for deciding this is a mount in progress fails to hold, namely that the dentry is not a mount point and has no subdirectories. 2) A mount request dentry is initially created negative and unhashed. It remains this way until the ->mkdir() callback completes. Since it is unhashed a fresh dentry is used when the user space mount request creates the mount point directory. This leaves the original dentry negative and unhashed. But revalidate has no way to tell the VFS that the dentry has changed, other than to force another ->lookup() by returning false, which is at best wastefull and at worst not possible. This results in an -ENOENT return from the original path walk when in fact the mount succeeded. To resolve this we need to ensure that the same dentry is used in all calls to ->lookup() during the course of a mount request. This patch achieves that by adding the initial dentry to a look aside list and removes it at ->mkdir() or ->symlink() completion (or when the dentry is released), since these are the only create operations autofs4 supports. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:31 -07:00
Ian Kent	caf7da3d5d	autofs4: revert - redo lookup in ttfd This patch series enables the use of a single dentry for lookups prior to the dentry being hashed and so we no longer need to redo the lookup. This patch reverts the patch of commit `033790449b`. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:31 -07:00
Ian Kent	5f6f4f28b6	autofs4: don't make expiring dentry negative Correct the error of making a positive dentry negative after it has been instantiated. The code that makes this error attempts to re-use the dentry from a concurrent expire and mount to resolve a race and the dentry used for the lookup must be negative for mounts to trigger in the required cases. The fact is that the dentry doesn't need to be re-used because all that is needed is to preserve the flag that indicates an expire is still incomplete at the time of the mount request. This change uses the the dentry to check the flag and wait for the expire to complete then discards it instead of attempting to re-use it. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:31 -07:00
Michael Halcrow	391b52f98c	eCryptfs: Make all persistent file opens delayed There is no good reason to immediately open the lower file, and that can cause problems with files that the user does not intend to immediately open, such as device nodes. This patch removes the persistent file open from the interpose step and pushes that to the locations where eCryptfs really does need the lower persistent file, such as just before reading or writing the metadata stored in the lower file header. Two functions are jumping to out_dput when they should just be jumping to out on error paths. This patch also fixes these. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:31 -07:00
Michael Halcrow	72b55fffd6	eCryptfs: do not try to open device files on mknod When creating device nodes, eCryptfs needs to delay actually opening the lower persistent file until an application tries to open. Device handles may not be backed by anything when they first come into existence. [Valdis.Kletnieks@vt.edu: build fix] Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Cc: <Valdis.Kletnieks@vt.edu} Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:31 -07:00
Harvey Harrison	0a688ad713	ecryptfs: inode.c mmap.c use unaligned byteorder helpers Fixe sparse warnings: fs/ecryptfs/inode.c:368:15: warning: cast to restricted __be64 fs/ecryptfs/mmap.c:385:12: warning: incorrect type in assignment (different base types) fs/ecryptfs/mmap.c:385:12: expected unsigned long long [unsigned] [assigned] [usertype] file_size fs/ecryptfs/mmap.c:385:12: got restricted __be64 [usertype] <noident> fs/ecryptfs/mmap.c:428:12: warning: incorrect type in assignment (different base types) fs/ecryptfs/mmap.c:428:12: expected unsigned long long [unsigned] [assigned] [usertype] file_size fs/ecryptfs/mmap.c:428:12: got restricted __be64 [usertype] <noident> Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:31 -07:00
Harvey Harrison	29335c6a41	ecryptfs: crypto.c use unaligned byteorder helpers Fixes the following sparse warnings: fs/ecryptfs/crypto.c:1036:8: warning: cast to restricted __be32 fs/ecryptfs/crypto.c:1038:8: warning: cast to restricted __be32 fs/ecryptfs/crypto.c:1077:10: warning: cast to restricted __be32 fs/ecryptfs/crypto.c:1103:6: warning: incorrect type in assignment (different base types) fs/ecryptfs/crypto.c:1105:6: warning: incorrect type in assignment (different base types) fs/ecryptfs/crypto.c:1124:8: warning: incorrect type in assignment (different base types) fs/ecryptfs/crypto.c:1241:21: warning: incorrect type in assignment (different base types) fs/ecryptfs/crypto.c:1244:30: warning: incorrect type in assignment (different base types) fs/ecryptfs/crypto.c:1414:23: warning: cast to restricted __be32 fs/ecryptfs/crypto.c:1417:32: warning: cast to restricted __be16 Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:31 -07:00
Miklos Szeredi	8f2368095e	ecryptfs: string copy cleanup Clean up overcomplicated string copy, which also gets rid of this bogus warning: fs/ecryptfs/main.c: In function 'ecryptfs_parse_options': include/asm/arch/string_32.h:75: warning: array subscript is above array bounds Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:31 -07:00
Eric Sandeen	982363c97f	ecryptfs: propagate key errors up at mount time Mounting with invalid key signatures should probably fail, if they were specifically requested but not available. Also fix case checks in process_request_key_err() for the right sign of the errnos, as spotted by Jan Tluka. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Jan Tluka <jtluka@redhat.com> Acked-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:31 -07:00
Tyler Hicks	6c4c17b073	ecryptfs: discard ecryptfsd registration messages in miscdev The userspace eCryptfs daemon sends HELO and QUIT messages to the kernel for per-user daemon (un)registration. These messages are required when netlink is used as the transport, but (un)registration is handled by opening and closing the device file when miscdev is the transport. These messages should be discarded in the miscdev transport so that a daemon isn't registered twice. Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:31 -07:00
Michael Halcrow	746f1e558b	eCryptfs: Privileged kthread for lower file opens eCryptfs would really like to have read-write access to all files in the lower filesystem. Right now, the persistent lower file may be opened read-only if the attempt to open it read-write fails. One way to keep from having to do that is to have a privileged kthread that can open the lower persistent file on behalf of the user opening the eCryptfs file; this patch implements this functionality. This patch will properly allow a less-privileged user to open the eCryptfs file, followed by a more-privileged user opening the eCryptfs file, with the first user only being able to read and the second user being able to both read and write. eCryptfs currently does this wrong; it will wind up calling vfs_write() on a file that was opened read-only. This is fixed in this patch. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:30 -07:00
Ulrich Drepper	9fe5ad9c8c	flag parameters add-on: remove epoll_create size param Remove the size parameter from the new epoll_create syscall and renames the syscall itself. The updated test program follows. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <fcntl.h> #include <stdio.h> #include <time.h> #include <unistd.h> #include <sys/syscall.h> #ifndef __NR_epoll_create2 # ifdef __x86_64__ # define __NR_epoll_create2 291 # elif defined __i386__ # define __NR_epoll_create2 329 # else # error "need __NR_epoll_create2" # endif #endif #define EPOLL_CLOEXEC O_CLOEXEC int main (void) { int fd = syscall (__NR_epoll_create2, 0); if (fd == -1) { puts ("epoll_create2(0) failed"); return 1; } int coe = fcntl (fd, F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if (coe & FD_CLOEXEC) { puts ("epoll_create2(0) set close-on-exec flag"); return 1; } close (fd); fd = syscall (__NR_epoll_create2, EPOLL_CLOEXEC); if (fd == -1) { puts ("epoll_create2(EPOLL_CLOEXEC) failed"); return 1; } coe = fcntl (fd, F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if ((coe & FD_CLOEXEC) == 0) { puts ("epoll_create2(EPOLL_CLOEXEC) set close-on-exec flag"); return 1; } close (fd); puts ("OK"); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:29 -07:00
Ulrich Drepper	e38b36f325	flag parameters: check magic constants This patch adds test that ensure the boundary conditions for the various constants introduced in the previous patches is met. No code is generated. [akpm@linux-foundation.org: fix alpha] Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:29 -07:00
Ulrich Drepper	510df2dd48	flag parameters: NONBLOCK in inotify_init This patch adds non-blocking support for inotify_init1. The additional changes needed are minimal. The following test must be adjusted for architectures other than x86 and x86-64 and in case the syscall numbers changed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <sys/syscall.h> #ifndef __NR_inotify_init1 # ifdef __x86_64__ # define __NR_inotify_init1 294 # elif defined __i386__ # define __NR_inotify_init1 332 # else # error "need __NR_inotify_init1" # endif #endif #define IN_NONBLOCK O_NONBLOCK int main (void) { int fd = syscall (__NR_inotify_init1, 0); if (fd == -1) { puts ("inotify_init1(0) failed"); return 1; } int fl = fcntl (fd, F_GETFL); if (fl == -1) { puts ("fcntl failed"); return 1; } if (fl & O_NONBLOCK) { puts ("inotify_init1(0) set non-blocking mode"); return 1; } close (fd); fd = syscall (__NR_inotify_init1, IN_NONBLOCK); if (fd == -1) { puts ("inotify_init1(IN_NONBLOCK) failed"); return 1; } fl = fcntl (fd, F_GETFL); if (fl == -1) { puts ("fcntl failed"); return 1; } if ((fl & O_NONBLOCK) == 0) { puts ("inotify_init1(IN_NONBLOCK) set non-blocking mode"); return 1; } close (fd); puts ("OK"); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:29 -07:00
Ulrich Drepper	be61a86d72	flag parameters: NONBLOCK in pipe This patch adds O_NONBLOCK support to pipe2. It is minimally more involved than the patches for eventfd et.al but still trivial. The interfaces of the create_write_pipe and create_read_pipe helper functions were changed and the one other caller as well. The following test must be adjusted for architectures other than x86 and x86-64 and in case the syscall numbers changed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <sys/syscall.h> #ifndef __NR_pipe2 # ifdef __x86_64__ # define __NR_pipe2 293 # elif defined __i386__ # define __NR_pipe2 331 # else # error "need __NR_pipe2" # endif #endif int main (void) { int fds[2]; if (syscall (__NR_pipe2, fds, 0) == -1) { puts ("pipe2(0) failed"); return 1; } for (int i = 0; i < 2; ++i) { int fl = fcntl (fds[i], F_GETFL); if (fl == -1) { puts ("fcntl failed"); return 1; } if (fl & O_NONBLOCK) { printf ("pipe2(0) set non-blocking mode for fds[%d]\n", i); return 1; } close (fds[i]); } if (syscall (__NR_pipe2, fds, O_NONBLOCK) == -1) { puts ("pipe2(O_NONBLOCK) failed"); return 1; } for (int i = 0; i < 2; ++i) { int fl = fcntl (fds[i], F_GETFL); if (fl == -1) { puts ("fcntl failed"); return 1; } if ((fl & O_NONBLOCK) == 0) { printf ("pipe2(O_NONBLOCK) does not set non-blocking mode for fds[%d]\n", i); return 1; } close (fds[i]); } puts ("OK"); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:29 -07:00
Ulrich Drepper	6b1ef0e60d	flag parameters: NONBLOCK in timerfd_create This patch adds support for the TFD_NONBLOCK flag to timerfd_create. The additional changes needed are minimal. The following test must be adjusted for architectures other than x86 and x86-64 and in case the syscall numbers changed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <fcntl.h> #include <stdio.h> #include <time.h> #include <unistd.h> #include <sys/syscall.h> #ifndef __NR_timerfd_create # ifdef __x86_64__ # define __NR_timerfd_create 283 # elif defined __i386__ # define __NR_timerfd_create 322 # else # error "need __NR_timerfd_create" # endif #endif #define TFD_NONBLOCK O_NONBLOCK int main (void) { int fd = syscall (__NR_timerfd_create, CLOCK_REALTIME, 0); if (fd == -1) { puts ("timerfd_create(0) failed"); return 1; } int fl = fcntl (fd, F_GETFL); if (fl == -1) { puts ("fcntl failed"); return 1; } if (fl & O_NONBLOCK) { puts ("timerfd_create(0) set non-blocking mode"); return 1; } close (fd); fd = syscall (__NR_timerfd_create, CLOCK_REALTIME, TFD_NONBLOCK); if (fd == -1) { puts ("timerfd_create(TFD_NONBLOCK) failed"); return 1; } fl = fcntl (fd, F_GETFL); if (fl == -1) { puts ("fcntl failed"); return 1; } if ((fl & O_NONBLOCK) == 0) { puts ("timerfd_create(TFD_NONBLOCK) set non-blocking mode"); return 1; } close (fd); puts ("OK"); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:29 -07:00
Ulrich Drepper	e7d476dfdf	flag parameters: NONBLOCK in eventfd This patch adds support for the EFD_NONBLOCK flag to eventfd2. The additional changes needed are minimal. The following test must be adjusted for architectures other than x86 and x86-64 and in case the syscall numbers changed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <sys/syscall.h> #ifndef __NR_eventfd2 # ifdef __x86_64__ # define __NR_eventfd2 290 # elif defined __i386__ # define __NR_eventfd2 328 # else # error "need __NR_eventfd2" # endif #endif #define EFD_NONBLOCK O_NONBLOCK int main (void) { int fd = syscall (__NR_eventfd2, 1, 0); if (fd == -1) { puts ("eventfd2(0) failed"); return 1; } int fl = fcntl (fd, F_GETFL); if (fl == -1) { puts ("fcntl failed"); return 1; } if (fl & O_NONBLOCK) { puts ("eventfd2(0) sets non-blocking mode"); return 1; } close (fd); fd = syscall (__NR_eventfd2, 1, EFD_NONBLOCK); if (fd == -1) { puts ("eventfd2(EFD_NONBLOCK) failed"); return 1; } fl = fcntl (fd, F_GETFL); if (fl == -1) { puts ("fcntl failed"); return 1; } if ((fl & O_NONBLOCK) == 0) { puts ("eventfd2(EFD_NONBLOCK) does not set non-blocking mode"); return 1; } close (fd); puts ("OK"); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:29 -07:00
Ulrich Drepper	5fb5e04926	flag parameters: NONBLOCK in signalfd This patch adds support for the SFD_NONBLOCK flag to signalfd4. The additional changes needed are minimal. The following test must be adjusted for architectures other than x86 and x86-64 and in case the syscall numbers changed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <fcntl.h> #include <signal.h> #include <stdio.h> #include <unistd.h> #include <sys/syscall.h> #ifndef __NR_signalfd4 # ifdef __x86_64__ # define __NR_signalfd4 289 # elif defined __i386__ # define __NR_signalfd4 327 # else # error "need __NR_signalfd4" # endif #endif #define SFD_NONBLOCK O_NONBLOCK int main (void) { sigset_t ss; sigemptyset (&ss); sigaddset (&ss, SIGUSR1); int fd = syscall (__NR_signalfd4, -1, &ss, 8, 0); if (fd == -1) { puts ("signalfd4(0) failed"); return 1; } int fl = fcntl (fd, F_GETFL); if (fl == -1) { puts ("fcntl failed"); return 1; } if (fl & O_NONBLOCK) { puts ("signalfd4(0) set non-blocking mode"); return 1; } close (fd); fd = syscall (__NR_signalfd4, -1, &ss, 8, SFD_NONBLOCK); if (fd == -1) { puts ("signalfd4(SFD_NONBLOCK) failed"); return 1; } fl = fcntl (fd, F_GETFL); if (fl == -1) { puts ("fcntl failed"); return 1; } if ((fl & O_NONBLOCK) == 0) { puts ("signalfd4(SFD_NONBLOCK) does not set non-blocking mode"); return 1; } close (fd); puts ("OK"); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:29 -07:00
Ulrich Drepper	99829b8329	flag parameters: NONBLOCK in anon_inode_getfd Building on the previous change to anon_inode_getfd, this patch introduces support for handling of O_NONBLOCK in addition to the already supported O_CLOEXEC. Following patches will take advantage of this support. As can be seen, the additional support for supporting this functionality is minimal. Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:28 -07:00
Ulrich Drepper	4006553b06	flag parameters: inotify_init This patch introduces the new syscall inotify_init1 (note: the 1 stands for the one parameter the syscall takes, as opposed to no parameter before). The values accepted for this parameter are function-specific and defined in the inotify.h header. Here the values must match the O_* flags, though. In this patch CLOEXEC support is introduced. The following test must be adjusted for architectures other than x86 and x86-64 and in case the syscall numbers changed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <sys/syscall.h> #ifndef __NR_inotify_init1 # ifdef __x86_64__ # define __NR_inotify_init1 294 # elif defined __i386__ # define __NR_inotify_init1 332 # else # error "need __NR_inotify_init1" # endif #endif #define IN_CLOEXEC O_CLOEXEC int main (void) { int fd; fd = syscall (__NR_inotify_init1, 0); if (fd == -1) { puts ("inotify_init1(0) failed"); return 1; } int coe = fcntl (fd, F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if (coe & FD_CLOEXEC) { puts ("inotify_init1(0) set close-on-exit"); return 1; } close (fd); fd = syscall (__NR_inotify_init1, IN_CLOEXEC); if (fd == -1) { puts ("inotify_init1(IN_CLOEXEC) failed"); return 1; } coe = fcntl (fd, F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if ((coe & FD_CLOEXEC) == 0) { puts ("inotify_init1(O_CLOEXEC) does not set close-on-exit"); return 1; } close (fd); puts ("OK"); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [akpm@linux-foundation.org: add sys_ni stub] Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:28 -07:00
Ulrich Drepper	ed8cae8ba0	flag parameters: pipe This patch introduces the new syscall pipe2 which is like pipe but it also takes an additional parameter which takes a flag value. This patch implements the handling of O_CLOEXEC for the flag. I did not add support for the new syscall for the architectures which have a special sys_pipe implementation. I think the maintainers of those archs have the chance to go with the unified implementation but that's up to them. The implementation introduces do_pipe_flags. I did that instead of changing all callers of do_pipe because some of the callers are written in assembler. I would probably screw up changing the assembly code. To avoid breaking code do_pipe is now a small wrapper around do_pipe_flags. Once all callers are changed over to do_pipe_flags the old do_pipe function can be removed. The following test must be adjusted for architectures other than x86 and x86-64 and in case the syscall numbers changed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <sys/syscall.h> #ifndef __NR_pipe2 # ifdef __x86_64__ # define __NR_pipe2 293 # elif defined __i386__ # define __NR_pipe2 331 # else # error "need __NR_pipe2" # endif #endif int main (void) { int fd[2]; if (syscall (__NR_pipe2, fd, 0) != 0) { puts ("pipe2(0) failed"); return 1; } for (int i = 0; i < 2; ++i) { int coe = fcntl (fd[i], F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if (coe & FD_CLOEXEC) { printf ("pipe2(0) set close-on-exit for fd[%d]\n", i); return 1; } } close (fd[0]); close (fd[1]); if (syscall (__NR_pipe2, fd, O_CLOEXEC) != 0) { puts ("pipe2(O_CLOEXEC) failed"); return 1; } for (int i = 0; i < 2; ++i) { int coe = fcntl (fd[i], F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if ((coe & FD_CLOEXEC) == 0) { printf ("pipe2(O_CLOEXEC) does not set close-on-exit for fd[%d]\n", i); return 1; } } close (fd[0]); close (fd[1]); puts ("OK"); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:28 -07:00
Ulrich Drepper	336dd1f70f	flag parameters: dup2 This patch adds the new dup3 syscall. It extends the old dup2 syscall by one parameter which is meant to hold a flag value. Support for the O_CLOEXEC flag is added in this patch. The following test must be adjusted for architectures other than x86 and x86-64 and in case the syscall numbers changed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <fcntl.h> #include <stdio.h> #include <time.h> #include <unistd.h> #include <sys/syscall.h> #ifndef __NR_dup3 # ifdef __x86_64__ # define __NR_dup3 292 # elif defined __i386__ # define __NR_dup3 330 # else # error "need __NR_dup3" # endif #endif int main (void) { int fd = syscall (__NR_dup3, 1, 4, 0); if (fd == -1) { puts ("dup3(0) failed"); return 1; } int coe = fcntl (fd, F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if (coe & FD_CLOEXEC) { puts ("dup3(0) set close-on-exec flag"); return 1; } close (fd); fd = syscall (__NR_dup3, 1, 4, O_CLOEXEC); if (fd == -1) { puts ("dup3(O_CLOEXEC) failed"); return 1; } coe = fcntl (fd, F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if ((coe & FD_CLOEXEC) == 0) { puts ("dup3(O_CLOEXEC) set close-on-exec flag"); return 1; } close (fd); puts ("OK"); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:28 -07:00
Ulrich Drepper	a0998b50c3	flag parameters: epoll_create This patch adds the new epoll_create2 syscall. It extends the old epoll_create syscall by one parameter which is meant to hold a flag value. In this patch the only flag support is EPOLL_CLOEXEC which causes the close-on-exec flag for the returned file descriptor to be set. A new name EPOLL_CLOEXEC is introduced which in this implementation must have the same value as O_CLOEXEC. The following test must be adjusted for architectures other than x86 and x86-64 and in case the syscall numbers changed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <fcntl.h> #include <stdio.h> #include <time.h> #include <unistd.h> #include <sys/syscall.h> #ifndef __NR_epoll_create2 # ifdef __x86_64__ # define __NR_epoll_create2 291 # elif defined __i386__ # define __NR_epoll_create2 329 # else # error "need __NR_epoll_create2" # endif #endif #define EPOLL_CLOEXEC O_CLOEXEC int main (void) { int fd = syscall (__NR_epoll_create2, 1, 0); if (fd == -1) { puts ("epoll_create2(0) failed"); return 1; } int coe = fcntl (fd, F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if (coe & FD_CLOEXEC) { puts ("epoll_create2(0) set close-on-exec flag"); return 1; } close (fd); fd = syscall (__NR_epoll_create2, 1, EPOLL_CLOEXEC); if (fd == -1) { puts ("epoll_create2(EPOLL_CLOEXEC) failed"); return 1; } coe = fcntl (fd, F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if ((coe & FD_CLOEXEC) == 0) { puts ("epoll_create2(EPOLL_CLOEXEC) set close-on-exec flag"); return 1; } close (fd); puts ("OK"); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:28 -07:00
Ulrich Drepper	11fcb6c146	flag parameters: timerfd_create The timerfd_create syscall already has a flags parameter. It just is unused so far. This patch changes this by introducing the TFD_CLOEXEC flag to set the close-on-exec flag for the returned file descriptor. A new name TFD_CLOEXEC is introduced which in this implementation must have the same value as O_CLOEXEC. The following test must be adjusted for architectures other than x86 and x86-64 and in case the syscall numbers changed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <fcntl.h> #include <stdio.h> #include <time.h> #include <unistd.h> #include <sys/syscall.h> #ifndef __NR_timerfd_create # ifdef __x86_64__ # define __NR_timerfd_create 283 # elif defined __i386__ # define __NR_timerfd_create 322 # else # error "need __NR_timerfd_create" # endif #endif #define TFD_CLOEXEC O_CLOEXEC int main (void) { int fd = syscall (__NR_timerfd_create, CLOCK_REALTIME, 0); if (fd == -1) { puts ("timerfd_create(0) failed"); return 1; } int coe = fcntl (fd, F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if (coe & FD_CLOEXEC) { puts ("timerfd_create(0) set close-on-exec flag"); return 1; } close (fd); fd = syscall (__NR_timerfd_create, CLOCK_REALTIME, TFD_CLOEXEC); if (fd == -1) { puts ("timerfd_create(TFD_CLOEXEC) failed"); return 1; } coe = fcntl (fd, F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if ((coe & FD_CLOEXEC) == 0) { puts ("timerfd_create(TFD_CLOEXEC) set close-on-exec flag"); return 1; } close (fd); puts ("OK"); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:27 -07:00
Ulrich Drepper	b087498eb5	flag parameters: eventfd This patch adds the new eventfd2 syscall. It extends the old eventfd syscall by one parameter which is meant to hold a flag value. In this patch the only flag support is EFD_CLOEXEC which causes the close-on-exec flag for the returned file descriptor to be set. A new name EFD_CLOEXEC is introduced which in this implementation must have the same value as O_CLOEXEC. The following test must be adjusted for architectures other than x86 and x86-64 and in case the syscall numbers changed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <sys/syscall.h> #ifndef __NR_eventfd2 # ifdef __x86_64__ # define __NR_eventfd2 290 # elif defined __i386__ # define __NR_eventfd2 328 # else # error "need __NR_eventfd2" # endif #endif #define EFD_CLOEXEC O_CLOEXEC int main (void) { int fd = syscall (__NR_eventfd2, 1, 0); if (fd == -1) { puts ("eventfd2(0) failed"); return 1; } int coe = fcntl (fd, F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if (coe & FD_CLOEXEC) { puts ("eventfd2(0) sets close-on-exec flag"); return 1; } close (fd); fd = syscall (__NR_eventfd2, 1, EFD_CLOEXEC); if (fd == -1) { puts ("eventfd2(EFD_CLOEXEC) failed"); return 1; } coe = fcntl (fd, F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if ((coe & FD_CLOEXEC) == 0) { puts ("eventfd2(EFD_CLOEXEC) does not set close-on-exec flag"); return 1; } close (fd); puts ("OK"); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [akpm@linux-foundation.org: add sys_ni stub] Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:27 -07:00
Ulrich Drepper	9deb27baed	flag parameters: signalfd This patch adds the new signalfd4 syscall. It extends the old signalfd syscall by one parameter which is meant to hold a flag value. In this patch the only flag support is SFD_CLOEXEC which causes the close-on-exec flag for the returned file descriptor to be set. A new name SFD_CLOEXEC is introduced which in this implementation must have the same value as O_CLOEXEC. The following test must be adjusted for architectures other than x86 and x86-64 and in case the syscall numbers changed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <fcntl.h> #include <signal.h> #include <stdio.h> #include <unistd.h> #include <sys/syscall.h> #ifndef __NR_signalfd4 # ifdef __x86_64__ # define __NR_signalfd4 289 # elif defined __i386__ # define __NR_signalfd4 327 # else # error "need __NR_signalfd4" # endif #endif #define SFD_CLOEXEC O_CLOEXEC int main (void) { sigset_t ss; sigemptyset (&ss); sigaddset (&ss, SIGUSR1); int fd = syscall (__NR_signalfd4, -1, &ss, 8, 0); if (fd == -1) { puts ("signalfd4(0) failed"); return 1; } int coe = fcntl (fd, F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if (coe & FD_CLOEXEC) { puts ("signalfd4(0) set close-on-exec flag"); return 1; } close (fd); fd = syscall (__NR_signalfd4, -1, &ss, 8, SFD_CLOEXEC); if (fd == -1) { puts ("signalfd4(SFD_CLOEXEC) failed"); return 1; } coe = fcntl (fd, F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if ((coe & FD_CLOEXEC) == 0) { puts ("signalfd4(SFD_CLOEXEC) does not set close-on-exec flag"); return 1; } close (fd); puts ("OK"); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [akpm@linux-foundation.org: add sys_ni stub] Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:27 -07:00
Ulrich Drepper	7d9dbca342	flag parameters: anon_inode_getfd extension This patch just extends the anon_inode_getfd interface to take an additional parameter with a flag value. The flag value is passed on to get_unused_fd_flags in anticipation for a use with the O_CLOEXEC flag. No actual semantic changes here, the changed callers all pass 0 for now. [akpm@linux-foundation.org: KVM fix] Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:27 -07:00
Akinobu Mita	6e2c10a12a	binfmt_misc: use simple_read_from_buffer() Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:27 -07:00
Jon Tollefson	f4a67cceee	fs: check for statfs overflow Adds a check for an overflow in the filesystem size so if someone is checking with statfs() on a 16G blocksize hugetlbfs in a 32bit binary that it will report back EOVERFLOW instead of a size of 0. Acked-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:19 -07:00
Andi Kleen	a137e1cc6d	hugetlbfs: per mount huge page sizes Add the ability to configure the hugetlb hstate used on a per mount basis. - Add a new pagesize= option to the hugetlbfs mount that allows setting the page size - This option causes the mount code to find the hstate corresponding to the specified size, and sets up a pointer to the hstate in the mount's superblock. - Change the hstate accessors to use this information rather than the global_hstate they were using (requires a slight change in mm/memory.c so we don't NULL deref in the error-unmap path -- see comments). [np: take hstate out of hugetlbfs inode and vma->vm_private_data] Acked-by: Adam Litke <agl@us.ibm.com> Acked-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:17 -07:00
Andi Kleen	a551643895	hugetlb: modular state for hugetlb page size The goal of this patchset is to support multiple hugetlb page sizes. This is achieved by introducing a new struct hstate structure, which encapsulates the important hugetlb state and constants (eg. huge page size, number of huge pages currently allocated, etc). The hstate structure is then passed around the code which requires these fields, they will do the right thing regardless of the exact hstate they are operating on. This patch adds the hstate structure, with a single global instance of it (default_hstate), and does the basic work of converting hugetlb to use the hstate. Future patches will add more hstate structures to allow for different hugetlbfs mounts to have different page sizes. [akpm@linux-foundation.org: coding-style fixes] Acked-by: Adam Litke <agl@us.ibm.com> Acked-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:17 -07:00
Eric Dumazet	a47a126ad5	vmallocinfo: add NUMA information Christoph recently added /proc/vmallocinfo file to get information about vmalloc allocations. This patch adds NUMA specific information, giving number of pages allocated on each memory node. This should help to check that vmalloc() is able to respect NUMA policies. Example of output on a four nodes machine (one cpu per node) 1) network hash tables are evenly spreaded on four nodes (OK) (Same point for inodes and dentries hash tables) 2) iptables tables (x_tables) are correctly allocated on each cpu node (OK). 3) sys_swapon() allocates its memory from one node only. 4) each loaded module is using memory on one node. Sysadmins could tune their setup to change points 3) and 4) if necessary. grep "pages=" /proc/vmallocinfo 0xffffc20000000000-0xffffc20000201000 2101248 alloc_large_system_hash+0x204/0x2c0 pages=512 vmalloc N0=128 N1=128 N2=128 N3=128 0xffffc20000201000-0xffffc20000302000 1052672 alloc_large_system_hash+0x204/0x2c0 pages=256 vmalloc N0=64 N1=64 N2=64 N3=64 0xffffc2000031a000-0xffffc2000031d000 12288 alloc_large_system_hash+0x204/0x2c0 pages=2 vmalloc N1=1 N2=1 0xffffc2000031f000-0xffffc2000032b000 49152 cramfs_uncompress_init+0x2e/0x80 pages=11 vmalloc N0=3 N1=3 N2=2 N3=3 0xffffc2000033e000-0xffffc20000341000 12288 sys_swapon+0x640/0xac0 pages=2 vmalloc N0=2 0xffffc20000341000-0xffffc20000344000 12288 xt_alloc_table_info+0xfe/0x130 [x_tables] pages=2 vmalloc N0=2 0xffffc20000344000-0xffffc20000347000 12288 xt_alloc_table_info+0xfe/0x130 [x_tables] pages=2 vmalloc N1=2 0xffffc20000347000-0xffffc2000034a000 12288 xt_alloc_table_info+0xfe/0x130 [x_tables] pages=2 vmalloc N2=2 0xffffc2000034a000-0xffffc2000034d000 12288 xt_alloc_table_info+0xfe/0x130 [x_tables] pages=2 vmalloc N3=2 0xffffc20004381000-0xffffc20004402000 528384 alloc_large_system_hash+0x204/0x2c0 pages=128 vmalloc N0=32 N1=32 N2=32 N3=32 0xffffc20004402000-0xffffc20004803000 4198400 alloc_large_system_hash+0x204/0x2c0 pages=1024 vmalloc vpages N0=256 N1=256 N2=256 N3=256 0xffffc20004803000-0xffffc20004904000 1052672 alloc_large_system_hash+0x204/0x2c0 pages=256 vmalloc N0=64 N1=64 N2=64 N3=64 0xffffc20004904000-0xffffc20004bec000 `3047424` sys_swapon+0x640/0xac0 pages=743 vmalloc vpages N0=743 0xffffffffa0000000-0xffffffffa000f000 61440 sys_init_module+0xc27/0x1d00 pages=14 vmalloc N1=14 0xffffffffa000f000-0xffffffffa0014000 20480 sys_init_module+0xc27/0x1d00 pages=4 vmalloc N0=4 0xffffffffa0014000-0xffffffffa0017000 12288 sys_init_module+0xc27/0x1d00 pages=2 vmalloc N0=2 0xffffffffa0017000-0xffffffffa0022000 45056 sys_init_module+0xc27/0x1d00 pages=10 vmalloc N1=10 0xffffffffa0022000-0xffffffffa0028000 24576 sys_init_module+0xc27/0x1d00 pages=5 vmalloc N3=5 0xffffffffa0028000-0xffffffffa0050000 163840 sys_init_module+0xc27/0x1d00 pages=39 vmalloc N1=39 0xffffffffa0050000-0xffffffffa0052000 8192 sys_init_module+0xc27/0x1d00 pages=1 vmalloc N1=1 0xffffffffa0052000-0xffffffffa0056000 16384 sys_init_module+0xc27/0x1d00 pages=3 vmalloc N1=3 0xffffffffa0056000-0xffffffffa0081000 176128 sys_init_module+0xc27/0x1d00 pages=42 vmalloc N3=42 0xffffffffa0081000-0xffffffffa00ae000 184320 sys_init_module+0xc27/0x1d00 pages=44 vmalloc N3=44 0xffffffffa00ae000-0xffffffffa00b1000 12288 sys_init_module+0xc27/0x1d00 pages=2 vmalloc N3=2 0xffffffffa00b1000-0xffffffffa00b9000 32768 sys_init_module+0xc27/0x1d00 pages=7 vmalloc N0=7 0xffffffffa00b9000-0xffffffffa00c4000 45056 sys_init_module+0xc27/0x1d00 pages=10 vmalloc N3=10 0xffffffffa00c6000-0xffffffffa00e0000 106496 sys_init_module+0xc27/0x1d00 pages=25 vmalloc N2=25 0xffffffffa00e0000-0xffffffffa00f1000 69632 sys_init_module+0xc27/0x1d00 pages=16 vmalloc N2=16 0xffffffffa00f1000-0xffffffffa00f4000 12288 sys_init_module+0xc27/0x1d00 pages=2 vmalloc N3=2 0xffffffffa00f4000-0xffffffffa00f7000 12288 sys_init_module+0xc27/0x1d00 pages=2 vmalloc N3=2 [akpm@linux-foundation.org: fix comment] Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:17 -07:00
Pavel Machek	cce7708158	SYNC_FILE_RANGE_WRITE may and will block. Document that. [akpm@linux-foundation.org: fix comment text] Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:17 -07:00
Mel Gorman	04f2cbe356	hugetlb: guarantee that COW faults for a process that called mmap(MAP_PRIVATE) on hugetlbfs will succeed After patch 2 in this series, a process that successfully calls mmap() for a MAP_PRIVATE mapping will be guaranteed to successfully fault until a process calls fork(). At that point, the next write fault from the parent could fail due to COW if the child still has a reference. We only reserve pages for the parent but a copy must be made to avoid leaking data from the parent to the child after fork(). Reserves could be taken for both parent and child at fork time to guarantee faults but if the mapping is large it is highly likely we will not have sufficient pages for the reservation, and it is common to fork only to exec() immediatly after. A failure here would be very undesirable. Note that the current behaviour of mainline with MAP_PRIVATE pages is pretty bad. The following situation is allowed to occur today. 1. Process calls mmap(MAP_PRIVATE) 2. Process calls mlock() to fault all pages and makes sure it succeeds 3. Process forks() 4. Process writes to MAP_PRIVATE mapping while child still exists 5. If the COW fails at this point, the process gets SIGKILLed even though it had taken care to ensure the pages existed This patch improves the situation by guaranteeing the reliability of the process that successfully calls mmap(). When the parent performs COW, it will try to satisfy the allocation without using reserves. If that fails the parent will steal the page leaving any children without a page. Faults from the child after that point will result in failure. If the child COW happens first, an attempt will be made to allocate the page without reserves and the child will get SIGKILLed on failure. To summarise the new behaviour: 1. If the original mapper performs COW on a private mapping with multiple references, it will attempt to allocate a hugepage from the pool or the buddy allocator without using the existing reserves. On fail, VMAs mapping the same area are traversed and the page being COW'd is unmapped where found. It will then steal the original page as the last mapper in the normal way. 2. The VMAs the pages were unmapped from are flagged to note that pages with data no longer exist. Future no-page faults on those VMAs will terminate the process as otherwise it would appear that data was corrupted. A warning is printed to the console that this situation occured. 2. If the child performs COW first, it will attempt to satisfy the COW from the pool if there are enough pages or via the buddy allocator if overcommit is allowed and the buddy allocator can satisfy the request. If it fails, the child will be killed. If the pool is large enough, existing applications will not notice that the reserves were a factor. Existing applications depending on the no-reserves been set are unlikely to exist as for much of the history of hugetlbfs, pages were prefaulted at mmap(), allocating the pages at that point or failing the mmap(). [npiggin@suse.de: fix CONFIG_HUGETLB=n build] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:16 -07:00
Mel Gorman	a1e78772d7	hugetlb: reserve huge pages for reliable MAP_PRIVATE hugetlbfs mappings until fork() This patch reserves huge pages at mmap() time for MAP_PRIVATE mappings in a similar manner to the reservations taken for MAP_SHARED mappings. The reserve count is accounted both globally and on a per-VMA basis for private mappings. This guarantees that a process that successfully calls mmap() will successfully fault all pages in the future unless fork() is called. The characteristics of private mappings of hugetlbfs files behaviour after this patch are; 1. The process calling mmap() is guaranteed to succeed all future faults until it forks(). 2. On fork(), the parent may die due to SIGKILL on writes to the private mapping if enough pages are not available for the COW. For reasonably reliable behaviour in the face of a small huge page pool, children of hugepage-aware processes should not reference the mappings; such as might occur when fork()ing to exec(). 3. On fork(), the child VMAs inherit no reserves. Reads on pages already faulted by the parent will succeed. Successful writes will depend on enough huge pages being free in the pool. 4. Quotas of the hugetlbfs mount are checked at reserve time for the mapper and at fault time otherwise. Before this patch, all reads or writes in the child potentially needs page allocations that can later lead to the death of the parent. This applies to reads and writes of uninstantiated pages as well as COW. After the patch it is only a write to an instantiated page that causes problems. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:16 -07:00
Kentaro Makita	da3bbdd463	fix soft lock up at NFS mount via per-SB LRU-list of unused dentries [Summary] Split LRU-list of unused dentries to one per superblock to avoid soft lock up during NFS mounts and remounting of any filesystem. Previously I posted here: http://lkml.org/lkml/2008/3/5/590 [Descriptions] - background dentry_unused is a list of dentries which are not referenced. dentry_unused grows up when references on directories or files are released. This list can be very long if there is huge free memory. - the problem When shrink_dcache_sb() is called, it scans all dentry_unused linearly under spin_lock(), and if dentry->d_sb is differnt from given superblock, scan next dentry. This scan costs very much if there are many entries, and very ineffective if there are many superblocks. IOW, When we need to shrink unused dentries on one dentry, but scans unused dentries on all superblocks in the system. For example, we scan 500 dentries to unmount a filesystem, but scans 1,000,000 or more unused dentries on other superblocks. In our case , At mounting NFS, shrink_dcache_sb() is called to shrink unused dentries on NFS, but scans 100,000,000 unused dentries on superblocks in the system such as local ext3 filesystems. I hear NFS mounting took 1 min on some system in use. : NFS uses virtual filesystem in rpc layer, so NFS is affected by this problem. 100,000,000 is possible number on large systems. Per-superblock LRU of unused dentried can reduce the cost in reasonable manner. - How to fix I found this problem is solved by David Chinner's "Per-superblock unused dentry LRU lists V3"(1), so I rebase it and add some fix to reclaim with fairness, which is in Andrew Morton's comments(2). 1) http://lkml.org/lkml/2006/5/25/318 2) http://lkml.org/lkml/2006/5/25/320 Split LRU-list of unused dentries to each superblocks. Then, NFS mounting will check dentries under a superblock instead of all. But this spliting will break LRU of dentry-unused. So, I've attempted to make reclaim unused dentrins with fairness by calculate number of dentries to scan on this sb based on following way number of dentries to scan on this sb = count * (number of dentries on this sb / number of dentries in the machine) - ToDo - I have to measuring performance number and do stress tests. - When unmount occurs during prune_dcache(), scanning on same superblock, It is unable to reach next superblock because it is gone away. We restart scannig superblock from first one, it causes unfairness of reclaim unused dentries on first superblock. But I think this happens very rarely. - Test Results Result on 6GB boxes with excessive unused dentries. Without patch: $ cat /proc/sys/fs/dentry-state 10181835 10180203 45 0 0 0 # mount -t nfs 10.124.60.70:/work/kernel-src nfs real 0m1.830s user 0m0.001s sys 0m1.653s With this patch: $ cat /proc/sys/fs/dentry-state 10236610 10234751 45 0 0 0 # mount -t nfs 10.124.60.70:/work/kernel-src nfs real 0m0.106s user 0m0.002s sys 0m0.032s [akpm@linux-foundation.org: fix comments] Signed-off-by: Kentaro Makita <k-makita@np.css.fujitsu.com> Cc: Neil Brown <neilb@suse.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: David Chinner <dgc@sgi.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:15 -07:00
Jan Beulich	42b7772812	mm: remove double indirection on tlb parameter to free_pgd_range() & Co The double indirection here is not needed anywhere and hence (at least) confusing. Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Acked-by: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:15 -07:00
Adrian Bunk	c748e1340e	mm/vmstat.c: proper externs This patch adds proper extern declarations for five variables in include/linux/vmstat.h Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:14 -07:00
Li Zefan	ec05e868ac	ext4: improve ext4_fill_flex_info() a bit - use kzalloc() instead of kmalloc() + memset() - improve a printk info Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2008-07-24 12:49:59 -04:00
Shirish Pargaonkar	ef571cadd5	[CIFS] Fix warnings from checkpatch Signed-off-by: Steve French <sfrench@us.ibm.com>	2008-07-24 15:56:05 +00:00
Shirish Pargaonkar	b1910ad622	[CIFS] Fix improper endian conversion of ACL subauth field In mode_to_acl when converting a Unix mode to a Windows ACL the subauth fields of the SID in the ACL were translated incorrectly on bigendian architectures Signed-off-by: Steve French <sfrench@us.ibm.com>	2008-07-24 14:53:20 +00:00
Shirish Pargaonkar	76c510ad2e	[CIFS] Fix possible double free if search immediately after search rewind fails Signed-off-by: Shirish Pargaonkar <shirishp@us.ibm.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2008-07-24 14:48:33 +00:00
Steve French	99b1f5b2f6	[CIFS] remove checkpatch warning Signed-off-by: Steve French <sfrench@us.ibm.com>	2008-07-24 02:37:45 +00:00
Alexey Dobriyan	f984c7b982	Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Steven French <sfrench@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Steve French <sfrench@us.ibm.com>	2008-07-24 02:34:24 +00:00
Harvey Harrison	5ca33c6ac3	cifs: assorted endian annotations fs/cifs/cifssmb.c:3917:13: warning: incorrect type in assignment (different base types) fs/cifs/cifssmb.c:3917:13: expected bool [unsigned] [usertype] is_unicode fs/cifs/cifssmb.c:3917:13: got restricted __le16 The comment explains why __force is used here. fs/cifs/connect.c:458:16: warning: cast to restricted __be32 fs/cifs/connect.c:458:16: warning: cast to restricted __be32 fs/cifs/connect.c:458:16: warning: cast to restricted __be32 fs/cifs/connect.c:458:16: warning: cast to restricted __be32 fs/cifs/connect.c:458:16: warning: cast to restricted __be32 fs/cifs/connect.c:458:16: warning: cast to restricted __be32 Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2008-07-24 01:14:41 +00:00
Jeff Layton	8efdbde647	[CIFS] break ATTR_SIZE changes out into their own function Move the code that handles ATTR_SIZE changes to its own function. This makes for a smaller function and reduces the level of indentation. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2008-07-23 21:28:12 +00:00
Jeff Layton	09e50d55a9	lockdep: annotate cifs in-kernel sockets Put CIFS sockets in their own class to avoid some lockdep warnings. CIFS sockets are not exposed to user-space, and so are not subject to the same deadlock scenarios. A similar change was made a couple of years ago for RPC sockets in commit `ed07536ed6`. This patch should prevent lockdep false-positives like this one: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.18-98.el5.jtltest.38.bz456320.1debug #1 ------------------------------------------------------- test5/2483 is trying to acquire lock: (sk_lock-AF_INET){--..}, at: [<ffffffff800270d2>] tcp_sendmsg+0x1c/0xb2f but task is already holding lock: (&inode->i_alloc_sem){--..}, at: [<ffffffff8002e454>] notify_change+0xf5/0x2e0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (&inode->i_alloc_sem){--..}: [<ffffffff800a817c>] __lock_acquire+0x9a9/0xadf [<ffffffff800a8a72>] lock_acquire+0x55/0x70 [<ffffffff8002e454>] notify_change+0xf5/0x2e0 [<ffffffff800a4e36>] down_write+0x3c/0x68 [<ffffffff8002e454>] notify_change+0xf5/0x2e0 [<ffffffff800e358d>] do_truncate+0x50/0x6b [<ffffffff8005197c>] get_write_access+0x40/0x46 [<ffffffff80012cf1>] may_open+0x1d3/0x22e [<ffffffff8001bc81>] open_namei+0x2c6/0x6dd [<ffffffff800289c6>] do_filp_open+0x1c/0x38 [<ffffffff800683ef>] _spin_unlock+0x17/0x20 [<ffffffff800167a7>] get_unused_fd+0xf9/0x107 [<ffffffff8001a704>] do_sys_open+0x44/0xbe [<ffffffff80060116>] system_call+0x7e/0x83 [<ffffffffffffffff>] 0xffffffffffffffff -> #2 (&sysfs_inode_imutex_key){--..}: [<ffffffff800a817c>] __lock_acquire+0x9a9/0xadf [<ffffffff8010f6df>] create_dir+0x26/0x1d7 [<ffffffff800a8a72>] lock_acquire+0x55/0x70 [<ffffffff8010f6df>] create_dir+0x26/0x1d7 [<ffffffff800671c0>] mutex_lock_nested+0x104/0x29c [<ffffffff800a819d>] __lock_acquire+0x9ca/0xadf [<ffffffff8010f6df>] create_dir+0x26/0x1d7 [<ffffffff8010fc67>] sysfs_create_dir+0x58/0x76 [<ffffffff8015144c>] kobject_add+0xdb/0x198 [<ffffffff801be765>] class_device_add+0xb2/0x465 [<ffffffff8005a6ff>] kobject_get+0x12/0x17 [<ffffffff80225265>] register_netdevice+0x270/0x33e [<ffffffff8022538c>] register_netdev+0x59/0x67 [<ffffffff80464d40>] net_olddevs_init+0xb/0xac [<ffffffff80448a79>] init+0x1f9/0x2fc [<ffffffff80068885>] _spin_unlock_irq+0x24/0x27 [<ffffffff80067f86>] trace_hardirqs_on_thunk+0x35/0x37 [<ffffffff80061079>] child_rip+0xa/0x11 [<ffffffff80068885>] _spin_unlock_irq+0x24/0x27 [<ffffffff800606a8>] restore_args+0x0/0x30 [<ffffffff80179a59>] acpi_ds_init_one_object+0x0/0x80 [<ffffffff80448880>] init+0x0/0x2fc [<ffffffff8006106f>] child_rip+0x0/0x11 [<ffffffffffffffff>] 0xffffffffffffffff -> #1 (rtnl_mutex){--..}: [<ffffffff800a817c>] __lock_acquire+0x9a9/0xadf [<ffffffff8025acf8>] ip_mc_leave_group+0x23/0xb7 [<ffffffff800a8a72>] lock_acquire+0x55/0x70 [<ffffffff8025acf8>] ip_mc_leave_group+0x23/0xb7 [<ffffffff800671c0>] mutex_lock_nested+0x104/0x29c [<ffffffff8025acf8>] ip_mc_leave_group+0x23/0xb7 [<ffffffff802451b0>] do_ip_setsockopt+0x6d1/0x9bf [<ffffffff800a575e>] lock_release_holdtime+0x27/0x48 [<ffffffff800a575e>] lock_release_holdtime+0x27/0x48 [<ffffffff8006a85e>] do_page_fault+0x503/0x835 [<ffffffff8012cbf6>] socket_has_perm+0x5b/0x68 [<ffffffff80245556>] ip_setsockopt+0x22/0x78 [<ffffffff8021c973>] sys_setsockopt+0x91/0xb7 [<ffffffff800602a6>] tracesys+0xd5/0xdf [<ffffffffffffffff>] 0xffffffffffffffff -> #0 (sk_lock-AF_INET){--..}: [<ffffffff800a5037>] print_stack_trace+0x59/0x68 [<ffffffff800a8092>] __lock_acquire+0x8bf/0xadf [<ffffffff800a8a72>] lock_acquire+0x55/0x70 [<ffffffff800270d2>] tcp_sendmsg+0x1c/0xb2f [<ffffffff80035466>] lock_sock+0xd4/0xe4 [<ffffffff80096e91>] _local_bh_enable+0xcb/0xe0 [<ffffffff800606a8>] restore_args+0x0/0x30 [<ffffffff800270d2>] tcp_sendmsg+0x1c/0xb2f [<ffffffff80057540>] sock_sendmsg+0xf3/0x110 [<ffffffff800a2bb6>] autoremove_wake_function+0x0/0x2e [<ffffffff800a10e4>] kernel_text_address+0x1a/0x26 [<ffffffff8006f4e2>] dump_trace+0x211/0x23a [<ffffffff800a6d3d>] find_usage_backwards+0x5f/0x88 [<ffffffff8840221a>] MD5Final+0xaf/0xc2 [cifs] [<ffffffff884032ec>] cifs_calculate_signature+0x55/0x69 [cifs] [<ffffffff8021d891>] kernel_sendmsg+0x35/0x47 [<ffffffff883ff38e>] smb_send+0xa3/0x151 [cifs] [<ffffffff883ff5de>] SendReceive+0x1a2/0x448 [cifs] [<ffffffff800a812f>] __lock_acquire+0x95c/0xadf [<ffffffff883e758a>] CIFSSMBSetEOF+0x20d/0x25b [cifs] [<ffffffff883fa430>] cifs_set_file_size+0x110/0x3b7 [cifs] [<ffffffff883faa89>] cifs_setattr+0x3b2/0x6f6 [cifs] [<ffffffff8002e454>] notify_change+0xf5/0x2e0 [<ffffffff8002e4a4>] notify_change+0x145/0x2e0 [<ffffffff800e358d>] do_truncate+0x50/0x6b [<ffffffff8005197c>] get_write_access+0x40/0x46 [<ffffffff80012cf1>] may_open+0x1d3/0x22e [<ffffffff8001bc81>] open_namei+0x2c6/0x6dd [<ffffffff800289c6>] do_filp_open+0x1c/0x38 [<ffffffff800683ef>] _spin_unlock+0x17/0x20 [<ffffffff800167a7>] get_unused_fd+0xf9/0x107 [<ffffffff8001a704>] do_sys_open+0x44/0xbe [<ffffffff800602a6>] tracesys+0xd5/0xdf [<ffffffffffffffff>] 0xffffffffffffffff other info that might help us debug this: 2 locks held by test5/2483: #0: (&inode->i_mutex){--..}, at: [<ffffffff800e3582>] do_truncate+0x45/0x6b #1: (&inode->i_alloc_sem){--..}, at: [<ffffffff8002e454>] notify_change+0xf5/0x2e0 stack backtrace: Call Trace: [<ffffffff800a6a7b>] print_circular_bug_tail+0x65/0x6e [<ffffffff800a5037>] print_stack_trace+0x59/0x68 [<ffffffff800a8092>] __lock_acquire+0x8bf/0xadf [<ffffffff800a8a72>] lock_acquire+0x55/0x70 [<ffffffff800270d2>] tcp_sendmsg+0x1c/0xb2f [<ffffffff80035466>] lock_sock+0xd4/0xe4 [<ffffffff80096e91>] _local_bh_enable+0xcb/0xe0 [<ffffffff800606a8>] restore_args+0x0/0x30 [<ffffffff800270d2>] tcp_sendmsg+0x1c/0xb2f [<ffffffff80057540>] sock_sendmsg+0xf3/0x110 [<ffffffff800a2bb6>] autoremove_wake_function+0x0/0x2e [<ffffffff800a10e4>] kernel_text_address+0x1a/0x26 [<ffffffff8006f4e2>] dump_trace+0x211/0x23a [<ffffffff800a6d3d>] find_usage_backwards+0x5f/0x88 [<ffffffff8840221a>] :cifs:MD5Final+0xaf/0xc2 [<ffffffff884032ec>] :cifs:cifs_calculate_signature+0x55/0x69 [<ffffffff8021d891>] kernel_sendmsg+0x35/0x47 [<ffffffff883ff38e>] :cifs:smb_send+0xa3/0x151 [<ffffffff883ff5de>] :cifs:SendReceive+0x1a2/0x448 [<ffffffff800a812f>] __lock_acquire+0x95c/0xadf [<ffffffff883e758a>] :cifs:CIFSSMBSetEOF+0x20d/0x25b [<ffffffff883fa430>] :cifs:cifs_set_file_size+0x110/0x3b7 [<ffffffff883faa89>] :cifs:cifs_setattr+0x3b2/0x6f6 [<ffffffff8002e454>] notify_change+0xf5/0x2e0 [<ffffffff8002e4a4>] notify_change+0x145/0x2e0 [<ffffffff800e358d>] do_truncate+0x50/0x6b [<ffffffff8005197c>] get_write_access+0x40/0x46 [<ffffffff80012cf1>] may_open+0x1d3/0x22e [<ffffffff8001bc81>] open_namei+0x2c6/0x6dd [<ffffffff800289c6>] do_filp_open+0x1c/0x38 [<ffffffff800683ef>] _spin_unlock+0x17/0x20 [<ffffffff800167a7>] get_unused_fd+0xf9/0x107 [<ffffffff8001a704>] do_sys_open+0x44/0xbe [<ffffffff800602a6>] tracesys+0xd5/0xdf Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Steve French <sfrench@us.ibm.com>	2008-07-23 18:25:38 +00:00
Linus Torvalds	c010b2f76c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (82 commits) ipw2200: Call netif_*_queue() interfaces properly. netxen: Needs to include linux/vmalloc.h [netdrvr] atl1d: fix !CONFIG_PM build r6040: rework init_one error handling r6040: bump release number to 0.18 r6040: handle RX fifo full and no descriptor interrupts r6040: change the default waiting time r6040: use definitions for magic values in descriptor status r6040: completely rework the RX path r6040: call napi_disable when puting down the interface and set lp->dev accordingly. mv643xx_eth: fix NETPOLL build r6040: rework the RX buffers allocation routine r6040: fix scheduling while atomic in r6040_tx_timeout r6040: fix null pointer access and tx timeouts r6040: prefix all functions with r6040 rndis_host: support WM6 devices as modems at91_ether: use netstats in net_device structure sfc: Create one RX queue and interrupt per CPU package by default sfc: Use a separate workqueue for resets sfc: I2C adapter initialisation fixes ...	2008-07-22 19:09:51 -07:00
Adrian Bunk	8086cd451f	netns: make get_proc_net() static get_proc_net() can now become static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-22 14:19:19 -07:00
Linus Torvalds	53baaaa968	Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: (79 commits) arm: bus_id -> dev_name() and dev_set_name() conversions sparc64: fix up bus_id changes in sparc core code 3c59x: handle pci_name() being const MTD: handle pci_name() being const HP iLO driver sysdev: Convert the x86 mce tolerant sysdev attribute to generic attribute sysdev: Add utility functions for simple int/ulong variable sysdev attributes sysdev: Pass the attribute to the low level sysdev show/store function driver core: Suppress sysfs warnings for device_rename(). kobject: Transmit return value of call_usermodehelper() to caller sysfs-rules.txt: reword API stability statement debugfs: Implement debugfs_remove_recursive() HOWTO: change email addresses of James in HOWTO always enable FW_LOADER unless EMBEDDED=y uio-howto.tmpl: use unique output names uio-howto.tmpl: use standard copyright/legal markings sysfs: don't call notify_change sysdev: fix debugging statements in registration code. kobject: should use kobject_put() in kset-example kobject: reorder kobject to save space on 64 bit builds ...	2008-07-22 13:13:47 -07:00
Alexey Dobriyan	ee1e6ab605	proc: fix /proc/*/pagemap some more struct pagemap_walk was placed on stack, some hooks are initialized, the rest (->pgd_entry, ->pud_entry, ->pte_entry) are valid but junk. Reported-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Vegard Nossum" <vegard.nossum@gmail.com> Cc: <stable@kernel.org> [2.6.25.x, 2.6.26.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-22 09:59:41 -07:00
John Reiser	6519108746	execve filename: document and export via auxiliary vector The Linux kernel puts the filename argument of execve() into the new address space. Many developers are surprised to learn this. Those who know and could use it, object "But it's not documented." Those who want to use it dislike the expression (char *)(1+ strlen(env[-1+ n_env]) + env[-1+ n_env]) because it requires locating the last original environment variable, and assumes that the filename follows the characters. This patch documents the insertion of the filename, and makes it easier to find by adding a new tag AT_EXECFN in the ElfXX_auxv_t; see <elf.h>. In many cases readlink("/proc/self/exe",) gives the same answer. But if all the original pages get unmapped, then the kernel erases the symlink for /proc/self/exe. This can happen when a program decompressor does a good job of cleaning up after uncompressing directly to memory, so that the address space of the target program looks the same as if compression had never happened. One example is http://upx.sourceforge.net . One notable use of the underlying concept (what path containED the executable) is glibc expanding $ORIGIN in DT_RUNPATH. In practice for the near term, it may be a good idea for user-mode code to use both /proc/self/exe and AT_EXECFN as fall-back methods for each other. /proc/self/exe can fail due to unmapping, AT_EXECFN can fail because it won't be present on non-new systems. The auxvec or {AT_EXECFN}.d_val also can get overwritten, although in nearly all cases this would be the result of a bug. The runtime cost is one NEW_AUX_ENT using two words of stack space. The underlying value is maintained already as bprm->exec; setup_arg_pages() in fs/exec.c slides it for stack_shift, etc. Signed-off-by: John Reiser <jreiser@BitWagon.com> Cc: Roland McGrath <roland@redhat.com> Cc: Jakub Jelinek <jakub@redhat.com> Cc: Ulrich Drepper <drepper@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-22 09:59:40 -07:00
Jan Beulich	04e1e0ccca	[CIFS] Fix compiler warning on 64-bit Signed-off-by: Steve French <sfrench@us.ibm.com>	2008-07-22 13:04:18 +00:00
Cornelia Huck	36ce6dad6e	driver core: Suppress sysfs warnings for device_rename(). driver core: Suppress sysfs warnings for device_rename(). Renaming network devices to an already existing name is not something we want sysfs to print a scary warning for, since the callers can deal with this correctly. So let's introduce sysfs_create_link_nowarn() which gets rid of the common warning. Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-07-21 21:55:01 -07:00
Haavard Skinnemoen	9505e63756	debugfs: Implement debugfs_remove_recursive() debugfs_remove_recursive() will remove a dentry and all its children. Drivers can use this to zap their whole debugfs tree so that they don't need to keep track of every single debugfs dentry they created. It may fail to remove the whole tree in certain cases: sh-3.2# rmmod atmel-mci < /sys/kernel/debug/mmc0/ios/clock mmc0: card b368 removed atmel_mci atmel_mci.0: Lost dma0chan1, falling back to PIO sh-3.2# ls /sys/kernel/debug/mmc0/ ios But I'm not sure if that case can be handled in any sane manner. Signed-off-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com> Cc: Pierre Ossman <drzeus-list@drzeus.cx> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-07-21 21:54:59 -07:00
Miklos Szeredi	93265d13ea	sysfs: don't call notify_change sysfs_chmod_file() calls notify_change() to change the permission bits on a sysfs file. Replace with explicit call to sysfs_setattr() and fsnotify_change(). This is equivalent, except that security_inode_setattr() is not called. This function is called by drivers, so the security checks do not make any sense. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-07-21 21:54:57 -07:00
Kay Sievers	aab0de2451	driver core: remove KOBJ_NAME_LEN define Kobjects do not have a limit in name size since a while, so stop pretending that they do. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-07-21 21:54:52 -07:00
Greg Kroah-Hartman	6143b59970	device create: coda: convert device_create to device_create_drvdata device_create() is race-prone, so use the race-free device_create_drvdata() instead as device_create() is going away. Cc: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-07-21 21:54:41 -07:00
Linus Torvalds	14b395e35d	Merge branch 'for-2.6.27' of git://linux-nfs.org/~bfields/linux * 'for-2.6.27' of git://linux-nfs.org/~bfields/linux: (51 commits) nfsd: nfs4xdr.c do-while is not a compound statement nfsd: Use C99 initializers in fs/nfsd/nfs4xdr.c lockd: Pass "struct sockaddr *" to new failover-by-IP function lockd: get host reference in nlmsvc_create_block() instead of callers lockd: minor svclock.c style fixes lockd: eliminate duplicate nlmsvc_lookup_host call from nlmsvc_lock lockd: eliminate duplicate nlmsvc_lookup_host call from nlmsvc_testlock lockd: nlm_release_host() checks for NULL, caller needn't file lock: reorder struct file_lock to save space on 64 bit builds nfsd: take file and mnt write in nfs4_upgrade_open nfsd: document open share bit tracking nfsd: tabulate nfs4 xdr encoding functions nfsd: dprint operation names svcrdma: Change WR context get/put to use the kmem cache svcrdma: Create a kmem cache for the WR contexts svcrdma: Add flush_scheduled_work to module exit function svcrdma: Limit ORD based on client's advertised IRD svcrdma: Remove unused wait q from svcrdma_xprt structure svcrdma: Remove unneeded spin locks from __svc_rdma_free svcrdma: Add dma map count and WARN_ON ...	2008-07-20 21:21:46 -07:00
Linus Torvalds	db6d8c7a40	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (1232 commits) iucv: Fix bad merging. net_sched: Add size table for qdiscs net_sched: Add accessor function for packet length for qdiscs net_sched: Add qdisc_enqueue wrapper highmem: Export totalhigh_pages. ipv6 mcast: Omit redundant address family checks in ip6_mc_source(). net: Use standard structures for generic socket address structures. ipv6 netns: Make several "global" sysctl variables namespace aware. netns: Use net_eq() to compare net-namespaces for optimization. ipv6: remove unused macros from net/ipv6.h ipv6: remove unused parameter from ip6_ra_control tcp: fix kernel panic with listening_get_next tcp: Remove redundant checks when setting eff_sacks tcp: options clean up tcp: Fix MD5 signatures for non-linear skbs sctp: Update sctp global memory limit allocations. sctp: remove unnecessary byteshifting, calculate directly in big-endian sctp: Allow only 1 listening socket with SO_REUSEADDR sctp: Do not leak memory on multiple listen() calls sctp: Support ipv6only AF_INET6 sockets. ...	2008-07-20 17:43:29 -07:00
Linus Torvalds	f7df406dce	Merge branch 'configfs-fixup-ptr-error' of git://oss.oracle.com/git/jlbec/linux-2.6 * 'configfs-fixup-ptr-error' of git://oss.oracle.com/git/jlbec/linux-2.6: configfs: Allow ->make_item() and ->make_group() to return detailed errors. Revert "configfs: Allow ->make_item() and ->make_group() to return detailed errors."	2008-07-20 17:17:52 -07:00
Alan Cox	a352def21a	tty: Ldisc revamp Move the line disciplines towards a conventional ->ops arrangement. For the moment the actual 'tty_ldisc' struct in the tty is kept as part of the tty struct but this can then be changed if it turns out that when it all settles down we want to refcount ldiscs separately to the tty. Pull the ldisc code out of /proc and put it with our ldisc code. Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-20 17:12:34 -07:00
David S. Miller	407d819cf0	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-2.6	2008-07-19 00:30:39 -07:00
Harvey Harrison	5108b27651	nfsd: nfs4xdr.c do-while is not a compound statement The WRITEMEM macro produces sparse warnings of the form: fs/nfsd/nfs4xdr.c:2668:2: warning: do-while statement is not a compound statement Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Cc: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-07-18 15:18:35 -04:00
J. Bruce Fields	ad1060c89c	nfsd: Use C99 initializers in fs/nfsd/nfs4xdr.c Thanks to problem report and original patch from Harvey Harrison. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: Harvey Harrison <harvey.harrison@gmail.com> Cc: Benny Halevy <bhalevy@panasas.com>	2008-07-18 15:04:58 -04:00
Pavel Emelyanov	b6fcbdb4f2	proc: consolidate per-net single-release callers They are symmetrical to single_open ones :) Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 04:07:44 -07:00
Pavel Emelyanov	de05c557b2	proc: consolidate per-net single_open callers There are already 7 of them - time to kill some duplicate code. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 04:07:21 -07:00
David S. Miller	49997d7515	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: Documentation/powerpc/booting-without-of.txt drivers/atm/Makefile drivers/net/fs_enet/fs_enet-main.c drivers/pci/pci-acpi.c net/8021q/vlan.c net/iucv/iucv.c	2008-07-18 02:39:39 -07:00
Joel Becker	a6795e9ebb	configfs: Allow ->make_item() and ->make_group() to return detailed errors. The configfs operations ->make_item() and ->make_group() currently return a new item/group. A return of NULL signifies an error. Because of this, -ENOMEM is the only return code bubbled up the stack. Multiple folks have requested the ability to return specific error codes when these operations fail. This patch adds that ability by changing the ->make_item/group() ops to return ERR_PTR() values. These errors are bubbled up appropriately. NULL returns are changed to -ENOMEM for compatibility. Also updated are the in-kernel users of configfs. This is a rework of reverted commit `11c3b79218`. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2008-07-17 15:21:29 -07:00
Joel Becker	f89ab8619e	Revert "configfs: Allow ->make_item() and ->make_group() to return detailed errors." This reverts commit `11c3b79218`. The code will move to PTR_ERR(). Signed-off-by: Joel Becker <joel.becker@oracle.com>	2008-07-17 14:53:48 -07:00
Aneesh Kumar K.V	12219aea6b	ext4: Cleanup the block reservation code path The truncate patch should not use the i_allocated_meta_blocks value. So add seperate functions to be used in the truncate and alloc path. We also need to release the meta-data block that we reserved for the blocks that we are truncating. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2008-07-17 16:12:08 -04:00
Theodore Ts'o	34071da71a	ext4: don't assume extents can't cross block groups when truncating With the FLEX_BG layout, there is no reason why extents can't cross block groups, so make the truncate code reserve enough credits so we don't BUG if we come across such an extent. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-08-01 21:59:19 -04:00
Theodore Ts'o	bc965ab3f2	ext4: Fix lack of credits BUG() when deleting a badly fragmented inode The extents codepath for ext4_truncate() requests journal transaction credits in very small chunks, requesting only what is needed. This means there may not be enough credits left on the transaction handle after ext4_truncate() returns and then when ext4_delete_inode() tries finish up its work, it may not have enough transaction credits, causing a BUG() oops in the jbd2 core. Also, reserve an extra 2 blocks when starting an ext4_delete_inode() since we need to update the inode bitmap, as well as update the orphaned inode linked list. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-08-02 21:10:38 -04:00

... 2 3 4 5 6 ...

9770 Commits