Commit Graph

2480 Commits

Author SHA1 Message Date
Roman Zippel 05cfb614dd [PATCH] hrtimers: remove data field
The nanosleep cleanup allows to remove the data field of hrtimer.  The
callback function can use container_of() to get it's own data.  Since the
hrtimer structure is anyway embedded in other structures, this adds no
overhead.

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:03 -08:00
Roman Zippel 4dee26b7e2 [PATCH] hrtimers: remove it_real_value calculation from proc/*/stat
Remove the it_real_value from /proc/*/stat, during 1.2.x was the last time it
returned useful data (as it was directly maintained by the scheduler), now
it's only a waste of time to calculate it.  Return 0 instead.

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:02 -08:00
Badari Pulavarty a0e9285233 [PATCH] ext3: "nobh" writeback support for filesystems blocksize < pagesize
There is no valid reason why we can't support "nobh" option for filesystems
with blocksize != PAGESIZE.

This patch lets them use "nobh" option for writeback mode for blocksize <
pagesize.

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:02 -08:00
Badari Pulavarty f91a2ad2ed [PATCH] ext3: multi-block get_block()
Mingming Cao recently added multi-block allocation support for ext3,
currently used only by DIO.  I added support to map multiple blocks for
mpage_readpages().  This patch add support for ext3_get_block() to deal
with multi-block mapping.  Basically it renames ext3_direct_io_get_blocks()
as ext3_get_block().

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:02 -08:00
Andrew Morton d6859bfca8 [PATCH] ext3: cleanups and WARN_ON()
- Clean up a few little layout things and comments.

- Add a WARN_ON to a case which I was wondering about.

- Tune up some inlines.

Cc: Mingming Cao <cmm@us.ibm.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:02 -08:00
Badari Pulavarty 1d8fa7a2b9 [PATCH] remove ->get_blocks() support
Now that get_block() can handle mapping multiple disk blocks, no need to have
->get_blocks().  This patch removes fs specific ->get_blocks() added for DIO
and makes it users use get_block() instead.

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:01 -08:00
Badari Pulavarty fa30bd058b [PATCH] map multiple blocks for mpage_readpages()
This patch changes mpage_readpages() and get_block() to get the disk mapping
information for multiple blocks at the same time.

b_size represents the amount of disk mapping that needs to mapped.  On the
successful get_block() b_size indicates the amount of disk mapping thats
actually mapped.  Only the filesystems who care to use this information and
provide multiple disk blocks at a time can choose to do so.

No changes are needed for the filesystems who wants to ignore this.

[akpm@osdl.org: cleanups]
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:01 -08:00
Badari Pulavarty b0cf2321c6 [PATCH] pass b_size to ->get_block()
Pass amount of disk needs to be mapped to get_block().  This way one can
modify the fs ->get_block() functions to map multiple blocks at the same time.

[akpm@osdl.org: performance tweak]
[akpm@osdl.org: remove unneeded assignments]
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:01 -08:00
Badari Pulavarty 205f87f6b3 [PATCH] change buffer_head.b_size to size_t
Increase the size of the buffer_head b_size field (only) for 64 bit platforms.
Update some old and moldy comments in and around the structure as well.

The b_size increase allows us to perform larger mappings and allocations for
large I/O requests from userspace, which tie in with other changes allowing
the get_block_t() interface to map multiple blocks at once.

Signed-off-by: Nathan Scott <nathans@sgi.com>
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:01 -08:00
Mingming Cao d48589bfad [PATCH] ext3_get_blocks: Adjust reservation window size for mblocks
Optimize the block reservation and the multiple block allocation: with the
knowledge of the total number of blocks ahead, set or adjust the reservation
window size properly (based on the number of blocks needed) before block
allocation happens: if there isn't any reservation yet, make sure the
reservation window equals to or greater than the number of blocks needed,
before create an reservation window; if a reservation window is already
exists, try to extends the window size to match the number of blocks to
allocate.  This could increase the possibility of completing multiple blocks
allocation in a single request, as blocks are only allocated in the range of
the inode's reservation window.

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:01 -08:00
Mingming Cao faa569763a [PATCH] ext3_get_blocks: Adjust accounting info in ext3_new_blocks()
Update accounting information (quota, boundary checks, free blocks number etc)
in ext3_new_blocks().

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:01 -08:00
Mingming Cao b54e41ec17 [PATCH] ext3_get_blocks: support multiple blocks allocation in ext3_new_block()
Change ext3_try_to_allocate() (called via ext3_new_blocks()) to try to
allocate the requested number of blocks on a best effort basis: After
allocated the first block, it will always attempt to allocate the next few(up
to the requested size and not beyond the reservation window) adjacent blocks
at the same time.

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:01 -08:00
Mingming Cao b47b24781c [PATCH] ext3_get_blocks: multiple block allocation
Add support for multiple block allocation in ext3-get-blocks().

Look up the disk block mapping and count the total number of blocks to
allocate, then pass it to ext3_new_block(), where the real block allocation is
performed.  Once multiple blocks are allocated, prepare the branch with those
just allocated blocks info and finally splice the whole branch into the block
mapping tree.

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:01 -08:00
Mingming Cao 89747d369d [PATCH] ext3_get_blocks: Mapping multiple blocks at a once
Currently ext3_get_block() only maps or allocates one block at a time.  This
is quite inefficient for sequential IO workload.

I have posted a early implements a simply multiple block map and allocation
with current ext3.  The basic idea is allocating the 1st block in the existing
way, and attempting to allocate the next adjacent blocks on a best effort
basis.  More description about the implementation could be found here:
http://marc.theaimsgroup.com/?l=ext2-devel&m=112162230003522&w=2

The following the latest version of the patch: break the original patch into 5
patches, re-worked some logicals, and fixed some bugs.  The break ups are:

 [patch 1] Adding map multiple blocks at a time in ext3_get_blocks()
 [patch 2] Extend ext3_get_blocks() to support multiple block allocation
 [patch 3] Implement multiple block allocation in ext3-try-to-allocate
 (called via ext3_new_block()).
 [patch 4] Proper accounting updates in ext3_new_blocks()
 [patch 5] Adjust reservation window size properly (by the given number
 of blocks to allocate) before block allocation to increase the
 possibility of allocating multiple blocks in a single call.

Tests done so far includes fsx,tiobench and dbench.  The following numbers
collected from Direct IO tests (1G file creation/read) shows the system time
have been greatly reduced (more than 50% on my 8 cpu system) with the patches.

 1G file DIO write:
 	2.6.15		2.6.15+patches
 real    0m31.275s	0m31.161s
 user    0m0.000s	0m0.000s
 sys     0m3.384s	0m0.564s

 1G file DIO read:
 	2.6.15		2.6.15+patches
 real    0m30.733s	0m30.624s
 user    0m0.000s	0m0.004s
 sys     0m0.748s	0m0.380s

Some previous test we did on buffered IO with using multiple blocks allocation
and delayed allocation shows noticeable improvement on throughput and system
time.

This patch:

Add support of mapping multiple blocks in one call.

This is useful for DIO reads and re-writes (where blocks are already
allocated), also is in line with Christoph's proposal of using getblocks() in
mpage_readpage() or mpage_readpages().

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:00 -08:00
Andrew Morton 5515eff811 [PATCH] 2tb-files-add-blkcnt_t-fixes
Cc: Takashi Sato <sho@tnes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:00 -08:00
Matthew Dobson 93d2341c75 [PATCH] mempool: use mempool_create_slab_pool()
Modify well over a dozen mempool users to call mempool_create_slab_pool()
rather than calling mempool_create() with extra arguments, saving about 30
lines of code and increasing readability.

Signed-off-by: Matthew Dobson <colpatch@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:00 -08:00
Matthew Dobson 0eaae62aba [PATCH] mempool: use common mempool kmalloc allocator
This patch changes several mempool users, all of which are basically just
wrappers around kmalloc(), to use the common mempool_kmalloc/kfree, rather
than their own wrapper function, removing a bunch of duplicated code.

Signed-off-by: Matthew Dobson <colpatch@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:59 -08:00
Andy Adamson eb76b3fda1 [PATCH] NFSD4: return conflict lock without races
Update the NFSv4 server to use the new posix_lock_file_conf() interface.
Remove unnecessary (and race-prone) posix_test_file() calls.

Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:56 -08:00
Andy Adamson 5842add2f3 [PATCH] VFS,fs/locks.c,NFSD4: add race_free posix_lock_file_conf() interface
Lockd and the NFSv4 server both exercise a race condition where
posix_test_lock() is called either before or after posix_lock_file() to
deal with a denied lock request due to a conflicting lock.

Remove the race condition for the NFSv4 server by adding a new conflicting
lock parameter to __posix_lock_file() , changing the name to
__posix_lock_file_conf().

Keep posix_lock_file() interface, add posix_lock_conf() interface, both
call __posix_lock_file_conf().

[akpm@osdl.org: Put the EXPORT_SYMBOL() where it belongs]
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:56 -08:00
J. Bruce Fields 6dc0fe8f8b [PATCH] VFS,fs/locks.c: cleanup locks_insert_block
BUG instead of handling a case that should never happen.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:56 -08:00
Eric Dumazet fa3536cc14 [PATCH] Use __read_mostly on some hot fs variables
I discovered on oprofile hunting on a SMP platform that dentry lookups were
slowed down because d_hash_mask, d_hash_shift and dentry_hashtable were in
a cache line that contained inodes_stat.  So each time inodes_stats is
changed by a cpu, other cpus have to refill their cache line.

This patch moves some variables to the __read_mostly section, in order to
avoid false sharing.  RCU dentry lookups can go full speed.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:56 -08:00
NeilBrown 2ff28e22bd [PATCH] Make address_space_operations->invalidatepage return void
The return value of this function is never used, so let's be honest and
declare it as void.

Some places where invalidatepage returned 0, I have inserted comments
suggesting a BUG_ON.

[akpm@osdl.org: JBD BUG fix]
[akpm@osdl.org: rework for git-nfs]
[akpm@osdl.org: don't go BUG in block_invalidate_page()]
Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:55 -08:00
NeilBrown 3978d7179d [PATCH] Make address_space_operations->sync_page return void
The only user ignores the return value, and the only instanace
(block_sync_page) always returns 0...

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:55 -08:00
Ingo Molnar 353ab6e97b [PATCH] sem2mutex: fs/
Semaphore to mutex conversion.

The conversion was generated via scripts, and the result was validated
automatically via a script as well.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Eric Van Hensbergen <ericvh@ericvh.myip.org>
Cc: Robert Love <rml@tech9.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:55 -08:00
Steven Rostedt 64a07bd82e [PATCH] protect remove_proc_entry
It has been discovered that the remove_proc_entry has a race in the removing
of entries in the proc file system that are siblings.  There's no protection
around the traversing and removing of elements that belong in the same
subdirectory.

This subdirectory list is protected in other areas by the BKL.  So the BKL was
at first used to protect this area too, but unfortunately, remove_proc_entry
may be called with spinlocks held.  The BKL may schedule, so this was not a
solution.

The final solution was to add a new global spin lock to protect this list,
called proc_subdir_lock.  This lock now protects the list in
remove_proc_entry, and I also went around looking for other areas that this
list is modified and added this protection there too.  Care must be taken
since these locations call several functions that may also schedule.

Since I don't see any location that these functions that modify the
subdirectory list are called by interrupts, the irqsave/restore versions of
the spin lock was _not_ used.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:53 -08:00
Eric Sesterhenn 309be53da6 BUG_ON() Conversion in fs/ext2/
this changes if() BUG(); constructs to BUG_ON() which is
cleaner, contains unlikely() and can better optimized away.

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-03-26 18:27:41 +02:00
Eric Sesterhenn 4d4ef9abe3 BUG_ON() Conversion in fs/hfs/
this changes if() BUG(); constructs to BUG_ON() which is
cleaner, contains unlikely() and can better optimized away.

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-03-26 18:26:51 +02:00
Eric Sesterhenn 28133c7b2b BUG_ON() Conversion in fs/dcache.c
this changes if() BUG(); constructs to BUG_ON() which is
cleaner and can better optimized away

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-03-26 18:25:39 +02:00
Eric Sesterhenn e827f92355 BUG_ON() Conversion in fs/buffer.c
this changes if() BUG(); constructs to BUG_ON() which is
cleaner and can better optimized away

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-03-26 18:24:46 +02:00
Linus Torvalds 1b9a391736 Merge branch 'audit.b3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current
* 'audit.b3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current: (22 commits)
  [PATCH] fix audit_init failure path
  [PATCH] EXPORT_SYMBOL patch for audit_log, audit_log_start, audit_log_end and audit_format
  [PATCH] sem2mutex: audit_netlink_sem
  [PATCH] simplify audit_free() locking
  [PATCH] Fix audit operators
  [PATCH] promiscuous mode
  [PATCH] Add tty to syscall audit records
  [PATCH] add/remove rule update
  [PATCH] audit string fields interface + consumer
  [PATCH] SE Linux audit events
  [PATCH] Minor cosmetic cleanups to the code moved into auditfilter.c
  [PATCH] Fix audit record filtering with !CONFIG_AUDITSYSCALL
  [PATCH] Fix IA64 success/failure indication in syscall auditing.
  [PATCH] Miscellaneous bug and warning fixes
  [PATCH] Capture selinux subject/object context information.
  [PATCH] Exclude messages by message type
  [PATCH] Collect more inode information during syscall processing.
  [PATCH] Pass dentry, not just name, in fsnotify creation hooks.
  [PATCH] Define new range of userspace messages.
  [PATCH] Filter rule comparators
  ...

Fixed trivial conflict in security/selinux/hooks.c
2006-03-25 09:24:53 -08:00
Linus Torvalds 53846a21c1 Merge git://git.linux-nfs.org/pub/linux/nfs-2.6
* git://git.linux-nfs.org/pub/linux/nfs-2.6: (103 commits)
  SUNRPC,RPCSEC_GSS: spkm3--fix config dependencies
  SUNRPC,RPCSEC_GSS: spkm3: import contexts using NID_cast5_cbc
  LOCKD: Make nlmsvc_traverse_shares return void
  LOCKD: nlmsvc_traverse_blocks return is unused
  SUNRPC,RPCSEC_GSS: fix krb5 sequence numbers.
  NFSv4: Dont list system.nfs4_acl for filesystems that don't support it.
  SUNRPC,RPCSEC_GSS: remove unnecessary kmalloc of a checksum
  SUNRPC: Ensure rpc_call_async() always calls tk_ops->rpc_release()
  SUNRPC: Fix memory barriers for req->rq_received
  NFS: Fix a race in nfs_sync_inode()
  NFS: Clean up nfs_flush_list()
  NFS: Fix a race with PG_private and nfs_release_page()
  NFSv4: Ensure the callback daemon flushes signals
  SUNRPC: Fix a 'Busy inodes' error in rpc_pipefs
  NFS, NLM: Allow blocking locks to respect signals
  NFS: Make nfs_fhget() return appropriate error values
  NFSv4: Fix an oops in nfs4_fill_super
  lockd: blocks should hold a reference to the nlm_file
  NFSv4: SETCLIENTID_CONFIRM should handle NFS4ERR_DELAY/NFS4ERR_RESOURCE
  NFSv4: Send the delegation stateid for SETATTR calls
  ...
2006-03-25 09:18:27 -08:00
Andi Kleen 913bd90601 [PATCH] x86_64: Increase the variability of the process stack on 64bit architectures
8MB is not really very random, use 1GB (or more with larger page sizes)
instead.

Also use the low bits of the random generator output now instead of
throwing them away.

Only enabled on x86-64 right now. Other architectures need to add
a suitable STACK_RND_MASK

Cc: mingo@elte.hu
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 09:10:52 -08:00
Linus Torvalds 0c50527379 Merge branch 'upstream-linus' of git://oss.oracle.com/home/sourcebo/git/ocfs2
* 'upstream-linus' of git://oss.oracle.com/home/sourcebo/git/ocfs2:
  ocfs2: finally remove MLF* macros
  ocfs2: don't use MLF* in the file system
  ocfs2: don't use MLF* in dlm/ files
  ocfs2: don't use MLF* in cluster/ files
  [PATCH] ocfs2: dlm recovery fixes
  [PATCH] ocfs2: fix hang in dlm lock resource mastery
  ocfs2: use __attribute__ format
2006-03-25 08:50:56 -08:00
Linus Torvalds 1e8c573933 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (21 commits)
  BUG_ON() Conversion in drivers/video/
  BUG_ON() Conversion in drivers/parisc/
  BUG_ON() Conversion in drivers/block/
  BUG_ON() Conversion in sound/sparc/cs4231.c
  BUG_ON() Conversion in drivers/s390/block/dasd.c
  BUG_ON() Conversion in lib/swiotlb.c
  BUG_ON() Conversion in kernel/cpu.c
  BUG_ON() Conversion in ipc/msg.c
  BUG_ON() Conversion in block/elevator.c
  BUG_ON() Conversion in fs/coda/
  BUG_ON() Conversion in fs/binfmt_elf_fdpic.c
  BUG_ON() Conversion in input/serio/hil_mlc.c
  BUG_ON() Conversion in md/dm-hw-handler.c
  BUG_ON() Conversion in md/bitmap.c
  The comment describing how MS_ASYNC works in msync.c is confusing
  rcu: undeclared variable used in documentation
  fix typos "wich" -> "which"
  typo patch for fs/ufs/super.c
  Fix simple typos
  tabify drivers/char/Makefile
  ...
2006-03-25 08:41:09 -08:00
Luke Yang 1ad3dcc09c [PATCH] flat binary loader doesn't check fd table full
In binfmt_flat.c, the flat binary loader should check file descriptor table
and install the fd on the file.

Convert the function to single-exit and fix this bug.

Signed-off-by: "Luke Yang" <luke.adi@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:23:01 -08:00
Carsten Otte 6cc6b1226b [PATCH] remove needless check in fs/read_write.c
nr_segs is unsigned long and thus cannot be negative.  We checked against 0
few lines before.

Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:23:01 -08:00
Carsten Otte 5514854812 [PATCH] remove needless check in binfmt_elf.c
Local variable i is unsigned int and thus cannot be negative.

(akpm: unsigneds shouldn't be called `i'.  This value cannot possibly be
negative anyway).

Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:23:01 -08:00
Chen, Kenneth W 174e27c607 [PATCH] direct-io: bug fix in dio handling write error
There is a bug in direct-io on propagating write error up to the higher I/O
layer.  When performing an async ODIRECT write to a block device, if a
device error occurred (like media error or disk is pulled), the error code
is only propagated from device driver to the DIO layer.  The error code
stops at finished_one_bio().  The aysnc write, however, is supposedly have
a corresponding AIO event with appropriate return code (in this case -EIO).
 Application which waits on the async write event, will hang forever since
such AIO event is lost forever (if such app did not use the timeout option
in io_getevents call.  Regardless, an AIO event is lost).

The discovery of above bug leads to another discovery of potential race
window with dio->result.  The fundamental problem is that dio->result is
overloaded with dual use: an indicator of fall back path for partial dio
write, and an error indicator used in the I/O completion path.  In the
event of device error, the setting of -EIO to dio->result clashes with
value used to track partial write that activates the fall back path.

It was also pointed out that it is impossible to use dio->result to track
partial write and at the same time to track error returned from device
driver.  Because direct_io_work can only determines whether it is a partial
write at the end of io submission and in mid stream of those io submission,
a return code could be coming back from the driver.  Thus messing up all
the subsequent logic.

Proposed fix is to separating out error code returned by the IO completion
path from partial IO submit tracking.  A new variable is added to dio
structure specifically to track io error returned in the completion path.

Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Acked-by: Zach Brown <zach.brown@oracle.com>
Acked-by: Suparna Bhattacharya <suparna@in.ibm.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:23:00 -08:00
Phillip Susi 0e6b3e5e97 [PATCH] udf: fix uid/gid options and add uid/gid=ignore and forget options
As Pekka Enberg pointed out, with the if still following the else, you can
still get a null uid written to the disk if you specify a default uid= without
uid=forget.  In other words, if the desktop user is uid=1000 and the mount
option uid=1000 is given ( which is done on ubuntu automatically and probably
other distributions that use hal ), then if any other user besides uid 1000
owns a file then a 0 will be written to the media as the owning uid instead.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:23:00 -08:00
Oliver Neukum 11b0b5abb2 [PATCH] use kzalloc and kcalloc in core fs code
Signed-off-by: Oliver Neukum <oliver@neukum.name>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:23:00 -08:00
Peter Staubach 57070d012c [PATCH] compat_sys_nfsservctl(): handle errors correctly
Correct some error handling on the compat version of the nfsservctl()
system.  It was detecting errors while copying in the arguments from user
space, but then attempting to use the arguments anyway.  This didn't seem
so good.

Signed-off-by: Peter Staubach <staubach@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:59 -08:00
Rob Landley 76c67de460 [PATCH] Ext2 flags shouldn't report "nogrpid"
If I mount ext2 "rw", I want it to say "rw", not "rw,nogrpid".

I caught this writing an automated regression test script for the busybox
mount command.  The symptom is
  /dev/loop0 on /images/ext2.dir type ext2 (rw,nogrpid)
instead of:
  /dev/loop0 on /images/ext2.dir type ext2 (rw)

The behavior was introduced by git commit
8fc2751beb.

Signed-off-by: Rob Landley <rob@landley.net>
Cc: Mark Bellon <mbellon@mvista.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:58 -08:00
NeilBrown 7e53cac41d [PATCH] Honour AOP_TRUNCATE_PAGE returns in page_symlink
As prepare_write, commit_write and readpage are allowed to return
AOP_TRUNCATE_PAGE, page_symlink should respond to them.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:58 -08:00
Benoit Boissinot d5ee4ea833 [PATCH] indirect_print_item() warning fix
fs/reiserfs/item_ops.c: In function 'indirect_print_item':
fs/reiserfs/item_ops.c:278: warning: 'num' may be used uninitialized in this function

(akpm: this is probably just gcc being dumb)

Signed-off-by: Benoit Boissinot <benoit.boissinot@ens-lyon.fr>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:57 -08:00
Adrian Bunk 3cdc409c16 [PATCH] reiserfs/xattr_acl.c:reiserfs_get_acl(): make size an int
The Coverity checker wasn't happy seeing a size_t compared with -ENODATA
and -ENOSYS.

Since the only place where size is set is through the result of
reiserfs_xattr_get() which is an int, we could simply make size an int.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <mason@suse.com>
Cc: Hans Reiser <reiser@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:57 -08:00
Kirk True c1f5a19446 [PATCH] ext3: Fix debug logging-only compilation error
When EXT3FS_DEBUG is #define-d, the compile breaks due to #include file
issues.

Signed-off-by: Kirk True <kernel@kirkandsheila.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:56 -08:00
Kirill Korotaev 2ab1346085 [PATCH] Reduce sched latency in shrink_dcache_sb()
This patch reduces scheduling latency in shrink_dcache_sb() noticed during
remounting of big partitions with many cached dentries.  The same latency
fix was applied to select_parent() long ago.

Signed-off-by: Denis Lunev <den@sw.ru>
Signed-off-by: Pavel Emelianov <xemul@sw.ru>
Signed-off-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:56 -08:00
OGAWA Hirofumi 4ffc844425 [PATCH] Move cond_resched() after iput() in sync_sb_inodes()
In here, I think the following order is more cache-friendly.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:56 -08:00
OGAWA Hirofumi d25b9a1ff0 [PATCH] freeze_bdev() cleanup
freeze_bdev() uses a fsync_super() without sync_blockdev().  This patch
makes __fsync_super() and shares it.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:56 -08:00
Denis Vlasenko 11b8448751 [PATCH] fix messages in fs/minix
Believe it or not, but in fs/minix/*, the oldest filesystem in the kernel,
something still can be fixed:

	printk("new_inode: bit already set");

"\n" is missing!

While at it, I also removed periods from the end of error messages and made
capitalization uniform.  Also s/i-node/inode/, s/printk (/printk(/

Signed-ff-by: Denis Vlasenko <vda@ilport.com.ua>

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:56 -08:00
Davide Libenzi f348d70a32 [PATCH] POLLRDHUP/EPOLLRDHUP handling for half-closed devices notifications
Implement the half-closed devices notifiation, by adding a new POLLRDHUP
(and its alias EPOLLRDHUP) bit to the existing poll/select sets.  Since the
existing POLLHUP handling, that does not report correctly half-closed
devices, was feared to be changed, this implementation leaves the current
POLLHUP reporting unchanged and simply add a new bit that is set in the few
places where it makes sense.  The same thing was discussed and conceptually
agreed quite some time ago:

http://lkml.org/lkml/2003/7/12/116

Since this new event bit is added to the existing Linux poll infrastruture,
even the existing poll/select system calls will be able to use it.  As far
as the existing POLLHUP handling, the patch leaves it as is.  The
pollrdhup-2.6.16.rc5-0.10.diff defines the POLLRDHUP for all the existing
archs and sets the bit in the six relevant files.  The other attached diff
is the simple change required to sys/epoll.h to add the EPOLLRDHUP
definition.

There is "a stupid program" to test POLLRDHUP delivery here:

 http://www.xmailserver.org/pollrdhup-test.c

It tests poll(2), but since the delivery is same epoll(2) will work equally.

Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:56 -08:00
Kirk True 58bf6a2db2 [PATCH] smbfs: Fix debug logging-only compilation error
When SMBFS_DEBUG_VERBOSE is #define-d, the compile breaks:

fs/smbfs/inode.c:217: error: aggregate value used where an integer was expected

This is a simple matter of using the .tv_sec attribute of struct time_spec.

Signed-off-by: Kirk True <kernel@kirkandsheila.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:55 -08:00
Eric Van Hensbergen 67543e508d [PATCH] 9p: fix name consistency problems
There were a number of conflicting naming schemes used in the v9fs project.
The directory was fs/9p, but MAINTAINERS and Documentation referred to
v9fs.  The module name itself was 9p2000, and the file system type was 9P.
This patch attempts to clean that up, changing all references to 9p in
order to match the directory name.  We'll also start using 9p instead of
v9fs as our patch prefix.

There is also a minor consistency cleanup in the options changing the name
option to uname in order to more closely match the Plan 9 options.

Signed-off-by: Eric Van Hensbergevan <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:54 -08:00
Eric Van Hensbergen 42e8c509cf [PATCH] v9fs: update license boilerplate
Update license boilerplate to specify GPLv2 and remove the (at your option
clause).  This change was agreed to by all the copyright holders (approvals
can be found on v9fs-developer mailing list).

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:54 -08:00
Eugene Teo c0291a05f8 [PATCH] v9fs: fix vfs_inode dereference before NULL check
__getname, which in turn will call kmem_cache_alloc, may return NULL.

Coverity bug #977

Signed-off-by: Eugene Teo <eugene.teo@eugeneteo.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:54 -08:00
Latchesar Ionkov 16cce6d27e [PATCH] v9fs: add extension field to Tcreate
Implement a new way of creating special files.  Instead of Tcreate+Twstat,
add one more field to Tcreate that contains special file description.

Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:54 -08:00
Latchesar Ionkov 5174fdab9f [PATCH] v9fs: print 9p messages
Print 9p messages.

Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Cc: Eric Van Hensbergen <ericvh@ericvh.myip.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:54 -08:00
Russ Cox 4a26c2429b [PATCH] v9fs: rename tids to tags to be consistent with Plan 9 documentation
The code talks about these things called tids, which I eventually figured
out are tags.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:54 -08:00
Russ Cox 27979bb2ff [PATCH] v9fs: consolidate trans_sock into trans_fd
Here is a new trans_fd.c that replaces the current trans_fd.c and
trans_sock.c.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:54 -08:00
Alexander Zarochentsev 5930860296 [PATCH] reiserfs: use balance_dirty_pages_ratelimited_nr in reiserfs_file_write()
Use the new balance_dirty_pages_ratelimited_nr in reiserfs "largeio" file
write.

Signed-off-by: Hans Reiser <reiser@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:53 -08:00
Vladimir V. Saveliev cd02b966bf [PATCH] reiserfs: cleanups
Clean up several places where gcc issues warnings when -W is specified.
Thanks to Neil for finding that.

Signed-off-by: Vladimir V. Saveliev <vs@namesys.com>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Hans Reiser <reiser@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:53 -08:00
Nick Piggin c32ccd87bf [PATCH] inotify: lock avoidance with parent watch status in dentry
Previous inotify work avoidance is good when inotify is completely unused,
but it breaks down if even a single watch is in place anywhere in the
system.  Robin Holt notices that udev is one such culprit - it slows down a
512-thread application on a 512 CPU system from 6 seconds to 22 minutes.

Solve this by adding a flag in the dentry that tells inotify whether or not
its parent inode has a watch on it.  Event queueing to parent will skip
taking locks if this flag is cleared.  Setting and clearing of this flag on
all child dentries versus event delivery: this is no in terms of race
cases, and that was shown to be equivalent to always performing the check.

The essential behaviour is that activity occuring _after_ a watch has been
added and _before_ it has been removed, will generate events.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Robert Love <rml@novell.com>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:53 -08:00
Oleg Drokin 9a56c21392 [PATCH] Add lookup_instantiate_filp usage warning
I think it would be nice to put an usage warning in header of
lookup_instantiate_filp() to indicate it is unsafe to use it on anything
but regular files (even that is potentially unsafe, but there your ->open()
is usually in your hands anyway), so that others won't fall into the same
trap I did.

Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:51 -08:00
Oleg Drokin b500531e6f [PATCH] Introduce FMODE_EXEC file flag
Introduce FMODE_EXEC file flag, to indicate that file is being opened for
execution.  This is useful for distributed filesystems to maintain
consistent behavior for returning ETXTBUSY when opening for write and
execution happens on different nodes.

akpm:

  Needed by Lustre at present.  I assume their objective to to work towards
  being able to install Lustre on an unmodified distro kernel, which seems
  sane.  It should have zero runtime cost.

  Trond and Chuck indicate that NFS4 can probably use this too, for the same
  thing.

  Steven says it's also on the GFS todo list.

Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Chuck Lever <cel@citi.umich.edu>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:51 -08:00
Jeff Mahoney 619d5d8a2b [PATCH] reiserfs: reiserfs_file_write() will lose error code when a 0-length write occurs w/ O_SYNC
When an error occurs in reiserfs_file_write before any data is written, and
O_SYNC is set, the return code of generic_osync_write will overwrite the
error code, losing it.

This patch ensures that generic_osync_inode() doesn't run under an error
condition, losing the error.  This duplicates the logic from
generic_file_buffered_write().

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <mason@suse.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Alexander Zarochentsev <zam@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:51 -08:00
Alexander Zarochentsev a44c94a7b8 [PATCH] reiserfs: handle trans_id overflow
Reiserfs does not handle transaction ID overflow correctly.  Transaction ID
== 0 causes reiserfs to crash.  The patch fixes all places where the
transaction ID is incremented.

Signed-off-by: Alexander Zarochentsev <zam@namesys.com>
Signed-off-by: Hans Reiser <reiser@namesys.com>
Cc: Chris Mason <mason@suse.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:51 -08:00
Alexander Zarochentzev 23f9e0f891 [PATCH] reiserfs: fix transaction overflowing
This patch fixes a bug in reiserfs truncate.  A transaction might overflow
when truncating long highly fragmented file.  The fix is to split
truncation into several transactions to avoid overflowing.

Signed-off-by: Vladimir V. Saveliev <vs@namesys.com>
Cc; Charles McColgan <cm@chuck.net>
Cc: Alexander Zarochentsev <zam@namesys.com>
Cc: Hans Reiser <reiser@namesys.com>
Cc: Chris Mason <mason@suse.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:51 -08:00
Adrian Bunk bdfc326614 [PATCH] fs/inode.c: make iprune_mutex static
There's no reason for iprune_mutex being global.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:51 -08:00
Oleg Drokin 4af4c52f34 [PATCH] Missed error checking for intent's filp in open_namei().
It seems there is error check missing in open_namei for errors returned
through intent.open.file (from lookup_instantiate_filp).

If there is plain open performed, then such a check done inside
__path_lookup_intent_open called from path_lookup_open(), but when the open
is performed with O_CREAT flag set, then __path_lookup_intent_open is only
called with LOOKUP_PARENT set where no file opening can occur yet.

Later on lookup_hash is called where exact opening might take place and
intent.open.file may be filled.  If it is filled with error value of some
sort, then we get kernel attempting to dereference this error value as
address (and corresponding oops) in nameidata_to_filp() called from
filp_open().

While this is relatively simple to workaround in ->lookup() method by just
checking lookup_instantiate_filp() return value and returning error as
needed, this is not so easy in ->d_revalidate(), where we can only return
"yes, dentry is valid" or "no, dentry is invalid, perform full lookup
again", and just returning 0 on error would cause extra lookup (with
potential extra costly RPCs).

So in short, I believe that there should be no difference in error handling
for opening a file and creating a file in open_namei() and propose this
simple patch as a solution.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:51 -08:00
Andrew Morton 8d8c85117f [PATCH] jbd: convert kjournald to kthread API
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:50 -08:00
Andrew Morton e3df18983e [PATCH] jbd: embed j_commit_timer in journal struct
The kjournald timer is currently on the kernel thread's stack and the journal
structure points at it.  Save a pointer hop by moving the timer into the
journal structure.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:50 -08:00
Al Viro 871751e25d [PATCH] slab: implement /proc/slab_allocators
Implement /proc/slab_allocators.   It produces output like:

idr_layer_cache: 80 idr_pre_get+0x33/0x4e
buffer_head: 2555 alloc_buffer_head+0x20/0x75
mm_struct: 9 mm_alloc+0x1e/0x42
mm_struct: 20 dup_mm+0x36/0x370
vm_area_struct: 384 dup_mm+0x18f/0x370
vm_area_struct: 151 do_mmap_pgoff+0x2e0/0x7c3
vm_area_struct: 1 split_vma+0x5a/0x10e
vm_area_struct: 11 do_brk+0x206/0x2e2
vm_area_struct: 2 copy_vma+0xda/0x142
vm_area_struct: 9 setup_arg_pages+0x99/0x214
fs_cache: 8 copy_fs_struct+0x21/0x133
fs_cache: 29 copy_process+0xf38/0x10e3
files_cache: 30 alloc_files+0x1b/0xcf
signal_cache: 81 copy_process+0xbaa/0x10e3
sighand_cache: 77 copy_process+0xe65/0x10e3
sighand_cache: 1 de_thread+0x4d/0x5f8
anon_vma: 241 anon_vma_prepare+0xd9/0xf3
size-2048: 1 add_sect_attrs+0x5f/0x145
size-2048: 2 journal_init_revoke+0x99/0x302
size-2048: 2 journal_init_revoke+0x137/0x302
size-2048: 2 journal_init_inode+0xf9/0x1c4

Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Alexander Nyberg <alexn@telia.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Cc: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
DESC
slab-leaks3-locking-fix
EDESC
From: Andrew Morton <akpm@osdl.org>

Update for slab-remove-cachep-spinlock.patch

Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Alexander Nyberg <alexn@telia.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Cc: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:49 -08:00
David Howells 214fda1f6e [PATCH] Optimise d_find_alias()
The attached patch optimises d_find_alias() to only take the spinlock if
there's anything in the the inode's alias list.  If there isn't, it returns
NULL immediately.

With respect to the superblock sharing patch, this should reduce by one the
number of times the dcache_lock is taken by nfs_lookup() for ordinary
directory lookups.

Only in the case where there's already a dentry for particular directory inode
(such as might happen when another mountpoint is rooted at that dentry) will
the lock then be taken the extra time.

Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:49 -08:00
Mark Fasheh ea8aa68d36 ocfs2: finally remove MLF* macros
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-03-24 14:58:29 -08:00
Mark Fasheh b0697053f9 ocfs2: don't use MLF* in the file system
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-03-24 14:58:28 -08:00
Kurt Hackel 29004858a7 ocfs2: don't use MLF* in dlm/ files
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-03-24 14:58:27 -08:00
Mark Fasheh 70bacbdbfa ocfs2: don't use MLF* in cluster/ files
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-03-24 14:58:26 -08:00
Kurt Hackel c03872f5f5 [PATCH] ocfs2: dlm recovery fixes
when starting lock mastery (excepting the recovery lock) wait on any nodes
needing recovery. fix one instance where lock resources were left attached
to the recovery list after recovery completed.  ensure that the node_down
code is run uniformly regardless of which node found the dead node first.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-03-24 14:58:25 -08:00
Kurt Hackel 9c6510a5bf [PATCH] ocfs2: fix hang in dlm lock resource mastery
fixes hangs in lock mastery related to refcounting on the mle structure

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-03-24 14:58:25 -08:00
Mark Fasheh a74e1f0e8a ocfs2: use __attribute__ format
Use the "format" attribute on ocfs2_error() and ocfs2_abort() so that the
compiler will warn when we get calls to those functions wrong.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-03-24 14:58:24 -08:00
Eric Sesterhenn c5d3237c24 BUG_ON() Conversion in fs/coda/
this changes if() BUG(); constructs to BUG_ON() which is
cleaner, contains unlikely() and can better optimized away.

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-03-24 18:42:13 +01:00
Eric Sesterhenn 88bcd51262 BUG_ON() Conversion in fs/binfmt_elf_fdpic.c
this changes if() BUG(); constructs to BUG_ON() which is
cleaner and can better optimized away

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-03-24 18:38:48 +01:00
Uwe Zeisberger c30fe7f731 fix typos "wich" -> "which"
Signed-off-by: Uwe Zeisberger <zeisberg@informatik.uni-freiburg.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-03-24 18:23:14 +01:00
Michael Owen c690a72253 typo patch for fs/ufs/super.c
Quick and simple typo fix. neTXstep -> neXTstep

Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-03-24 18:21:44 +01:00
Glauber de Oliveira Costa b5a7c4f583 [PATCH] ext3: Properly report backup block present in a group
In filesystems with the meta block group flag on, ext3_bg_num_gdb() fails
to report the correct number of blocks used to store the group descriptor
backups in a given group.  It happens because meta_bg follows a different
logic from the original ext3 backup placement in groups multiples of 3, 5
and 7.

Signed-off-by: Glauber de Oliveira Costa <glommer@br.ibm.com>
Cc: Andreas Dilger <adilger@clusterfs.com>
Cc: "Stephen C. Tweedie" <sct@redhat.com>
Cc: Alex Tomas <alex@clusterfs.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-24 07:33:30 -08:00
Kyle McMartin 82d821ddca [PATCH] Conditionalize compat_sys_newfstatat
If we don't want sys_newfstatat because __ARCH_WANT_STAT64 is defined, then
we certainly don't want compat_sys_newfstatat either.

Signed-off-by: Grant Grundler <grundler@parisc-linux.org>
Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-24 07:33:27 -08:00
Andrew Morton 18e79b40ed [PATCH] fsync: extract internal code
Pull the guts out of do_fsync() - we can use it elsewhere.

Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-24 07:33:27 -08:00
Andrew Morton 4741c9fd36 [PATCH] set_page_dirty() return value fixes
We need set_page_dirty() to return true if it actually transitioned the page
from a clean to dirty state.  This wasn't right in a couple of places.  Do a
kernel-wide audit, fix things up.

This leaves open the possibility of returning a negative errno from
set_page_dirty() sometime in the future.  But we don't do that at present.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-24 07:33:26 -08:00
Eric Dumazet 8a14342683 [PATCH] HOTPLUG_CPU: avoid hitting too many cachelines in recalc_bh_state()
Instead of using for_each_cpu(i), we can use for_each_online_cpu(i).

When a CPU goes offline (ie removed from online map), it might have a non
null bh_accounting.nr, so this patch adds a transfer of this counter to an
online CPU counter.

We already have a hotcpu_notifier, (function buffer_cpu_notify()), where we
can do this bh_accounting.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-24 07:33:26 -08:00
Coywolf Qi Hunt 38885bd4c2 [PATCH] sb_set_blocksize cleanup
sb_set_blocksize() cleanup: make sb_set_blocksize() use blksize_bits().

Signed-off-by: Coywolf Qi Hunt <qiyong@fc-cn.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-24 07:33:25 -08:00
Alex Tomas 09fe316a7b [PATCH] fast ext3_statfs
Under I/O load it may take up to a dozen seconds to read all group
descriptors.  This is what ext3_statfs() does.  At the same time, we already
maintain global numbers of free inodes/blocks.  Why don't we use them instead
of group reading and summing?

Cc: Ravikiran G Thirumalai <kiran@scalex86.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-24 07:33:25 -08:00
Pekka Enberg 5b3cf3e0f0 [PATCH] isofs: remove unused debugging macros
Remove unused debugging macros from isofs.  The referred debug functions do
not exist in the kernel.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-24 07:33:24 -08:00
Alexey Dobriyan 53b3531bbb [PATCH] s/;;/;/g
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-24 07:33:24 -08:00
Paul Jackson b0196009d8 [PATCH] cpuset memory spread slab cache hooks
Change the kmem_cache_create calls for certain slab caches to support cpuset
memory spreading.

See the previous patches, cpuset_mem_spread, for an explanation of cpuset
memory spreading, and cpuset_mem_spread_slab_cache for the slab cache support
for memory spreading.

The slab caches marked for now are: dentry_cache, inode_cache, some xfs slab
caches, and buffer_head.  This list may change over time.  In particular,
other file system types that are used extensively on large NUMA systems may
want to allow for spreading their directory and inode slab cache entries.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-24 07:33:23 -08:00
Paul Jackson fffb60f93c [PATCH] cpuset memory spread: slab cache format
Rewrap the overly long source code lines resulting from the previous
patch's addition of the slab cache flag SLAB_MEM_SPREAD.  This patch
contains only formatting changes, and no function change.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-24 07:33:23 -08:00
Paul Jackson 4b6a9316fa [PATCH] cpuset memory spread: slab cache filesystems
Mark file system inode and similar slab caches subject to SLAB_MEM_SPREAD
memory spreading.

If a slab cache is marked SLAB_MEM_SPREAD, then anytime that a task that's
in a cpuset with the 'memory_spread_slab' option enabled goes to allocate
from such a slab cache, the allocations are spread evenly over all the
memory nodes (task->mems_allowed) allowed to that task, instead of favoring
allocation on the node local to the current cpu.

The following inode and similar caches are marked SLAB_MEM_SPREAD:

    file                               cache
    ====                               =====
    fs/adfs/super.c                    adfs_inode_cache
    fs/affs/super.c                    affs_inode_cache
    fs/befs/linuxvfs.c                 befs_inode_cache
    fs/bfs/inode.c                     bfs_inode_cache
    fs/block_dev.c                     bdev_cache
    fs/cifs/cifsfs.c                   cifs_inode_cache
    fs/coda/inode.c                    coda_inode_cache
    fs/dquot.c                         dquot
    fs/efs/super.c                     efs_inode_cache
    fs/ext2/super.c                    ext2_inode_cache
    fs/ext2/xattr.c (fs/mbcache.c)     ext2_xattr
    fs/ext3/super.c                    ext3_inode_cache
    fs/ext3/xattr.c (fs/mbcache.c)     ext3_xattr
    fs/fat/cache.c                     fat_cache
    fs/fat/inode.c                     fat_inode_cache
    fs/freevxfs/vxfs_super.c           vxfs_inode
    fs/hpfs/super.c                    hpfs_inode_cache
    fs/isofs/inode.c                   isofs_inode_cache
    fs/jffs/inode-v23.c                jffs_fm
    fs/jffs2/super.c                   jffs2_i
    fs/jfs/super.c                     jfs_ip
    fs/minix/inode.c                   minix_inode_cache
    fs/ncpfs/inode.c                   ncp_inode_cache
    fs/nfs/direct.c                    nfs_direct_cache
    fs/nfs/inode.c                     nfs_inode_cache
    fs/ntfs/super.c                    ntfs_big_inode_cache_name
    fs/ntfs/super.c                    ntfs_inode_cache
    fs/ocfs2/dlm/dlmfs.c               dlmfs_inode_cache
    fs/ocfs2/super.c                   ocfs2_inode_cache
    fs/proc/inode.c                    proc_inode_cache
    fs/qnx4/inode.c                    qnx4_inode_cache
    fs/reiserfs/super.c                reiser_inode_cache
    fs/romfs/inode.c                   romfs_inode_cache
    fs/smbfs/inode.c                   smb_inode_cache
    fs/sysv/inode.c                    sysv_inode_cache
    fs/udf/super.c                     udf_inode_cache
    fs/ufs/super.c                     ufs_inode_cache
    net/socket.c                       sock_inode_cache
    net/sunrpc/rpc_pipe.c              rpc_inode_cache

The choice of which slab caches to so mark was quite simple.  I marked
those already marked SLAB_RECLAIM_ACCOUNT, except for fs/xfs, dentry_cache,
inode_cache, and buffer_head, which were marked in a previous patch.  Even
though SLAB_RECLAIM_ACCOUNT is for a different purpose, it marks the same
potentially large file system i/o related slab caches as we need for memory
spreading.

Given that the rule now becomes "wherever you would have used a
SLAB_RECLAIM_ACCOUNT slab cache flag before (usually the inode cache), use
the SLAB_MEM_SPREAD flag too", this should be easy enough to maintain.
Future file system writers will just copy one of the existing file system
slab cache setups and tend to get it right without thinking.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-24 07:33:23 -08:00
Adrian Bunk c98d8cfbc6 [PATCH] fs/coda/: proper prototypes
Introduce a file fs/coda/coda_int.h with proper prototypes for some code.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Jan Harkes <jaharkes@cs.cmu.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-24 07:33:21 -08:00
Adrian Bunk 2c2212901f [PATCH] fs/ext2/: proper ext2_get_parent() prototype
Add a proper prototype for ext2_get_parent().

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-24 07:33:21 -08:00
Adrian Bunk 29c6e48601 [PATCH] fs/9p/: possible cleanups
- mux.c: v9fs_poll_mux() was inline but not static resuling in needless
         object size bloat
- mux.c: remove all "inline"s: gcc should know best what to inline
- #if 0 the following unused global functions:
  - 9p.c: v9fs_v9fs_t_flush()
  - conv.c: v9fs_create_tauth()
  - mux.c: v9fs_mux_rpcnb()

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Eric Van Hensbergen <ericvh@ericvh.myip.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-24 07:33:21 -08:00
Tobias Klauser e8c96f8c29 [PATCH] fs: Use ARRAY_SIZE macro
Use ARRAY_SIZE macro instead of sizeof(x)/sizeof(x[0]) and remove a
duplicate of ARRAY_SIZE.  Some trailing whitespaces are also deleted.

Signed-off-by: Tobias Klauser <tklauser@nuerscht.ch>
Cc: David Howells <dhowells@redhat.com>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Cc: Chris Mason <mason@suse.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Nathan Scott <nathans@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-24 07:33:19 -08:00
Horst Hummel ef1597d527 [PATCH] s390: Remove old history/whitespave from partition code
Remove obsolete history and trailing whitespace.

Signed-off-by: Horst Hummel <horst.hummel@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-24 07:33:17 -08:00
Theodore Ts'o 9b04c997b1 [PATCH] vfs: MS_VERBOSE should be MS_SILENT
The meaning of MS_VERBOSE is backwards; if the bit is set, it really means,
"don't be verbose".  This is confusing and counter-intuitive.

In addition, there is also no way to set the MS_VERBOSE flag in the
mount(8) program in util-linux, but interesting, it does define options
which would do the right thing if MS_SILENT were defined, which
unfortunately we do not:

#ifdef MS_SILENT
  { "quiet",    0, 0, MS_SILENT    },   /* be quiet  */
  { "loud",     0, 1, MS_SILENT    },   /* print out messages. */
#endif

So the obvious fix is to deprecate the use of MS_VERBOSE and replace it
with MS_SILENT.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-24 07:33:15 -08:00
Trond Myklebust 1ebbe2b200 Merge branch 'linus' 2006-03-23 23:44:19 -05:00
Linus Torvalds a1a051b187 Merge git://git.kernel.org/pub/scm/linux/kernel/git/aia21/ntfs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/aia21/ntfs-2.6:
  NTFS: 2.1.27 - Various bug fixes and cleanups.
  NTFS: Semaphore to mutex conversion.
  NTFS: Handle the recently introduced -ENAMETOOLONG return value from
  NTFS: Add a missing call to flush_dcache_mft_record_page() in
  NTFS: Fix a bug in fs/ntfs/inode.c::ntfs_read_locked_index_inode() where we
  NTFS: Improve comments on file attribute flags in fs/ntfs/layout.h.
  NTFS: Limit name length in fs/ntfs/unistr.c::ntfs_nlstoucs() to maximum
  NTFS: Remove all the make_bad_inode() calls.  This should only be called
  NTFS: Add support for sparse files which have a compression unit of 0.
  NTFS: Fix comparison of $MFT and $MFTMirr to not bail out when there are
  NTFS: Use buffer_migrate_page() for the ->migratepage function of all ntfs
  NTFS: Fix a buggette in an "should be impossible" case handling where we
  NTFS: Fix an (innocent) off-by-one error in the runlist code.
  NTFS: Fix two compiler warnings on Alpha.  Thanks to Andrew Morton for
2006-03-23 16:26:56 -08:00
Linus Torvalds cec6062037 Merge branch 'blktrace' of git://brick.kernel.dk/data/git/linux-2.6-block
* 'blktrace' of git://brick.kernel.dk/data/git/linux-2.6-block:
  [PATCH] Block queue IO tracing support (blktrace) as of 2006-03-23
  [PATCH] relay: consolidate sendfile() and read() code
  [PATCH] relay: add sendfile() support
  [PATCH] relay: migrate from relayfs to a generic relay API
2006-03-23 16:24:24 -08:00
Linus Torvalds debf798b1e Merge git://oss.sgi.com:8090/oss/git/xfs-2.6
* git://oss.sgi.com:8090/oss/git/xfs-2.6: (71 commits)
  [XFS] Sync up one/two other minor changes missed in previous merges.
  [XFS] Reenable the noikeep (delete inode cluster space) option by default.
  [XFS] Check that a page has dirty buffers before finding it acceptable for
  [XFS] Fixup naming inconsistencies found by Pekka Enberg and one from Jan
  [XFS] Explain the race closed by the addition of vn_iowait() to the start
  [XFS] Fixing the error caused by the conflict between DIO Write's
  [XFS] Fixing KDB's xrwtrc command, also added the current process id into
  [XFS] Fix compiler warning from xfs_file_compat_invis_ioctl prototype. 
  [XFS] remove bogus INT_GET for u8 variables in xfs_dir_leaf.c 
  [XFS] endianess annotations for xfs_da_node_hdr_t 
  [XFS] endianess annotations for xfs_da_node_entry_t 
  [XFS] store xfs_attr_inactive_list_t in native endian 
  [XFS] store xfs_attr_sf_sort in native endian 
  [XFS] endianess annotations for xfs_attr_shortform_t 
  [XFS] endianess annotations for xfs_attr_leaf_name_remote_t 
  [XFS] endianess annotations for xfs_attr_leaf_name_local_t 
  [XFS] endianess annotations for xfs_attr_leaf_entry_t 
  [XFS] endianess annotations for xfs_attr_leaf_hdr_t 
  [XFS] remove bogus INT_GET on u8 variables in xfs_dir2_block.c 
  [XFS] endianess annotations for xfs_da_blkinfo_t 
  ...
2006-03-23 15:28:51 -08:00
Jens Axboe 2056a782f8 [PATCH] Block queue IO tracing support (blktrace) as of 2006-03-23
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-03-23 20:00:26 +01:00
Jens Axboe b86ff981a8 [PATCH] relay: migrate from relayfs to a generic relay API
Original patch from Paul Mundt, sysfs parts removed by me since they
were broken.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-03-23 19:56:55 +01:00
Anton Altaparmakov 92fe7b9ea8 Merge branch 'master' of /usr/src/ntfs-2.6/ 2006-03-23 17:06:08 +00:00
Anton Altaparmakov e750d1c7cc NTFS: 2.1.27 - Various bug fixes and cleanups.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2006-03-23 17:04:12 +00:00
Ingo Molnar 4e5e529ad6 NTFS: Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2006-03-23 16:57:48 +00:00
Anton Altaparmakov 834ba600ce NTFS: Handle the recently introduced -ENAMETOOLONG return value from
fs/ntfs/unistr.c::ntfs_nlstoucs() in fs/ntfs/namei.c::ntfs_lookup().

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2006-03-23 16:25:23 +00:00
Anton Altaparmakov 20fdcf1d54 NTFS: Add a missing call to flush_dcache_mft_record_page() in
fs/ntfs/inode.c::ntfs_write_inode().

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2006-03-23 16:21:02 +00:00
Anton Altaparmakov a778f21732 NTFS: Fix a bug in fs/ntfs/inode.c::ntfs_read_locked_index_inode() where we
forgot to update a temporary variable so loading index inodes which
      have an index allocation attribute failed.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2006-03-23 16:18:23 +00:00
Anton Altaparmakov 2c2c8c1c21 NTFS: Improve comments on file attribute flags in fs/ntfs/layout.h.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2006-03-23 16:09:40 +00:00
Anton Altaparmakov d4faf636d6 NTFS: Limit name length in fs/ntfs/unistr.c::ntfs_nlstoucs() to maximum
allowed by NTFS, i.e. 255 Unicode characters, not including the
      terminating NULL (which is not stored on disk).

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2006-03-23 16:05:11 +00:00
Anton Altaparmakov f95c4018fd NTFS: Remove all the make_bad_inode() calls. This should only be called
from read inode and new inode code paths.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2006-03-23 15:59:32 +00:00
Anton Altaparmakov a0646a1f04 NTFS: Add support for sparse files which have a compression unit of 0.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2006-03-23 15:53:03 +00:00
Benjamin LaHaise b0e6e96299 [PATCH] reduce size of bio mempools
The biovec default mempool limit of 256 entries results in over 3MB of RAM
being permanently pinned, even on systems with only 128MB of RAM.  Since
mempool tries to allocate from the system pool first, it makes sense to
reduce the size of the mempool fallbacks to a more reasonable limit of 1-5
entries -- enough for the system to be able to make progress even under
load.

Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Acked-by: Jens Axboe <axboe@suse.de>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:18 -08:00
Andrew Morton 394e3902c5 [PATCH] more for_each_cpu() conversions
When we stop allocating percpu memory for not-possible CPUs we must not touch
the percpu data for not-possible CPUs at all.  The correct way of doing this
is to test cpu_possible() or to use for_each_cpu().

This patch is a kernel-wide sweep of all instances of NR_CPUS.  I found very
few instances of this bug, if any.  But the patch converts lots of open-coded
test to use the preferred helper macros.

Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Acked-by: Kyle McMartin <kyle@parisc-linux.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Christian Zankel <chris@zankel.net>
Cc: Philippe Elie <phil.el@wanadoo.fr>
Cc: Nathan Scott <nathans@sgi.com>
Cc: Jens Axboe <axboe@suse.de>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:17 -08:00
Benjamin LaHaise 5a6b7951bf [PATCH] get_empty_filp tweaks, inline epoll_init_file()
Eliminate a handful of cache references by keeping current in a register
instead of reloading (helps x86) and avoiding the overhead of a function
call.  Inlining eventpoll_init_file() saves 24 bytes.  Also reorder file
initialization to make writes occur more sequentially.

Signed-off-by: Benjamin LaHaise <bcrl@linux.intel.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:17 -08:00
Alexey Dobriyan 713729e8b9 [PATCH] fs/*/file.c: drop insane header dependencies
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:17 -08:00
Domen Puncer 7a673c6b8f [PATCH] devpts: use lib/parser.c for parsing mount options
Item from "2.6 should fix" list.

Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:17 -08:00
Alexey Dobriyan 3257545e40 [PATCH] ufs: switch to inode_inc_count, inode_dec_count
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:16 -08:00
Alexey Dobriyan a513b035ea [PATCH] ext2: switch to inode_inc_count, inode_dec_count
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:16 -08:00
Alexey Dobriyan 4e907c3d45 [PATCH] sysv: switch to inode_inc_count, inode_dec_count
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:15 -08:00
Alexey Dobriyan 78ec7b6917 [PATCH] minix: switch to inode_inc_link_count, inode_dec_link_count
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:15 -08:00
Alexey Dobriyan a7ccf00718 [PATCH] fs/ufs/file.c: drop insane header dependencies
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:15 -08:00
Arjan van de Ven 6b9438e132 [PATCH] fat_lock is used as a mutex, convert it to using the new mutex primitive
The fat code uses the fat_lock always in a mutex way (taking and releasing
the lock in the same function), the patch below converts it into the new
mutex primitive.  Please consider this patch for the code.

Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:15 -08:00
Ingo Molnar 1e7933defd [PATCH] sem2mutex: UDF
Semaphore to mutex conversion.

The conversion was generated via scripts, and the result was validated
automatically via a script as well.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:14 -08:00
Ingo Molnar 8e3f90459b [PATCH] sem2mutex: NCPFS
Semaphore to mutex conversion.

The conversion was generated via scripts, and the result was validated
automatically via a script as well.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:14 -08:00
Arjan van de Ven 9746151861 [PATCH] convert ext3's truncate_sem to a mutex
ext3's truncate_sem is always released in the same function it's taken
and it otherwise is a mutex as well..

Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:14 -08:00
Ingo Molnar 7bf6d78dd9 [PATCH] sem2mutex: HPFS
Semaphore to mutex conversion.

The conversion was generated via scripts, and the result was validated
automatically via a script as well.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:13 -08:00
Ingo Molnar 1d5599e397 [PATCH] sem2mutex: autofs4 wq_sem
Semaphore to mutex conversion.

The conversion was generated via scripts, and the result was validated
automatically via a script as well.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:13 -08:00
Ingo Molnar 1eb0d67007 [PATCH] sem2mutex: JFFS
Semaphore to mutex conversion.

The conversion was generated via scripts, and the result was validated
automatically via a script as well.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:13 -08:00
Ingo Molnar 0ac1759abc [PATCH] sem2mutex: fs/seq_file.c
Semaphore to mutex conversion.

The conversion was generated via scripts, and the result was validated
automatically via a script as well.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:12 -08:00
Ingo Molnar 7cf34c761d [PATCH] sem2mutex: fs/libfs.c
Semaphore to mutex conversion.

The conversion was generated via scripts, and the result was validated
automatically via a script as well.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:12 -08:00
Arjan van de Ven 2c68ee754c [PATCH] sem2mutex: jbd, j_checkpoint_mutex
Semaphore to mutex conversion.

The conversion was generated via scripts, and the result was validated
automatically via a script as well.

Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:12 -08:00
Ingo Molnar f24075bd0c [PATCH] sem2mutex: iprune
Semaphore to mutex conversion.

The conversion was generated via scripts, and the result was validated
automatically via a script as well.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:12 -08:00
Arjan van de Ven a11f3a0574 [PATCH] sem2mutex: vfs_rename_mutex
Semaphore to mutex conversion.

The conversion was generated via scripts, and the result was validated
automatically via a script as well.

Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:12 -08:00
Arjan van de Ven 144efe3e3e [PATCH] sem2mutex: eventpoll
Semaphore to mutex conversion.

The conversion was generated via scripts, and the result was validated
automatically via a script as well.

Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:12 -08:00
Ingo Molnar d4f9af9dac [PATCH] sem2mutex: inotify
Semaphore to mutex conversion.

The conversion was generated via scripts, and the result was validated
automatically via a script as well.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Acked-by: Robert Love <rml@novell.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:11 -08:00
Ingo Molnar d3be915fc5 [PATCH] sem2mutex: quota
Semaphore to mutex conversion.

The conversion was generated via scripts, and the result was validated
automatically via a script as well.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Jan Kara <jack@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:11 -08:00
Arjan van de Ven c039e3134a [PATCH] sem2mutex: blockdev #2
Semaphore to mutex conversion.

The conversion was generated via scripts, and the result was validated
automatically via a script as well.

Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:11 -08:00
Ingo Molnar 4f7a07b887 [PATCH] convert fs/9p/ to mutexes, fix locking bugs
Convert fs/9p/mux.c from semaphore to mutex.

NOTE: fixed locking bugs in the process - the code was using semaphores
the other way around.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Eric Van Hensbergen <ericvh@ericvh.myip.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:10 -08:00
Jan Kara 6362e4d4ed [PATCH] Fix oops in invalidate_dquots()
When quota is being turned off we assumed that all the references to dquots
were already dropped.  That need not be true as inodes being deleted are
not on superblock's inodes list and hence we need not reach it when
removing quota references from inodes.  So invalidate_dquots() has to wait
for all the users of dquots (as quota is already marked as turned off, no
new references can be acquired and so this is bound to happen rather
early).  When we do this, we can also remove the iprune_sem locking as it
was protecting us against exactly the same problem when freeing inodes
icache memory.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:10 -08:00
Eric Dumazet 0c9e63fd38 [PATCH] Shrinks sizeof(files_struct) and better layout
1) Reduce the size of (struct fdtable) to exactly 64 bytes on 32bits
   platforms, lowering kmalloc() allocated space by 50%.

2) Reduce the size of (files_struct), using a special 32 bits (or
   64bits) embedded_fd_set, instead of a 1024 bits fd_set for the
   close_on_exec_init and open_fds_init fields.  This save some ram (248
   bytes per task) as most tasks dont open more than 32 files.  D-Cache
   footprint for such tasks is also reduced to the minimum.

3) Reduce size of allocated fdset.  Currently two full pages are
   allocated, that is 32768 bits on x86 for example, and way too much.  The
   minimum is now L1_CACHE_BYTES.

UP and SMP should benefit from this patch, because most tasks will touch
only one cache line when open()/close() stdin/stdout/stderr (0/1/2),
(next_fd, close_on_exec_init, open_fds_init, fd_array[0 ..  2] being in the
same cache line)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:09 -08:00
Andrew Morton d8733c2956 [PATCH] ext3_readdir: use generic readahead
Linus points out that ext3_readdir's readahead only cuts in when
ext3_readdir() is operating at the very start of the directory.  So for large
directories we end up performing no readahead at all and we suck.

So take it all out and use the core VM's page_cache_readahead().  This means
that ext3 directory reads will use all of readahead's dynamic sizing goop.

Note that we're using the directory's filp->f_ra to hold the readahead state,
but readahead is actually being performed against the underlying blockdev's
address_space.  Fortunately the readahead code is all set up to handle this.

Tested with printk.  It works.  I was struggling to find a real workload which
actually cared.

(The patch also exports page_cache_readahead() to GPL modules)

Cc: "Stephen C. Tweedie" <sct@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:09 -08:00
Neil Horman 5be0e95119 [PATCH] proc: fix duplicate line in /proc/devices
Fix a duplicate block device line printed after the "Block device" header
in /proc/devices.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:02 -08:00
Anton Altaparmakov 949763b2b8 NTFS: Fix comparison of $MFT and $MFTMirr to not bail out when there are
unused, invalid mft records which are the same in both $MFT and
      $MFTMirr.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2006-03-23 15:34:13 +00:00
Anton Altaparmakov 78264bd9c2 NTFS: Use buffer_migrate_page() for the ->migratepage function of all ntfs
address space operations.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2006-03-23 15:06:18 +00:00
Anton Altaparmakov 3ccc7384db NTFS: Fix a buggette in an "should be impossible" case handling where we
continued the attribute lookup loop instead of aborting it.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2006-03-23 15:03:11 +00:00
Anton Altaparmakov 67b1dfe77a NTFS: Fix an (innocent) off-by-one error in the runlist code.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2006-03-23 14:57:43 +00:00
Anton Altaparmakov b4d8d1a93c Merge branch 'master' of /usr/src/ntfs-2.6/ 2006-03-23 14:50:51 +00:00
Linus Torvalds 8b4b6707ee Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial:
  fixed path to moved file in include/linux/device.h
  Fix spelling in E1000_DISABLE_PACKET_SPLIT Kconfig description
  Documentation/dvb/get_dvb_firmware: fix firmware URL
  Documentation: Update to BUG-HUNTING
  Remove superfluous NOTIFY_COOKIE_LEN define
  add "tags" to .gitignore
  Fix "frist", "fisrt", typos
  fix rwlock usage example
  It's UTF-8
2006-03-22 10:58:05 -08:00
Christoph Lameter b20a35035f [PATCH] page migration reorg
Centralize the page migration functions in anticipation of additional
tinkering.  Creates a new file mm/migrate.c

1. Extract buffer_migrate_page() from fs/buffer.c

2. Extract central migration code from vmscan.c

3. Extract some components from mempolicy.c

4. Export pageout() and remove_from_swap() from vmscan.c

5. Make it possible to configure NUMA systems without page migration
   and non-NUMA systems with page migration.

I had to so some #ifdeffing in mempolicy.c that may need a cleanup.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:54:06 -08:00
Chen, Kenneth W bba1e9b211 [PATCH] convert hugetlbfs_counter to atomic
Implementation of hugetlbfs_counter() is functionally equivalent to
atomic_inc_return().  Use the simpler atomic form.

Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:54:04 -08:00
David Gibson b45b5bd65f [PATCH] hugepage: Strict page reservation for hugepage inodes
These days, hugepages are demand-allocated at first fault time.  There's a
somewhat dubious (and racy) heuristic when making a new mmap() to check if
there are enough available hugepages to fully satisfy that mapping.

A particularly obvious case where the heuristic breaks down is where a
process maps its hugepages not as a single chunk, but as a bunch of
individually mmap()ed (or shmat()ed) blocks without touching and
instantiating the pages in between allocations.  In this case the size of
each block is compared against the total number of available hugepages.
It's thus easy for the process to become overcommitted, because each block
mapping will succeed, although the total number of hugepages required by
all blocks exceeds the number available.  In particular, this defeats such
a program which will detect a mapping failure and adjust its hugepage usage
downward accordingly.

The patch below addresses this problem, by strictly reserving a number of
physical hugepages for hugepage inodes which have been mapped, but not
instatiated.  MAP_SHARED mappings are thus "safe" - they will fail on
mmap(), not later with an OOM SIGKILL.  MAP_PRIVATE mappings can still
trigger an OOM.  (Actually SHARED mappings can technically still OOM, but
only if the sysadmin explicitly reduces the hugepage pool between mapping
and instantiation)

This patch appears to address the problem at hand - it allows DB2 to start
correctly, for instance, which previously suffered the failure described
above.

This patch causes no regressions on the libhugetblfs testsuite, and makes a
test (designed to catch this problem) pass which previously failed (ppc64,
POWER5).

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:54:03 -08:00
Nick Piggin 84097518d1 [PATCH] mm: nommu use compound pages
Now that compound page handling is properly fixed in the VM, move nommu
over to using compound pages rather than rolling their own refcounting.

nommu vm page refcounting is broken anyway, but there is no need to have
divergent code in the core VM now, nor when it gets fixed.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: David Howells <dhowells@redhat.com>

(Needs testing, please).
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:54:01 -08:00
Christoph Lameter ac2b898ca6 [PATCH] slab: Remove SLAB_NO_REAP option
SLAB_NO_REAP is documented as an option that will cause this slab not to be
reaped under memory pressure.  However, that is not what happens.  The only
thing that SLAB_NO_REAP controls at the moment is the reclaim of the unused
slab elements that were allocated in batch in cache_reap().  Cache_reap()
is run every few seconds independently of memory pressure.

Could we remove the whole thing?  Its only used by three slabs anyways and
I cannot find a reason for having this option.

There is an additional problem with SLAB_NO_REAP.  If set then the recovery
of objects from alien caches is switched off.  Objects not freed on the
same node where they were initially allocated will only be reused if a
certain amount of objects accumulates from one alien node (not very likely)
or if the cache is explicitly shrunk.  (Strangely __cache_shrink does not
check for SLAB_NO_REAP)

Getting rid of SLAB_NO_REAP fixes the problems with alien cache freeing.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:53:59 -08:00
Latchesar Ionkov 5e7a99ac45 [PATCH] v9fs: assign dentry ops to negative dentries
If a file is not found in v9fs_vfs_lookup, the function creates negative
dentry, but doesn't assign any dentry ops.  This leaves the negative entry
in the cache (there is no d_delete to mark it for removal).  If the file is
created outside of the mounted v9fs filesystem, the file shows up in the
directory with weird permissions.

This patch assigns the default v9fs dentry ops to the negative dentry.

Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:53:55 -08:00
Nathan Scott 4d74f423c7 Merge HEAD from ../linux-2.6 2006-03-22 15:31:14 +11:00
Nathan Scott bb19fba193 [XFS] Sync up one/two other minor changes missed in previous merges.
Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-03-22 14:12:12 +11:00
Nathan Scott e15f195cfb [XFS] Reenable the noikeep (delete inode cluster space) option by default.
SGI-PV: 951200
SGI-Modid: xfs-linux-melb:xfs-kern:25535a

Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-03-22 12:47:52 +11:00
David Chinner 2ddee844ee [XFS] Check that a page has dirty buffers before finding it acceptable for
rewrite clustering. This prevents writing excessive amounts of clean data
when doing random rewrites of a cached file.

SGI-PV: 951193
SGI-Modid: xfs-linux-melb:xfs-kern:25531a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-03-22 12:47:40 +11:00
Nathan Scott 3758dee9f6 [XFS] Fixup naming inconsistencies found by Pekka Enberg and one from Jan
Engelhardt.

SGI-PV: 947038
SGI-Modid: xfs-linux-melb:xfs-kern:25529a

Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-03-22 12:47:28 +11:00
David Chinner 38e2299a64 [XFS] Explain the race closed by the addition of vn_iowait() to the start
of xfs_itruncate_start().

SGI-PV: 947420
SGI-Modid: xfs-linux-melb:xfs-kern:25527a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-03-22 12:47:15 +11:00
Yingping Lu 9fa8046f50 [XFS] Fixing the error caused by the conflict between DIO Write's
conversion and concurrent truncate operations. Use vn_iowait to wait for
the completion of any pending DIOs. Since the truncate requires exclusive
IOLOCK, so this blocks any further DIO operations since DIO write also
needs exclusive IOBLOCK. This serves as a barrier and prevent any
potential starvation.

SGI-PV: 947420
SGI-Modid: xfs-linux-melb:xfs-kern:208088a

Signed-off-by: Yingping Lu <yingping@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-03-22 12:44:35 +11:00
Yingping Lu f1fdc848aa [XFS] Fixing KDB's xrwtrc command, also added the current process id into
the trace.

SGI-PV: 948300
SGI-Modid: xfs-linux-melb:xfs-kern:208069a

Signed-off-by: Yingping Lu <yingping@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-03-22 12:44:15 +11:00
Alexey Dobriyan 4de151d8cd It's UTF-8
Fix some comments to "UTF-8".

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-03-22 00:13:35 +01:00
Trond Myklebust ac58c9059d Merge branch 'linus' 2006-03-21 12:08:21 -05:00
J. Bruce Fields df6db302cb SUNRPC,RPCSEC_GSS: spkm3--fix config dependencies
Add default selection of CRYPTO_CAST5 when selecting RPCSEC_GSS_SPKM3.

Signed-off-by: Kevin Coffman <kwc@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 23:25:10 -05:00
J. Bruce Fields 5f12191bc0 LOCKD: Make nlmsvc_traverse_shares return void
The nlmsvc_traverse_shares return value is always zero, hence useless.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 23:24:25 -05:00
J. Bruce Fields f3ee439f43 LOCKD: nlmsvc_traverse_blocks return is unused
Note that we never return non-zero.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 23:24:13 -05:00
J. Bruce Fields 096455a22a NFSv4: Dont list system.nfs4_acl for filesystems that don't support it.
Thanks to Frank Filz for pointing out that we list system.nfs4_acl extended
attribute even on filesystems where we don't actually support nfs4_acl.
This is inconsistent with the e.g. ext3 POSIX ACL behaviour, and seems to
annoy cp.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 23:23:42 -05:00
Trond Myklebust 7a1218a277 SUNRPC: Ensure rpc_call_async() always calls tk_ops->rpc_release()
Currently this will not happen if we exit before rpc_new_task() was called.
Also fix up rpc_run_task() to do the same (for consistency).

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 18:11:10 -05:00
Greg Kroah-Hartman b3229087c5 [PATCH] sysfs: fix a kobject leak in sysfs_add_link on the error path
As pointed out by Oliver Neukum.

Cc: Maneesh Soni <maneesh@in.ibm.com>
Cc: Oliver Neukum <oliver@neukum.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-03-20 13:42:59 -08:00
Greg Kroah-Hartman 832c57e9af [PATCH] sysfs: don't export dir symbols
These functions should only be used by the kobject core, and if any
driver tries to use them, bad things happen.  Unexport them to try to
prevent this from happening.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-03-20 13:42:59 -08:00
Michael Ellerman dd308bc355 [PATCH] debugfs: Add debugfs_create_blob() helper for exporting binary data
I wanted to export a binary blob via debugfs, and although it was pretty easy
it seems like it'd be easier if there was a helper for it. It's a pity we need
the wrapper struct but I can't see a cleaner way to do it.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-03-20 13:42:59 -08:00
Maneesh Soni c516865cfb [PATCH] sysfs: fix problem with duplicate sysfs directories and files
The following patch checks for existing sysfs_dirent before
preparing new one while creating sysfs directories and files.

Signed-off-by: Maneesh Soni <maneesh@in.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-03-20 13:42:59 -08:00
Eric Sesterhenn 58d49283b8 [PATCH] sysfs: kzalloc conversion
this converts fs/sysfs to kzalloc() usage.
compile tested with make allyesconfig

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-03-20 13:42:58 -08:00
Jes Sorensen 58383af629 [PATCH] kobj_map semaphore to mutex conversion
Convert the kobj_map code to use a mutex instead of a semaphore.  It
converts the single two users as well, genhd.c and char_dev.c.

Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-03-20 13:42:58 -08:00
Greg Kroah-Hartman 641e6f30a0 [PATCH] sysfs: sysfs_remove_dir() needs to invalidate the dentry
When calling sysfs_remove_dir() don't allow any further sysfs functions
to work for this kobject anymore.  This fixes a nasty USB cdc-acm oops
on disconnect.

Many thanks to Bob Copeland and Paul Fulghum for taking the time to
track this down.

Cc: Bob Copeland <email@bobcopeland.com>
Cc: Paul Fulghum <paulkf@microgate.com>
Cc: Maneesh Soni <maneesh@in.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-03-20 13:42:57 -08:00
Amy Griffis 73241ccca0 [PATCH] Collect more inode information during syscall processing.
This patch augments the collection of inode info during syscall
processing. It represents part of the functionality that was provided
by the auditfs patch included in RHEL4.

Specifically, it:

- Collects information for target inodes created or removed during
  syscalls.  Previous code only collects information for the target
  inode's parent.

- Adds the audit_inode() hook to syscalls that operate on a file
  descriptor (e.g. fchown), enabling audit to do inode filtering for
  these calls.

- Modifies filtering code to check audit context for either an inode #
  or a parent inode # matching a given rule.

- Modifies logging to provide inode # for both parent and child.

- Protect debug info from NULL audit_names.name.

[AV: folded a later typo fix from the same author]

Signed-off-by: Amy Griffis <amy.griffis@hp.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-03-20 14:08:53 -05:00
Amy Griffis f38aa94224 [PATCH] Pass dentry, not just name, in fsnotify creation hooks.
The audit hooks (to be added shortly) will want to see dentry->d_inode
too, not just the name.

Signed-off-by: Amy Griffis <amy.griffis@hp.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-03-20 14:08:53 -05:00
Trond Myklebust c42de9dd67 NFS: Fix a race in nfs_sync_inode()
Kudos to Neil Brown for spotting the problem:

"in nfs_sync_inode, there is effectively the sequence:

   nfs_wait_on_requests
   nfs_flush_inode
   nfs_commit_inode

 This seems a bit racy to me as if the only requests are on the
 ->commit list, and nfs_commit_inode is called separately after
 nfs_wait_on_requests completes, and before nfs_commit_inode start
 (say: by nfs_write_inode) then none of these function will return
 >0, yet there will be some pending request that aren't waited for."

The solution is to search for requests to wait upon, search for dirty
requests, and search for uncommitted requests while holding the
nfsi->req_lock

The patch also cleans up nfs_sync_inode(), getting rid of the redundant
FLUSH_WAIT flag. It turns out that we were always setting it.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:51 -05:00
Trond Myklebust 7d46a49f51 NFS: Clean up nfs_flush_list()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:50 -05:00
Trond Myklebust deb7d63826 NFS: Fix a race with PG_private and nfs_release_page()
We don't need to set PG_private for readahead pages, since they never get
unlocked while I/O is in progress. However there is a small race in
nfs_readpage_release() whereby the page may be unlocked, and have
PG_private set.

Fix is to have PG_private set only for the case of writes...

Also fix a bug in nfs_clear_page_writeback(): Don't attempt to clear the
radix_tree tag if we've already deleted the radix tree entry.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:50 -05:00
Trond Myklebust 1dd761e907 NFSv4: Ensure the callback daemon flushes signals
If the callback daemon is signalled, but is unable to exit because it still
has users, then we need to flush signals. If not, then svc_recv() can
never sleep, and so we hang.
If we flush signals, then we also have to be prepared to resend them when
we want the thread to exit.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:49 -05:00
Trond Myklebust a9a801787a NFS, NLM: Allow blocking locks to respect signals
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:48 -05:00
Trond Myklebust 03f28e3a20 NFS: Make nfs_fhget() return appropriate error values
Currently it returns NULL, which usually gets interpreted as ENOMEM. In
fact it can mean a host of issues.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:48 -05:00
Trond Myklebust 01d0ae8bea NFSv4: Fix an oops in nfs4_fill_super
The mount statistics patches introduced a call to nfs_free_iostats that is
not only redundant, but actually causes an oops.

Also fix a memory leak due to the lack of a call to nfs_free_iostats on
unmount.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:48 -05:00
Trond Myklebust d9f6eb75d4 lockd: blocks should hold a reference to the nlm_file
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:47 -05:00
Trond Myklebust 51581f3bf9 NFSv4: SETCLIENTID_CONFIRM should handle NFS4ERR_DELAY/NFS4ERR_RESOURCE
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:47 -05:00
Trond Myklebust 3e4f6290ca NFSv4: Send the delegation stateid for SETATTR calls
In the case where we hold a delegation stateid, use that in for inside
SETATTR calls.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:46 -05:00
Trond Myklebust f25bc34967 NFSv4: Ensure nfs_callback_down() calls svc_destroy()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:46 -05:00
Trond Myklebust 6041b79192 lockd: Fix a typo in nlmsvc_grant_release()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:45 -05:00
Trond Myklebust d471662448 lockd: Add helper for *_RES callbacks
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:45 -05:00
Trond Myklebust 92737230dd NLM: Add nlmclnt_release_call
Add a helper function to simplify the freeing of NLM client requests.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:45 -05:00
Trond Myklebust e4cd038a45 NLM: Fix nlmclnt_test to not copy private part of locks
The struct file_lock does not carry a properly initialised lock,
so don't copy it as if it were.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:44 -05:00
Trond Myklebust 3a649b8846 NLM: Simplify client locks
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:44 -05:00
Trond Myklebust d72b7a6b26 NFS: O_DIRECT needs to use a completion
Now that we have aio writes, it is possible for dreq->outstanding to be
zero, but for the I/O not to have completed. Convert struct nfs_direct_req
to use a completion to signal when the I/O is done.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:43 -05:00
Trond Myklebust 6b45d858ed NFS: Clean up nfs_get_user_pages
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:43 -05:00
Chuck Lever 606bbba06b NFS: fix compiler warnings on 64-bit platforms
Introduced by NFS aio+dio patches.

Test plan:
Compile kernel with CONFIG_NFS enabled on 64-bit hardware.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:42 -05:00
Trond Myklebust 35576cba57 NLM: nlmclnt_cancel_callback should accept NLM_LCK_DENIED errors
NLM_LCK_DENIED is a valid error return for an NLM_CANCEL call by the
client.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:41 -05:00
Trond Myklebust 4c060b5310 lockd: Fix Oopses due to list manipulation errors.
The patch "stop abusing file_lock_list introduces a couple of bugs since
the locks may be copied and need to be removed from the lists when they are
destroyed.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:41 -05:00
Christoph Hellwig 26bcbf965f lockd: stop abusing file_lock_list
Currently lockd directly access the file_lock_list from fs/locks.c.
It does so to mark locks granted or reclaimable.  This is very
suboptimal, because a) lockd needs to poke into locks.c internals, and
b) it needs to iterate over all locks in the system for marking locks
granted or reclaimable.

This patch adds lists for granted and reclaimable locks to the nlm_host
structure instead, and adds locks to those.

nlmclnt_lock:
	now adds the lock to h_granted instead of setting the
	NFS_LCK_GRANTED, still O(1)

nlmclnt_mark_reclaim:
	goes away completely, replaced by a list_splice_init.
	Complexity reduced from O(locks in the system) to O(1)

reclaimer:
	iterates over h_reclaim now, complexity reduced from
	O(locks in the system) to O(locks per nlm_host)

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:40 -05:00
Trond Myklebust 04266473ec lockd: Make lockd use rpc_new_client() instead of rpc_create_client
When doing NLM_GRANTED requests, lockd may end up blocking if we use
rpc_create_client() due to the synchronous call to rpc_ping(). Instead, use
rpc_new_client().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:40 -05:00
Trond Myklebust 686517f1ad lockd: Make nlmsvc_create_block() use nlmsvc_lookup_host()
Currently it uses nlmclnt_lookup_host(), which puts the resulting host
structure on a different list.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:39 -05:00
Trond Myklebust 5e1abf8cb7 lockd: Clean up of the server-side GRANTED code
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:39 -05:00
Trond Myklebust 6849c0cab6 lockd: Add refcounting to struct nlm_block
Otherwise, the block may disappear from underneath us when in
nlmsvc_retry_blocked.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:39 -05:00
Trond Myklebust 09c7938c56 lockd: Fix server-side lock blocking code
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:38 -05:00
Trond Myklebust 0996905f93 lockd: posix_test_lock() should not call locks_copy_lock()
The caller of posix_test_lock() should never need to look at the lock
private data, so do not copy that information. This also means that there
is no need to call the fl_release_private methods.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:38 -05:00
Trond Myklebust 3feb2d4939 NFS: Uninline nfs_writedata_(alloc|free) and nfs_readdata_(alloc|free)
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:37 -05:00
Trond Myklebust 5db3a7b2ca NFS: Debugging code for nfs_direct_(read|write)_schedule()
Make sure that we're doing our list accounting correctly.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:37 -05:00
Trond Myklebust a8881f5a5c NFS: O_DIRECT async IO may lose context
The struct nfs_direct_req currently keeps a pointer to the file descriptor
without referencing it. This may cause problems if the parent process is
killed.

The nfs_open_context should normally have all the information that we're
currently using the filp for, and unlike fput(), is safe to release from
an rpciod process context.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:36 -05:00
Trond Myklebust fad6149041 nfs: Use UNSTABLE + COMMIT for NFS O_DIRECT writes
Currently NFS O_DIRECT writes use FILE_SYNC so that a COMMIT is not
necessary.  This simplifies the internal logic, but this could be a
difficult workload for some servers.

Instead, let's send UNSTABLE writes, and after they all complete, send a
COMMIT for the dirty range.  After the COMMIT returns successfully, then do
the wake_up or fire off aio_complete().

Test plan:
Async direct I/O tests against Solaris (or any server that requires
committed unstable writes).  Reboot server during test.

Based on an earlier patch by Chuck Lever <cel@netapp.com>

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:36 -05:00
Trond Myklebust e17b1fc4b3 NFS: Make nfs_commit_alloc() extern
We need to use nfs_commit_alloc() in fs/nfs/direct.c.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:35 -05:00
Chuck Lever a37ec012d7 NFS: fix data_update accounting in NFS direct I/O path
^C against "iozone -I" is hitting the assertion in nfs_clear_inode().

Test plan:
"iozone -i0 -I -a -c" against a slow server, then control C.  This should
not cause an oops.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:35 -05:00
Chuck Lever 15ce4a0c1c NFS: Replace atomic_t variables in nfs_direct_req with a single spin lock
Three atomic_t variables cause a lot of bus locking.  Because they are all
used in the same places in the code, just use a single spin lock.

Now that the atomic_t variables are gone, we can remove the request size
limitation since the code no longer depends on the limited width of atomic_t
on some platforms.

Test plan:
Compile with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled.  Millions of fsx
operations, iozone, OraSim.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:34 -05:00
Chuck Lever 88467055f7 NFS: clean up comments and tab damage in direct.c
Clean up tab damage and comments.  Replace "file_offset" with more commonly
used "pos".

Test plan:
Compile with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:34 -05:00
Chuck Lever 9eafa8cc52 NFS: support EIOCBQUEUED return in direct write path
For async iocb's, the NFS direct write path now returns EIOCBQUEUED,
and calls aio_complete when all the requested writes are finished.  The
synchronous part of the NFS direct write path behaves exactly as it
was before.

Shared mapped NFS files will have some coherency difficulties when
accessed concurrently with aio+dio.  Will need to explore how this
is handled in the local file system case.

Test plan:
aio-stress with "-O". OraSim.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:33 -05:00
Chuck Lever c89f2ee5f9 NFS: make iocb available everywhere in direct write path
Pass the iocb argument all the way down to the direct write request
scheduler, and make it available in nfs_direct_write_result.

Test plan:
Compile the kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled.
Millions of fsx-odirect ops.  OraSim.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:33 -05:00
Chuck Lever 47989d7454 NFS: remove support for multi-segment iovs in the direct write path
Eliminate the persistent use of automatic storage in all parts of the
NFS client's direct write path to pave the way for introducing support
for aio against files opened with the O_DIRECT flag.

Test plan:
Compile the kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled.
Millions of fsx-odirect ops.  OraSim.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:32 -05:00
Chuck Lever 462d5b3296 NFS: make direct write path generate write requests concurrently
Duplicate infrastructure from direct read path that will allow write
path to generate multiple write requests concurrently.  This will
enable us to add support for aio in this path.

Temporarily we will lose the ability to do UNSTABLE writes followed by
a COMMIT in the direct write path.  However, all applications I am
aware of that use NFS O_DIRECT currently write in relatively small
chunks, so this should not be inconvenient in any way.

Test plan:
Millions of fsx-odirect ops. OraSim.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:32 -05:00
Chuck Lever 63ab46abc7 NFS: create common routine for handling direct I/O completion
Factor out the common piece of completing an NFS direct I/O request.

Test plan:
Compile kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:31 -05:00
Chuck Lever 93619e5989 NFS: create common routine for allocating nfs_direct_req
Factor out a small common piece of the path that allocate nfs_direct_req
structures.

Test plan:
Compile kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:31 -05:00
Chuck Lever bc0fb201b3 NFS: create common routine for waiting for direct I/O to complete
We're about to add asynchrony to the NFS direct write path.  Begin by
abstracting out the common pieces in the read path.

The first piece is nfs_direct_read_wait, which works the same whether the
process is waiting for a read or a write.

Test plan:
Compile kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:31 -05:00
Chuck Lever 487b83723e NFS: support EIOCBQUEUED return in direct read path
For async iocb's, the NFS direct read path should return EIOCBQUEUED and
call aio_complete when all the requested reads are finished.  The
synchronous part of the NFS direct read path behaves exactly as it was
before.

Test plan:
aio-stress with "-O".  OraSim.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:30 -05:00
Chuck Lever 99514f8fdd NFS: make iocb available everywhere in direct read path
Pass the iocb argument all the way down to the direct read request
scheduler, and make it available in nfs_direct_read_result.

Test plan:
Compile the kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled.
Millions of fsx-odirect ops.  OraSim.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:30 -05:00
Chuck Lever 0cdd80d07f NFS: remove support for multi-segment iovs in the direct read path
Eliminate the persistent use of automatic storage in all parts of the NFS
client's direct read path to pave the way for introducing support for aio
against files opened with the O_DIRECT flag.

Test plan:
Compile the kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled.
Millions of fsx-odirect ops.  OraSim.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:29 -05:00
Chuck Lever 5dd602f206 NFS: use size_t type for holding rsize bytes in NFS O_DIRECT read path
size_t is used for holding byte counts, so use it for variables storing rsize.
Note that the write path will be updated as we add support for async
O_DIRECT writes.

Test plan:
Need to verify that existing comparisons against new size_t variables behave
correctly.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:29 -05:00
Chuck Lever d4cc948ba9 NFS: update comments and function definitions in fs/nfs/direct.c
Update to latest coding style standards.  Remove block comments on
statically defined functions, and place function definitions all on
one line.

Test plan:
Compile kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:28 -05:00
Chuck Lever b8a32e2b8b NFS: clean up NFS client's a_ops->direct_IO method
The NFS client's a_ops->direct_IO method, nfs_direct_IO, is required to
be present to allow NFS files to be opened with O_DIRECT, but is never
called because the NFS client shunts reads and writes to files opened
with O_DIRECT directly to its own routines.

Gut the nfs_direct_IO function.  This eliminates the only part of the
NFS client's direct I/O path that requires support for multi-segment
iovs, allowing further simplification in subsequent patches.

Test plan:
Compile the kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled.  Millions
of fsx-odirect ops.  OraSim.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:28 -05:00
Trond Myklebust ec06c096ed NFS: Cleanup of NFS read code
Same callback hierarchy inversion as for the NFS write calls. This patch is
not strictly speaking needed by the O_DIRECT code, but avoids confusing
differences between the asynchronous read and write code.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:27 -05:00
Trond Myklebust 788e7a89a0 NFS: Cleanup of NFS write code in preparation for asynchronous o_direct
This patch inverts the callback hierarchy for NFS write calls.

Instead of having the NFSv2/v3/v4-specific code set up the RPC callback
ops, we allow the original caller to do so. This allows for more
flexibility w.r.t. how to set up and tear down the nfs_write_data
structure while still allowing the NFSv3/v4 code to perform error
handling.

The greater flexibility is needed by the asynchronous O_DIRECT code, which
wants to be able to hold on to the original nfs_write_data structures after
the WRITE RPC call has completed in order to be able to replay them if the
COMMIT call determines that the server has rebooted.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:27 -05:00
J. Bruce Fields 7117bf3dfb lockd: Remove FL_LOCKD flag
Currently lockd identifies its own locks using the FL_LOCKD flag.  This
doesn't scale well to multiple lock managers--if we did this in nfsv4 too,
for example, we'd be left with only one free flag bit.

Instead, we just check whether the file manager ops (fl_lmops) set on this
lock are our own.

The only use for this is in nlm_traverse_locks, which uses it to find locks
that need cleaning up when freeing a host or a file.

In the long run it might be nice to do reference counting instead of
traversing all the locks like this....

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:26 -05:00
Andy Adamson 8dc7c3115b locks,lockd: fix race in nlmsvc_testlock
posix_test_lock() returns a pointer to a struct file_lock which is unprotected
and can be removed while in use by the caller.  Move the conflicting lock from
the return to a parameter, and copy the conflicting lock.

In most cases the caller ends up putting the copy of the conflicting lock on
the stack.  On i386, sizeof(struct file_lock) appears to be about 100 bytes.
We're assuming that's reasonable.

Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:26 -05:00
Andy Adamson 2e0af86f61 locks: remove unused posix_block_lock
posix_lock_file() is used to add a blocked lock to Lockd's block, so
posix_block_lock() is no longer needed.

Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:26 -05:00
Andy Adamson a85f193e2f lockd: make nlmsvc_lock use only posix_lock_file
Reorganize nlmsvc_lock() to make full use of posix_lock_file(), which does
eveything nlmsvc_lock() needs - no need to call posix_test_lock(),
posix_locks_deadlock(), or posix_block_lock() separately.

Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:25 -05:00
Andy Adamson 5de0e5024a lockd: simplify nlmsvc_grant_blocked
Reorganize nlmsvc_grant_blocked() to make full use of posix_lock_file().  Note
that there's no need for separate calls to posix_test_lock(),
posix_locks_deadlock(), or posix_block_lock().

Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:25 -05:00
Andy Adamson 15dadef946 lockd: clean up nlmsvc_lock
Slightly more consistent dprintk error reporting, consolidate some up()'s.

Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:24 -05:00
Chuck Lever 1e7cb3dc12 NFS: directory trace messages
Reuse NFSDBG_DIRCACHE and NFSDBG_LOOKUPCACHE to provide additional
diagnostic messages that trace the operation of the NFS client's
directory behavior.  A few new messages are now generated when NFSDBG_VFS
is active, as well, to trace normal VFS activity.  This compromise
provides better trace debugging for those who use pre-built kernels,
without adding a lot of extra noise to the standard debug settings.

Test-plan:
Enable NFS trace debugging with flags 1, 2, or 4.  You should be able to
see different types of trace messages with each flag setting.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:24 -05:00
Chuck Lever dead28da8e SUNRPC: eliminate rpc_call()
Clean-up: replace rpc_call() helper with direct call to rpc_call_sync.

This makes NFSv2 and NFSv3 synchronous calls more computationally
efficient, and reduces stack consumption in functions that used to
invoke rpc_call more than once.

Test plan:
Compile kernel with CONFIG_NFS enabled.  Connectathon on NFS version 2,
version 3, and version 4 mount points.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:23 -05:00
Chuck Lever cc0175c1dc SUNRPC: display human-readable procedure name in rpc_iostats output
Add fields to the rpc_procinfo struct that allow the display of a
human-readable name for each procedure in the rpc_iostats output.

Also fix it so that the NFSv4 stats are broken up correctly by
sub-procedure number.  NFSv4 uses only two real RPC procedures:
NULL, and COMPOUND.

Test plan:
Mount with NFSv2, NFSv3, and NFSv4, and do "cat /proc/self/mountstats".

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:22 -05:00
Chuck Lever 4ece3a2d18 NFS: add RPC I/O statistics to /proc/self/mountstats
NFS client now shows various RPC I/O metrics in /proc/self/mountstats.

Test plan:
Mount/umount while doing "cat /proc/self/mountstats", multiple iterations
of connectathon locking suite.  Test with NFS version 2, 3, and 4.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:22 -05:00
Chuck Lever 67ec9f46b8 NFS: report how long an NFS file system has been mounted
Add a field in nfs_server to record a timestamp when a mount succeeds.
Report the number of seconds the file system has been mounted via
nfs_show_stats().

Test plan:
Mount an NFS file system, watch the mountstats reports and compare with
clock time.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:15 -05:00
Chuck Lever 006ea73e5f NFS: add hooks to account for NFSERR_JUKEBOX errors
Make an inode or an nfs_server struct available in the logic that handles
JUKEBOX/DELAY type errors so the NFS client can account for them.

This patch is split out from the main nfs iostat patch to highlight minor
architectural changes required to support this statistic.

Test plan:
None.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:14 -05:00
Chuck Lever 91d5b47023 NFS: add I/O performance counters
Invoke the byte and event counter macros where we want to count bytes and
events.

Clean-up: fix a possible NULL dereference in nfs_lock, and simplify
nfs_file_open.

Test-plan:
fsx and iozone on UP and SMP systems, with and without pre-emption.  Watch
for memory overwrite bugs, and performance loss (significantly more CPU
required per op).

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:14 -05:00
Chuck Lever d9ef5a8c26 NFS: introduce mechanism for tracking NFS client metrics
Add a per-superblock performance counter facility to the NFS client.  This
facility mimics the counters available for block devices and for
networking.  Expose these new counters via the new /proc/self/mountstats
interface.

Thanks to Andrew Morton and Trond Myklebust for their review and comments.

Test plan:
fsx and iozone on UP and SMP systems, with and without pre-emption.  Watch
for memory overwrite bugs, and performance loss (significantly more CPU
required per op).

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:13 -05:00