Commit graph

19950 commits

Author SHA1 Message Date
Arnd Bergmann
763641d812 lockd: push lock_flocks down
lockd should use lock_flocks() instead of lock_kernel()
to lock against posix locks accessing the i_flock list.

This is a prerequisite to turning lock_flocks into a
spinlock.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: J. Bruce Fields <bfields@redhat.com>
2010-10-27 21:39:39 +02:00
Linus Torvalds
f9ba5375a8 Merge branch 'ima-memory-use-fixes'
* ima-memory-use-fixes:
  IMA: fix the ToMToU logic
  IMA: explicit IMA i_flag to remove global lock on inode_delete
  IMA: drop refcnt from ima_iint_cache since it isn't needed
  IMA: only allocate iint when needed
  IMA: move read counter into struct inode
  IMA: use i_writecount rather than a private counter
  IMA: use inode->i_lock to protect read and write counters
  IMA: convert internal flags from long to char
  IMA: use unsigned int instead of long for counters
  IMA: drop the inode opencount since it isn't needed for operation
  IMA: use rbtree instead of radix tree for inode information cache
2010-10-26 11:37:48 -07:00
Eric Paris
a178d2027d IMA: move read counter into struct inode
IMA currently allocated an inode integrity structure for every inode in
core.  This stucture is about 120 bytes long.  Most files however
(especially on a system which doesn't make use of IMA) will never need
any of this space.  The problem is that if IMA is enabled we need to
know information about the number of readers and the number of writers
for every inode on the box.  At the moment we collect that information
in the per inode iint structure and waste the rest of the space.  This
patch moves those counters into the struct inode so we can eventually
stop allocating an IMA integrity structure except when absolutely
needed.

This patch does the minimum needed to move the location of the data.
Further cleanups, especially the location of counter updates, may still
be possible.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26 11:37:18 -07:00
Linus Torvalds
f1ebdd60cc Merge branch 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6
* 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (22 commits)
  Add _addr_lsb field to ia64 siginfo
  Fix migration.c compilation on s390
  HWPOISON: Remove retry loop for try_to_unmap
  HWPOISON: Turn addr_valid from bitfield into char
  HWPOISON: Disable DEBUG by default
  HWPOISON: Convert pr_debugs to pr_info
  HWPOISON: Improve comments in memory-failure.c
  x86: HWPOISON: Report correct address granuality for huge hwpoison faults
  Encode huge page size for VM_FAULT_HWPOISON errors
  Fix build error with !CONFIG_MIGRATION
  hugepage: move is_hugepage_on_freelist inside ifdef to avoid warning
  Clean up __page_set_anon_rmap
  HWPOISON, hugetlb: fix unpoison for hugepage
  HWPOISON, hugetlb: soft offlining for hugepage
  HWPOSION, hugetlb: recover from free hugepage error when !MF_COUNT_INCREASED
  hugetlb: move refcounting in hugepage allocation inside hugetlb_lock
  HWPOISON, hugetlb: add free check to dequeue_hwpoison_huge_page()
  hugetlb: hugepage migration core
  hugetlb: redefine hugepage copy functions
  hugetlb: add allocate function for hugepage migration
  ...
2010-10-26 10:13:10 -07:00
Linus Torvalds
4390110fef Merge branch 'for-2.6.37' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.37' of git://linux-nfs.org/~bfields/linux: (99 commits)
  svcrpc: svc_tcp_sendto XPT_DEAD check is redundant
  svcrpc: no need for XPT_DEAD check in svc_xprt_enqueue
  svcrpc: assume svc_delete_xprt() called only once
  svcrpc: never clear XPT_BUSY on dead xprt
  nfsd4: fix connection allocation in sequence()
  nfsd4: only require krb5 principal for NFSv4.0 callbacks
  nfsd4: move minorversion to client
  nfsd4: delay session removal till free_client
  nfsd4: separate callback change and callback probe
  nfsd4: callback program number is per-session
  nfsd4: track backchannel connections
  nfsd4: confirm only on succesful create_session
  nfsd4: make backchannel sequence number per-session
  nfsd4: use client pointer to backchannel session
  nfsd4: move callback setup into session init code
  nfsd4: don't cache seq_misordered replies
  SUNRPC: Properly initialize sock_xprt.srcaddr in all cases
  SUNRPC: Use conventional switch statement when reclassifying sockets
  sunrpc/xprtrdma: clean up workqueue usage
  sunrpc: Turn list_for_each-s into the ..._entry-s
  ...

Fix up trivial conflicts (two different deprecation notices added in
separate branches) in Documentation/feature-removal-schedule.txt
2010-10-26 09:55:25 -07:00
Linus Torvalds
a4dd8dce14 Merge branch 'nfs-for-2.6.37' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'nfs-for-2.6.37' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  net/sunrpc: Use static const char arrays
  nfs4: fix channel attribute sanity-checks
  NFSv4.1: Use more sensible names for 'initialize_mountpoint'
  NFSv4.1: pnfs: filelayout: add driver's LAYOUTGET and GETDEVICEINFO infrastructure
  NFSv4.1: pnfs: add LAYOUTGET and GETDEVICEINFO infrastructure
  NFS: client needs to maintain list of inodes with active layouts
  NFS: create and destroy inode's layout cache
  NFSv4.1: pnfs: filelayout: introduce minimal file layout driver
  NFSv4.1: pnfs: full mount/umount infrastructure
  NFS: set layout driver
  NFS: ask for layouttypes during v4 fsinfo call
  NFS: change stateid to be a union
  NFSv4.1: pnfsd, pnfs: protocol level pnfs constants
  SUNRPC: define xdr_decode_opaque_fixed
  NFSD: remove duplicate NFS4_STATEID_SIZE
2010-10-26 09:52:09 -07:00
J. Bruce Fields
43c2e885be nfs4: fix channel attribute sanity-checks
The sanity checks here are incorrect; in the worst case they allow
values that crash the client.

They're also over-reliant on the preprocessor.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-25 22:19:41 -04:00
Linus Torvalds
74eb94b218 Merge branch 'nfs-for-2.6.37' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'nfs-for-2.6.37' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (67 commits)
  SUNRPC: Cleanup duplicate assignment in rpcauth_refreshcred
  nfs: fix unchecked value
  Ask for time_delta during fsinfo probe
  Revalidate caches on lock
  SUNRPC: After calling xprt_release(), we must restart from call_reserve
  NFSv4: Fix up the 'dircount' hint in encode_readdir
  NFSv4: Clean up nfs4_decode_dirent
  NFSv4: nfs4_decode_dirent must clear entry->fattr->valid
  NFSv4: Fix a regression in decode_getfattr
  NFSv4: Fix up decode_attr_filehandle() to handle the case of empty fh pointer
  NFS: Ensure we check all allocation return values in new readdir code
  NFS: Readdir plus in v4
  NFS: introduce generic decode_getattr function
  NFS: check xdr_decode for errors
  NFS: nfs_readdir_filler catch all errors
  NFS: readdir with vmapped pages
  NFS: remove page size checking code
  NFS: decode_dirent should use an xdr_stream
  SUNRPC: Add a helper function xdr_inline_peek
  NFS: remove readdir plus limit
  ...
2010-10-25 13:48:29 -07:00
Linus Torvalds
57c155d51e Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd
* 'for-linus' of git://git.open-osd.org/linux-open-osd:
  exofs: Remove inode->i_count manipulation in exofs_new_inode
  fs/exofs: typo fix of faild to failed
  exofs: Set i_mapping->backing_dev_info anyway
  exofs: Cleaup read path in regard with read_for_write
2010-10-25 10:08:21 -07:00
Boaz Harrosh
fe2fd9ed5b exofs: Remove inode->i_count manipulation in exofs_new_inode
exofs_new_inode() was incrementing the inode->i_count and
decrementing it in create_done(), in a bad attempt to make sure
the inode will still be there when the asynchronous create_done()
finally arrives. This was very stupid because iput() was not called,
and if it was actually needed, it would leak the inode.

However all this is not needed, because at exofs_evict_inode()
we already wait for create_done() by waiting for the
object_created event. Therefore remove the superfluous ref counting
and just Thicken the comment at exofs_evict_inode() a bit.

While at it change places that open coded wait_obj_created()
to call the already available wrapper.

CC: Dave Chinner <dchinner@redhat.com>
CC: Christoph Hellwig <hch@lst.de>
CC: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2010-10-25 18:03:07 +02:00
Joe Perches
571f7f46bf fs/exofs: typo fix of faild to failed
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2010-10-25 18:02:49 +02:00
Yoshihisa Abe
da47c19e5c Coda: replace BKL with mutex
Replace the BKL with a mutex to protect the venus_comm structure which
binds the mountpoint with the character device and holds the upcall
queues.

Signed-off-by: Yoshihisa Abe <yoshiabe@cs.cmu.edu>
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-25 08:02:40 -07:00
Yoshihisa Abe
f7cc02b871 Coda: push BKL regions into coda_upcall()
Now that shared inode state is locked using the cii->c_lock, the BKL is
only used to protect the upcall queues used to communicate with the
userspace cache manager. The remaining state is all local and we can
push the lock further down into coda_upcall().

Signed-off-by: Yoshihisa Abe <yoshiabe@cs.cmu.edu>
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-25 08:02:40 -07:00
Yoshihisa Abe
b5ce1d83a6 Coda: add spin lock to protect accesses to struct coda_inode_info.
We mostly need it to protect cached user permissions. The c_flags field
is advisory, reading the wrong value is harmless and in the worst case
we hit a slow path where we have to make an extra upcall to the
userspace cache manager when revalidating a dentry or inode.

Signed-off-by: Yoshihisa Abe <yoshiabe@cs.cmu.edu>
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-25 08:02:40 -07:00
Linus Torvalds
8e775167d5 Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  Revert "block: fix accounting bug on cross partition merges"
2010-10-25 07:45:10 -07:00
J. Bruce Fields
a663bdd8c5 nfsd4: fix connection allocation in sequence()
We're doing an allocation under a spinlock, and ignoring the
possibility of allocation failure.

A better fix wouldn't require an unnecessary allocation in the common
case, but we'll leave that for later.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-24 21:07:07 -04:00
Trond Myklebust
1c787096fc NFSv4.1: Use more sensible names for 'initialize_mountpoint'
The initialize_mountpoint/uninitialise_mountpoint functions are really about
setting or clearing the layout driver to be used on this filesystem. Change
the names to the more descriptive 'set_layoutdriver/clear_layoutdriver'.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-24 18:07:11 -04:00
Andy Adamson
16b374ca43 NFSv4.1: pnfs: filelayout: add driver's LAYOUTGET and GETDEVICEINFO infrastructure
Implement the driver's io_ops->alloc_lseg and free_lseg functions,
which integrate into the deviceid cache and calls out to
nfs4_proc_getdeviceinfo when necessary.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Dean Hildebrand <dhildebz@umich.edu>
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Mike Sager <sager@netapp.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com>
Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-24 18:07:11 -04:00
Andy Adamson
b1f69b754e NFSv4.1: pnfs: add LAYOUTGET and GETDEVICEINFO infrastructure
Add the ability to actually send LAYOUTGET and GETDEVICEINFO.  This also adds
in the machinery to handle layout state and the deviceid cache.  Note that
GETDEVICEINFO is not called directly by the generic layer.  Instead it
is called by the drivers while parsing the LAYOUTGET opaque data in response
to an unknown device id embedded therein.  RFC 5661 only encodes
device ids within the driver-specific opaque data.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Dean Hildebrand <dhildebz@umich.edu>
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Mike Sager <sager@netapp.com>
Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com>
Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-24 18:07:10 -04:00
Andy Adamson
974cec8ca0 NFS: client needs to maintain list of inodes with active layouts
In particular, server reboot will invalidate all layouts.

Note that in order to have an active layout, we must get a successful response
from the server.  To avoid adding that machinery, this patch just includes a
stub that fakes up a successful return.  Since the layout is never referenced
for io, this is not a problem.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Dean Hildebrand <dhildebz@umich.edu>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-24 18:07:10 -04:00
Benny Halevy
e5e940170b NFS: create and destroy inode's layout cache
At the start of the io paths, try to grab the relevant layout
information.  This will initiate the inode's layout cache, but
stubs ensure the cache stays empty.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Dean Hildebrand <dhildebz@umich.edu>
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn>
Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-24 18:07:10 -04:00
Dean Hildebrand
7ab672ce31 NFSv4.1: pnfs: filelayout: introduce minimal file layout driver
This driver just registers itself and supplies trivial mount/umount functions.

Signed-off-by: Dean Hildebrand <dhildebz@umich.edu>
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-24 18:07:10 -04:00
Fred Isaman
02c35fca7c NFSv4.1: pnfs: full mount/umount infrastructure
Allow a module implementing a layout type to register, and
have its mount/umount routines called for filesystems that
the server declares support it.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Andy Adamson<andros@netapp.com>
Signed-off-by: Bian Naimeng <biannm@cn.fujitsu.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-24 18:07:10 -04:00
Ricardo Labiaga
85e174ba6b NFS: set layout driver
Put in the infrastructure that uses information returned from the
server at mount to select a layout driver module.

In this patch, a stub is used that always returns "no driver found".

Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Dean Hildebrand <dhildebz@umich.edu>
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-24 18:07:10 -04:00
Andy Adamson
504913fbc8 NFS: ask for layouttypes during v4 fsinfo call
This information will be used to determine which layout driver,
if any, to use for subsequent IO on this filesystem.  Each driver
is assigned an integer id, with 0 reserved to indicate no driver.

The server can in theory return multiple ids.  However, our current
client implementation only notes the first entry and ignores the
rest.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-24 18:07:09 -04:00
Alexandros Batsakis
9449925273 NFS: change stateid to be a union
In NFSv4.1 the stateid consists of the other and seqid fields. For layout
processing we need to numerically compare the seqid value of layout stateids.
To do so, introduce a union to nfs4_stateid to switch between opaque(16 bytes)
and opaque(12 bytes) / __be32

Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-24 18:02:53 -04:00
Andy Adamson
3c9101a057 NFSD: remove duplicate NFS4_STATEID_SIZE
Already accepted by Bruce

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-24 18:02:53 -04:00
Roman Borisov
3388bff5cf nfs: fix unchecked value
Return value of "decode_attr_bitmap()" was not checked;

Signed-off-by: Roman Borisov <ext-roman.borisov@nokia.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-24 18:00:12 -04:00
Ricardo Labiaga
55b6e7742d Ask for time_delta during fsinfo probe
Used by the client to determine if the server has a granular enough
time stamp.

Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-24 18:00:04 -04:00
Ricardo Labiaga
6b96724e50 Revalidate caches on lock
Instead of blindly zapping the caches, attempt to revalidate them if
the server has indicated that it uses high resolution timestamps.

NFSv4 should be able to always revalidate the cache since the
protocol requires the update of the change attribute on modification of
the data.  In reality, there are servers (the Linux NFS server
for example) that do not obey this requirement and use ctime as the
basis for change attribute.  Long term, the server needs to be fixed.
At this time, and to be on the safe side, continue zapping caches if
the server indicates that it does not have a high resolution timestamp.

Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-24 17:59:56 -04:00
Linus Torvalds
229aebb873 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
  Update broken web addresses in arch directory.
  Update broken web addresses in the kernel.
  Revert "drivers/usb: Remove unnecessary return's from void functions" for musb gadget
  Revert "Fix typo: configuation => configuration" partially
  ida: document IDA_BITMAP_LONGS calculation
  ext2: fix a typo on comment in ext2/inode.c
  drivers/scsi: Remove unnecessary casts of private_data
  drivers/s390: Remove unnecessary casts of private_data
  net/sunrpc/rpc_pipe.c: Remove unnecessary casts of private_data
  drivers/infiniband: Remove unnecessary casts of private_data
  drivers/gpu/drm: Remove unnecessary casts of private_data
  kernel/pm_qos_params.c: Remove unnecessary casts of private_data
  fs/ecryptfs: Remove unnecessary casts of private_data
  fs/seq_file.c: Remove unnecessary casts of private_data
  arm: uengine.c: remove C99 comments
  arm: scoop.c: remove C99 comments
  Fix typo configue => configure in comments
  Fix typo: configuation => configuration
  Fix typo interrest[ing|ed] => interest[ing|ed]
  Fix various typos of valid in comments
  ...

Fix up trivial conflicts in:
	drivers/char/ipmi/ipmi_si_intf.c
	drivers/usb/gadget/rndis.c
	net/irda/irnet/irnet_ppp.c
2010-10-24 13:41:39 -07:00
Jens Axboe
f253b86b4a Revert "block: fix accounting bug on cross partition merges"
This reverts commit 7681bfeecc.

Conflicts:

	include/linux/genhd.h

It has numerous issues with the cleanup path and non-elevator
devices. Revert it for now so we can come up with a clean
version without rushing things.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-10-24 22:06:02 +02:00
Trond Myklebust
6f7a35bd23 NFSv4: Fix up the 'dircount' hint in encode_readdir
Also ensure we only ask for either fileid or mounted_on_fileid in the
readdirplus case too...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-24 13:20:25 -04:00
Trond Myklebust
9af8c222ca NFSv4: Clean up nfs4_decode_dirent
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-24 13:18:04 -04:00
Trond Myklebust
4f082222fa NFSv4: nfs4_decode_dirent must clear entry->fattr->valid
Otherwise, we may end up reading uninitialised data from the resulting
struct nfs_fattr.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-24 13:14:02 -04:00
Trond Myklebust
3201f3dd73 NFSv4: Fix a regression in decode_getfattr
We don't want to have the mounted_on_fileid overwrite the true fileid. We
only return the former if the server didn't supply the true fileid.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-23 15:43:10 -04:00
Trond Myklebust
7ad0735300 NFSv4: Fix up decode_attr_filehandle() to handle the case of empty fh pointer
decode_attr_filehandle still needs to skip the XDR-encoded filehandle if
someone passes a null pointer argument.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-23 15:34:20 -04:00
Trond Myklebust
4a201d6e3f NFS: Ensure we check all allocation return values in new readdir code
Also some clean ups.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-23 15:27:38 -04:00
Bryan Schumaker
82f2e5472e NFS: Readdir plus in v4
By requsting more attributes during a readdir, we can mimic the readdir plus
operation that was in NFSv3.

To test, I ran the command `ls -lU --color=none` on directories with various
numbers of files.  Without readdir plus, I see this:

n files |    100    |   1,000   |  10,000   |  100,000  | 1,000,000
--------+-----------+-----------+-----------+-----------+----------
real    | 0m00.153s | 0m00.589s | 0m05.601s | 0m56.691s | 9m59.128s
user    | 0m00.007s | 0m00.007s | 0m00.077s | 0m00.703s | 0m06.800s
sys     | 0m00.010s | 0m00.070s | 0m00.633s | 0m06.423s | 1m10.005s
access  | 3         | 1         | 1         | 4         | 31
getattr | 2         | 1         | 1         | 1         | 1
lookup  | 104       | 1,003     | 10,003    | 100,003   | 1,000,003
readdir | 2         | 16        | 158       | 1,575     | 15,749
total   | 111       | 1,021     | 10,163    | 101,583   | 1,015,784

With readdir plus enabled, I see this:

n files |    100    |   1,000   |  10,000   |  100,000  | 1,000,000
--------+-----------+-----------+-----------+-----------+----------
real    | 0m00.115s | 0m00.206s | 0m01.079s | 0m12.521s | 2m07.528s
user    | 0m00.003s | 0m00.003s | 0m00.040s | 0m00.290s | 0m03.296s
sys     | 0m00.007s | 0m00.020s | 0m00.120s | 0m01.357s | 0m17.556s
access  | 3         | 1         | 1         | 1         | 7
getattr | 2         | 1         | 1         | 1         | 1
lookup  | 4         | 3         | 3         | 3         | 3
readdir | 6         | 62        | 630       | 6,300     | 62,993
total   | 15        | 67        | 635       | 6,305     | 63,004

Readdir plus disabled has about a 16x increase in the number of rpc calls and
is 4 - 5 times slower on large directories.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-23 15:27:37 -04:00
Bryan Schumaker
ae42c70a60 NFS: introduce generic decode_getattr function
Getattr should be able to decode errors and the readdir file handle.
decode_getfattr_attrs does the actual attribute decoding, while
decode_getfattr_generic will check the opcode before decoding.  This will
let other functions call decode_getfattr_attrs to decode their attributes.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-23 15:27:37 -04:00
Bryan Schumaker
9942438089 NFS: check xdr_decode for errors
Check if the decoded entry has the eof bit set when returning from xdr_decode
with an error.  If it does, we should set the eof bits in the array before
returning.  This should keep us from looping when we expect more data but the
server doesn't give us anything new.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-23 15:27:36 -04:00
Bryan Schumaker
3c8a1aeed8 NFS: nfs_readdir_filler catch all errors
Check for all errors, not a specific one.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-23 15:27:35 -04:00
Bryan Schumaker
56e4ebf877 NFS: readdir with vmapped pages
We can use vmapped pages to read more information from the network at once.
This will reduce the number of calls needed to complete a readdir.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
[trondmy: Added #include for linux/vmalloc.h> in fs/nfs/dir.c]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-23 15:27:35 -04:00
Bryan Schumaker
afa8ccc978 NFS: remove page size checking code
Remove the page size checking code for a readdir decode.  This is now done
by decode_dirent with xdr_streams.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-23 15:27:34 -04:00
Bryan Schumaker
babddc72a9 NFS: decode_dirent should use an xdr_stream
Convert nfs*xdr.c to use an xdr stream in decode_dirent.  This will prevent a
kernel oops that has been occuring when reading a vmapped page.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-23 15:27:33 -04:00
Bryan Schumaker
0715dc632a NFS: remove readdir plus limit
We will now use readdir plus even on directories that are very large.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-23 15:27:32 -04:00
Bryan Schumaker
d39ab9de3b NFS: re-add readdir plus
This patch adds readdir plus support to the cache array.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-23 15:27:31 -04:00
Trond Myklebust
baf57a09e9 NFS: Optimise the readdir searches
If we're going through the loop in nfs_readdir() more than once, we usually
do not want to restart searching from the beginning of the pages cache.

We only want to do that if the previous search failed...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-23 15:27:30 -04:00
Bryan Schumaker
d1bacf9eb2 NFS: add readdir cache array
This patch adds the readdir cache array and functions to retreive the array
stored on a cache page, clear the array by freeing allocated memory, add an
entry to the array, and search the array for a given cookie.

It then modifies readdir to make use of the new cache array.
With the new cache array method, we no longer need some of this code.

Finally, nfs_llseek_dir() will set file->f_pos to a value greater than 0 and
desc->dir_cookie to zero.  When we see this, readdir needs to find the file
at position file->f_pos from the start of the directory.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-23 15:27:30 -04:00
Randy Dunlap
8c7597f6ce nfs: include ratelimit.h, fix nfs4state build error
nfs4state.c uses interfaces from ratelimit.h.  It needs to include
that header file to fix build errors:

fs/nfs/nfs4state.c:1195: warning: type defaults to 'int' in declaration of 'DEFINE_RATELIMIT_STATE'
fs/nfs/nfs4state.c:1195: warning: parameter names (without types) in function declaration
fs/nfs/nfs4state.c:1195: error: invalid storage class for function 'DEFINE_RATELIMIT_STATE'
fs/nfs/nfs4state.c:1195: error: implicit declaration of function '__ratelimit'
fs/nfs/nfs4state.c:1195: error: '_rs' undeclared (first use in this function)

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc:	Trond Myklebust <Trond.Myklebust@netapp.com>
Cc:	linux-nfs@vger.kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-23 15:27:29 -04:00