Commit Graph

448 Commits

Author SHA1 Message Date
J. R. Okajima 1406b916f4 nfsd: fix lost nfserrno() call in nfsd_setattr()
There is a regression in
	208d0ac 2014-01-07 nfsd4: break only delegations when appropriate
which deletes an nfserrno() call in nfsd_setattr() (by accident,
probably), and NFSD becomes ignoring an error from VFS.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-02-18 12:25:57 -05:00
Linus Torvalds d9894c228b Merge branch 'for-3.14' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
 - Handle some loose ends from the vfs read delegation support.
   (For example nfsd can stop breaking leases on its own in a
    fewer places where it can now depend on the vfs to.)
 - Make life a little easier for NFSv4-only configurations
   (thanks to Kinglong Mee).
 - Fix some gss-proxy problems (thanks Jeff Layton).
 - miscellaneous bug fixes and cleanup

* 'for-3.14' of git://linux-nfs.org/~bfields/linux: (38 commits)
  nfsd: consider CLAIM_FH when handing out delegation
  nfsd4: fix delegation-unlink/rename race
  nfsd4: delay setting current_fh in open
  nfsd4: minor nfs4_setlease cleanup
  gss_krb5: use lcm from kernel lib
  nfsd4: decrease nfsd4_encode_fattr stack usage
  nfsd: fix encode_entryplus_baggage stack usage
  nfsd4: simplify xdr encoding of nfsv4 names
  nfsd4: encode_rdattr_error cleanup
  nfsd4: nfsd4_encode_fattr cleanup
  minor svcauth_gss.c cleanup
  nfsd4: better VERIFY comment
  nfsd4: break only delegations when appropriate
  NFSD: Fix a memory leak in nfsd4_create_session
  sunrpc: get rid of use_gssp_lock
  sunrpc: fix potential race between setting use_gss_proxy and the upcall rpc_clnt
  sunrpc: don't wait for write before allowing reads from use-gss-proxy file
  nfsd: get rid of unused function definition
  Define op_iattr for nfsd4_open instead using macro
  NFSD: fix compile warning without CONFIG_NFSD_V3
  ...
2014-01-30 10:18:43 -08:00
J. Bruce Fields 4335723e8e nfsd4: fix delegation-unlink/rename race
If a file is unlinked or renamed between the time when we do the local
open and the time when we get the delegation, then we will return to the
client indicating that it holds a delegation even though the file no
longer exists under the name it was open under.

But a client performing an open-by-name, when it is returned a
delegation, must be able to assume that the file is still linked at the
name it was opened under.

So, hold the parent i_mutex for longer to prevent concurrent renames or
unlinks.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-01-27 13:59:16 -05:00
Christoph Hellwig 4ac7249ea5 nfsd: use get_acl and ->set_acl
Remove the boilerplate code to marshall and unmarhall ACL objects into
xattrs and operate on the posix_acl objects directly.  Also move all
the ACL handling code into nfs?acl.c where it belongs.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-01-26 08:26:41 -05:00
J. Bruce Fields 208d0acc49 nfsd4: break only delegations when appropriate
As a temporary fix, nfsd was breaking all leases on unlink, link,
rename, and setattr.

Now that we can distinguish between leases and delegations, we can be
nicer and break only the delegations, and not bother lease-holders with
operations they don't care about.

And we get to delete some code while we're at it.

Note that in the presence of delegations the vfs calls here all return
-EWOULDBLOCK instead of blocking, so nfsd threads will not get stuck
waiting for delegation returns.

Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-01-07 16:01:15 -05:00
Stanislav Kholmanskikh c4fa6d7c59 nfsd: revoking of suid/sgid bits after chown() in a consistent way
There is an inconsistency in the handling of SUID/SGID file
bits after chown() between NFS and other local file systems.

Local file systems (for example, ext3, ext4, xfs, btrfs) revoke
SUID/SGID bits after chown() on a regular file even if
the owner/group of the file has not been changed:

~# touch file; chmod ug+s file; chmod u+x file
~# ls -l file
-rwsr-Sr-- 1 root root 0 Dec  6 04:49 file
~# chown root file; ls -l file
-rwxr-Sr-- 1 root root 0 Dec  6 04:49 file

but NFS doesn't do that:

~# touch file; chmod ug+s file; chmod u+x file
~# ls -l file
-rwsr-Sr-- 1 root root 0 Dec  6 04:49 file
~# chown root file; ls -l file
-rwsr-Sr-- 1 root root 0 Dec  6 04:49 file

NFS does that only if the owner/group has been changed:

~# touch file; chmod ug+s file; chmod u+x file
~# ls -l file
-rwsr-Sr-- 1 root root 0 Dec  6 05:02 file
~# chown bin file; ls -l file
-rwxr-Sr-- 1 bin root 0 Dec  6 05:02 file

See: http://pubs.opengroup.org/onlinepubs/9699919799/functions/chown.html

 "If the specified file is a regular file, one or more of
 the S_IXUSR, S_IXGRP, or S_IXOTH bits of the file mode are set,
 and the process has appropriate privileges, it is
 implementation-defined whether the set-user-ID and set-group-ID
 bits are altered."

So both variants are acceptable by POSIX.

This patch makes NFS to behave like local file systems.

Signed-off-by: Stanislav Kholmanskikh <stanislav.kholmanskikh@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-12-12 11:37:18 -05:00
Christoph Hellwig 987da47910 nfsd: make sure to balance get/put_write_access
Use a straight goto error label style in nfsd_setattr to make sure
we always do the put_write_access call after we got it earlier.

Note that the we have been failing to do that in the case
nfsd_break_lease() returns an error, a bug introduced into 2.6.38 with
6a76bebefe "nfsd4: break lease on nfsd
setattr".

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-11-18 12:06:48 -05:00
Christoph Hellwig 818e5a22e9 nfsd: split up nfsd_setattr
Split out two helpers to make the code more readable and easier to verify
for correctness.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-11-18 12:06:47 -05:00
J. Bruce Fields 27ac0ffeac locks: break delegations on any attribute modification
NFSv4 uses leases to guarantee that clients can cache metadata as well
as data.

Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Cc: David Howells <dhowells@redhat.com>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Dustin Kirkland <dustin.kirkland@gazzang.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:44 -05:00
J. Bruce Fields 146a8595c6 locks: break delegations on link
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Dustin Kirkland <dustin.kirkland@gazzang.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:43 -05:00
J. Bruce Fields 8e6d782cab locks: break delegations on rename
Cc: David Howells <dhowells@redhat.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:43 -05:00
J. Bruce Fields b21996e36c locks: break delegations on unlink
We need to break delegations on any operation that changes the set of
links pointing to an inode.  Start with unlink.

Such operations also hold the i_mutex on a parent directory.  Breaking a
delegation may require waiting for a timeout (by default 90 seconds) in
the case of a unresponsive NFS client.  To avoid blocking all directory
operations, we therefore drop locks before waiting for the delegation.
The logic then looks like:

	acquire locks
	...
	test for delegation; if found:
		take reference on inode
		release locks
		wait for delegation break
		drop reference on inode
		retry

It is possible this could never terminate.  (Even if we take precautions
to prevent another delegation being acquired on the same inode, we could
get a different inode on each retry.)  But this seems very unlikely.

The initial test for a delegation happens after the lock on the target
inode is acquired, but the directory inode may have been acquired
further up the call stack.  We therefore add a "struct inode **"
argument to any intervening functions, which we use to pass the inode
back up to the caller in the case it needs a delegation synchronously
broken.

Cc: David Howells <dhowells@redhat.com>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Dustin Kirkland <dustin.kirkland@gazzang.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:42 -05:00
Al Viro a6a9f18f0a nfsd: switch to %p[dD]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24 23:34:51 -04:00
Harshula Jayasuriya e4daf1ffbe nfsd: nfsd_open: when dentry_open returns an error do not propagate as struct file
The following call chain:
------------------------------------------------------------
nfs4_get_vfs_file
- nfsd_open
  - dentry_open
    - do_dentry_open
      - __get_file_write_access
        - get_write_access
          - return atomic_inc_unless_negative(&inode->i_writecount) ? 0 : -ETXTBSY;
------------------------------------------------------------

can result in the following state:
------------------------------------------------------------
struct nfs4_file {
...
  fi_fds = {0xffff880c1fa65c80, 0xffffffffffffffe6, 0x0},
  fi_access = {{
      counter = 0x1
    }, {
      counter = 0x0
    }},
...
------------------------------------------------------------

1) First time around, in nfs4_get_vfs_file() fp->fi_fds[O_WRONLY] is
NULL, hence nfsd_open() is called where we get status set to an error
and fp->fi_fds[O_WRONLY] to -ETXTBSY. Thus we do not reach
nfs4_file_get_access() and fi_access[O_WRONLY] is not incremented.

2) Second time around, in nfs4_get_vfs_file() fp->fi_fds[O_WRONLY] is
NOT NULL (-ETXTBSY), so nfsd_open() is NOT called, but
nfs4_file_get_access() IS called and fi_access[O_WRONLY] is incremented.
Thus we leave a landmine in the form of the nfs4_file data structure in
an incorrect state.

3) Eventually, when __nfs4_file_put_access() is called it finds
fi_access[O_WRONLY] being non-zero, it decrements it and calls
nfs4_file_put_fd() which tries to fput -ETXTBSY.
------------------------------------------------------------
...
     [exception RIP: fput+0x9]
     RIP: ffffffff81177fa9  RSP: ffff88062e365c90  RFLAGS: 00010282
     RAX: ffff880c2b3d99cc  RBX: ffff880c2b3d9978  RCX: 0000000000000002
     RDX: dead000000100101  RSI: 0000000000000001  RDI: ffffffffffffffe6
     RBP: ffff88062e365c90   R8: ffff88041fe797d8   R9: ffff88062e365d58
     R10: 0000000000000008  R11: 0000000000000000  R12: 0000000000000001
     R13: 0000000000000007  R14: 0000000000000000  R15: 0000000000000000
     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
  #9 [ffff88062e365c98] __nfs4_file_put_access at ffffffffa0562334 [nfsd]
 #10 [ffff88062e365cc8] nfs4_file_put_access at ffffffffa05623ab [nfsd]
 #11 [ffff88062e365ce8] free_generic_stateid at ffffffffa056634d [nfsd]
 #12 [ffff88062e365d18] release_open_stateid at ffffffffa0566e4b [nfsd]
 #13 [ffff88062e365d38] nfsd4_close at ffffffffa0567401 [nfsd]
 #14 [ffff88062e365d88] nfsd4_proc_compound at ffffffffa0557f28 [nfsd]
 #15 [ffff88062e365dd8] nfsd_dispatch at ffffffffa054543e [nfsd]
 #16 [ffff88062e365e18] svc_process_common at ffffffffa04ba5a4 [sunrpc]
 #17 [ffff88062e365e98] svc_process at ffffffffa04babe0 [sunrpc]
 #18 [ffff88062e365eb8] nfsd at ffffffffa0545b62 [nfsd]
 #19 [ffff88062e365ee8] kthread at ffffffff81090886
 #20 [ffff88062e365f48] kernel_thread at ffffffff8100c14a
------------------------------------------------------------

Cc: stable@vger.kernel.org
Signed-off-by: Harshula Jayasuriya <harshula@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-07-23 12:15:32 -04:00
Linus Torvalds 0ff08ba5d0 Merge branch 'for-3.11' of git://linux-nfs.org/~bfields/linux
Pull nfsd changes from Bruce Fields:
 "Changes this time include:

   - 4.1 enabled on the server by default: the last 4.1-specific issues
     I know of are fixed, so we're not going to find the rest of the
     bugs without more exposure.
   - Experimental support for NFSv4.2 MAC Labeling (to allow running
     selinux over NFS), from Dave Quigley.
   - Fixes for some delicate cache/upcall races that could cause rare
     server hangs; thanks to Neil Brown and Bodo Stroesser for extreme
     debugging persistence.
   - Fixes for some bugs found at the recent NFS bakeathon, mostly v4
     and v4.1-specific, but also a generic bug handling fragmented rpc
     calls"

* 'for-3.11' of git://linux-nfs.org/~bfields/linux: (31 commits)
  nfsd4: support minorversion 1 by default
  nfsd4: allow destroy_session over destroyed session
  svcrpc: fix failures to handle -1 uid's
  sunrpc: Don't schedule an upcall on a replaced cache entry.
  net/sunrpc: xpt_auth_cache should be ignored when expired.
  sunrpc/cache: ensure items removed from cache do not have pending upcalls.
  sunrpc/cache: use cache_fresh_unlocked consistently and correctly.
  sunrpc/cache: remove races with queuing an upcall.
  nfsd4: return delegation immediately if lease fails
  nfsd4: do not throw away 4.1 lock state on last unlock
  nfsd4: delegation-based open reclaims should bypass permissions
  svcrpc: don't error out on small tcp fragment
  svcrpc: fix handling of too-short rpc's
  nfsd4: minor read_buf cleanup
  nfsd4: fix decoding of compounds across page boundaries
  nfsd4: clean up nfs4_open_delegation
  NFSD: Don't give out read delegations on creates
  nfsd4: allow client to send no cb_sec flavors
  nfsd4: fail attempts to request gss on the backchannel
  nfsd4: implement minimal SP4_MACH_CRED
  ...
2013-07-11 10:17:13 -07:00
Al Viro ac6614b764 [readdir] constify ->actor
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:57:05 +04:00
Al Viro 5c0ba4e076 [readdir] introduce iterate_dir() and dir_context
iterate_dir(): new helper, replacing vfs_readdir().

struct dir_context: contains the readdir callback (and will get more stuff
in it), embedded into whatever data that callback wants to deal with;
eventually, we'll be passing it to ->readdir() replacement instead of
(data,filldir) pair.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:46:46 +04:00
David Quigley 18032ca062 NFSD: Server implementation of MAC Labeling
Implement labeled NFS on the server: encoding and decoding, and writing
and reading, of file labels.

Enabled with CONFIG_NFSD_V4_SECURITY_LABEL.

Signed-off-by: Matthew N. Dodd <Matthew.Dodd@sparta.com>
Signed-off-by: Miguel Rodel Felipe <Rodel_FM@dsi.a-star.edu.sg>
Signed-off-by: Phua Eu Gene <PHUA_Eu_Gene@dsi.a-star.edu.sg>
Signed-off-by: Khin Mi Mi Aung <Mi_Mi_AUNG@dsi.a-star.edu.sg>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-05-15 09:27:02 -04:00
J. Bruce Fields aa387d6ce1 nfsd: fix EXDEV checking in rename
We again check for the EXDEV a little later on, so the first check is
redundant.  This check is also slightly racier, since a badly timed
eviction from the export cache could leave us with the two fh_export
pointers pointing to two different cache entries which each refer to the
same underlying export.

It's better to compare vfsmounts as the later check does, but that
leaves a minor security hole in the case where the two exports refer to
two different directories especially if (for example) they have
different root-squashing options.

So, compare ex_path.dentry too.

Reported-by: Joe Habermann <joe.habermann@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-26 16:18:15 -04:00
Kent Overstreet e49dbbf3e7 nfsd: fix bad offset use
vfs_writev() updates the offset argument - but the code then passes the
offset to vfs_fsync_range(). Since offset now points to the offset after
what was just written, this is probably not what was intended

Introduced by face15025f "nfsd: use
vfs_fsync_range(), not O_SYNC, for stable writes".

Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: stable@vger.kernel.org
Reviewed-by: Zach Brown <zab@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-03-22 16:55:15 -04:00
Linus Torvalds d895cb1af1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile (part one) from Al Viro:
 "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
  locking violations, etc.

  The most visible changes here are death of FS_REVAL_DOT (replaced with
  "has ->d_weak_revalidate()") and a new helper getting from struct file
  to inode.  Some bits of preparation to xattr method interface changes.

  Misc patches by various people sent this cycle *and* ocfs2 fixes from
  several cycles ago that should've been upstream right then.

  PS: the next vfs pile will be xattr stuff."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
  saner proc_get_inode() calling conventions
  proc: avoid extra pde_put() in proc_fill_super()
  fs: change return values from -EACCES to -EPERM
  fs/exec.c: make bprm_mm_init() static
  ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
  ocfs2: fix possible use-after-free with AIO
  ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
  get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
  target: writev() on single-element vector is pointless
  export kernel_write(), convert open-coded instances
  fs: encode_fh: return FILEID_INVALID if invalid fid_type
  kill f_vfsmnt
  vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
  nfsd: handle vfs_getattr errors in acl protocol
  switch vfs_getattr() to struct path
  default SET_PERSONALITY() in linux/elf.h
  ceph: prepopulate inodes only when request is aborted
  d_hash_and_lookup(): export, switch open-coded instances
  9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
  9p: split dropping the acls from v9fs_set_create_acl()
  ...
2013-02-26 20:16:07 -08:00
Al Viro 496ad9aa8e new helper: file_inode(file)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-22 23:31:31 -05:00
Eric W. Biederman 6fab877900 nfsd: Properly compare and initialize kuids and kgids
Use uid_eq(uid, GLOBAL_ROOT_UID) instead of !uid.
Use gid_eq(gid, GLOBAL_ROOT_GID) instead of !gid.
Use uid_eq(uid, INVALID_UID) instead of uid == -1
Use gid_eq(uid, INVALID_GID) instead of gid == -1
Use uid = GLOBAL_ROOT_UID instead of uid = 0;
Use gid = GLOBAL_ROOT_GID instead of gid = 0;
Use !uid_eq(uid1, uid2) instead of uid1 != uid2.
Use !gid_eq(gid1, gid2) instead of gid1 != gid2.
Use uid_eq(uid1, uid2) instead of uid1 == uid2.

Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13 06:16:09 -08:00
J. Bruce Fields 10532b560b Revert "nfsd: warn on odd reply state in nfsd_vfs_read"
This reverts commit 79f77bf9a4.

This is obviously wrong, and I have no idea how I missed seeing the
warning in testing: I must just not have looked at the right logs.  The
caller bumps rq_resused/rq_next_page, so it will always be hit on a
large enough read.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-21 17:07:45 -08:00
J. Bruce Fields afc59400d6 nfsd4: cleanup: replace rq_resused count by rq_next_page pointer
It may be a matter of personal taste, but I find this makes the code
clearer.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-17 22:00:16 -05:00
J. Bruce Fields 79f77bf9a4 nfsd: warn on odd reply state in nfsd_vfs_read
As far as I can tell this shouldn't currently happen--or if it does,
something is wrong and data is going to be corrupted.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-17 21:55:46 -05:00
Neil Brown 7007c90fb9 nfsd: avoid permission checks on EXCLUSIVE_CREATE replay
With NFSv4, if we create a file then open it we explicit avoid checking
the permissions on the file during the open because the fact that we
created it ensures we should be allow to open it (the create and the
open should appear to be a single operation).

However if the reply to an EXCLUSIVE create gets lots and the client
resends the create, the current code will perform the permission check -
because it doesn't realise that it did the open already..

This patch should fix this.

Note that I haven't actually seen this cause a problem.  I was just
looking at the code trying to figure out a different EXCLUSIVE open
related issue, and this looked wrong.

(Fix confirmed with pynfs 4.0 test OPEN4--bfields)

Cc: stable@kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
[bfields: use OWNER_OVERRIDE and update for 4.1]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-10 16:25:31 -05:00
J. Bruce Fields face15025f nfsd: use vfs_fsync_range(), not O_SYNC, for stable writes
NFSv4 shares the same struct file across multiple writes.  (And we'd
like NFSv2 and NFSv3 to do that as well some day.)

So setting O_SYNC on the struct file as a way to request a synchronous
write doesn't work.

Instead, do a vfs_fsync_range() in that case.

Reported-by: Peter Staubach <pstaubach@exagrid.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-07 19:31:34 -05:00
J. Bruce Fields fae5096ad2 nfsd: assume writeable exportabled filesystems have f_sync
I don't really see how you could claim to support nfsd and not support
fsync somehow.

And in practice a quick look through the exportable filesystems suggests
the only ones without an ->fsync are read-only (efs, isofs, squashfs) or
in-memory (shmem).

Also, performing a write and then returning an error if the sync fails
(as we would do here in the wgather case) seems unhelpful to clients.

Also remove an incorrect comment.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-07 19:31:33 -05:00
J. Bruce Fields f474af7051 UAPI Disintegration 2012-10-09
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIVAwUAUHPmWxOxKuMESys7AQKN4w//XDwALfbf0MXIw+gwyRiUtJe9mGexvI6X
 1R4FWU9a3ImzEZP4cWnmPGT2wmC/x007DcIvx8cyvbdlSuqtR2i/DC+HbWabiLRn
 nJS7Eer1BJvLv5dn6NmXMEz7yB4Z46+frcmBs3WQeR0sqBMDm+rjQzCqECznO8Jc
 VtCbox+VR2DuWcM++YECTblYEH3Z+doDXUN2eBaD8L9x3klPbPXD7OcRyOnry8w+
 ynmUTKKyH4+hpxDakYrObPIg+vFCxb4QRck1mlgA4wbvb3eqjhM0oOCYJ8GvmILA
 vdFYztWCjkiuOl5djtXBlsClX8SAMOBYlRed+R1GvjNCSR+WCWrFJJ2F8qoQ1w87
 9ts2/8qrozS8luTB475SkT2uLdJkIUKX89Oh+dWeE8YkbPnRPj5lNAdtNY5QSyDq
 VaRpIo+YfmZygyvHJQlAXBuZ0mvzcPzArfcPgSVTD3B7xTEGVu/45V7SnQX5os/V
 v39ySPXMdGOIdvK51gw7OtZl64uqrEKu39PyYDX/GUADflp/CHD0J7PJrQePbsH9
 AQolVZDIxTfKqYQnUdL8+C8Zc24RowEzz3c2+aO89MSzwGqev3q8sXRVbW/Iqryg
 p+V3nHe+ipKcga5tOBlPr9KDtDd7j3xN2yaIwf5/QyO1OHBpjAZP1gjSVDcUcwpi
 svYy4kPn3PA=
 =etoL
 -----END PGP SIGNATURE-----

nfs: disintegrate UAPI for nfs

This is to complete part of the Userspace API (UAPI) disintegration for which
the preparatory patches were pulled recently.  After these patches, userspace
headers will be segregated into:

        include/uapi/linux/.../foo.h

for the userspace interface stuff, and:

        include/linux/.../foo.h

for the strictly kernel internal stuff.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-10-09 18:35:22 -04:00
Eric W. Biederman 5f3a4a28ec userns: Pass a userns parameter into posix_acl_to_xattr and posix_acl_from_xattr
- Pass the user namespace the uid and gid values in the xattr are stored
   in into posix_acl_from_xattr.

 - Pass the user namespace kuid and kgid values should be converted into
   when storing uid and gid values in an xattr in posix_acl_to_xattr.

- Modify all callers of posix_acl_from_xattr and posix_acl_to_xattr to
  pass in &init_user_ns.

In the short term this change is not strictly needed but it makes the
code clearer.  In the longer term this change is necessary to be able to
mount filesystems outside of the initial user namespace that natively
store posix acls in the linux xattr format.

Cc: Theodore Tso <tytso@mit.edu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-09-18 01:01:35 -07:00
J. Bruce Fields fac7a17b5f nfsd4: cast readlink() bug argument
As we already do in readv, writev.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-09-10 17:46:19 -04:00
Linus Torvalds a0e881b7c1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull second vfs pile from Al Viro:
 "The stuff in there: fsfreeze deadlock fixes by Jan (essentially, the
  deadlock reproduced by xfstests 068), symlink and hardlink restriction
  patches, plus assorted cleanups and fixes.

  Note that another fsfreeze deadlock (emergency thaw one) is *not*
  dealt with - the series by Fernando conflicts a lot with Jan's, breaks
  userland ABI (FIFREEZE semantics gets changed) and trades the deadlock
  for massive vfsmount leak; this is going to be handled next cycle.
  There probably will be another pull request, but that stuff won't be
  in it."

Fix up trivial conflicts due to unrelated changes next to each other in
drivers/{staging/gdm72xx/usb_boot.c, usb/gadget/storage_common.c}

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits)
  delousing target_core_file a bit
  Documentation: Correct s_umount state for freeze_fs/unfreeze_fs
  fs: Remove old freezing mechanism
  ext2: Implement freezing
  btrfs: Convert to new freezing mechanism
  nilfs2: Convert to new freezing mechanism
  ntfs: Convert to new freezing mechanism
  fuse: Convert to new freezing mechanism
  gfs2: Convert to new freezing mechanism
  ocfs2: Convert to new freezing mechanism
  xfs: Convert to new freezing code
  ext4: Convert to new freezing mechanism
  fs: Protect write paths by sb_start_write - sb_end_write
  fs: Skip atime update on frozen filesystem
  fs: Add freezing handling to mnt_want_write() / mnt_drop_write()
  fs: Improve filesystem freezing handling
  switch the protection of percpu_counter list to spinlock
  nfsd: Push mnt_want_write() outside of i_mutex
  btrfs: Push mnt_want_write() outside of i_mutex
  fat: Push mnt_want_write() outside of i_mutex
  ...
2012-08-01 10:26:23 -07:00
Linus Torvalds 08843b79fb Merge branch 'nfsd-next' of git://linux-nfs.org/~bfields/linux
Pull nfsd changes from J. Bruce Fields:
 "This has been an unusually quiet cycle--mostly bugfixes and cleanup.
  The one large piece is Stanislav's work to containerize the server's
  grace period--but that in itself is just one more step in a
  not-yet-complete project to allow fully containerized nfs service.

  There are a number of outstanding delegation, container, v4 state, and
  gss patches that aren't quite ready yet; 3.7 may be wilder."

* 'nfsd-next' of git://linux-nfs.org/~bfields/linux: (35 commits)
  NFSd: make boot_time variable per network namespace
  NFSd: make grace end flag per network namespace
  Lockd: move grace period management from lockd() to per-net functions
  LockD: pass actual network namespace to grace period management functions
  LockD: manage grace list per network namespace
  SUNRPC: service request network namespace helper introduced
  NFSd: make nfsd4_manager allocated per network namespace context.
  LockD: make lockd manager allocated per network namespace
  LockD: manage grace period per network namespace
  Lockd: add more debug to host shutdown functions
  Lockd: host complaining function introduced
  LockD: manage used host count per networks namespace
  LockD: manage garbage collection timeout per networks namespace
  LockD: make garbage collector network namespace aware.
  LockD: mark host per network namespace on garbage collect
  nfsd4: fix missing fault_inject.h include
  locks: move lease-specific code out of locks_delete_lock
  locks: prevent side-effects of locks_release_private before file_lock is initialized
  NFSd: set nfsd_serv to NULL after service destruction
  NFSd: introduce nfsd_destroy() helper
  ...
2012-07-31 14:42:28 -07:00
Jan Kara 4a55c1017b nfsd: Push mnt_want_write() outside of i_mutex
When mnt_want_write() starts to handle freezing it will get a full lock
semantics requiring proper lock ordering. So push mnt_want_write() call
consistently outside of i_mutex.

CC: linux-nfs@vger.kernel.org
CC: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-31 01:02:51 +04:00
Al Viro 765927b2d5 switch dentry_open() to struct path, make it grab references itself
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-23 00:01:29 +04:00
Al Viro 312b63fba9 don't pass nameidata * to vfs_create()
all we want is a boolean flag, same as the method gets now

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:50 +04:00
J. Bruce Fields d91d0b5690 nfsd: allow owner_override only for regular files
We normally allow the owner of a file to override permissions checks on
IO operations, since:
	- the client will take responsibility for doing an access check
	  on open;
	- the permission checks offer no protection against malicious
	  clients--if they can authenticate as the file's owner then
	  they can always just change its permissions;
	- checking permission on each IO operation breaks the usual
	  posix rule that permission is checked only on open.

However, we've never allowed the owner to override permissions on
readdir operations, even though the above logic would also apply to
directories.  I've never heard of this causing a problem, probably
because a) simultaneously opening and creating a directory (with
restricted mode) isn't possible, and b) opening a directory, then
chmod'ing it, is rare.

Our disallowal of owner-override on directories appears to be an
accident, though--the readdir itself succeeds, and then we fail just
because lookup_one_len() calls in our filldir methods fail.

I'm not sure what the easiest fix for that would be.  For now, just make
this behavior obvious by denying the override right at the start.

This also fixes some odd v4 behavior: with the rdattr_error attribute
requested, it would perform the readdir but return an ACCES error with
each entry.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-10 16:41:35 -04:00
Jeff Layton b108fe6b08 nfsd: trivial: use SEEK_SET instead of 0 in vfs_llseek
They're equivalent, but SEEK_SET is more informative...

Cc: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-04-25 15:38:23 -04:00
J. Bruce Fields 9dc4e6c4d1 nfsd: don't fail unchecked creates of non-special files
Allow a v3 unchecked open of a non-regular file succeed as if it were a
lookup; typically a client in such a case will want to fall back on a
local open, so succeeding and giving it the filehandle is more useful
than failing with nfserr_exist, which makes it appear that nothing at
all exists by that name.

Similarly for v4, on an open-create, return the same errors we would on
an attempt to open a non-regular file, instead of returning
nfserr_exist.

This fixes a problem found doing a v4 open of a symlink with
O_RDONLY|O_CREAT, which resulted in the current client returning EEXIST.

Thanks also to Trond for analysis.

Cc: stable@kernel.org
Reported-by: Orion Poplawski <orion@cora.nwra.com>
Tested-by: Orion Poplawski <orion@cora.nwra.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-04-11 17:49:52 -04:00
Linus Torvalds 71db34fc43 Merge branch 'for-3.4' of git://linux-nfs.org/~bfields/linux
Pull nfsd changes from Bruce Fields:

Highlights:
 - Benny Halevy and Tigran Mkrtchyan implemented some more 4.1 features,
   moving us closer to a complete 4.1 implementation.
 - Bernd Schubert fixed a long-standing problem with readdir cookies on
   ext2/3/4.
 - Jeff Layton performed a long-overdue overhaul of the server reboot
   recovery code which will allow us to deprecate the current code (a
   rather unusual user of the vfs), and give us some needed flexibility
   for further improvements.
 - Like the client, we now support numeric uid's and gid's in the
   auth_sys case, allowing easier upgrades from NFSv2/v3 to v4.x.

Plus miscellaneous bugfixes and cleanup.

Thanks to everyone!

There are also some delegation fixes waiting on vfs review that I
suppose will have to wait for 3.5.  With that done I think we'll finally
turn off the "EXPERIMENTAL" dependency for v4 (though that's mostly
symbolic as it's been on by default in distro's for a while).

And the list of 4.1 todo's should be achievable for 3.5 as well:

   http://wiki.linux-nfs.org/wiki/index.php/Server_4.0_and_4.1_issues

though we may still want a bit more experience with it before turning it
on by default.

* 'for-3.4' of git://linux-nfs.org/~bfields/linux: (55 commits)
  nfsd: only register cld pipe notifier when CONFIG_NFSD_V4 is enabled
  nfsd4: use auth_unix unconditionally on backchannel
  nfsd: fix NULL pointer dereference in cld_pipe_downcall
  nfsd4: memory corruption in numeric_name_to_id()
  sunrpc: skip portmap calls on sessions backchannel
  nfsd4: allow numeric idmapping
  nfsd: don't allow legacy client tracker init for anything but init_net
  nfsd: add notifier to handle mount/unmount of rpc_pipefs sb
  nfsd: add the infrastructure to handle the cld upcall
  nfsd: add a header describing upcall to nfsdcld
  nfsd: add a per-net-namespace struct for nfsd
  sunrpc: create nfsd dir in rpc_pipefs
  nfsd: add nfsd4_client_tracking_ops struct and a way to set it
  nfsd: convert nfs4_client->cl_cb_flags to a generic flags field
  NFSD: Fix nfs4_verifier memory alignment
  NFSD: Fix warnings when NFSD_DEBUG is not defined
  nfsd: vfs_llseek() with 32 or 64 bit offsets (hashes)
  nfsd: rename 'int access' to 'int may_flags' in nfsd_open()
  ext4: return 32/64-bit dir name hash according to usage type
  fs: add new FMODE flags: FMODE_32bithash and FMODE_64bithash
  ...
2012-03-29 14:53:25 -07:00
Al Viro 68ac1234fb switch touch_atime to struct path
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-20 21:29:41 -04:00
Bernd Schubert 06effdbb49 nfsd: vfs_llseek() with 32 or 64 bit offsets (hashes)
Use 32-bit or 64-bit llseek() hashes for directory offsets depending on
the NFS version. NFSv2 gets 32-bit hashes only.

NOTE: This patch got rather complex as Christoph asked to set the
filp->f_mode flag in the open call or immediatly after dentry_open()
in nfsd_open() to avoid races.
Personally I still do not see a reason for that and in my opinion
FMODE_32BITHASH/FMODE_64BITHASH flags could be set nfsd_readdir(), as it
follows directly after nfsd_open() without a chance of races.

Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: J. Bruce Fields<bfields@redhat.com>
2012-03-18 22:44:50 -04:00
Bernd Schubert 999448a8c0 nfsd: rename 'int access' to 'int may_flags' in nfsd_open()
Just rename this variable, as the next patch will add a flag and
'access' as variable name would not be correct any more.

Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: J. Bruce Fields<bfields@redhat.com>
2012-03-18 22:44:49 -04:00
Linus Torvalds 0b48d42235 Merge branch 'for-3.3' of git://linux-nfs.org/~bfields/linux
* 'for-3.3' of git://linux-nfs.org/~bfields/linux: (31 commits)
  nfsd4: nfsd4_create_clid_dir return value is unused
  NFSD: Change name of extended attribute containing junction
  svcrpc: don't revert to SVC_POOL_DEFAULT on nfsd shutdown
  svcrpc: fix double-free on shutdown of nfsd after changing pool mode
  nfsd4: be forgiving in the absence of the recovery directory
  nfsd4: fix spurious 4.1 post-reboot failures
  NFSD: forget_delegations should use list_for_each_entry_safe
  NFSD: Only reinitilize the recall_lru list under the recall lock
  nfsd4: initialize special stateid's at compile time
  NFSd: use network-namespace-aware cache registering routines
  SUNRPC: create svc_xprt in proper network namespace
  svcrpc: update outdated BKL comment
  nfsd41: allow non-reclaim open-by-fh's in 4.1
  svcrpc: avoid memory-corruption on pool shutdown
  svcrpc: destroy server sockets all at once
  svcrpc: make svc_delete_xprt static
  nfsd: Fix oops when parsing a 0 length export
  nfsd4: Use kmemdup rather than duplicating its implementation
  nfsd4: add a separate (lockowner, inode) lookup
  nfsd4: fix CONFIG_NFSD_FAULT_INJECTION compile error
  ...
2012-01-14 12:26:41 -08:00
Chuck Lever 9b4146e855 NFSD: Change name of extended attribute containing junction
As of fedfs-utils-0.8.0, user space stores all NFS junction
information in a single extended attribute: "trusted.junction.nfs".

Both FedFS and NFS basic junctions are stored in this one attribute,
and the intention is that all future forms of NFS junction metadata
will be stored in this attribute.  Other protocols may use a different
extended attribute.

Thus NFSD needs to look only for that one extended attribute.  The
"trusted.junction.type" xattr is deprecated.  fedfs-utils-0.8.0 will
continue to attach a "trusted.junction.type" xattr to junctions, but
future fedfs-utils releases may no longer do that.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-01-05 15:35:57 -05:00
Al Viro 175a4eb7ea fs: propagate umode_t, misc bits
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:10 -05:00
Al Viro bad0dcffc2 new helpers: fh_{want,drop}_write()
A bunch of places in nfsd does mnt_{want,drop}_write on vfsmount of
export of given fhandle.  Switched to obvious inlined helpers...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:52:35 -05:00
J. Bruce Fields 856121b2e8 nfsd4: warn on open failure after create
If we create the object and then return failure to the client, we're
left with an unexpected file in the filesystem.

I'm trying to eliminate such cases but not 100% sure I have so an
assertion might be helpful for now.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-10-17 17:50:08 -04:00
Trond Myklebust 11fcee0293 NFSD: Add a cache for fs_locations information
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
[ cel: since this is server-side, use nfsd4_ prefix instead of nfs4_ prefix. ]
[ cel: implement S_ISVTX filter in bfields-normal form ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-09-13 22:44:17 -04:00
J. Bruce Fields a043226bc1 nfsd4: permit read opens of executable-only files
A client that wants to execute a file must be able to read it.  Read
opens over nfs are therefore implicitly allowed for executable files
even when those files are not readable.

NFSv2/v3 get this right by using a passed-in NFSD_MAY_OWNER_OVERRIDE on
read requests, but NFSv4 has gotten this wrong ever since
dc730e1737 "nfsd4: fix owner-override on
open", when we realized that the file owner shouldn't override
permissions on non-reclaim NFSv4 opens.

So we can't use NFSD_MAY_OWNER_OVERRIDE to tell nfsd_permission to allow
reads of executable files.

So, do the same thing we do whenever we encounter another weird NFS
permission nit: define yet another NFSD_MAY_* flag.

The industry's future standardization on 128-bit processors will be
motivated primarily by the need for integers with enough bits for all
the NFSD_MAY_* flags.

Reported-by: Leonardo Borda <leonardoborda@gmail.com>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-27 14:20:20 -04:00
J. Bruce Fields e281d81009 nfsd4: fix incorrect comment in nfsd4_set_nfs4_acl
Zero means "I don't care what kind of file this is".  And that's
probably what we want--acls are also settable at least on directories,
and if the filesystem doesn't want them on other objects, leave it to it
to complain.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-26 18:22:49 -04:00
J. Bruce Fields 7d818a7b8f nfsd: open-code special directory-hardlink check
We allow the fh_verify caller to specify that any object *except* those
of a given type is allowed, by passing a negative type.  But only one
caller actually uses it.  Open-code that check in the one caller.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-26 18:22:47 -04:00
J. Bruce Fields 105f462210 nfsd4: fix break_lease flags on nfsd open
Thanks to Casey Bodley for pointing out that on a read open we pass 0,
instead of O_RDONLY, to break_lease, with the result that a read open is
treated like a write open for the purposes of lease breaking!

Reported-by: Casey Bodley <cbodley@citi.umich.edu>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-06-20 10:38:01 -04:00
Casey Bodley 7d751f6f8c nfsd: link returns nfserr_delay when breaking lease
fix for commit 4795bb37ef, nfsd: break
lease on unlink, link, and rename

if the LINK operation breaks a delegation, it returns NFS4ERR_NOENT
(which is not a valid error in rfc 5661) instead of NFS4ERR_DELAY.
the return value of nfsd_break_lease() in nfsd_link() must be
converted from host_err to err

Signed-off-by: Casey Bodley <cbodley@citi.umich.edu>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-06-06 18:46:56 -04:00
Linus Torvalds a74d70b63f Merge branch 'for-2.6.40' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.40' of git://linux-nfs.org/~bfields/linux: (22 commits)
  nfsd: make local functions static
  NFSD: Remove unused variable from nfsd4_decode_bind_conn_to_session()
  NFSD: Check status from nfsd4_map_bcts_dir()
  NFSD: Remove setting unused variable in nfsd_vfs_read()
  nfsd41: error out on repeated RECLAIM_COMPLETE
  nfsd41: compare request's opcnt with session's maxops at nfsd4_sequence
  nfsd v4.1 lOCKT clientid field must be ignored
  nfsd41: add flag checking for create_session
  nfsd41: make sure nfs server process OPEN with EXCLUSIVE4_1 correctly
  nfsd4: fix wrongsec handling for PUTFH + op cases
  nfsd4: make fh_verify responsibility of nfsd_lookup_dentry caller
  nfsd4: introduce OPDESC helper
  nfsd4: allow fh_verify caller to skip pseudoflavor checks
  nfsd: distinguish functions of NFSD_MAY_* flags
  svcrpc: complete svsk processing on cb receive failure
  svcrpc: take advantage of tcp autotuning
  SUNRPC: Don't wait for full record to receive tcp data
  svcrpc: copy cb reply instead of pages
  svcrpc: close connection if client sends short packet
  svcrpc: note network-order types in svc_process_calldir
  ...
2011-05-29 11:21:12 -07:00
Bryan Schumaker fccb13c947 NFSD: Remove setting unused variable in nfsd_vfs_read()
Compiling gave me this warning:
fs/nfsd/vfs.c: In function ‘nfsd_vfs_read’:
fs/nfsd/vfs.c:880:16: warning: variable ‘inode’ set but not used
[-Wunused-but-set-variable]

I discovered that a local variable "inode" was being set towards the
beginning of nfsd_vfs_read() and then ignored for the rest of the
function.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-04-29 20:47:57 -04:00
Mi Jinlong ac6721a13e nfsd41: make sure nfs server process OPEN with EXCLUSIVE4_1 correctly
The NFS server uses nfsd_create_v3 to handle EXCLUSIVE4_1 opens, but
that function is not prepared to handle them.

Rename nfsd_create_v3() to do_nfsd_create(), and add handling of
EXCLUSIVE4_1.

Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-04-29 20:47:52 -04:00
Sachin Prabhu 1574dff899 Open with O_CREAT flag set fails to open existing files on non writable directories
An open on a NFS4 share using the O_CREAT flag on an existing file for
which we have permissions to open but contained in a directory with no
write permissions will fail with EACCES.

A tcpdump shows that the client had set the open mode to UNCHECKED which
indicates that the file should be created if it doesn't exist and
encountering an existing flag is not an error. Since in this case the
file exists and can be opened by the user, the NFS server is wrong in
attempting to check create permissions on the parent directory.

The patch adds a conditional statement to check for create permissions
only if the file doesn't exist.

Signed-off-by: Sachin S. Prabhu <sprabhu@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-04-20 11:03:01 -04:00
J. Bruce Fields 29a78a3ed7 nfsd4: make fh_verify responsibility of nfsd_lookup_dentry caller
The secinfo caller actually won't want this.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-04-11 08:42:22 -04:00
J. Bruce Fields aea93397db nfsd: distinguish functions of NFSD_MAY_* flags
Most of the NFSD_MAY_* flags actually request permissions, but over the
years we've accreted a few that modify the behavior of the permission or
open code in other ways.

Distinguish the two cases a little more.  In particular, allow the
shortcut at the start of nfsd_permission to ignore the
non-permission-requesting bits.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-04-11 08:42:03 -04:00
Linus Torvalds dc87c55120 Merge branch 'for-2.6.39' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.39' of git://linux-nfs.org/~bfields/linux:
  SUNRPC: Remove resource leak in svc_rdma_send_error()
  nfsd: wrong index used in inner loop
  nfsd4: fix comment and remove unused nfsd4_file fields
  nfs41: make sure nfs server return right ca_maxresponsesize_cached
  nfsd: fix compile error
  svcrpc: fix bad argument in unix_domain_find
  nfsd4: fix struct file leak
  nfsd4: minor nfs4state.c reshuffling
  svcrpc: fix rare race on unix_domain creation
  nfsd41: modify the members value of nfsd4_op_flags
  nfsd: add proc file listing kernel's gss_krb5 enctypes
  gss:krb5 only include enctype numbers in gm_upcall_enctypes
  NFSD, VFS: Remove dead code in nfsd_rename()
  nfsd: kill unused macro definition
  locks: use assign_type()
2011-03-24 08:20:39 -07:00
Al Viro 7cc90cc3ff don't pass 'mounting_here' flag to follow_down()
it's always false now

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-18 09:04:20 -04:00
Jesper Juhl 46d4cef9cf NFSD, VFS: Remove dead code in nfsd_rename()
Currently we have the following code in fs/nfsd/vfs.c::nfsd_rename() :

	...
	host_err = nfsd_break_lease(odentry->d_inode);
	if (host_err)
		goto out_drop_write;
	if (ndentry->d_inode) {
		host_err = nfsd_break_lease(ndentry->d_inode);
		if (host_err)
			goto out_drop_write;
	}
	if (host_err)
		goto out_drop_write;
	...

'host_err' is guaranteed to be 0 by the time we test 'ndentry->d_inode'.
If 'host_err' becomes != 0 inside the 'if' statement, then we goto
'out_drop_write'. So, after the 'if' statement there is no way that
'host_err' can be anything but 0, so the test afterwards is just dead
code.
This patch removes the dead code.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-03-07 12:05:14 -05:00
J. Bruce Fields 83f6b0c182 nfsd: break lease on unlink due to rename
4795bb37ef "nfsd: break lease on unlink,
link, and rename", only broke the lease on the file that was being
renamed, and didn't handle the case where the target path refers to an
already-existing file that will be unlinked by a rename--in that case
the target file should have any leases broken as well.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-02-14 10:35:19 -05:00
Konstantin Khorenko 3aa6e0aa8a NFSD: memory corruption due to writing beyond the stat array
If nfsd fails to find an exported via NFS file in the readahead cache, it
should increment corresponding nfsdstats counter (ra_depth[10]), but due to a
bug it may instead write to ra_depth[11], corrupting the following field.

In a kernel with NFSDv4 compiled in the corruption takes the form of an
increment of a counter of the number of NFSv4 operation 0's received; since
there is no operation 0, this is harmless.

In a kernel with NFSDv4 disabled it corrupts whatever happens to be in the
memory beyond nfsdstats.

Signed-off-by: Konstantin Khorenko <khorenko@openvz.org>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-02-14 10:35:18 -05:00
J. Bruce Fields 541ce98c10 nfsd: don't leak dentry count on mnt_want_write failure
The exit cleanup isn't quite right here.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-02-14 10:31:08 -05:00
Linus Torvalds f8206b925f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (23 commits)
  sanitize vfsmount refcounting changes
  fix old umount_tree() breakage
  autofs4: Merge the remaining dentry ops tables
  Unexport do_add_mount() and add in follow_automount(), not ->d_automount()
  Allow d_manage() to be used in RCU-walk mode
  Remove a further kludge from __do_follow_link()
  autofs4: Bump version
  autofs4: Add v4 pseudo direct mount support
  autofs4: Fix wait validation
  autofs4: Clean up autofs4_free_ino()
  autofs4: Clean up dentry operations
  autofs4: Clean up inode operations
  autofs4: Remove unused code
  autofs4: Add d_manage() dentry operation
  autofs4: Add d_automount() dentry operation
  Remove the automount through follow_link() kludge code from pathwalk
  CIFS: Use d_automount() rather than abusing follow_link()
  NFS: Use d_automount() rather than abusing follow_link()
  AFS: Use d_automount() rather than abusing follow_link()
  Add an AT_NO_AUTOMOUNT flag to suppress terminal automount
  ...
2011-01-16 11:31:50 -08:00
David Howells cc53ce53c8 Add a dentry op to allow processes to be held during pathwalk transit
Add a dentry op (d_manage) to permit a filesystem to hold a process and make it
sleep when it tries to transit away from one of that filesystem's directories
during a pathwalk.  The operation is keyed off a new dentry flag
(DCACHE_MANAGE_TRANSIT).

The filesystem is allowed to be selective about which processes it holds and
which it permits to continue on or prohibits from transiting from each flagged
directory.  This will allow autofs to hold up client processes whilst letting
its userspace daemon through to maintain the directory or the stuff behind it
or mounted upon it.

The ->d_manage() dentry operation:

	int (*d_manage)(struct path *path, bool mounting_here);

takes a pointer to the directory about to be transited away from and a flag
indicating whether the transit is undertaken by do_add_mount() or
do_move_mount() skipping through a pile of filesystems mounted on a mountpoint.

It should return 0 if successful and to let the process continue on its way;
-EISDIR to prohibit the caller from skipping to overmounted filesystems or
automounting, and to use this directory; or some other error code to return to
the user.

->d_manage() is called with namespace_sem writelocked if mounting_here is true
and no other locks held, so it may sleep.  However, if mounting_here is true,
it may not initiate or wait for a mount or unmount upon the parameter
directory, even if the act is actually performed by userspace.

Within fs/namei.c, follow_managed() is extended to check with d_manage() first
on each managed directory, before transiting away from it or attempting to
automount upon it.

follow_down() is renamed follow_down_one() and should only be used where the
filesystem deliberately intends to avoid management steps (e.g. autofs).

A new follow_down() is added that incorporates the loop done by all other
callers of follow_down() (do_add/move_mount(), autofs and NFSD; whilst AFS, NFS
and CIFS do use it, their use is removed by converting them to use
d_automount()).  The new follow_down() calls d_manage() as appropriate.  It
also takes an extra parameter to indicate if it is being called from mount code
(with namespace_sem writelocked) which it passes to d_manage().  follow_down()
ignores automount points so that it can be used to mount on them.

__follow_mount_rcu() is made to abort rcu-walk mode if it hits a directory with
DCACHE_MANAGE_TRANSIT set on the basis that we're probably going to have to
sleep.  It would be possible to enter d_manage() in rcu-walk mode too, and have
that determine whether to abort or not itself.  That would allow the autofs
daemon to continue on in rcu-walk mode.

Note that DCACHE_MANAGE_TRANSIT on a directory should be cleared when it isn't
required as every tranist from that directory will cause d_manage() to be
invoked.  It can always be set again when necessary.

==========================
WHAT THIS MEANS FOR AUTOFS
==========================

Autofs currently uses the lookup() inode op and the d_revalidate() dentry op to
trigger the automounting of indirect mounts, and both of these can be called
with i_mutex held.

autofs knows that the i_mutex will be held by the caller in lookup(), and so
can drop it before invoking the daemon - but this isn't so for d_revalidate(),
since the lock is only held on _some_ of the code paths that call it.  This
means that autofs can't risk dropping i_mutex from its d_revalidate() function
before it calls the daemon.

The bug could manifest itself as, for example, a process that's trying to
validate an automount dentry that gets made to wait because that dentry is
expired and needs cleaning up:

	mkdir         S ffffffff8014e05a     0 32580  24956
	Call Trace:
	 [<ffffffff885371fd>] :autofs4:autofs4_wait+0x674/0x897
	 [<ffffffff80127f7d>] avc_has_perm+0x46/0x58
	 [<ffffffff8009fdcf>] autoremove_wake_function+0x0/0x2e
	 [<ffffffff88537be6>] :autofs4:autofs4_expire_wait+0x41/0x6b
	 [<ffffffff88535cfc>] :autofs4:autofs4_revalidate+0x91/0x149
	 [<ffffffff80036d96>] __lookup_hash+0xa0/0x12f
	 [<ffffffff80057a2f>] lookup_create+0x46/0x80
	 [<ffffffff800e6e31>] sys_mkdirat+0x56/0xe4

versus the automount daemon which wants to remove that dentry, but can't
because the normal process is holding the i_mutex lock:

	automount     D ffffffff8014e05a     0 32581      1              32561
	Call Trace:
	 [<ffffffff80063c3f>] __mutex_lock_slowpath+0x60/0x9b
	 [<ffffffff8000ccf1>] do_path_lookup+0x2ca/0x2f1
	 [<ffffffff80063c89>] .text.lock.mutex+0xf/0x14
	 [<ffffffff800e6d55>] do_rmdir+0x77/0xde
	 [<ffffffff8005d229>] tracesys+0x71/0xe0
	 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

which means that the system is deadlocked.

This patch allows autofs to hold up normal processes whilst the daemon goes
ahead and does things to the dentry tree behind the automouter point without
risking a deadlock as almost no locks are held in d_manage() and none in
d_automount().

Signed-off-by: David Howells <dhowells@redhat.com>
Was-Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-15 20:07:31 -05:00
Linus Torvalds 18bce371ae Merge branch 'for-2.6.38' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.38' of git://linux-nfs.org/~bfields/linux: (62 commits)
  nfsd4: fix callback restarting
  nfsd: break lease on unlink, link, and rename
  nfsd4: break lease on nfsd setattr
  nfsd: don't support msnfs export option
  nfsd4: initialize cb_per_client
  nfsd4: allow restarting callbacks
  nfsd4: simplify nfsd4_cb_prepare
  nfsd4: give out delegations more quickly in 4.1 case
  nfsd4: add helper function to run callbacks
  nfsd4: make sure sequence flags are set after destroy_session
  nfsd4: re-probe callback on connection loss
  nfsd4: set sequence flag when backchannel is down
  nfsd4: keep finer-grained callback status
  rpc: allow xprt_class->setup to return a preexisting xprt
  rpc: keep backchannel xprt as long as server connection
  rpc: move sk_bc_xprt to svc_xprt
  nfsd4: allow backchannel recovery
  nfsd4: support BIND_CONN_TO_SESSION
  nfsd4: modify session list under cl_lock
  Documentation: fl_mylease no longer exists
  ...

Fix up conflicts in fs/nfsd/vfs.c with the vfs-scale work.  The
vfs-scale work touched some msnfs cases, and this merge removes support
for that entirely, so the conflict was trivial to resolve.
2011-01-14 13:17:26 -08:00
J. Bruce Fields 4795bb37ef nfsd: break lease on unlink, link, and rename
Any change to any of the links pointing to an entry should also break
delegations.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-01-13 21:04:09 -05:00
J. Bruce Fields 6a76bebefe nfsd4: break lease on nfsd setattr
Leases (delegations) should really be broken on any metadata change, not
just on size change.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-01-13 21:04:08 -05:00
J. Bruce Fields 9ce137eee4 nfsd: don't support msnfs export option
We've long had these pointless #ifdef MSNFS's sprinkled throughout the
code--pointless because MSNFS is always defined (and we give no config
option to make that easy to change).  So we could just remove the
ifdef's and compile the resulting code unconditionally.

But as long as we're there: why not just rip out this code entirely?
The only purpose is to implement the "msnfs" export option which turns
on Windows-like behavior in some cases, and:

	- the export option isn't documented anywhere;
	- the userland utilities (which would need to be able to parse
	  "msnfs" in an export file) don't support it;
	- I don't know how to maintain this, as I don't know what the
	  proper behavior is; and
	- google shows no evidence that anyone has ever used this.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-01-13 21:04:07 -05:00
Linus Torvalds 275220f0fc Merge branch 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block: (43 commits)
  block: ensure that completion error gets properly traced
  blktrace: add missing probe argument to block_bio_complete
  block cfq: don't use atomic_t for cfq_group
  block cfq: don't use atomic_t for cfq_queue
  block: trace event block fix unassigned field
  block: add internal hd part table references
  block: fix accounting bug on cross partition merges
  kref: add kref_test_and_get
  bio-integrity: mark kintegrityd_wq highpri and CPU intensive
  block: make kblockd_workqueue smarter
  Revert "sd: implement sd_check_events()"
  block: Clean up exit_io_context() source code.
  Fix compile warnings due to missing removal of a 'ret' variable
  fs/block: type signature of major_to_index(int) to major_to_index(unsigned)
  block: convert !IS_ERR(p) && p to !IS_ERR_NOR_NULL(p)
  cfq-iosched: don't check cfqg in choose_service_tree()
  fs/splice: Pull buf->ops->confirm() from splice_from_pipe actors
  cdrom: export cdrom_check_events()
  sd: implement sd_check_events()
  sr: implement sr_check_events()
  ...
2011-01-13 10:45:01 -08:00
Nick Piggin b7ab39f631 fs: dcache scale dentry refcount
Make d_count non-atomic and protect it with d_lock. This allows us to ensure a
0 refcount dentry remains 0 without dcache_lock. It is also fairly natural when
we start protecting many other dentry members with d_lock.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:21 +11:00
J. Bruce Fields 2ca72e17e5 nfsd4: move idmap and acl header files into fs/nfsd
These are internal nfsd interfaces.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-01-04 18:22:09 -05:00
J. Bruce Fields da165dd60e nfsd: remove some unnecessary dropit handling
We no longer need a few of these special cases.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-01-04 16:49:23 -05:00
Jens Axboe 3603b8eacc Fix compile warnings due to missing removal of a 'ret' variable
Commit a8adbe3 forgot to remove the return variable, kill it.

drivers/block/loop.c: In function 'lo_splice_actor':
drivers/block/loop.c:398: warning: unused variable 'ret'
[...]
fs/nfsd/vfs.c: In function 'nfsd_splice_actor':
fs/nfsd/vfs.c:848: warning: unused variable 'ret'

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-12-20 09:15:19 +01:00
Michał Mirosław a8adbe378b fs/splice: Pull buf->ops->confirm() from splice_from_pipe actors
This patch pulls calls to buf->ops->confirm() from all actors passed
(also indirectly) to splice_from_pipe_feed().

Is avoiding the call to buf->ops->confirm() while splice()ing to
/dev/null is an intentional optimization? No other user does that
and this will remove this special case.

Against current linux.git 6313e3c217.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-12-17 08:56:44 +01:00
Christoph Hellwig c37650161a fs: add sync_inode_metadata
Add a new helper to write out the inode using the writeback code,
that is including the correct dirty bit and list manipulation.  A few
of filesystems already opencode this, and a lot of others should be
using it instead of using write_inode_now which also writes out the
data.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25 21:18:19 -04:00
Takashi Iwai f6360efb83 nfsd: fix NULL dereference in nfsd_statfs()
The commit ebabe9a900
    pass a struct path to vfs_statfs
introduced the struct path initialization, and this seems to trigger
an Oops on my machine.

fh_dentry field may be NULL and set later in fh_verify(), thus the
initialization of path must be after fh_verify().

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-08-26 13:23:16 -04:00
Linus Torvalds 8c8946f509 Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notify
* 'for-linus' of git://git.infradead.org/users/eparis/notify: (132 commits)
  fanotify: use both marks when possible
  fsnotify: pass both the vfsmount mark and inode mark
  fsnotify: walk the inode and vfsmount lists simultaneously
  fsnotify: rework ignored mark flushing
  fsnotify: remove global fsnotify groups lists
  fsnotify: remove group->mask
  fsnotify: remove the global masks
  fsnotify: cleanup should_send_event
  fanotify: use the mark in handler functions
  audit: use the mark in handler functions
  dnotify: use the mark in handler functions
  inotify: use the mark in handler functions
  fsnotify: send fsnotify_mark to groups in event handling functions
  fsnotify: Exchange list heads instead of moving elements
  fsnotify: srcu to protect read side of inode and vfsmount locks
  fsnotify: use an explicit flag to indicate fsnotify_destroy_mark has been called
  fsnotify: use _rcu functions for mark list traversal
  fsnotify: place marks on object in order of group memory address
  vfs/fsnotify: fsnotify_close can delay the final work in fput
  fsnotify: store struct file not struct path
  ...

Fix up trivial delete/modify conflict in fs/notify/inotify/inotify.c.
2010-08-10 11:39:13 -07:00
Linus Torvalds 5f248c9c25 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (96 commits)
  no need for list_for_each_entry_safe()/resetting with superblock list
  Fix sget() race with failing mount
  vfs: don't hold s_umount over close_bdev_exclusive() call
  sysv: do not mark superblock dirty on remount
  sysv: do not mark superblock dirty on mount
  btrfs: remove junk sb_dirt change
  BFS: clean up the superblock usage
  AFFS: wait for sb synchronization when needed
  AFFS: clean up dirty flag usage
  cifs: truncate fallout
  mbcache: fix shrinker function return value
  mbcache: Remove unused features
  add f_flags to struct statfs(64)
  pass a struct path to vfs_statfs
  update VFS documentation for method changes.
  All filesystems that need invalidate_inode_buffers() are doing that explicitly
  convert remaining ->clear_inode() to ->evict_inode()
  Make ->drop_inode() just return whether inode needs to be dropped
  fs/inode.c:clear_inode() is gone
  fs/inode.c:evict() doesn't care about delete vs. non-delete paths now
  ...

Fix up trivial conflicts in fs/nilfs2/super.c
2010-08-10 11:26:52 -07:00
Christoph Hellwig ebabe9a900 pass a struct path to vfs_statfs
We'll need the path to implement the flags field for statvfs support.
We do have it available in all callers except:

 - ecryptfs_statfs.  This one doesn't actually need vfs_statfs but just
   needs to do a caller to the lower filesystem statfs method.
 - sys_ustat.  Add a non-exported statfs_by_dentry helper for it which
   doesn't won't be able to fill out the flags field later on.

In addition rename the helpers for statfs vs fstatfs to do_*statfs instead
of the misleading vfs prefix.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:48:42 -04:00
Linus Torvalds 0d9f9e122c Merge branch 'for-2.6.36' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.36' of git://linux-nfs.org/~bfields/linux: (34 commits)
  nfsd4: fix file open accounting for RDWR opens
  nfsd: don't allow setting maxblksize after svc created
  nfsd: initialize nfsd versions before creating svc
  net: sunrpc: removed duplicated #include
  nfsd41: Fix a crash when a callback is retried
  nfsd: fix startup/shutdown order bug
  nfsd: minor nfsd read api cleanup
  gcc-4.6: nfsd: fix initialized but not read warnings
  nfsd4: share file descriptors between stateid's
  nfsd4: fix openmode checking on IO using lock stateid
  nfsd4: miscellaneous process_open2 cleanup
  nfsd4: don't pretend to support write delegations
  nfsd: bypass readahead cache when have struct file
  nfsd: minor nfsd_svc() cleanup
  nfsd: move more into nfsd_startup()
  nfsd: just keep single lockd reference for nfsd
  nfsd: clean up nfsd_create_serv error handling
  nfsd: fix error handling in __write_ports_addxprt
  nfsd: fix error handling when starting nfsd with rpcbind down
  nfsd4: fix v4 state shutdown error paths
  ...
2010-08-07 14:24:41 -07:00
J. Bruce Fields 039a87ca53 nfsd: minor nfsd read api cleanup
Christoph points that the NFSv2/v3 callers know which case they want
here, so we may as well just call the file=NULL case directly instead of
making this conditional.

Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-07-30 12:54:54 -04:00
Andi Kleen 6904996101 gcc-4.6: nfsd: fix initialized but not read warnings
Fixes at least one real minor bug: the nfs4 recovery dir sysctl
would not return its status properly.

Also I finished Al's 1e41568d73 ("Take ima_path_check() in nfsd
past dentry_open() in nfsd_open()") commit, it moved the IMA
code, but left the old path initializer in there.

The rest is just dead code removed I think, although I was not
fully sure about the "is_borc" stuff. Some more review
would be still good.

Found by gcc 4.6's new warnings.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-07-29 19:32:17 -04:00
Eric Paris 2a12a9d781 fsnotify: pass a file instead of an inode to open, read, and write
fanotify, the upcoming notification system actually needs a struct path so it can
do opens in the context of listeners, and it needs a file so it can get f_flags
from the original process.  Close was the only operation that already was passing
a struct file to the notification hook.  This patch passes a file for access,
modify, and open as well as they are easily available to these hooks.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:32 -04:00
J. Bruce Fields fa0a21269f nfsd: bypass readahead cache when have struct file
The readahead cache compensates for the fact that the NFS server
currently does an open and close on every IO operation in the NFSv2 and
NFSv3 case.

In the NFSv4 case we have long-lived struct files associated with client
opens, so there's no need for this.  In fact, concurrent IO's using
trying to modify the same file->f_ra may cause problems.

So, don't bother with the readahead cache in that case.

Note eventually we'll likely do this in the v2/v3 case as well by
keeping a cache of struct files instead of struct file_ra_state's.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-07-27 18:15:54 -04:00
J. Bruce Fields 55b13354d7 nfsd: remove unused assignment from nfsd_link
Trivial cleanup, since "dest" is never used.

Reported-by: Anshul Madan <Anshul.Madan@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-07-23 08:50:39 -04:00
J. Bruce Fields 6a85d6c769 nfsd4: comment nitpick
Reported-by: "Madan, Anshul" <Anshul.Madan@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-07-06 12:40:22 -04:00
Linus Torvalds b95a568093 Merge branch 'for-2.6.35' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.35' of git://linux-nfs.org/~bfields/linux:
  nfsd4: shut down callback queue outside state lock
  nfsd: nfsd_setattr needs to call commit_metadata
2010-06-09 12:43:04 -07:00
J. Bruce Fields 44b56603c4 Merge branch 'for-2.6.34-incoming' into for-2.6.35-incoming 2010-06-08 20:05:18 -04:00
Christoph Hellwig b160fdabe9 nfsd: nfsd_setattr needs to call commit_metadata
The conversion of write_inode_now calls to commit_metadata in commit
f501912a35 missed out the call in nfsd_setattr.

But without this conversion we can't guarantee that a SETATTR request
has actually been commited to disk with XFS, which causes a regression
from 2.6.32 (only for NFSv2, but anyway).

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-06-01 19:17:50 -04:00
Christoph Hellwig 8018ab0574 sanitize vfs_fsync calling conventions
Now that the last user passing a NULL file pointer is gone we can remove
the redundant dentry argument and associated hacks inside vfs_fsynmc_range.

The next step will be removig the dentry argument from ->fsync, but given
the luck with the last round of method prototype changes I'd rather
defer this until after the main merge window.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-21 18:31:21 -04:00
J. Bruce Fields 5306293c9c Merge commit 'v2.6.34-rc6'
Conflicts:
	fs/nfsd/nfs4callback.c
2010-05-04 11:29:05 -04:00
Tejun Heo 5a0e3ad6af include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30 22:02:32 +09:00
Jeff Layton 91885258e8 nfsd: don't break lease while servicing a COMMIT
This is the second attempt to fix the problem whereby a COMMIT call
causes a lease break and triggers a possible deadlock.

The problem is that nfsd attempts to break a lease on a COMMIT call.
This triggers a delegation recall if the lease is held for a delegation.
If the client is the one holding the delegation and it's the same one on
which it's issuing the COMMIT, then it can't return that delegation
until the COMMIT is complete. But, nfsd won't complete the COMMIT until
the delegation is returned. The client and server are essentially
deadlocked until the state is marked bad (due to the client not
responding on the callback channel).

The first patch attempted to deal with this by eliminating the open of
the file altogether and simply had nfsd_commit pass a NULL file pointer
to the vfs_fsync_range. That would conflict with some work in progress
by Christoph Hellwig to clean up the fsync interface, so this patch
takes a different approach.

This declares a new NFSD_MAY_NOT_BREAK_LEASE access flag that indicates
to nfsd_open that it should not break any leases when opening the file,
and has nfsd_commit set that flag on the nfsd_open call.

For now, this patch leaves nfsd_commit opening the file with write
access since I'm not clear on what sort of access would be more
appropriate.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-03-22 15:37:53 -04:00
Linus Torvalds 05c5cb31ec Merge branch 'for-2.6.34' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.34' of git://linux-nfs.org/~bfields/linux: (22 commits)
  nfsd4: fix minor memory leak
  svcrpc: treat uid's as unsigned
  nfsd: ensure sockets are closed on error
  Revert "sunrpc: move the close processing after do recvfrom method"
  Revert "sunrpc: fix peername failed on closed listener"
  sunrpc: remove unnecessary svc_xprt_put
  NFSD: NFSv4 callback client should use RPC_TASK_SOFTCONN
  xfs_export_operations.commit_metadata
  commit_metadata export operation replacing nfsd_sync_dir
  lockd: don't clear sm_monitored on nsm_reboot_lookup
  lockd: release reference to nsm_handle in nlm_host_rebooted
  nfsd: Use vfs_fsync_range() in nfsd_commit
  NFSD: Create PF_INET6 listener in write_ports
  SUNRPC: NFS kernel APIs shouldn't return ENOENT for "transport not found"
  SUNRPC: Bury "#ifdef IPV6" in svc_create_xprt()
  NFSD: Support AF_INET6 in svc_addsock() function
  SUNRPC: Use rpc_pton() in ip_map_parse()
  nfsd: 4.1 has an rfc number
  nfsd41: Create the recovery entry for the NFSv4.1 client
  nfsd: use vfs_fsync for non-directories
  ...
2010-03-06 11:31:38 -08:00
Linus Torvalds e213e26ab3 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (33 commits)
  quota: stop using QUOTA_OK / NO_QUOTA
  dquot: cleanup dquot initialize routine
  dquot: move dquot initialization responsibility into the filesystem
  dquot: cleanup dquot drop routine
  dquot: move dquot drop responsibility into the filesystem
  dquot: cleanup dquot transfer routine
  dquot: move dquot transfer responsibility into the filesystem
  dquot: cleanup inode allocation / freeing routines
  dquot: cleanup space allocation / freeing routines
  ext3: add writepage sanity checks
  ext3: Truncate allocated blocks if direct IO write fails to update i_size
  quota: Properly invalidate caches even for filesystems with blocksize < pagesize
  quota: generalize quota transfer interface
  quota: sb_quota state flags cleanup
  jbd: Delay discarding buffers in journal_unmap_buffer
  ext3: quota_write cross block boundary behaviour
  quota: drop permission checks from xfs_fs_set_xstate/xfs_fs_set_xquota
  quota: split out compat_sys_quotactl support from quota.c
  quota: split out netlink notification support from quota.c
  quota: remove invalid optimization from quota_sync_all
  ...

Fixed trivial conflicts in fs/namei.c and fs/ufs/inode.c
2010-03-05 13:20:53 -08:00
Christoph Hellwig 907f4554e2 dquot: move dquot initialization responsibility into the filesystem
Currently various places in the VFS call vfs_dq_init directly.  This means
we tie the quota code into the VFS.  Get rid of that and make the
filesystem responsible for the initialization.   For most metadata operations
this is a straight forward move into the methods, but for truncate and
open it's a bit more complicated.

For truncate we currently only call vfs_dq_init for the sys_truncate case
because open already takes care of it for ftruncate and open(O_TRUNC) - the
new code causes an additional vfs_dq_init for those which is harmless.

For open the initialization is moved from do_filp_open into the open method,
which means it happens slightly earlier now, and only for regular files.
The latter is fine because we don't need to initialize it for operations
on special files, and we already do it as part of the namespace operations
for directories.

Add a dquot_file_open helper that filesystems that support generic quotas
can use to fill in ->open.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-03-05 00:20:30 +01:00
J. Bruce Fields 4ea41e2de5 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs into for-2.6.34-incoming
Resolve merge conflict in fs/xfs/linux-2.6/xfs_export.c.
2010-03-04 12:04:51 -05:00
Al Viro 8737c9305b Switch may_open() and break_lease() to passing O_...
... instead of mixing FMODE_ and O_

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-03 13:00:21 -05:00
Ben Myers f501912a35 commit_metadata export operation replacing nfsd_sync_dir
- Add commit_metadata export_operation to allow the underlying filesystem to
decide how to commit an inode most efficiently.

- Usage of nfsd_sync_dir and write_inode_now has been replaced with the
commit_metadata function that takes a svc_fh.

- The commit_metadata function calls the commit_metadata export_op if it's
there, or else falls back to sync_inode instead of fsync and write_inode_now
because only metadata need be synced here.

- nfsd4_sync_rec_dir now uses vfs_fsync so that commit_metadata can be static

Signed-off-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-02-20 13:13:44 -08:00
Chuck Ebbert aeaa5ccd64 vfs: don't call ima_file_check() unconditionally in nfsd_open()
commit 1e41568d73 ("Take ima_path_check()
in nfsd past dentry_open() in nfsd_open()") moved this code back to its
original location but missed the "else".

Signed-off-by: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-02-20 00:47:31 -05:00
Mimi Zohar 9bbb6cad01 ima: rename ima_path_check to ima_file_check
ima_path_check actually deals with files!  call it ima_file_check instead.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-02-07 03:06:22 -05:00
Mimi Zohar 8eb988c70e fix ima breakage
The "Untangling ima mess, part 2 with counters" patch messed
up the counters.  Based on conversations with Al Viro, this patch
streamlines ima_path_check() by removing the counter maintaince.
The counters are now updated independently, from measuring the file,
in __dentry_open() and alloc_file() by calling ima_counts_get().
ima_path_check() is called from nfsd and do_filp_open().
It also did not measure all files that should have been measured.
Reason: ima_path_check() got bogus value passed as mask.
[AV: mea culpa]
[AV: add missing nfsd bits]

Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-02-07 03:06:22 -05:00
Al Viro 1e41568d73 Take ima_path_check() in nfsd past dentry_open() in nfsd_open()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-02-07 03:06:22 -05:00
Trond Myklebust aa696a6f34 nfsd: Use vfs_fsync_range() in nfsd_commit
The NFS COMMIT operation allows the client to specify the exact byte range
that it wishes to sync to disk in order to optimise server performance.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-01-29 18:53:11 -05:00
Christoph Hellwig 6a68f89ee1 nfsd: use vfs_fsync for non-directories
Instead of opencoding the fsync calling sequence use vfs_fsync.  This also
gets rid of the useless i_mutex over the data writeout.

Consolidate the remaining special code for syncing directories and document
it's quirks.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-01-13 09:42:26 -05:00
Linus Torvalds 93939f4e5d Merge branch 'for-2.6.33' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.33' of git://linux-nfs.org/~bfields/linux:
  sunrpc: fix peername failed on closed listener
  nfsd: make sure data is on disk before calling ->fsync
  nfsd: fix "insecure" export option
2010-01-06 18:10:15 -08:00
Christoph Hellwig 7211a4e859 nfsd: make sure data is on disk before calling ->fsync
nfsd is not using vfs_fsync, so I missed it when changing the calling
convention during the 2.6.32 window.  This patch fixes it to not only
start the data writeout, but also wait for it to complete before calling
into ->fsync.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-01-06 17:37:26 -05:00
Linus Torvalds bac5e54c29 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (38 commits)
  direct I/O fallback sync simplification
  ocfs: stop using do_sync_mapping_range
  cleanup blockdev_direct_IO locking
  make generic_acl slightly more generic
  sanitize xattr handler prototypes
  libfs: move EXPORT_SYMBOL for d_alloc_name
  vfs: force reval of target when following LAST_BIND symlinks (try #7)
  ima: limit imbalance msg
  Untangling ima mess, part 3: kill dead code in ima
  Untangling ima mess, part 2: deal with counters
  Untangling ima mess, part 1: alloc_file()
  O_TRUNC open shouldn't fail after file truncation
  ima: call ima_inode_free ima_inode_free
  IMA: clean up the IMA counts updating code
  ima: only insert at inode creation time
  ima: valid return code from ima_inode_alloc
  fs: move get_empty_filp() deffinition to internal.h
  Sanitize exec_permission_lite()
  Kill cached_lookup() and real_lookup()
  Kill path_lookup_open()
  ...

Trivial conflicts in fs/direct-io.c
2009-12-16 12:04:02 -08:00
Al Viro 1429b3eca2 Untangling ima mess, part 3: kill dead code in ima
Kill the 'update' argument of ima_path_check(), kill
dead code in ima.

Current rules: ima counters are bumped at the same time
when the file switches from put_filp() fodder to fput()
one.  Which happens exactly in two places - alloc_file()
and __dentry_open().  Nothing else needs to do that at
all.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-12-16 12:16:47 -05:00
Al Viro b65a9cfc2c Untangling ima mess, part 2: deal with counters
* do ima_get_count() in __dentry_open()
* stop doing that in followups
* move ima_path_check() to right after nameidata_to_filp()
* don't bump counters on it

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-12-16 12:16:47 -05:00
J. Bruce Fields 7663dacd92 nfsd: remove pointless paths in file headers
The new .h files have paths at the top that are now out of date.  While
we're here, just remove all of those from fs/nfsd; they never served any
purpose.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-12-15 15:01:47 -05:00
Steve Dickson 03a816b46d nfsd: restrict filehandles accepted in V4ROOT case
On V4ROOT exports, only accept filehandles that are the *root* of some
export.  This allows mountd to allow or deny access to individual
directories and symlinks on the pseudofilesystem.

Note that the checks in readdir and lookup are not enough, since a
malicious host with access to the network could guess filehandles that
they weren't able to obtain through lookup or readdir.

Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-12-15 14:07:24 -05:00
J. Bruce Fields 3227fa41ab nfsd: filter readdir results in V4ROOT case
As with lookup, we treat every boject as a mountpoint and pretend it
doesn't exist if it isn't exported.

The preexisting code here is confusing, but I haven't yet figured out
how to make it clearer.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-12-15 14:07:24 -05:00
J. Bruce Fields 82ead7fe41 nfsd: filter lookup results in V4ROOT case
We treat every object as a mountpoint and pretend it doesn't exist if
it isn't exported.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-12-15 14:07:23 -05:00
J. Bruce Fields 3b6cee7bc4 nfsd4: don't continue "under" mounts in V4ROOT case
If /A/mount/point/ has filesystem "B" mounted on top of it, and if "A"
is exported, but not "B", then the nfs server has always returned to the
client a filehandle for the mountpoint, instead of for the root of "B",
allowing the client to see the subtree of "A" that would otherwise be
hidden by B.

Disable this behavior in the case of V4ROOT exports; we implement the
path restrictions of V4ROOT exports by treating *every* directory as if
it were a mountpoint, and allowing traversal *only* if the new directory
is exported.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-12-15 14:07:23 -05:00
Boaz Harrosh 9a74af2133 nfsd: Move private headers to source directory
Lots of include/linux/nfsd/* headers are only used by
nfsd module. Move them to the source directory

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-12-14 18:12:12 -05:00
Boaz Harrosh 341eb18446 nfsd: Source files #include cleanups
Now that the headers are fixed and carry their own wait, all fs/nfsd/
source files can include a minimal set of headers. and still compile just
fine.

This patch should improve the compilation speed of the nfsd module.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-12-14 18:12:09 -05:00
J. Bruce Fields 0a3adadee4 nfsd: make fs/nfsd/vfs.h for common includes
None of this stuff is used outside nfsd, so move it out of the common
linux include directory.

Actually, probably none of the stuff in include/linux/nfsd/nfsd.h really
belongs there, so later we may remove that file entirely.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-11-13 13:23:02 -05:00
J. Bruce Fields 289ede453e nfsd: minor nfsd_lookup cleanup
Break out some of nfsd_lookup_dentry into helper functions.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-09-28 12:07:53 -04:00
J. Bruce Fields fed8381126 nfsd4: cross mountpoints when looking up parents
3c394ddaa7 "nfsd4: nfsv4 clients should
cross mountpoints" forgot to handle lookups of parents directories.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-09-28 12:07:52 -04:00
Linus Torvalds a87e84b5cd Merge branch 'for-2.6.32' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.32' of git://linux-nfs.org/~bfields/linux: (68 commits)
  nfsd4: nfsv4 clients should cross mountpoints
  nfsd: revise 4.1 status documentation
  sunrpc/cache: avoid variable over-loading in cache_defer_req
  sunrpc/cache: use list_del_init for the list_head entries in cache_deferred_req
  nfsd: return success for non-NFS4 nfs4_state_start
  nfsd41: Refactor create_client()
  nfsd41: modify nfsd4.1 backchannel to use new xprt class
  nfsd41: Backchannel: Implement cb_recall over NFSv4.1
  nfsd41: Backchannel: cb_sequence callback
  nfsd41: Backchannel: Setup sequence information
  nfsd41: Backchannel: Server backchannel RPC wait queue
  nfsd41: Backchannel: Add sequence arguments to callback RPC arguments
  nfsd41: Backchannel: callback infrastructure
  nfsd4: use common rpc_cred for all callbacks
  nfsd4: allow nfs4 state startup to fail
  SUNRPC: Defer the auth_gss upcall when the RPC call is asynchronous
  nfsd4: fix null dereference creating nfsv4 callback client
  nfsd4: fix whitespace in NFSPROC4_CLNT_CB_NULL definition
  nfsd41: sunrpc: add new xprt class for nfsv4.1 backchannel
  sunrpc/cache: simplify cache_fresh_locked and cache_fresh_unlocked.
  ...
2009-09-22 07:54:33 -07:00
Steve Dickson 3c394ddaa7 nfsd4: nfsv4 clients should cross mountpoints
Allow NFS v4 clients to seamlessly cross mount point without
have to set either the 'crossmnt' or the 'nohide' export
options.

Signed-Off-By: Steve Dickson <steved@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-09-21 16:02:25 -04:00
David Howells e0e817392b CRED: Add some configurable debugging [try #6]
Add a config option (CONFIG_DEBUG_CREDENTIALS) to turn on some debug checking
for credential management.  The additional code keeps track of the number of
pointers from task_structs to any given cred struct, and checks to see that
this number never exceeds the usage count of the cred struct (which includes
all references, not just those from task_structs).

Furthermore, if SELinux is enabled, the code also checks that the security
pointer in the cred struct is never seen to be invalid.

This attempts to catch the bug whereby inode_has_perm() faults in an nfsd
kernel thread on seeing cred->security be a NULL pointer (it appears that the
credential struct has been previously released):

	http://www.kerneloops.org/oops.php?number=252883

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
2009-09-02 21:29:01 +10:00
David Howells 033a666ccb NFSD: Don't hold unrefcounted creds over call to nfsd_setuser()
nfsd_open() gets an unrefcounted pointer to the current process's effective
credentials at the top of the function, then calls nfsd_setuser() via
fh_verify() - which may replace and destroy the current process's effective
credentials - and then passes the unrefcounted pointer to dentry_open() - but
the credentials may have been destroyed by this point.

Instead, the value from current_cred() should be passed directly to
dentry_open() as one of its arguments, rather than being cached in a variable.

Possibly fh_verify() should return the creds to use.

This is a regression introduced by
745ca2475a "CRED: Pass credentials through
dentry_open()".

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-and-Verified-By: Steve Dickson <steved@redhat.com>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-07-03 10:21:10 -04:00
Linus Torvalds 7e0338c0de Merge branch 'for-2.6.31' of git://fieldses.org/git/linux-nfsd
* 'for-2.6.31' of git://fieldses.org/git/linux-nfsd: (60 commits)
  SUNRPC: Fix the TCP server's send buffer accounting
  nfsd41: Backchannel: minorversion support for the back channel
  nfsd41: Backchannel: cleanup nfs4.0 callback encode routines
  nfsd41: Remove ip address collision detection case
  nfsd: optimise the starting of zero threads when none are running.
  nfsd: don't take nfsd_mutex twice when setting number of threads.
  nfsd41: sanity check client drc maxreqs
  nfsd41: move channel attributes from nfsd4_session to a nfsd4_channel_attr struct
  NFS: kill off complicated macro 'PROC'
  sunrpc: potential memory leak in function rdma_read_xdr
  nfsd: minor nfsd_vfs_write cleanup
  nfsd: Pull write-gathering code out of nfsd_vfs_write
  nfsd: track last inode only in use_wgather case
  sunrpc: align cache_clean work's timer
  nfsd: Use write gathering only with NFSv2
  NFSv4: kill off complicated macro 'PROC'
  NFSv4: do exact check about attribute specified
  knfsd: remove unreported filehandle stats counters
  knfsd: fix reply cache memory corruption
  knfsd: reply cache cleanups
  ...
2009-06-22 12:55:50 -07:00
J. Bruce Fields e4636d535e nfsd: minor nfsd_vfs_write cleanup
There's no need to check host_err >= 0 every time here when we could
check host_err < 0 once, following the usual kernel style.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-06-15 19:18:34 -07:00
J. Bruce Fields d911df7b8d nfsd: Pull write-gathering code out of nfsd_vfs_write
This is a relatively self-contained piece of code that handles a special
case--move it to its own function.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-06-15 18:54:05 -07:00
J. Bruce Fields 9d2a3f31d6 nfsd: track last inode only in use_wgather case
Updating last_ino and last_dev probably isn't useful in the !use_wgather
case.

Also remove some pointless ifdef'd-out code.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-06-15 18:52:47 -07:00
Trond Myklebust 48e03bc515 nfsd: Use write gathering only with NFSv2
NFSv3 and above can use unstable writes whenever they are sending more
than one write, rather than relying on the flaky write gathering
heuristics. More often than not, write gathering is currently getting it
wrong when the NFSv3 clients are sending a single write with FILE_SYNC
for efficiency reasons.

This patch turns off write gathering for NFSv3/v4, and ensures that
it only applies to the one case that can actually benefit: namely NFSv2.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-06-15 18:14:57 -07:00
Al Viro 9393bd07cf switch follow_down()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-11 21:36:01 -04:00
Al Viro bab77ebf51 switch follow_up() to struct path
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-11 21:36:00 -04:00
Al Viro e64c390ca0 switch rqst_exp_parent()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-11 21:36:00 -04:00
Al Viro 91c9fa8f75 switch rqst_exp_get_by_name()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-11 21:36:00 -04:00
James Morris 0b4ec6e4e0 Merge branch 'master' into next 2009-06-09 09:27:53 +10:00
Mimi Zohar 14dba5331b integrity: nfsd imbalance bug fix
An nfsd exported file is opened/closed by the kernel causing the
integrity imbalance message.

Before a file is opened, there normally is permission checking, which
is done in inode_permission().  However, as integrity checking requires
a dentry and mount point, which is not available in inode_permission(),
the integrity (permission) checking must be called separately.

In order to detect any missing integrity checking calls, we keep track
of file open/closes.  ima_path_check() increments these counts and
does the integrity (permission) checking. As a result, the number of
calls to ima_path_check()/ima_file_free() should be balanced.  An extra
call to fput(), indicates the file could have been accessed without first
calling ima_path_check().

In nfsv3 permission checking is done once, followed by multiple reads,
which do an open/close for each read.  The integrity (permission) checking
call should be in nfsd_permission() after the inode_permission() call, but
as there is no correlation between the number of permission checking and
open calls, the integrity checking call should not increment the counters,
but defer it to when the file is actually opened.

This patch adds:
- integrity (permission) checking for nfsd exported files in nfsd_permission().
- a call to increment counts for files opened by nfsd.

This patch has been updated to return the nfs error types.

Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
2009-05-28 09:32:43 +10:00
Wei Yongjun a0d24b295a nfsd: fix hung up of nfs client while sync write data to nfs server
Commit 'Short write in nfsd becomes a full write to the client'
(31dec2538e) broken the sync write.
With the following commands to reproduce:

  $ mount -t nfs -o sync 192.168.0.21:/nfsroot /mnt
  $ cd /mnt
  $ echo aaaa > temp.txt

Then nfs client is hung up.

In SYNC mode the server alaways return the write count 0 to the
client. This is because the value of host_err in nfsd_vfs_write()
will be overwrite in SYNC mode by 'host_err=nfsd_sync(file);',
and then we return host_err(which is now 0) as write count.

This patch fixed the problem.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-05-27 17:40:06 -04:00
David Woodhouse 2f9092e102 Fix i_mutex vs. readdir handling in nfsd
Commit 14f7dd63 ("Copy XFS readdir hack into nfsd code") introduced a
bug to generic code which had been extant for a long time in the XFS
version -- it started to call through into lookup_one_len() and hence
into the file systems' ->lookup() methods without i_mutex held on the
directory.

This patch fixes it by locking the directory's i_mutex again before
calling the filldir functions. The original deadlocks which commit
14f7dd63 was designed to avoid are still avoided, because they were due
to fs-internal locking, not i_mutex.

While we're at it, fix the return type of nfsd_buffered_readdir() which
should be a __be32 not an int -- it's an NFS errno, not a Linux errno.
And return nfserrno(-ENOMEM) when allocation fails, not just -ENOMEM.
Sparse would have caught that, if it wasn't so busy bitching about
__cold__.

Commit 05f4f678 ("nfsd4: don't do lookup within readdir in recovery
code") introduced a similar problem with calling lookup_one_len()
without i_mutex, which this patch also addresses. To fix that, it was
necessary to fix the called functions so that they expect i_mutex to be
held; that part was done by J. Bruce Fields.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Umm-I-can-live-with-that-by: Al Viro <viro@zeniv.linux.org.uk>
Reported-by: J. R. Okajima <hooanon05@yahoo.co.jp>
Tested-by: J. Bruce Fields <bfields@citi.umich.edu>
LKML-Reference: <8036.1237474444@jrobl>
Cc: stable@kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-04-20 23:01:16 -04:00
Al Viro 1644ccc8a9 Safer nfsd_cross_mnt()
AFAICS, we have a subtle bug there: if we have crossed mountpoint
*and* it got mount --move'd away, we'll be holding only one
reference to fs containing dentry - exp->ex_path.mnt.  IOW, we
ought to dput() before exp_put().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-04-20 23:01:15 -04:00
Linus Torvalds a63856252d Merge branch 'for-2.6.30' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.30' of git://linux-nfs.org/~bfields/linux: (81 commits)
  nfsd41: define nfsd4_set_statp as noop for !CONFIG_NFSD_V4
  nfsd41: define NFSD_DRC_SIZE_SHIFT in set_max_drc
  nfsd41: Documentation/filesystems/nfs41-server.txt
  nfsd41: CREATE_EXCLUSIVE4_1
  nfsd41: SUPPATTR_EXCLCREAT attribute
  nfsd41: support for 3-word long attribute bitmask
  nfsd: dynamically skip encoded fattr bitmap in _nfsd4_verify
  nfsd41: pass writable attrs mask to nfsd4_decode_fattr
  nfsd41: provide support for minor version 1 at rpc level
  nfsd41: control nfsv4.1 svc via /proc/fs/nfsd/versions
  nfsd41: add OPEN4_SHARE_ACCESS_WANT nfs4_stateid bmap
  nfsd41: access_valid
  nfsd41: clientid handling
  nfsd41: check encode size for sessions maxresponse cached
  nfsd41: stateid handling
  nfsd: pass nfsd4_compound_state* to nfs4_preprocess_{state,seq}id_op
  nfsd41: destroy_session operation
  nfsd41: non-page DRC for solo sequence responses
  nfsd41: Add a create session replay cache
  nfsd41: create_session operation
  ...
2009-04-06 13:25:56 -07:00
Linus Torvalds 2c9e15a011 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-quota-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-quota-2.6: (27 commits)
  ext2: Zero our b_size in ext2_quota_read()
  trivial: fix typos/grammar errors in fs/Kconfig
  quota: Coding style fixes
  quota: Remove superfluous inlines
  quota: Remove uppercase aliases for quota functions.
  nfsd: Use lowercase names of quota functions
  jfs: Use lowercase names of quota functions
  udf: Use lowercase names of quota functions
  ufs: Use lowercase names of quota functions
  reiserfs: Use lowercase names of quota functions
  ext4: Use lowercase names of quota functions
  ext3: Use lowercase names of quota functions
  ext2: Use lowercase names of quota functions
  ramfs: Remove quota call
  vfs: Use lowercase names of quota functions
  quota: Remove dqbuf_t and other cleanups
  quota: Remove NODQUOT macro
  quota: Make global quota locks cacheline aligned
  quota: Move quota files into separate directory
  ext4: quota reservation for delayed allocation
  ...
2009-03-27 14:48:34 -07:00
Jan Kara 90c0af05a5 nfsd: Use lowercase names of quota functions
Use lowercase names of quota functions instead of old uppercase ones.

CC: bfields@fieldses.org
CC: neilb@suse.de
Signed-off-by: Jan Kara <jack@suse.cz>
2009-03-26 02:18:37 +01:00
Sachin S. Prabhu 0953e620de Inconsistent setattr behaviour
There is an inconsistency seen in the behaviour of nfs compared to other local
filesystems on linux when changing owner or group of a directory. If the
directory has SUID/SGID flags set, on changing owner or group on the directory,
the flags are stripped off on nfs. These flags are maintained on other
filesystems such as ext3.

To reproduce on a nfs share or local filesystem, run the following commands
mkdir test; chmod +s+g test; chown user1 test; ls -ld test

On the nfs share, the flags are stripped and the output seen is
drwxr-xr-x 2 user1 root 4096 Feb 23  2009 test

On other local filesystems(ex: ext3), the flags are not stripped and the output
seen is
drwsr-sr-x 2 user1 root 4096 Feb 23 13:57 test

chown_common() called from sys_chown() will only strip the flags if the inode is
not a directory.
static int chown_common(struct dentry * dentry, uid_t user, gid_t group)
{
..
        if (!S_ISDIR(inode->i_mode))
                newattrs.ia_valid |=
                        ATTR_KILL_SUID | ATTR_KILL_SGID | ATTR_KILL_PRIV;
..
}

See: http://www.opengroup.org/onlinepubs/7990989775/xsh/chown.html

"If the path argument refers to a regular file, the set-user-ID (S_ISUID) and
set-group-ID (S_ISGID) bits of the file mode are cleared upon successful return
from chown(), unless the call is made by a process with appropriate privileges,
in which case it is implementation-dependent whether these bits are altered. If
chown() is successfully invoked on a file that is not a regular file, these
bits may be cleared. These bits are defined in <sys/stat.h>."

The behaviour as it stands does not appear to violate POSIX.  However the
actions performed are inconsistent when comparing ext3 and nfs.

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-03-18 17:59:37 -04:00
David Shaw 31dec2538e Short write in nfsd becomes a full write to the client
If a filesystem being written to via NFS returns a short write count
(as opposed to an error) to nfsd, nfsd treats that as a success for
the entire write, rather than the short count that actually succeeded.

For example, given a 8192 byte write, if the underlying filesystem
only writes 4096 bytes, nfsd will ack back to the nfs client that all
8192 bytes were written.  The nfs client does have retry logic for
short writes, but this is never called as the client is told the
complete write succeeded.

There are probably other ways it could happen, but in my case it
happened with a fuse (filesystem in userspace) filesystem which can
rather easily have a partial write.

Here is a patch to properly return the short write count to the
client.

Signed-off-by: David Shaw <dshaw@jabberwocky.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-03-18 17:38:40 -04:00
wengang wang 4ac35c2f79 nfsd(v2/v3): fix the failure of creation from HPUX client
sometimes HPUX nfs client sends a create request to linux nfs server(v2/v3).
the dump of the request is like:
    obj_attributes
        mode: value follows
            set_it: value follows (1)
            mode: 00
        uid: no value
            set_it: no value (0)
        gid: value follows
            set_it: value follows (1)
            gid: 8030
        size: value follows
            set_it: value follows (1)
            size: 0
        atime: don't change
            set_it: don't change (0)
        mtime: don't change
            set_it: don't change (0)

note that mode is 00(havs no rwx privilege even for the owner) and it requires
to set size to 0.

as current nfsd(v2/v3) implementation, the server does mainly 2 steps:
1) creates the file in mode specified by calling vfs_create().
2) sets attributes for the file by calling nfsd_setattr().

at step 2), it finally calls file system specific setattr() function which may
fail when checking permission because changing size needs WRITE privilege but
it has none since mode is 000.

for this case, a new file created, we may simply ignore the request of
setting size to 0, so that WRITE privilege is not needed and the open
succeeds.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
--
 vfs.c |   19 +++++++++++++++++++
 1 file changed, 19 insertions(+)
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-03-18 17:30:50 -04:00
Jonathan Corbet db1dd4d376 Use f_lock to protect f_flags
Traditionally, changes to struct file->f_flags have been done under BKL
protection, or with no protection at all.  This patch causes all f_flags
changes after file open/creation time to be done under protection of
f_lock.  This allows the removal of some BKL usage and fixes a number of
longstanding (if microscopic) races.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2009-03-16 08:32:27 -06:00
J. Bruce Fields 9a8d248e2d nfsd: fix double-locks of directory mutex
A number of nfsd operations depend on the i_mutex to cover more code
than just the fsync, so the approach of 4c728ef583 "add a vfs_fsync
helper" doesn't work for nfsd.  Revert the parts of those patches that
touch nfsd.

Note: we can't, however, remove the logic from vfs_fsync that was needed
only for the special case of nfsd, because a vfs_fsync(NULL,...) call
can still result indirectly from a stackable filesystem that was called
by nfsd.  (Thanks to Christoph Hellwig for pointing this out.)

Reported-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-07 15:40:45 -05:00
Christoph Hellwig 4c728ef583 add a vfs_fsync helper
Fsync currently has a fdatawrite/fdatawait pair around the method call,
and a mutex_lock/unlock of the inode mutex.  All callers of fsync have
to duplicate this, but we have a few and most of them don't quite get
it right.  This patch adds a new vfs_fsync that takes care of this.
It's a little more complicated as usual as ->fsync might get a NULL file
pointer and just a dentry from nfsd, but otherwise gets afile and we
want to take the mapping and file operations from it when it is there.

Notes on the fsync callers:

 - ecryptfs wasn't calling filemap_fdatawrite / filemap_fdatawait on the
   	lower file
 - coda wasn't calling filemap_fdatawrite / filemap_fdatawait on the host
	file, and returning 0 when ->fsync was missing
 - shm wasn't calling either filemap_fdatawrite / filemap_fdatawait nor
   taking i_mutex.  Now given that shared memory doesn't have disk
   backing not doing anything in fsync seems fine and I left it out of
   the vfs_fsync conversion for now, but in that case we might just
   not pass it through to the lower file at all but just call the no-op
   simple_sync_file directly.

[and now actually export vfs_fsync]

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-05 11:54:28 -05:00
Al Viro acfa4380ef inode->i_op is never NULL
We used to have rather schizophrenic set of checks for NULL ->i_op even
though it had been eliminated years ago.  You'd need to go out of your
way to set it to NULL explicitly _and_ a bunch of code would die on
such inodes anyway.  After killing two remaining places that still
did that bogosity, all that crap can go away.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-05 11:54:28 -05:00
James Morris 2b82892565 Merge branch 'master' into next
Conflicts:
	security/keys/internal.h
	security/keys/process_keys.c
	security/keys/request_key.c

Fixed conflicts above by using the non 'tsk' versions.

Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 11:29:12 +11:00
David Howells 745ca2475a CRED: Pass credentials through dentry_open()
Pass credentials through dentry_open() so that the COW creds patch can have
SELinux's flush_unauthorized_files() pass the appropriate creds back to itself
when it opens its null chardev.

The security_dentry_open() call also now takes a creds pointer, as does the
dentry_open hook in struct security_operations.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:39:22 +11:00
David Howells 5cc0a84076 CRED: Wrap task credential accesses in the NFS daemon
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: J. Bruce Fields <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: linux-nfs@vger.kernel.org
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:38:58 +11:00
Doug Nazar b726e923ea Fix nfsd truncation of readdir results
Commit 8d7c4203 "nfsd: fix failure to set eof in readdir in some
situations" introduced a bug: on a directory in an exported ext3
filesystem with dir_index unset, a READDIR will only return about 250
entries, even if the directory was larger.

Bisected it back to this commit; reverting it fixes the problem.

It turns out that in this case ext3 reads a block at a time, then
returns from readdir, which means we can end up with buf.full==0 but
with more entries in the directory still to be read.  Before 8d7c4203
(but after c002a6c797 "Optimise NFS readdir hack slightly"), this would
cause us to return the READDIR result immediately, but with the eof bit
unset.  That could cause a performance regression (because the client
would need more roundtrips to the server to read the whole directory),
but no loss in correctness, since the cleared eof bit caused the client
to send another readdir.  After 8d7c4203, the setting of the eof bit
made this a correctness problem.

So, move nfserr_eof into the loop and remove the buf.full check so that
we loop until buf.used==0.  The following seems to do the right thing
and reduces the network traffic since we don't return a READDIR result
until the buffer is full.

Tested on an empty directory & large directory; eof is properly sent and
there are no more short buffers.

Signed-off-by: Doug Nazar <nazard@dragoninc.ca>
Cc: David Woodhouse <David.Woodhouse@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-11-09 15:15:50 -05:00
J. Bruce Fields 8d7c4203c6 nfsd: fix failure to set eof in readdir in some situations
Before 14f7dd6320 "[PATCH] Copy XFS
readdir hack into nfsd code", readdir_cd->err was reset to eof before
each call to vfs_readdir; afterwards, it is set only once.  Similarly,
c002a6c797 "[PATCH] Optimise NFS readdir
hack slightly", can cause us to exit without nfserr_eof set.  Fix this.

This ensures the "eof" bit is set when needed in readdir replies.  (The
particular case I saw was an nfsv4 readdir of an empty directory, which
returned with no entries (the protocol requires "." and ".." to be
filtered out), but with eof unset.)

Cc: David Woodhouse <David.Woodhouse@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-10-30 17:16:49 -04:00
Linus Torvalds 5ed487bc2c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (46 commits)
  [PATCH] fs: add a sanity check in d_free
  [PATCH] i_version: remount support
  [patch] vfs: make security_inode_setattr() calling consistent
  [patch 1/3] FS_MBCACHE: don't needlessly make it built-in
  [PATCH] move executable checking into ->permission()
  [PATCH] fs/dcache.c: update comment of d_validate()
  [RFC PATCH] touch_mnt_namespace when the mount flags change
  [PATCH] reiserfs: add missing llseek method
  [PATCH] fix ->llseek for more directories
  [PATCH vfs-2.6 6/6] vfs: add LOOKUP_RENAME_TARGET intent
  [PATCH vfs-2.6 5/6] vfs: remove LOOKUP_PARENT from non LOOKUP_PARENT lookup
  [PATCH vfs-2.6 4/6] vfs: remove unnecessary fsnotify_d_instantiate()
  [PATCH vfs-2.6 3/6] vfs: add __d_instantiate() helper
  [PATCH vfs-2.6 2/6] vfs: add d_ancestor()
  [PATCH vfs-2.6 1/6] vfs: replace parent == dentry->d_parent by IS_ROOT()
  [PATCH] get rid of on-stack dentry in udf
  [PATCH 2/2] anondev: switch to IDA
  [PATCH 1/2] anondev: init IDR statically
  [JFFS2] Use d_splice_alias() not d_add() in jffs2_lookup()
  [PATCH] Optimise NFS readdir hack slightly.
  ...
2008-10-23 10:22:40 -07:00
David Woodhouse c002a6c797 [PATCH] Optimise NFS readdir hack slightly.
Avoid calling the underlying ->readdir() again when we reached the end
already; keep going round the loop only if we stopped due to our own
buffer being full.

[AV: tidy the things up a bit, while we are there]

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-10-23 05:13:11 -04:00
Al Viro 53c9c5c0e3 [PATCH] prepare vfs_readdir() callers to returning filldir result
It's not the final state, but it allows moving ->readdir() instances
to passing filldir return value to caller of vfs_readdir().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-10-23 05:13:10 -04:00
David Woodhouse 14f7dd6320 [PATCH] Copy XFS readdir hack into nfsd code.
Some file systems with their own internal locking have problems with the
way that nfsd calls the ->lookup() method from within a filldir function
called from their ->readdir() method. The recursion back into the file
system code can cause deadlock.

XFS has a fairly hackish solution to this which involves doing the
readdir() into a locally-allocated buffer, then going back through it
calling the filldir function afterwards. It's not ideal, but it works.

It's particularly suboptimal because XFS does this for local file
systems too, where it's completely unnecessary.

Copy this hack into the NFS code where it can be used only for NFS
export. In response to feedback, use it unconditionally rather than only
for the affected file systems.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-10-23 05:13:05 -04:00
David Woodhouse 2628b76636 [PATCH] Factor out nfsd_do_readdir() into its own function
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-10-23 05:13:04 -04:00
Krishna Kumar 6c6a426fdc nfsd: Fix memory leak in nfsd_getxattr
Fix a memory leak in nfsd_getxattr. nfsd_getxattr should free up memory
	that it allocated if vfs_getxattr fails.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-10-22 14:00:45 -04:00
Jeff Layton 54a66e5480 knfsd: allocate readahead cache in individual chunks
I had a report from someone building a large NFS server that they were
unable to start more than 585 nfsd threads. It was reported against an
older kernel using the slab allocator, and I tracked it down to the
large allocation in nfsd_racache_init failing.

It appears that the slub allocator handles large allocations better,
but large contiguous allocations can often be problematic. There
doesn't seem to be any reason that the racache has to be allocated as a
single large chunk. This patch breaks this up so that the racache is
built up from separate allocations.

(Thanks also to Takashi Iwai for a bugfix.)

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Takashi Iwai <tiwai@suse.de>
2008-09-29 17:56:59 -04:00
J. Bruce Fields 04716e6621 nfsd: permit unauthenticated stat of export root
RFC 2623 section 2.3.2 permits the server to bypass gss authentication
checks for certain operations that a client may perform when mounting.
In the case of a client that doesn't have some form of credentials
available to it on boot, this allows it to perform the mount unattended.
(Presumably real file access won't be needed until a user with
credentials logs in.)

Being slightly more lenient allows lots of old clients to access
krb5-only exports, with the only loss being a small amount of
information leaked about the root directory of the export.

This affects only v2 and v3; v4 still requires authentication for all
access.

Thanks to Peter Staubach testing against a Solaris client, which
suggesting addition of v3 getattr, to the list, and to Trond for noting
that doing so exposes no additional information.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Peter Staubach <staubach@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
2008-09-29 17:56:56 -04:00
Al Viro f419a2e3b6 [PATCH] kill nameidata passing to permission(), rename to inode_permission()
Incidentally, the name that gives hundreds of false positives on grep
is not a good idea...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-07-26 20:53:31 -04:00
Miklos Szeredi db2e747b14 [patch 5/5] vfs: remove mode parameter from vfs_symlink()
Remove the unused mode parameter from vfs_symlink and callers.

Thanks to Tetsuo Handa for noticing.

CC: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2008-07-26 20:53:18 -04:00
Miklos Szeredi 07cad1d2a4 nfsd: clean up mnt_want_write calls
Multiple mnt_want_write() calls in the switch statement looks really
ugly.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-07-01 15:22:03 -04:00
Miklos Szeredi 8837abcab3 nfsd: rename MAY_ flags
Rename nfsd_permission() specific MAY_* flags to NFSD_MAY_* to make it
clear, that these are not used outside nfsd, and to avoid name and
number space conflicts with the VFS.

[comment from hch: rename MAY_READ, MAY_WRITE and MAY_EXEC as well]

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-06-23 13:02:50 -04:00
Jeff Layton ca456252db knfsd: clear both setuid and setgid whenever a chown is done
Currently, knfsd only clears the setuid bit if the owner of a file is
changed on a SETATTR call, and only clears the setgid bit if the group
is changed. POSIX says this in the spec for chown():

    "If the specified file is a regular file, one or more of the
     S_IXUSR, S_IXGRP, or S_IXOTH bits of the file mode are set, and the
     process does not have appropriate privileges, the set-user-ID
     (S_ISUID) and set-group-ID (S_ISGID) bits of the file mode shall
     be cleared upon successful return from chown()."

If I'm reading this correctly, then knfsd is doing this wrong. It should
be clearing both the setuid and setgid bit on any SETATTR that changes
the uid or gid. This wasn't really as noticable before, but now that the
ATTR_KILL_S*ID bits are a no-op for the NFS client, it's more evident.

This patch corrects the nfsd_setattr logic so that this occurs. It also
does a bit of cleanup to the function.

There is also one small behavioral change. If a SETATTR call comes in
that changes the uid/gid and the mode, then we now only clear the setgid
bit if the group execute bit isn't set. The setgid bit without a group
execute bit signifies mandatory locking and we likely don't want to
clear the bit in that case. Since there is no call in POSIX that should
generate a SETATTR call like this, then this should rarely happen, but
it's worth noting.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-04-23 16:13:43 -04:00
Jeff Layton dee3209d99 knfsd: get rid of imode variable in nfsd_setattr
...it's not really needed.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-04-23 16:13:43 -04:00
Harvey Harrison 3ba1514815 nfsd: fix sparse warning in vfs.c
fs/nfsd/vfs.c:991:27: warning: Using plain integer as NULL pointer

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-04-23 16:13:39 -04:00
Adrian Bunk f2b0dee2ec make nfsd_create_setattr() static
This patch makes the needlessly global nfsd_create_setattr() static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-04-23 16:13:38 -04:00
Dave Hansen 2c463e9548 [PATCH] r/o bind mounts: check mnt instead of superblock directly
If we depend on the inodes for writeability, we will not catch the r/o mounts
when implemented.

This patches uses __mnt_want_write().  It does not guarantee that the mount
will stay writeable after the check.  But, this is OK for one of the checks
because it is just for a printk().

The other two are probably unnecessary and duplicate existing checks in the
VFS.  This won't make them better checks than before, but it will make them
detect r/o mounts.

Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-04-19 00:29:27 -04:00
Dave Hansen 18f335aff8 [PATCH] r/o bind mounts: elevate write count for xattr_permission() callers
This basically audits the callers of xattr_permission(), which calls
permission() and can perform writes to the filesystem.

[AV: add missing parts - removexattr() and nfsd posix acls, plug for a leak
spotted by Miklos]

Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-04-19 00:29:15 -04:00
Dave Hansen 9079b1eb17 [PATCH] r/o bind mounts: get write access for vfs_rename() callers
This also uses the little helper in the NFS code to make an if() a little bit
less ugly.  We introduced the helper at the beginning of the series.

Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-04-19 00:25:34 -04:00
Dave Hansen 75c3f29de7 [PATCH] r/o bind mounts: write counts for link/symlink
[AV: add missing nfsd pieces]

Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-04-19 00:25:34 -04:00
Dave Hansen 463c319726 [PATCH] r/o bind mounts: get callers of vfs_mknod/create/mkdir()
This takes care of all of the direct callers of vfs_mknod().
Since a few of these cases also handle normal file creation
as well, this also covers some calls to vfs_create().

So that we don't have to make three mnt_want/drop_write()
calls inside of the switch statement, we move some of its
logic outside of the switch and into a helper function
suggested by Christoph.

This also encapsulates a fix for mknod(S_IFREG) that Miklos
found.

[AV: merged mkdir handling, added missing nfsd pieces]

Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-04-19 00:25:34 -04:00
Dave Hansen 0622753b80 [PATCH] r/o bind mounts: elevate write count for rmdir and unlink.
Elevate the write count during the vfs_rmdir() and vfs_unlink().

[AV: merged rmdir and unlink parts, added missing pieces in nfsd]

Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-04-19 00:25:33 -04:00
Jan Blunck 5477549161 Use struct path in struct svc_export
I'm embedding struct path into struct svc_export.

[akpm@linux-foundation.org: coding-style fixes]
[ezk@cs.sunysb.edu: NFSD: fix wrong mnt_writer count in rename]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Acked-by: J. Bruce Fields <bfields@citi.umich.edu>
Acked-by: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14 21:17:08 -08:00
J. Bruce Fields 5c002b3bb2 nfsd: allow root to set uid and gid on create
The server silently ignores attempts to set the uid and gid on create.
Based on the comment, this appears to have been done to prevent some
overly-clever IRIX client from causing itself problems.

Perhaps we should remove that hack completely.  For now, at least, it
makes sense to allow root (when no_root_squash is set) to set uid and
gid.

While we're there, since nfsd_create and nfsd_create_v3 share the same
logic, pull that out into a separate function.  And spell out the
individual modifications of ia_valid instead of doing them both at once
inside a conditional.

Thanks to Roger Willcocks <roger@filmlight.ltd.uk> for the bug report
and original patch on which this is based.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:07 -05:00
Chuck Lever 5a022fc870 NFSD: Adjust filename length argument of nfsd_lookup
Clean up: adjust the sign of the length argument of nfsd_lookup and
nfsd_lookup_dentry, for consistency with recent changes.  NFSD version
4 callers already pass an unsigned file name length.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-By: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:03 -05:00
Pavel Emelyanov ba25f9dcc4 Use helpers to obtain task pid in printks
The task_struct->pid member is going to be deprecated, so start
using the helpers (task_pid_nr/task_pid_vnr/task_pid_nr_ns) in
the kernel.

The first thing to start with is the pid, printed to dmesg - in
this case we may safely use task_pid_nr(). Besides, printks produce
more (much more) than a half of all the explicit pid usage.

[akpm@linux-foundation.org: git-drm went and changed lots of stuff]
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Dave Airlie <airlied@linux.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 11:53:43 -07:00
Jeff Layton 8a0ce7d99a knfsd: only set ATTR_KILL_S*ID if ATTR_MODE isn't being explicitly set
It's theoretically possible for a single SETATTR call to come in that sets the
mode and the uid/gid.  In that case, don't set the ATTR_KILL_S*ID bits since
that would trip the BUG() in notify_change.  Just fix up the mode to have the
same effect.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: Neil Brown <neilb@suse.de>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:22 -07:00
Serge E. Hallyn b53767719b Implement file posix capabilities
Implement file posix capabilities.  This allows programs to be given a
subset of root's powers regardless of who runs them, without having to use
setuid and giving the binary all of root's powers.

This version works with Kaigai Kohei's userspace tools, found at
http://www.kaigai.gr.jp/index.php.  For more information on how to use this
patch, Chris Friedhoff has posted a nice page at
http://www.friedhoff.org/fscaps.html.

Changelog:
	Nov 27:
	Incorporate fixes from Andrew Morton
	(security-introduce-file-caps-tweaks and
	security-introduce-file-caps-warning-fix)
	Fix Kconfig dependency.
	Fix change signaling behavior when file caps are not compiled in.

	Nov 13:
	Integrate comments from Alexey: Remove CONFIG_ ifdef from
	capability.h, and use %zd for printing a size_t.

	Nov 13:
	Fix endianness warnings by sparse as suggested by Alexey
	Dobriyan.

	Nov 09:
	Address warnings of unused variables at cap_bprm_set_security
	when file capabilities are disabled, and simultaneously clean
	up the code a little, by pulling the new code into a helper
	function.

	Nov 08:
	For pointers to required userspace tools and how to use
	them, see http://www.friedhoff.org/fscaps.html.

	Nov 07:
	Fix the calculation of the highest bit checked in
	check_cap_sanity().

	Nov 07:
	Allow file caps to be enabled without CONFIG_SECURITY, since
	capabilities are the default.
	Hook cap_task_setscheduler when !CONFIG_SECURITY.
	Move capable(TASK_KILL) to end of cap_task_kill to reduce
	audit messages.

	Nov 05:
	Add secondary calls in selinux/hooks.c to task_setioprio and
	task_setscheduler so that selinux and capabilities with file
	cap support can be stacked.

	Sep 05:
	As Seth Arnold points out, uid checks are out of place
	for capability code.

	Sep 01:
	Define task_setscheduler, task_setioprio, cap_task_kill, and
	task_setnice to make sure a user cannot affect a process in which
	they called a program with some fscaps.

	One remaining question is the note under task_setscheduler: are we
	ok with CAP_SYS_NICE being sufficient to confine a process to a
	cpuset?

	It is a semantic change, as without fsccaps, attach_task doesn't
	allow CAP_SYS_NICE to override the uid equivalence check.  But since
	it uses security_task_setscheduler, which elsewhere is used where
	CAP_SYS_NICE can be used to override the uid equivalence check,
	fixing it might be tough.

	     task_setscheduler
		 note: this also controls cpuset:attach_task.  Are we ok with
		     CAP_SYS_NICE being used to confine to a cpuset?
	     task_setioprio
	     task_setnice
		 sys_setpriority uses this (through set_one_prio) for another
		 process.  Need same checks as setrlimit

	Aug 21:
	Updated secureexec implementation to reflect the fact that
	euid and uid might be the same and nonzero, but the process
	might still have elevated caps.

	Aug 15:
	Handle endianness of xattrs.
	Enforce capability version match between kernel and disk.
	Enforce that no bits beyond the known max capability are
	set, else return -EPERM.
	With this extra processing, it may be worth reconsidering
	doing all the work at bprm_set_security rather than
	d_instantiate.

	Aug 10:
	Always call getxattr at bprm_set_security, rather than
	caching it at d_instantiate.

[morgan@kernel.org: file-caps clean up for linux/capability.h]
[bunk@kernel.org: unexport cap_inode_killpriv]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Andrew Morgan <morgan@kernel.org>
Signed-off-by: Andrew Morgan <morgan@kernel.org>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:43:07 -07:00
Dave Hansen a8754beedb r/o bind mounts: create cleanup helper svc_msnfs()
I'm going to be modifying nfsd_rename() shortly to support read-only bind
mounts.  This #ifdef is around the area I'm patching, and it starts to get
really ugly if I just try to add my new code by itself.  Using this little
helper makes things a lot cleaner to use.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:43:05 -07:00
Linus Torvalds 541010e4b8 Merge branch 'locks' of git://linux-nfs.org/~bfields/linux
* 'locks' of git://linux-nfs.org/~bfields/linux:
  nfsd: remove IS_ISMNDLCK macro
  Rework /proc/locks via seq_files and seq_list helpers
  fs/locks.c: use list_for_each_entry() instead of list_for_each()
  NFS: clean up explicit check for mandatory locks
  AFS: clean up explicit check for mandatory locks
  9PFS: clean up explicit check for mandatory locks
  GFS2: clean up explicit check for mandatory locks
  Cleanup macros for distinguishing mandatory locks
  Documentation: move locks.txt in filesystems/
  locks: add warning about mandatory locking races
  Documentation: move mandatory locking documentation to filesystems/
  locks: Fix potential OOPS in generic_setlease()
  Use list_first_entry in locks_wake_up_blocks
  locks: fix flock_lock_file() comment
  Memory shortage can result in inconsistent flocks state
  locks: kill redundant local variable
  locks: reverse order of posix_locks_conflict() arguments
2007-10-15 16:07:40 -07:00
J. Bruce Fields 5e7fc43642 nfsd: remove IS_ISMNDLCK macro
This macro is only used in one place; in this place it seems simpler to
put open-code it and move the comment to where it's used.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2007-10-09 18:32:46 -04:00
Pavel Emelyanov a16877ca9c Cleanup macros for distinguishing mandatory locks
The combination of S_ISGID bit set and S_IXGRP bit unset is used to mark the
inode as "mandatory lockable" and there's a macro for this check called
MANDATORY_LOCK(inode).  However, fs/locks.c and some filesystems still perform
the explicit i_mode checking.  Besides, Andrew pointed out, that this macro is
buggy itself, as it dereferences the inode arg twice.

Convert this macro into static inline function and switch its users to it,
making the code shorter and more readable.

The __mandatory_lock() helper is to be used in places where the IS_MANDLOCK()
for superblock is already known to be true.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-10-09 18:32:46 -04:00
Christoph Hellwig 9c85fca56b nfsd: fix horrible indentation in nfsd_setattr
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Acked-by:  Neil Brown <neilb@suse.de>
2007-10-09 18:31:56 -04:00
Neil Brown a1033be72c knfsd: Fixed problem with NFS exporting directories which are mounted on.
Recent changes in NFSd cause a directory which is mounted-on
to not appear properly when the filesystem containing it is exported.

*exp_get* now returns -ENOENT rather than NULL and when
  commit 5d3dbbeaf5
removed the NULL checks, it didn't add a check for -ENOENT.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-10 18:57:47 -07:00
Jeff Layton 749997e512 knfsd: set the response bitmask for NFS4_CREATE_EXCLUSIVE
RFC 3530 says:

 If the server uses an attribute to store the exclusive create verifier, it
 will signify which attribute by setting the appropriate bit in the attribute
 mask that is returned in the results.

Linux uses the atime and mtime to store the verifier, but sends a zeroed out
bitmask back to the client.  This patch makes sure that we set the correct
bits in the bitmask in this situation.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-31 15:39:38 -07:00
J. Bruce Fields c7d51402d2 knfsd: clean up EX_RDONLY
Share a little common code, reverse the arguments for consistency, drop the
unnecessary "inline", and lowercase the name.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:52 -07:00
J. Bruce Fields e22841c637 knfsd: move EX_RDONLY out of header
EX_RDONLY is only called in one place; just put it there.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:52 -07:00
J. Bruce Fields 5d3dbbeaf5 nfsd: remove unnecessary NULL checks from nfsd_cross_mnt
We can now assume that rqst_exp_get_by_name() does not return NULL; so clean
up some unnecessary checks.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:52 -07:00
J. Bruce Fields a280df32db nfsd: fix possible read-ahead cache and export table corruption
The value of nperbucket calculated here is too small--we should be rounding up
instead of down--with the result that the index j in the following loop can
overflow the raparm_hash array.  At least in my case, the next thing in memory
turns out to be export_table, so the symptoms I see are crashes caused by the
appearance of four zeroed-out export entries in the first bucket of the hash
table of exports (which were actually entries in the readahead cache, a
pointer to which had been written to the export table in this initialization
code).

It looks like the bug was probably introduced with commit
fce1456a19 ("knfsd: make the readahead params
cache SMP-friendly").

Cc: <stable@kernel.org>
Cc: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:52 -07:00
J. Bruce Fields 0ec757df97 knfsd: nfsd4: make readonly access depend on pseudoflavor
Allow readonly access to vary depending on the pseudoflavor, using the flag
passed with each pseudoflavor in the export downcall.  The rest of the flags
are ignored for now, though some day we might also allow id squashing to vary
based on the flavor.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:08 -07:00
Andy Adamson 32c1eb0cd7 knfsd: nfsd4: return nfserr_wrongsec
Make the first actual use of the secinfo information by using it to return
nfserr_wrongsec when an export is found that doesn't allow the flavor used on
this request.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:08 -07:00
J. Bruce Fields 6c0a654dce knfsd: nfsd: factor nfsd_lookup into 2 pieces
Factor nfsd_lookup into nfsd_lookup_dentry, which finds the right dentry and
export, and a second part which composes the filehandle (and which will later
check the security flavor on the new export).

No change in behavior.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:08 -07:00
J. Bruce Fields 0989a78896 knfsd: nfsd: provide export lookup wrappers which take a svc_rqst
Split the callers of exp_get_by_name(), exp_find(), and exp_parent() into
those that are processing requests and those that are doing other stuff (like
looking up filehandles for mountd).

No change in behavior, just a (fairly pointless, on its own) cleanup.

(Note this has the effect of making nfsd_cross_mnt() pass rqstp->rq_client
instead of exp->ex_client into exp_find_by_name().  However, the two should
have the same value at this point.)

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:07 -07:00
J. Bruce Fields 87548c37c8 knfsd: nfsd: remove superfluous assignment from nfsd_lookup
The "err" variable will only be used in the final return, which always happens
after either the preceding

	err = fh_compose(...);

or after the following

	err = nfserrno(host_err);

So the earlier assignment to err is ignored.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:07 -07:00
J. Bruce Fields 2d3bb25209 knfsd: nfsd: make all exp_finding functions return -errno's on err
Currently exp_find(), exp_get_by_name(), and friends, return an export on
success, and on failure return:

	errors -EAGAIN (drop this request pending an upcall) or
		-ETIMEDOUT (an upcall has timed out), or
	return NULL, which can mean either that there was a memory allocation
		failure, or that an export was not found, or that a passed-in
		export lacks an auth_domain.

Many callers seem to assume that NULL means that an export was not found,
which may lead to bugs in the case of a memory allocation failure.

Modify these functions to distinguish between the two NULL cases by returning
either -ENOENT or -ENOMEM.  They now never return NULL.  We get to simplify
some code in the process.

We return -ENOENT in the case of a missing auth_domain.  This case should
probably be removed (or converted to a bug) after confirming that it can never
happen.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:07 -07:00
J. Bruce Fields 4b2ca38ad6 knfsd: nfsd4: fix handling of acl errrors
nfs4_acl_nfsv4_to_posix() returns an error and returns any posix acls
calculated in two caller-provided pointers.  It was setting these pointers to
-errno in some error cases, resulting in nfsd4_set_nfs4_acl() calling
posix_acl_release() with a -errno as an argument.

Fix both the caller and the callee, by modifying nfsd4_set_nfs4_acl() to
stop relying on the passed-in-pointers being left as NULL in the error
case, and by modifying nfs4_acl_nfsv4_to_posix() to stop returning
garbage in those pointers.

Thanks to Alex Soule for reporting the bug.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Cc: Alexander Soule <soule@umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:07 -07:00
Jens Axboe 4fbef206da nfsd: fix nfsd_vfs_read() splice actor setup
When nfsd was transitioned to use splice instead of sendfile() for data
transfers, a line setting the page index was lost. Restore it, so that
nfsd is functional when that path is used.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-13 16:45:43 -07:00
Jens Axboe cac36bb06e pipe: change the ->pin() operation to ->confirm()
The name 'pin' was badly chosen, it doesn't pin a pipe buffer
in the most commonly used sense in the kernel. So change the
name to 'confirm', after debating this issue with Hugh
Dickins a bit.

A good return from ->confirm() means that the buffer is really
there, and that the contents are good.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-10 08:04:15 +02:00
Jens Axboe d6b29d7cee splice: divorce the splice structure/function definitions from the pipe header
We need to move even more stuff into the header so that folks can use
the splice_to_pipe() implementation instead of open-coding a lot of
pipe knowledge (see relay implementation), so move to our own header
file finally.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-10 08:04:14 +02:00
Jens Axboe cf8208d0ea sendfile: convert nfsd to splice_direct_to_actor()
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-10 08:04:14 +02:00
J. Bruce Fields f34f924274 [PATCH] knfsd: nfsd4: fix error return on unsupported acl
We should be returning ATTRNOTSUPP, not NOTSUPP, when acls are unsupported.

Also fix a comment.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-16 08:14:01 -08:00
NeilBrown 34e9a63b4f [PATCH] knfsd: ratelimit some nfsd messages that are triggered by external events
Also remove {NFSD,RPC}_PARANOIA as having the defines doesn't really add
anything.

The printks covered by RPC_PARANOIA were triggered by badly formatted
packets and so should be ratelimited.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-30 08:26:45 -08:00
NeilBrown a0ad13ef64 [PATCH] knfsd: Fix type mismatch with filldir_t used by nfsd
nfsd defines a type 'encode_dent_fn' which is much like 'filldir_t' except
that the first pointer is 'struct readdir_cd *' rather than 'void *'.  It
then casts encode_dent_fn points to 'filldir_t' as needed.  This hides any
other type mismatches between the two such as the fact that the 'ino' arg
recently changed from ino_t to u64.

So: get rid of 'encode_dent_fn', get rid of the cast of the function type,
change the first arg of various functions from 'struct readdir_cd *' to
'void *', and live with the fact that we have a little less type checking
on the calling of these functions now.  Less internal (to nfsd) checking
offset by more external checking, which is more important.

Thanks to Gabriel Paubert <paubert@iram.es> for discovering this and
providing an initial patch.

Signed-off-by: Gabriel Paubert <paubert@iram.es>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26 13:51:00 -08:00
Peter Staubach c397852c3d [PATCH] knfsd: Don't mess with the 'mode' when storing a exclusive-create cookie
NFS V3 (and V4) support exclusive create by passing a 'cookie' which can get
stored with the file.  If the file exists but has exactly the right cookie
stored, then we assume this is a retransmit and the exclusive create was
successful.

The cookie is 64bits and is traditionally stored in the mtime and atime
fields.  This causes a problem with Solaris7 as negative mtime or atime
confuse it.  So we moved two bits into the mode word instead.

But inherited ACLs sometimes overwrite the mode word on create, so this is a
problem.

So we give up and just store 62 of the 64 bits and assume that is close
enough.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26 13:50:59 -08:00
NeilBrown 250f391518 [PATCH] knfsd: fix an NFSD bug with full sized, non-page-aligned reads
NFSd assumes that largest number of pages that will be needed for a
request+response is 2+N where N pages is the size of the largest permitted
read/write request.  The '2' are 1 for the non-data part of the request, and 1
for the non-data part of the reply.

However, when a read request is not page-aligned, and we choose to use
->sendfile to send it directly from the page cache, we may need N+1 pages to
hold the whole reply.  This can overflow and array and cause an Oops.

This patch increases size of the array for holding pages by one and makes sure
that entry is NULL when it is not in use.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26 13:50:59 -08:00
J.Bruce Fields e0bb89ef03 [PATCH] knfsd: nfsd: don't drop silently on upcall deferral
To avoid tying up server threads when nfsd makes an upcall (to mountd, to get
export options, to idmapd, for nfsv4 name<->id mapping, etc.), we temporarily
"drop" the request and save enough information so that we can revisit it
later.

Certain failures during the deferral process can cause us to really drop the
request and never revisit it.

This is often less than ideal, and is unacceptable in the NFSv4 case--rfc 3530
forbids the server from dropping a request without also closing the
connection.

As a first step, we modify the deferral code to return -ETIMEDOUT (which is
translated to nfserr_jukebox in the v3 and v4 cases, and remains a drop in the
v2 case).

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13 09:05:54 -08:00
Yan Burman 4b3bb06bea [PATCH] nfsd: replace kmalloc+memset with kcalloc + simplify NULL check
Replace kmalloc+memset with kcalloc and simplify

Signed-off-by: Yan Burman <burman.yan@gmail.com>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:29:02 -08:00
Josef "Jeff" Sipek 7eaa36e2d4 [PATCH] nfsd: change uses of f_{dentry, vfsmnt} to use f_path
Change all the uses of f_{dentry,vfsmnt} to f_path.{dentry,mnt} in the nfs
server code.

Signed-off-by: Josef "Jeff" Sipek <jsipek@cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:42 -08:00
J. Bruce Fields 088406bcf6 [PATCH] nfsd: fix spurious error return from nfsd_create in async case
Commit 6264d69d7d modified the nfsd_create()
error handling in such a way that nfsd_create will usually return
nfserr_perm even when succesful, if the export has the async export option.

This introduced a regression that could cause mkdir() to always return a
permissions error, even though the directory in question was actually
succesfully created.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-08 18:29:25 -08:00
J. Bruce Fields 81ac95c556 [PATCH] nfsd4: fix open-create permissions
In the case where an open creates the file, we shouldn't be rechecking
permissions to open the file; the open succeeds regardless of what the new
file's mode bits say.

This patch fixes the problem, but only by introducing yet another parameter
to nfsd_create_v3.  This is ugly.  This will be fixed by later patches.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Acked-by: Neil Brown <neilb@suse.de>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-08 18:29:23 -08:00
Al Viro 6264d69d7d [PATCH] nfsd: vfs.c endianness annotations
don't use the same variable to store NFS and host error values

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-20 10:26:41 -07:00
J.Bruce Fields b66285cee3 [PATCH] knfsd: nfsd4: acls: fix handling of zero-length acls
It is legal to have zero-length NFSv4 acls; they just deny everything.

Also, nfs4_acl_nfsv4_to_posix will always return with pacl and dpacl set on
success, so the caller doesn't need to check this.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:20 -07:00
Greg Banks fce1456a19 [PATCH] knfsd: make nfsd readahead params cache SMP-friendly
Make the nfsd read-ahead params cache more SMP-friendly by changing the single
global list and lock into a fixed 16-bucket hashtable with per-bucket locks.
This reduces spinlock contention in nfsd_read() on read-heavy workloads on
multiprocessor servers.

Testing was on a 4 CPU 4 NIC Altix using 4 IRIX clients each doing 1K
streaming reads at full line rate.  The server had 128 nfsd threads, which
sizes the RA cache at 256 entries, of which only a handful were used.  Flat
profiling shows nfsd_read(), including the inlined nfsd_get_raparms(), taking
10.4% of each CPU.  This patch drops the contribution from nfsd() to 1.71% for
each CPU.

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:16 -07:00
NeilBrown 4452435948 [PATCH] knfsd: Replace two page lists in struct svc_rqst with one
We are planning to increase RPCSVC_MAXPAGES from about 8 to about 256.  This
means we need to be a bit careful about arrays of size RPCSVC_MAXPAGES.

struct svc_rqst contains two such arrays.  However the there are never more
that RPCSVC_MAXPAGES pages in the two arrays together, so only one array is
needed.

The two arrays are for the pages holding the request, and the pages holding
the reply.  Instead of two arrays, we can simply keep an index into where the
first reply page is.

This patch also removes a number of small inline functions that probably
server to obscure what is going on rather than clarify it, and opencode the
needed functionality.

Also remove the 'rq_restailpage' variable as it is *always* 0.  i.e.  if the
response 'xdr' structure has a non-empty tail it is always in the same pages
as the head.

 check counters are initilised and incr properly
 check for consistant usage of ++ etc
 maybe extra some inlines for common approach
 general review

Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Magnus Maatta <novell@kiruna.se>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:15 -07:00
Peter Zijlstra 12fd352038 [PATCH] nfsd: lockdep annotation
while doing a kernel make modules_install install over an NFS mount.

  =============================================
  [ INFO: possible recursive locking detected ]
  ---------------------------------------------
  nfsd/9550 is trying to acquire lock:
   (&inode->i_mutex){--..}, at: [<c034c845>] mutex_lock+0x1c/0x1f

  but task is already holding lock:
   (&inode->i_mutex){--..}, at: [<c034c845>] mutex_lock+0x1c/0x1f

  other info that might help us debug this:
  2 locks held by nfsd/9550:
   #0:  (hash_sem){..--}, at: [<cc895223>] exp_readlock+0xd/0xf [nfsd]
   #1:  (&inode->i_mutex){--..}, at: [<c034c845>] mutex_lock+0x1c/0x1f

  stack backtrace:
   [<c0103508>] show_trace_log_lvl+0x58/0x152
   [<c0103b8b>] show_trace+0xd/0x10
   [<c0103c2f>] dump_stack+0x19/0x1b
   [<c012aa57>] __lock_acquire+0x77a/0x9a3
   [<c012af4a>] lock_acquire+0x60/0x80
   [<c034c6c2>] __mutex_lock_slowpath+0xa7/0x20e
   [<c034c845>] mutex_lock+0x1c/0x1f
   [<c0162edc>] vfs_unlink+0x34/0x8a
   [<cc891d98>] nfsd_unlink+0x18f/0x1e2 [nfsd]
   [<cc89884f>] nfsd3_proc_remove+0x95/0xa2 [nfsd]
   [<cc88f0d4>] nfsd_dispatch+0xc0/0x178 [nfsd]
   [<c033e84d>] svc_process+0x3a5/0x5ed
   [<cc88f5ba>] nfsd+0x1a7/0x305 [nfsd]
   [<c0101005>] kernel_thread_helper+0x5/0xb
  DWARF2 unwinder stuck at kernel_thread_helper+0x5/0xb
  Leftover inexact backtrace:
   [<c0103b8b>] show_trace+0xd/0x10
   [<c0103c2f>] dump_stack+0x19/0x1b
   [<c012aa57>] __lock_acquire+0x77a/0x9a3
   [<c012af4a>] lock_acquire+0x60/0x80
   [<c034c6c2>] __mutex_lock_slowpath+0xa7/0x20e
   [<c034c845>] mutex_lock+0x1c/0x1f
   [<c0162edc>] vfs_unlink+0x34/0x8a
   [<cc891d98>] nfsd_unlink+0x18f/0x1e2 [nfsd]
   [<cc89884f>] nfsd3_proc_remove+0x95/0xa2 [nfsd]
   [<cc88f0d4>] nfsd_dispatch+0xc0/0x178 [nfsd]
   [<c033e84d>] svc_process+0x3a5/0x5ed
   [<cc88f5ba>] nfsd+0x1a7/0x305 [nfsd]
   [<c0101005>] kernel_thread_helper+0x5/0xb

  =============================================
  [ INFO: possible recursive locking detected ]
  ---------------------------------------------
  nfsd/9580 is trying to acquire lock:
   (&inode->i_mutex){--..}, at: [<c034cc1d>] mutex_lock+0x1c/0x1f

  but task is already holding lock:
   (&inode->i_mutex){--..}, at: [<c034cc1d>] mutex_lock+0x1c/0x1f

  other info that might help us debug this:
  2 locks held by nfsd/9580:
   #0:  (hash_sem){..--}, at: [<cc89522b>] exp_readlock+0xd/0xf [nfsd]
   #1:  (&inode->i_mutex){--..}, at: [<c034cc1d>] mutex_lock+0x1c/0x1f

  stack backtrace:
   [<c0103508>] show_trace_log_lvl+0x58/0x152
   [<c0103b8b>] show_trace+0xd/0x10
   [<c0103c2f>] dump_stack+0x19/0x1b
   [<c012aa63>] __lock_acquire+0x77a/0x9a3
   [<c012af56>] lock_acquire+0x60/0x80
   [<c034ca9a>] __mutex_lock_slowpath+0xa7/0x20e
   [<c034cc1d>] mutex_lock+0x1c/0x1f
   [<cc892ad1>] nfsd_setattr+0x2c8/0x499 [nfsd]
   [<cc893ede>] nfsd_create_v3+0x31b/0x4ac [nfsd]
   [<cc8984a1>] nfsd3_proc_create+0x128/0x138 [nfsd]
   [<cc88f0d4>] nfsd_dispatch+0xc0/0x178 [nfsd]
   [<c033ec1d>] svc_process+0x3a5/0x5ed
   [<cc88f5ba>] nfsd+0x1a7/0x305 [nfsd]
   [<c0101005>] kernel_thread_helper+0x5/0xb
  DWARF2 unwinder stuck at kernel_thread_helper+0x5/0xb
  Leftover inexact backtrace:
   [<c0103b8b>] show_trace+0xd/0x10
   [<c0103c2f>] dump_stack+0x19/0x1b
   [<c012aa63>] __lock_acquire+0x77a/0x9a3
   [<c012af56>] lock_acquire+0x60/0x80
   [<c034ca9a>] __mutex_lock_slowpath+0xa7/0x20e
   [<c034cc1d>] mutex_lock+0x1c/0x1f
   [<cc892ad1>] nfsd_setattr+0x2c8/0x499 [nfsd]
   [<cc893ede>] nfsd_create_v3+0x31b/0x4ac [nfsd]
   [<cc8984a1>] nfsd3_proc_create+0x128/0x138 [nfsd]
   [<cc88f0d4>] nfsd_dispatch+0xc0/0x178 [nfsd]
   [<c033ec1d>] svc_process+0x3a5/0x5ed
   [<cc88f5ba>] nfsd+0x1a7/0x305 [nfsd]
   [<c0101005>] kernel_thread_helper+0x5/0xb

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Neil Brown <neilb@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:20 -07:00
Linus Torvalds 22a3e233ca Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial:
  Remove obsolete #include <linux/config.h>
  remove obsolete swsusp_encrypt
  arch/arm26/Kconfig typos
  Documentation/IPMI typos
  Kconfig: Typos in net/sched/Kconfig
  v9fs: do not include linux/version.h
  Documentation/DocBook/mtdnand.tmpl: typo fixes
  typo fixes: specfic -> specific
  typo fixes in Documentation/networking/pktgen.txt
  typo fixes: occuring -> occurring
  typo fixes: infomation -> information
  typo fixes: disadvantadge -> disadvantage
  typo fixes: aquire -> acquire
  typo fixes: mecanism -> mechanism
  typo fixes: bandwith -> bandwidth
  fix a typo in the RTC_CLASS help text
  smb is no longer maintained

Manually merged trivial conflict in arch/um/kernel/vmlinux.lds.S
2006-06-30 15:39:30 -07:00
J. Bruce Fields 5c04c46aec [PATCH] knfsd: nfsd: mark rqstp to prevent use of sendfile in privacy case
Add a rq_sendfile_ok flag to svc_rqst which will be cleared in the privacy
case so that the wrapping code will get copies of the read data instead of
real page cache pages.  This makes life simpler when we encrypt the response.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30 11:25:41 -07:00
J. Bruce Fields 9ecb6a08d8 [PATCH] knfsd: nfsd4: fix open flag passing
Since nfsv4 actually keeps around the file descriptors it gets from open
(instead of just using them for a single read or write operation), we need to
make sure that we can do RDWR opens and not just RDONLY/WRONLY.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30 11:25:40 -07:00
David M. Richter 270d56e536 [PATCH] knfsd: nfsd: fix misplaced fh_unlock() in nfsd_link()
In the event that lookup_one_len() fails in nfsd_link(), fh_unlock() is
skipped and locks are held overlong.

Patch was tested on 2.6.17-rc2 by causing lookup_one_len() to fail and
verifying that fh_unlock() gets called appropriately.

Signed-off-by: David M. Richter <richterd@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30 11:25:40 -07:00
NeilBrown a56f39375a [PATCH] knfsd: improve the test for cross-device-rename in nfsd
Just testing the i_sb isn't really enough, at least the vfsmnt must be the
same.  Thanks Al.

Cc: Al Viro <viro@ftp.linux.org.uk>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30 11:25:39 -07:00
Jörn Engel 6ab3d5624e Remove obsolete #include <linux/config.h>
Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-30 19:25:36 +02:00
David Howells 726c334223 [PATCH] VFS: Permit filesystem to perform statfs with a known root dentry
Give the statfs superblock operation a dentry pointer rather than a superblock
pointer.

This complements the get_sb() patch.  That reduced the significance of
sb->s_root, allowing NFS to place a fake root there.  However, NFS does
require a dentry to use as a target for the statfs operation.  This permits
the root in the vfsmount to be used instead.

linux/mount.h has been added where necessary to make allyesconfig build
successfully.

Interest has also been expressed for use with the FUSE and XFS filesystems.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nathan Scott <nathans@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:45 -07:00
Florin Malita 9ccfc29c67 [PATCH] nfsd: sign conversion obscuring errors in nfsd_set_posix_acl()
Assigning the result of posix_acl_to_xattr() to an unsigned data type
(size/size_t) obscures possible errors.

Coverity CID: 1206.

Signed-off-by: Florin Malita <fmalita@gmail.com>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-21 12:59:17 -07:00
NeilBrown b5872b0dcc [PATCH] knfsd: nfsd4: fix acl xattr length return
We should be using the length from the second vfs_getxattr, in case it
changed.  (Note: there's still a small race here; we could end up returning
-ENOMEM if the length increased between the first and second call.  I don't
know whether it's worth spending a lot of effort to fix that.)

This makes XFS ACLs usable on NFS exports, which they currently aren't, since
XFS appears to be returning a too-large value for vfs_getxattr() when it's
passed a NULL buffer.  So there's probably an XFS bug here too, though since
getxattr with a NULL buffer is usually used to decide how much memory to
allocate, it may be a fairly harmless bug in most cases.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:51 -07:00
Arjan van de Ven 99ac48f54a [PATCH] mark f_ops const in the inode
Mark the f_ops members of inodes as const, as well as fix the
ripple-through this causes by places that copy this f_ops and then "do
stuff" with it.

Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 09:16:05 -08:00
NeilBrown 45bd3b3dff [PATCH] knfsd: Fix some more errno/nfserr confusion in vfs.c
nfsd_sync* return an errno, which usually needs to be converted to an errno,
sometimes immediately, sometimes a little later.

Also, nfsd_setattr returns an nfserr which SHOULDN'T be converted from
an errno (because it isn't one).

Also some tidyups of the form:
  err = XX
  err = nfserrno(err)
and
  err = XX
  if (err)
      err = nfserrno(err)
become
  err = nfserrno(XX)

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-18 19:20:28 -08:00
Al Viro d75f2b9f5d [PATCH] nfsd/vfs.c: endianness fixes
Several failure exits return -E<something> instead of nfserr_<something> and
vice versa.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-18 19:20:27 -08:00
NeilBrown 7e8f05934d [PATCH] nfsd: remove inline from a couple of large NFS functions
These are both called from two places close together.  I could rearrange that
code so there is only one call site, but just removing the 'inline' is
probably best.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-18 19:20:24 -08:00
YAMAMOTO Takashi f193fbab2e [PATCH] nfsd: check error status from nfsd_sync_dir
Change nfsd_sync_dir to return an error if ->sync fails, and pass that error
up through the stack.  This involves a number of rearrangements of error
paths, and care to distinguish between Linux -errno numbers and NFSERR
numbers.

In the 'create' routines, we continue with the 'setattr' even if a previous
sync_dir failed.

This patch is quite different from Takashi's in a few ways, but there is still
a strong lineage.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-18 19:20:24 -08:00
Christoph Hellwig 5be196e5f9 [PATCH] add vfs_* helpers for xattr operations
Add vfs_getxattr, vfs_setxattr and vfs_removexattr helpers for common checks
around invocation of the xattr methods.  NFSD already was missing some of the
checks and there will be more soon.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: James Morris <jmorris@namei.org>

(James, I haven't touched selinux yet because it's doing various odd things
and I'm not sure how it would interact with the security attribute fallbacks
you added.  Could you investigate whether it could use vfs_getxattr or if not
add a __vfs_getxattr helper to share the bits it is fine with?)

For NFSv4: instead of just converting it add an nfsd_getxattr helper for the
code shared by NFSv2/3 and NFSv4 ACLs.  In fact that code isn't even
NFS-specific, but I'll wait for more users to pop up first before moving it to
common code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-10 08:01:29 -08:00
Jes Sorensen 1b1dcc1b57 [PATCH] mutex subsystem, semaphore to mutex: VFS, ->i_sem
This patch converts the inode semaphore to a mutex. I have tested it on
XFS and compiled as much as one can consider on an ia64. Anyway your
luck with it might be different.

Modified-by: Ingo Molnar <mingo@elte.hu>

(finished the conversion)

Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2006-01-09 15:59:24 -08:00
Neil Brown 9f708e40fe [PATCH] knfsd: reduce stack consumption
A typical nfsd call trace is
 nfsd -> svc_process -> nfsd_dispatch -> nfsd3_proc_write ->
   nfsd_write ->nfsd_vfs_write -> vfs_writev

These add up to over 300 bytes on the stack.
Looking at each of these, I see that nfsd_write (which includes
 nfsd_vfs_write) contributes 0x8c to stack usage itself!!

It turns out this is because it puts a 'struct iattr' on the stack so
it can kill suid if needed.  The following patch saves about 50 bytes
off the stack in this call path.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:59 -08:00
David Shaw a334de2866 [PATCH] knfsd: check error status from vfs_getattr and i_op->fsync
Both vfs_getattr and i_op->fsync return error statuses which nfsd was
largely ignoring.  This as noticed when exporting directories using fuse.

This patch cleans up most of the offences, which involves moving the call
to vfs_getattr out of the xdr encoding routines (where it is too late to
report an error) into the main NFS procedure handling routines.

There is still a called to vfs_gettattr (related to the ACL code) where the
status is ignored, and called to nfsd_sync_dir don't check return status
either.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:59 -08:00
NeilBrown 15b7a1b86d [PATCH] knfsd: fix setattr-on-symlink error return
This is a somewhat cosmetic fix to keep the SpecFS validation test from
complaining.

SpecFS want's to try chmod on symlinks, and ext3 and reiser (at least) return
ENOTSUPP.

Probably both sides are being silly, but it is easiest to simply make it a
non-issue and filter out chmod requests on symlinks at the nfsd level.

Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-07 07:53:47 -08:00
Robert Love 0eeca28300 [PATCH] inotify
inotify is intended to correct the deficiencies of dnotify, particularly
its inability to scale and its terrible user interface:

        * dnotify requires the opening of one fd per each directory
          that you intend to watch. This quickly results in too many
          open files and pins removable media, preventing unmount.
        * dnotify is directory-based. You only learn about changes to
          directories. Sure, a change to a file in a directory affects
          the directory, but you are then forced to keep a cache of
          stat structures.
        * dnotify's interface to user-space is awful.  Signals?

inotify provides a more usable, simple, powerful solution to file change
notification:

        * inotify's interface is a system call that returns a fd, not SIGIO.
	  You get a single fd, which is select()-able.
        * inotify has an event that says "the filesystem that the item
          you were watching is on was unmounted."
        * inotify can watch directories or files.

Inotify is currently used by Beagle (a desktop search infrastructure),
Gamin (a FAM replacement), and other projects.

See Documentation/filesystems/inotify.txt.

Signed-off-by: Robert Love <rml@novell.com>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-12 20:38:38 -07:00
NeilBrown a6ccbbb886 [PATCH] nfsd4: fix sync'ing of recovery directory
We need to fsync the recovery directory after writing to it, but we weren't
doing this correctly.  (For example, we weren't taking the i_sem when calling
->fsync().)

Just reuse the existing nfsd fsync code instead.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-07 18:24:07 -07:00
Christoph Hellwig 334a13ec3d [PATCH] really remove xattr_acl.h
Looks like it sneaked back with the NFS ACL merge..

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-28 21:20:31 -07:00
Christoph Hellwig 9a59f452ab [PATCH] remove <linux/xattr_acl.h>
This file duplicates <linux/posix_acl_xattr.h>, using slightly different
names.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:33 -07:00
Andreas Gruenbacher a257cdd0e2 [PATCH] NFSD: Add server support for NFSv3 ACLs.
This adds functions for encoding and decoding POSIX ACLs for the NFSACL
 protocol extension, and the GETACL and SETACL RPCs.  The implementation is
 compatible with NFSACL in Solaris.

 Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
 Acked-by: Olaf Kirch <okir@suse.de>
 Signed-off-by: Andrew Morton <akpm@osdl.org>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22 16:07:23 -04:00
Linus Torvalds 1da177e4c3 Linux-2.6.12-rc2
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!
2005-04-16 15:20:36 -07:00