Commit graph

363 commits

Author SHA1 Message Date
Tom Tucker
490231558e svc: Add a max payload value to the transport
The svc_max_payload function currently looks at the socket type
to determine the max payload. Add a max payload value to svc_xprt_class
so it can be returned directly.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:08 -05:00
Tom Tucker
9f29868b49 svc: Change the svc_sock in the rqstp structure to a transport
The rqstp structure contains a pointer to the transport for the
RPC request. This functionaly trivial patch adds an unamed union
with pointers to both svc_sock and svc_xprt. Ultimately the
union will be removed and only the rq_xprt field will remain. This
allows incrementally extracting transport independent interfaces without
one gigundo patch.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:08 -05:00
Tom Tucker
360d873864 svc: Make svc_sock the tcp/udp transport
Make TCP and UDP svc_sock transports, and register them
with the svc transport core.

A transport type (svc_sock) has an svc_xprt as its first member,
and calls svc_xprt_init to initialize this field.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:07 -05:00
Tom Tucker
1d8206b97a svc: Add an svc transport class
The transport class (svc_xprt_class) represents a type of transport, e.g.
udp, tcp, rdma.  A transport class has a unique name and a set of transport
operations kept in the svc_xprt_ops structure.

A transport class can be dynamically registered and unregisterd. The
svc_xprt_class represents the module that implements the transport
type and keeps reference counts on the module to avoid unloading while
there are active users.

The endpoint (svc_xprt) is a generic, transport independent endpoint that can
be used to send and receive data for an RPC service. It inherits it's
operations from the transport class.

A transport driver module registers and unregisters itself with svc sunrpc
by calling svc_reg_xprt_class, and svc_unreg_xprt_class respectively.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:07 -05:00
J. Bruce Fields
dbf847ecb6 knfsd: allow cache_register to return error on failure
Newer server features such as nfsv4 and gss depend on proc to work, so a
failure to initialize the proc files they need should be treated as
fatal.

Thanks to Andrew Morton for style fix and compile fix in case where
CONFIG_NFSD_V4 is undefined.

Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:05 -05:00
J. Bruce Fields
df95a9d4fb knfsd: cache unregistration needn't return error
There's really nothing much the caller can do if cache unregistration
fails.  And indeed, all any caller does in this case is print an error
and continue.  So just return void and move the printk's inside
cache_unregister.

Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:04 -05:00
Chuck Lever
e5cff482c7 SUNRPC: Use unsigned string lengths in xdr_decode_string_inplace
XDR strings, opaques, and net objects should all use unsigned lengths.
To wit, RFC 4506 says:

4.2.  Unsigned Integer

   An XDR unsigned integer is a 32-bit datum that encodes a non-negative
   integer in the range [0,4294967295].

 ...

4.11.  String

   The standard defines a string of n (numbered 0 through n-1) ASCII
   bytes to be the number n encoded as an unsigned integer (as described
   above), and followed by the n bytes of the string.

After this patch, xdr_decode_string_inplace now matches the other XDR
string and array helpers that take a string length argument.  See:

xdr_encode_opaque_fixed, xdr_encode_opaque, xdr_encode_array

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-By: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:02 -05:00
Linus Torvalds
75659ca0c1 Merge branch 'task_killable' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc
* 'task_killable' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc: (22 commits)
  Remove commented-out code copied from NFS
  NFS: Switch from intr mount option to TASK_KILLABLE
  Add wait_for_completion_killable
  Add wait_event_killable
  Add schedule_timeout_killable
  Use mutex_lock_killable in vfs_readdir
  Add mutex_lock_killable
  Use lock_page_killable
  Add lock_page_killable
  Add fatal_signal_pending
  Add TASK_WAKEKILL
  exit: Use task_is_*
  signal: Use task_is_*
  sched: Use task_contributes_to_load, TASK_ALL and TASK_NORMAL
  ptrace: Use task_is_*
  power: Use task_is_*
  wait: Use TASK_NORMAL
  proc/base.c: Use task_is_*
  proc/array.c: Use TASK_REPORT
  perfmon: Use task_is_*
  ...

Fixed up conflicts in NFS/sunrpc manually..
2008-02-01 11:45:47 +11:00
Chuck Lever
f1ec08cb94 SUNRPC: Use appropriate argument types in rpcb client
Clean up: Follow recommendations of Chapter 5 of Documentation/CodingStyle
and use "u32" instead of "__u32" for types in definitions that are not
shared with user space.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:06:09 -05:00
Chuck Lever
b454ae9060 SUNRPC: fewer conditionals in the format_ip_address routines
Clean up: have the set up routines explicitly pass the strings to be used
for the transport name and NETID.  This removes a number of conditionals
and dependencies on rpc_xprt.prot, which is overloaded.

Tighten up type checking on the address_strings array while we're at it.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:06:04 -05:00
Trond Myklebust
ba7392bb37 SUNRPC: Add support for per-client timeout values
In order to be able to support setting the timeo and retrans parameters on
a per-mountpoint basis, we move the rpc_timeout structure into the
rpc_clnt.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:59 -05:00
Trond Myklebust
2881ae74e6 SUNRPC: Clean up the transport timeout initialisation
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:58 -05:00
Chuck Lever
0fb2b7e945 SUNRPC: Move universal address definitions to global header
Universal addresses are defined in RFC 1833 and clarified in RFC 3530.  We
need to use them in several places in the NFS and RPC clients, so move the
relevant definition and block comment to an appropriate global include
file.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:50 -05:00
Trond Myklebust
c087567d3f SUNRPC: Remove the obsolete RPC_WAITQ macro
Now that we've killed off all the users.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:41 -05:00
Trond Myklebust
47fe064831 SUNRPC: Unexport rpc_init_task() and rpc_execute()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:40 -05:00
Trond Myklebust
e8f5d77c80 SUNRPC: allow the caller of rpc_run_task to preallocate the struct rpc_task
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:38 -05:00
Trond Myklebust
b5627943ab SUNRPC: Remove the now unused function rpc_call_setup()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:35 -05:00
Trond Myklebust
77de2c590e SUNRPC: Add a helper rpc_call_start() that initialises task->tk_action
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:31 -05:00
Trond Myklebust
3ff7576dda SUNRPC: Clean up the initialisation of priority queue scheduling info.
We want the default scheduling priority (priority == 0) to remain
RPC_PRIORITY_NORMAL.

Also ensure that the priority wait queue scheduling is per process id
instead of sometimes being per thread, and sometimes being per inode.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:30 -05:00
Trond Myklebust
c970aa85e7 SUNRPC: Clean up rpc_run_task
Make it use the new task initialiser structure instead of acting as a
wrapper.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:30 -05:00
Trond Myklebust
84115e1cd4 SUNRPC: Cleanup of rpc_task initialisation
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:30 -05:00
Trond Myklebust
62da3b2488 SUNRPC: Rename xprt_disconnect()
xprt_disconnect() should really only be called when the transport shutdown
is completed, and it is time to wake up any pending tasks. Rename it to
xprt_disconnect_done() in order to reflect the semantical change.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:27 -05:00
Trond Myklebust
3b948ae5be SUNRPC: Allow the client to detect if the TCP connection is closed
Add an xprt->state bit to enable the TCP ->state_change() method to signal
whether or not the TCP connection is in the process of closing down.
This will to be used by the reconnection logic in a separate patch.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:25 -05:00
Trond Myklebust
66af1e5585 SUNRPC: Fix a race in xs_tcp_state_change()
When scheduling the autoclose RPC call, we want to ensure that we don't
race against the test_bit() call in xprt_clear_locked().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:24 -05:00
Matthew Wilcox
150030b78a NFS: Switch from intr mount option to TASK_KILLABLE
By using the TASK_KILLABLE infrastructure, we can get rid of the 'intr'
mount option.  We have to use _killable everywhere instead of _interruptible
as we get rid of rpc_clnt_sigmask/sigunmask.

Signed-off-by: Liam R. Howlett <howlett@gmail.com>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
2007-12-06 17:40:25 -05:00
Adrian Bunk
483066d62e SUNRPC: make sunrpc/xprtsock.c:xs_setup_{udp,tcp}() static
xs_setup_{udp,tcp}() can now become static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-26 16:24:50 -05:00
James Lentini
cfcb43ff7c SUNRPC: remove NFS/RDMA client's binary sysctls
Support for binary sysctls is being deprecated in 2.6.24. Since there
are no applications using the NFS/RDMA client's binary sysctls, it
makes sense to remove them. The patch below does this while leaving
the /proc/sys interface unchanged.

Please consider this for 2.6.24.

Signed-off-by: James Lentini <jlentini@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-26 16:21:19 -05:00
Al Viro
2d8a972661 SUNRPC endianness annotations
rpcrdma stuff lacks endianness annotations for on-the-wire data.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-29 07:41:32 -07:00
Linus Torvalds
f4921aff5b Merge git://git.linux-nfs.org/pub/linux/nfs-2.6
* git://git.linux-nfs.org/pub/linux/nfs-2.6: (131 commits)
  NFSv4: Fix a typo in nfs_inode_reclaim_delegation
  NFS: Add a boot parameter to disable 64 bit inode numbers
  NFS: nfs_refresh_inode should clear cache_validity flags on success
  NFS: Fix a connectathon regression in NFSv3 and NFSv4
  NFS: Use nfs_refresh_inode() in ops that aren't expected to change the inode
  SUNRPC: Don't call xprt_release in call refresh
  SUNRPC: Don't call xprt_release() if call_allocate fails
  SUNRPC: Fix buggy UDP transmission
  [23/37] Clean up duplicate includes in
  [2.6 patch] net/sunrpc/rpcb_clnt.c: make struct rpcb_program static
  SUNRPC: Use correct type in buffer length calculations
  SUNRPC: Fix default hostname created in rpc_create()
  nfs: add server port to rpc_pipe info file
  NFS: Get rid of some obsolete macros
  NFS: Simplify filehandle revalidation
  NFS: Ensure that nfs_link() returns a hashed dentry
  NFS: Be strict about dentry revalidation when doing exclusive create
  NFS: Don't zap the readdir caches upon error
  NFS: Remove the redundant nfs_reval_fsid()
  NFSv3: Always use directory post-op attributes in nfs3_proc_lookup
  ...

Fix up trivial conflict due to sock_owned_by_user() cleanup manually in
net/sunrpc/xprtsock.c
2007-10-15 10:47:35 -07:00
J. Bruce Fields
dca1dd30ce nfsd: remove unused cache_for_each macro
This macro is unused.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Acked-by:  Neil Brown <neilb@suse.de>
2007-10-09 18:31:55 -04:00
\"Talpey, Thomas\
f58851e6b0 RPCRDMA: rpc rdma transport switch
This implements the configuration and building of the core transport
switch implementation of the rpcrdma transport. Stubs are provided for
the rpcrdma protocol handling, and the infiniband/iwarp verbs interface.
These are provided in following patches.

Signed-off-by: Tom Talpey <talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:18:03 -04:00
\"Talpey, Thomas\
c3a57ed747 RPCRDMA: Kconfig and header file with rpcrdma protocol definitions
This file implements the configuration target, protocol template and
constants for the rpcrdma transport framing, for use by the xprtrdma
rpc transport implementation.

Signed-off-by: Tom Talpey <talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:17:57 -04:00
\"Talpey, Thomas\
4fa016eb24 NFS/SUNRPC: support transport protocol naming
To prepare for including non-sockets-based RPC transports, select
RPC transports by an identifier (to be used in following patches).

Signed-off-by: Tom Talpey <tmt@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:17:50 -04:00
\"Talpey, Thomas\
49c36fcc44 SUNRPC: rearrange RPC sockets definitions
To prepare for including non-sockets-based RPC transports, move the
sockets-dependent definitions into their own file.

Signed-off-by: Tom Talpey <tmt@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:17:48 -04:00
\"Talpey, Thomas\
3c341b0b92 SUNRPC: rename the rpc_xprtsock_create structure
To prepare for including non-sockets-based RPC transports, change the
overly suggestive name of the transport creation arguments struct.

Signed-off-by: Tom Talpey <tmt@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:17:45 -04:00
\"Talpey, Thomas\
81c098af3d SUNRPC: Provide a new API for registering transport implementations
To allow transport capabilities to be loaded dynamically, provide an API
for registering and unregistering the transports with the RPC client.
Eventually xprt_create_transport() will be changed to search the list of
registered transports when initializing a fresh transport.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Tom Talpey <tmt@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:17:40 -04:00
\"Talpey, Thomas\
4f22ccc346 SUNRPC: mark bulk read/write data in xdrbuf
Adds a flag word to the xdrbuf struct which indicates any bulk
disposition of the data. This enables RPC transport providers to
marshal it efficiently/appropriately, and may enable other
optimizations.

Signed-off-by: Tom Talpey <tmt@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:17:34 -04:00
\"Talpey, Thomas\
4417c8c41a SUNRPC: export per-transport rpcbind netid's
The rpcbind (v3+) netid is provided by each RPC client transport. This fixes
an omission in IPv6 rpcbind client support, and enables future extension.

Signed-off-by: Tom Talpey <tmt@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:17:20 -04:00
\"Talpey, Thomas\
4f40ee4a02 SUNRPC: move per-transport rpcbind netid's
Move the TCP/UDP rpcbind netid's from the rpcbind client to a global header.

Signed-off-by: Tom Talpey <tmt@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:17:18 -04:00
Chuck Lever
89eb21c35b SUNRPC: fix a signed v. unsigned comparison nit in rpc_bind_new_program
/home/cel/linux/net/sunrpc/clnt.c: In function ‘rpc_bind_new_program’:
/home/cel/linux/net/sunrpc/clnt.c:445: warning:
	comparison between signed and unsigned

RPC version numbers are u32, not int.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:16:37 -04:00
Chuck Lever
756805e7a7 SUNRPC: Add support for formatted universal addresses
"Universal addresses" are a string representation of an IP address and
port.  They are described fully in RFC 3530, section 2.2.  Add support
for generating them in the RPC client's socket transport module.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2007-10-09 17:16:29 -04:00
Chuck Lever
fbfe3cc677 SUNRPC: Add hex-formatted address support to rpc_peeraddr2str()
Add support for the NFS client's need to export volume information
with IP addresses formatted in hex instead of decimal.

This isn't used yet, but subsequent patches (not in this series) will
change the NFS client to use this functionality.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:15:52 -04:00
J. Bruce Fields
be879c4e24 SUNRPC: move bkl locking and xdr proc invocation into a common helper
Since every invocation of xdr encode or decode functions takes the BKL now,
there's a lot of redundant lock_kernel/unlock_kernel pairs that we can pull
out into a common function.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-19 15:21:39 -04:00
J. Bruce Fields
4796f45740 knfsd: nfsd4: secinfo handling without secinfo= option
We could return some sort of error in the case where someone asks for secinfo
on an export without the secinfo= option set--that'd be no worse than what
we've been doing.  But it's not really correct.  So, hack up an approximate
secinfo response in that case--it may not be complete, but it'll tell the
client at least one acceptable security flavor.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:08 -07:00
J. Bruce Fields
3ab4d8b121 knfsd: nfsd: set rq_client to ip-address-determined-domain
We want it to be possible for users to restrict exports both by IP address and
by pseudoflavor.  The pseudoflavor information has previously been passed
using special auth_domains stored in the rq_client field.  After the preceding
patch that stored the pseudoflavor in rq_pflavor, that's now superfluous; so
now we use rq_client for the ip information, as auth_null and auth_unix do.

However, we keep around the special auth_domain in the rq_gssclient field for
backwards compatibility purposes, so we can still do upcalls using the old
"gss/pseudoflavor" auth_domain if upcalls using the unix domain to give us an
appropriate export.  This allows us to continue supporting old mountd.

In fact, for this first patch, we always use the "gss/pseudoflavor"
auth_domain (and only it) if it is available; thus rq_client is ignored in the
auth_gss case, and this patch on its own makes no change in behavior; that
will be left to later patches.

Note on idmap: I'm almost tempted to just replace the auth_domain in the idmap
upcall by a dummy value--no version of idmapd has ever used it, and it's
unlikely anyone really wants to perform idmapping differently depending on the
where the client is (they may want to perform *credential* mapping
differently, but that's a different matter--the idmapper just handles id's
used in getattr and setattr).  But I'm updating the idmapd code anyway, just
out of general backwards-compatibility paranoia.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:07 -07:00
Andy Adamson
c4170583f6 knfsd: nfsd4: store pseudoflavor in request
Add a new field to the svc_rqst structure to record the pseudoflavor that the
request was made with.  For now we record the pseudoflavor but don't use it
for anything.

Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:07 -07:00
Frank van Maarseveen
d3bc9a1deb SUNRPC client: add interface for binding to a local address
In addition to binding to a local privileged port the NFS client should
allow binding to a specific local address. This is used by the server
for callbacks. The patch adds the necessary interface.

Signed-off-by: Frank van Maarseveen <frankvm@frankvm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:49 -04:00
Frank van Maarseveen
a97476926e SUNRPC server: record the destination address of a request
Save the destination address of an incoming request over TCP like is
done already for UDP. It is necessary later for callbacks by the server.

Signed-off-by: Frank van Maarseveen <frankvm@frankvm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:49 -04:00
Frank van Maarseveen
96802a0951 SUNRPC: cleanup transport creation argument passing
Cleanup argument passing to functions for creating an RPC transport.

Signed-off-by: Frank van Maarseveen <frankvm@frankvm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:49 -04:00
Chuck Lever
45160d6275 SUNRPC: Rename rpcb_getport to be consistent with new rpcb_getport_sync name
Clean up, for consistency.  Rename rpcb_getport as rpcb_getport_async, to
match the naming scheme of rpcb_getport_sync.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:46 -04:00
Chuck Lever
cce63cd637 SUNRPC: Rename rpcb_getport_external routine
In preparation for handling NFS mount option parsing in the kernel,
rename rpcb_getport_external as rpcb_get_port_sync, and make it available
always (instead of only when CONFIG_ROOT_NFS is enabled).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:46 -04:00
Trond Myklebust
1be27f3660 SUNRPC: Remove the tk_auth macro...
We should almost always be deferencing the rpc_auth struct by means of the
credential's cr_auth field instead of the rpc_clnt->cl_auth anyway. Fix up
that historical mistake, and remove the macro that propagated it.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:37 -04:00
Trond Myklebust
5d28dc8207 SUNRPC: Convert gss_ctx_lock to an RCU lock
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:36 -04:00
Trond Myklebust
f5c2187cfe SUNRPC: Convert the credential garbage collector into a shrinker callback
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:36 -04:00
Trond Myklebust
9499b4341b SUNRPC: Give credential cache a local spinlock
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:36 -04:00
Trond Myklebust
31be5bf15f SUNRPC: Convert the credcache lookup code to use RCU
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:36 -04:00
Trond Myklebust
e092bdcd93 SUNRPC: cleanup rpc credential cache garbage collection
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:35 -04:00
Trond Myklebust
fc432dd907 SUNRPC: Enforce atomic updates of rpc_cred->cr_flags
Convert to the use of atomic bitops...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:35 -04:00
Trond Myklebust
5fe4755e25 SUNRPC: Clean up rpc credential initialisation
Add a helper rpc_cred_init()

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:35 -04:00
Trond Myklebust
f1c0a86150 SUNRPC: Mark auth and cred operation tables as constant.
Also do the same for gss_api operation tables.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:34 -04:00
Trond Myklebust
de7a8ce38a SUNRPC: Rename rpcauth_destroy() to rpcauth_release()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:34 -04:00
Trond Myklebust
5e1550d6a2 SUNRPC: Add the helper function 'rpc_call_null()'
Does a NULL RPC call and returns a pointer to the resulting rpc_task. The
call may be either synchronous or asynchronous.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:34 -04:00
Trond Myklebust
64c91a1f1c SUNRPC: Make rpc_ping() static
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:34 -04:00
Trond Myklebust
3ab9bb7243 SUNRPC: Fix a memory leak in the auth credcache code
The leak only affects the RPCSEC_GSS caches, since they are the only ones
that are dynamically allocated...
Rename the existing rpcauth_free_credcache() to rpcauth_clear_credcache()
in order to better describe its role, then add a new function
rpcauth_destroy_credcache() that actually frees the cache in addition to
clearing it out.

Also move the call to destroy the credcache in gss_destroy() to come before
the rpc upcall pipe is unlinked.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:33 -04:00
Trond Myklebust
03a1256f06 SUNRPC: Add a field to track the number of kernel users of an rpc_pipe
This allows us to correctly deduce when we need to remove the pipe.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:33 -04:00
Trond Myklebust
6e84c7b66a SUNRPC: Add a downcall queue to struct rpc_inode
Currently, the downcall queue is tied to the struct gss_auth, which means
that different RPCSEC_GSS pseudoflavours must use different upcall pipes.
Add a list to struct rpc_inode that can be used instead.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:32 -04:00
Trond Myklebust
4a8c1344dc SUNRPC: Add a backpointer from the struct rpc_cred to the rpc_auth
Cleans up an issue whereby rpcsec_gss uses the rpc_clnt->cl_auth. If we want
to be able to add several rpc_auths to a single rpc_clnt, then this abuse
must go.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:31 -04:00
Trond Myklebust
188fef11db SUNRPC: Move rpc_register_client and friends into net/sunrpc/clnt.c
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:30 -04:00
Trond Myklebust
4c402b4097 SUNRPC: Remove rpc_clnt->cl_count
The kref now does most of what cl_count + cl_user used to do. The only
remaining role for cl_count is to tell us if we are in a 'shutdown'
phase. We can provide that information using a single bit field instead
of a full atomic counter.

Also rename rpc_destroy_client() to rpc_close_client(), which reflects
better what its role is these days.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:29 -04:00
Trond Myklebust
90c5755ff5 SUNRPC: Kill rpc_clnt->cl_oneshot
Replace it with explicit calls to rpc_shutdown_client() or
rpc_destroy_client() (for the case of asynchronous calls).

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:29 -04:00
Trond Myklebust
848f1fe6be SUNRPC: Kill rpc_clnt->cl_dead
Its use is at best racy, and there is only one user (lockd), which has
additional locking that makes the whole thing redundant.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:29 -04:00
Trond Myklebust
34f52e3591 SUNRPC: Convert rpc_clnt->cl_users to a kref
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:28 -04:00
Trond Myklebust
4bef61ff75 SUNRPC: Add a per-rpc_clnt spinlock
Use that to protect the rpc_clnt->cl_tasks list instead of using a global
lock.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:28 -04:00
Trond Myklebust
6529eba08f SUNRPC: Move rpc_task->tk_task list into struct rpc_clnt
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:28 -04:00
Jens Axboe
cf8208d0ea sendfile: convert nfsd to splice_direct_to_actor()
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-10 08:04:14 +02:00
Trond Myklebust
7531d692d4 SUNRPC: Fix sparse warnings
- net/sunrpc/xprtsock.c:1635:5: warning: symbol 'init_socket_xprt' was not
   declared. Should it be static?
 - net/sunrpc/xprtsock.c:1649:6: warning: symbol 'cleanup_socket_xprt' was
   not declared. Should it be static?

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-05-14 19:33:47 -04:00
Jeff Layton
cd123012d9 RPC: add wrapper for svc_reserve to account for checksum
When the kernel calls svc_reserve to downsize the expected size of an RPC
reply, it fails to account for the possibility of a checksum at the end of
the packet.  If a client mounts a NFSv2/3 with sec=krb5i/p, and does I/O
then you'll generally see messages similar to this in the server's ring
buffer:

RPC request reserved 164 but used 208

While I was never able to verify it, I suspect that this problem is also
the root cause of some oopses I've seen under these conditions:

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=227726

This is probably also a problem for other sec= types and for NFSv4.  The
large reserved size for NFSv4 compound packets seems to generally paper
over the problem, however.

This patch adds a wrapper for svc_reserve that accounts for the possibility
of a checksum.  It also fixes up the appropriate callers of svc_reserve to
call the wrapper.  For now, it just uses a hardcoded value that I
determined via testing.  That value may need to be revised upward as things
change, or we may want to eventually add a new auth_op that attempts to
calculate this somehow.

Unfortunately, there doesn't seem to be a good way to reliably determine
the expected checksum length prior to actually calculating it, particularly
with schemes like spkm3.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:54 -07:00
NeilBrown
7ac1bea550 knfsd: rename sk_defer_lock to sk_lock
Now that sk_defer_lock protects two different things, make the name more
generic.

Also don't bother with disabling _bh as the lock is only ever taken from
process context.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:54 -07:00
Chuck Lever
4c2eaf073f SUNRPC: remove old portmapper
net/sunrpc/pmap_clnt.c has been replaced by net/sunrpc/rpcb_clnt.c.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:15 -07:00
Chuck Lever
a509050bd3 SUNRPC: introduce rpcbind: replacement for in-kernel portmapper
Introduce a replacement for the in-kernel portmapper client that supports
all 3 versions of the rpcbind protocol.  This code is not used yet.

Original code by Groupe Bull updated for the latest kernel, with multiple
bug fixes.

Note that rpcb_clnt.c does not yet support registering via versions 3 and
4 of the rpcbind protocol.  That is planned for a later patch.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:12 -07:00
Chuck Lever
c5a4dd8b7c SUNRPC: Eliminate side effects from rpc_malloc
Currently rpc_malloc sets req->rq_buffer internally.  Make this a more
generic interface:  return a pointer to the new buffer (or NULL) and
make the caller set req->rq_buffer and req->rq_bufsize.  This looks much
more like kmalloc and eliminates the side effects.

To fix a potential deadlock, this patch also replaces GFP_NOFS with
GFP_NOWAIT in rpc_malloc.  This prevents async RPCs from sleeping outside
the RPC's task scheduler while allocating their buffer.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:11 -07:00
Chuck Lever
2bea90d43a SUNRPC: RPC buffer size estimates are too large
The RPC buffer size estimation logic in net/sunrpc/clnt.c always
significantly overestimates the requirements for the buffer size.
A little instrumentation demonstrated that in fact rpc_malloc was never
allocating the buffer from the mempool, but almost always called kmalloc.

To compute the size of the RPC buffer more precisely, split p_bufsiz into
two fields; one for the argument size, and one for the result size.

Then, compute the sum of the exact call and reply header sizes, and split
the RPC buffer precisely between the two.  That should keep almost all RPC
buffers within the 2KiB buffer mempool limit.

And, we can finally be rid of RPC_SLACK_SPACE!

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:10 -07:00
NeilBrown
cda1fd4abd [PATCH] knfsd: fix recently introduced problem with shutting down a busy NFS server
When the last thread of nfsd exits, it shuts down all related sockets.  It
currently uses svc_close_socket to do this, but that only is immediately
effective if the socket is not SK_BUSY.

If the socket is busy - i.e.  if a request has arrived that has not yet been
processes - svc_close_socket is not effective and the shutdown process spins.

So create a new svc_force_close_socket which removes the SK_BUSY flag is set
and then calls svc_close_socket.

Also change some open-codes loops in svc_destroy to use
list_for_each_entry_safe.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-06 09:30:26 -08:00
NeilBrown
5a05ed73e1 [PATCH] knfsd: remove CONFIG_IPV6 ifdefs from sunrpc server code
They don't really save that much, and aren't worth the hassle.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-06 09:30:26 -08:00
Ralf Baechle
7b8f850beb [PATCH] Fix build errors if bitop functions are do {} while macros
If one of clear_bit, change_bit or set_bit is defined as a do { } while (0)
function usage of these functions in parenthesis like

  (foo_bit(23, &var))

while be expaned to something like

  (do { ... } while (0)}).

resulting in a build error.  This patch removes the useless parenthesis.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-20 17:10:12 -08:00
Eric W. Biederman
50d851f722 [PATCH] sysctl: move CTL_SUNRPC to sysctl.h where it belongs
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-14 08:09:55 -08:00
Trond Myklebust
d9bc125caf Merge branch 'master' of /home/trondmy/kernel/linux-2.6/
Conflicts:

	net/sunrpc/auth_gss/gss_krb5_crypto.c
	net/sunrpc/auth_gss/gss_spkm3_token.c
	net/sunrpc/clnt.c

Merge with mainline and fix conflicts.
2007-02-12 22:43:25 -08:00
Chuck Lever
43d78ef2ba NFS: disconnect before retrying NFSv4 requests over TCP
RFC3530 section 3.1.1 states an NFSv4 client MUST NOT send a request
twice on the same connection unless it is the NULL procedure.  Section
3.1.1 suggests that the client should disconnect and reconnect if it
wants to retry a request.

Implement this by adding an rpc_clnt flag that an ULP can use to
specify that the underlying transport should be disconnected on a
major timeout.  The NFSv4 client asserts this new flag, and requests
no retries after a minor retransmit timeout.

Note that disconnecting on a retransmit is in general not safe to do
if the RPC client does not reuse the TCP port number when reconnecting.

See http://bugzilla.linux-nfs.org/show_bug.cgi?id=6

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-02-12 22:40:45 -08:00
Chuck Lever
95756482c9 [PATCH] knfsd: SUNRPC: support IPv6 addresses in RPC server's UDP receive path
Add support for IPv6 addresses in the RPC server's UDP receive path.

[akpm@linux-foundation.org: cleanups]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:36 -08:00
Chuck Lever
73df0dbaff [PATCH] knfsd: SUNRPC: Make rq_daddr field address-version independent
The rq_daddr field must support larger addresses.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:36 -08:00
Chuck Lever
27459f0940 [PATCH] knfsd: SUNRPC: Provide room in svc_rqst for larger addresses
Expand the rq_addr field to allow it to contain larger addresses.

Specifically, we replace a 'sockaddr_in' with a 'sockaddr_storage', then
everywhere the 'sockaddr_in' was referenced, we use instead an accessor
function (svc_addr_in) which safely casts the _storage to _in.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:36 -08:00
Chuck Lever
2442222283 [PATCH] knfsd: SUNRPC: Use sockaddr_storage to store address in svc_deferred_req
Sockaddr_storage will allow us to store arbitrary socket addresses in the
svc_deferred_req struct.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:35 -08:00
Chuck Lever
ad06e4bd62 [PATCH] knfsd: SUNRPC: Add a function to format the address in an svc_rqst for printing
There are loads of places where the RPC server assumes that the rq_addr fields
contains an IPv4 address.  Top among these are error and debugging messages
that display the server's IP address.

Let's refactor the address printing into a separate function that's smart
enough to figure out the difference between IPv4 and IPv6 addresses.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:35 -08:00
Chuck Lever
067d781731 [PATCH] knfsd: SUNRPC: Cache remote peer's address in svc_sock
The remote peer's address won't change after the socket has been accepted.  We
don't need to call ->getname on every incoming request.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:35 -08:00
Chuck Lever
482fb94e1b [PATCH] knfsd: SUNRPC: allow creating an RPC service without registering with portmapper
Sometimes we need to create an RPC service but not register it with the local
portmapper.  NFSv4 delegation callback, for example.

Change the svc_makesock() API to allow optionally creating temporary or
permanent sockets, optionally registering with the local portmapper, and make
it return the ephemeral port of the new socket.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:35 -08:00
Chuck Lever
6b174337e5 [PATCH] knfsd: SUNRPC: update internal API: separate pmap register and temp sockets
Currently in the RPC server, registering with the local portmapper and
creating "permanent" sockets are tied together.  Expand the internal APIs to
allow these two socket characteristics to be separately specified.

This will be externalized in the next patch.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:35 -08:00
NeilBrown
aaf68cfbf2 [PATCH] knfsd: fix a race in closing NFSd connections
If you lose this race, it can iput a socket inode twice and you get a BUG
in fs/inode.c

When I added the option for user-space to close a socket, I added some
cruft to svc_delete_socket so that I could call that function when closing
a socket per user-space request.

This was the wrong thing to do.  I should have just set SK_CLOSE and let
normal mechanisms do the work.

Not only wrong, but buggy.  The locking is all wrong and it openned up a
race where-by a socket could be closed twice.

So this patch:
  Introduces svc_close_socket which sets SK_CLOSE then either leave
  the close up to a thread, or calls svc_delete_socket if it can
  get SK_BUSY.

  Adds a bias to sk_busy which is removed when SK_DEAD is set,
  This avoid races around shutting down the socket.

  Changes several 'spin_lock' to 'spin_lock_bh' where the _bh
  was missing.

Bugzilla-url: http://bugzilla.kernel.org/show_bug.cgi?id=7916

Signed-off-by: Neil Brown <neilb@suse.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-09 09:25:47 -08:00
Trond Myklebust
2efef837fb RPC: Clean up rpc_execute...
The error values are already propagated through task->tk_status, and
none of the callers check one without checking the other, so we can
drop the return value.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-02-03 15:35:03 -08:00
NeilBrown
250f391518 [PATCH] knfsd: fix an NFSD bug with full sized, non-page-aligned reads
NFSd assumes that largest number of pages that will be needed for a
request+response is 2+N where N pages is the size of the largest permitted
read/write request.  The '2' are 1 for the non-data part of the request, and 1
for the non-data part of the reply.

However, when a read request is not page-aligned, and we choose to use
->sendfile to send it directly from the page cache, we may need N+1 pages to
hold the whole reply.  This can overflow and array and cause an Oops.

This patch increases size of the array for holding pages by one and makes sure
that entry is NULL when it is not in use.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26 13:50:59 -08:00
Trond Myklebust
bde8f00ce6 [PATCH] NFS: Fix Oops in rpc_call_sync()
Fix the Oops in http://bugzilla.linux-nfs.org/show_bug.cgi?id=138
We shouldn't be calling rpc_release_task() for tasks that are not active.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-24 12:31:06 -08:00
Trond Myklebust
21b4e73692 Merge branch 'master' of /home/trondmy/kernel/linux-2.6/ into merge_linus 2006-12-07 16:35:17 -05:00
Trond Myklebust
34161db6b1 Merge branch 'master' of /home/trondmy/kernel/linux-2.6/ into merge_linus
Conflicts:

	include/linux/sunrpc/xprt.h
	net/sunrpc/xprtsock.c
Fix up conflicts with the workqueue changes.
2006-12-07 15:48:15 -05:00
Peter Zijlstra
6cfd76a26d [PATCH] lockdep: name some old style locks
Name some of the remaning 'old_style_spin_init' locks

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:36 -08:00
Chuck Lever
5847e1f4d0 SUNRPC: Remove pprintk() from net/sunrpc/xprt.c
These appear to be deprecated.  Removing them also gets rid of some sparse
noise.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:55 -05:00
Chuck Lever
dd4564715e SUNRPC: Rename skb_reader_t and friends
Clean-up:  hch suggested that the RPC client shouldn't pollute the name
space used by the generic skb manipulation routines in net/core/skbuff.c.

Rename a couple of types in xdr.h to adhere to this convention.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:52 -05:00
Chuck Lever
9d29231690 SUNRPC: skb_read_bits is the same as xs_tcp_copy_data
Clean-up: eliminate xs_tcp_copy_data -- it's exactly the same logic as the
common routine skb_read_bits.  The UDP and TCP socket read code now share
the same routine for copying data into an xdr_buf.

Now that skb_read_bits() is exported, rename it to avoid confusing it with
a generic skb_* function.  As these functions are XDR-specific, they should
not have names that suggest they are of generic use.  Also rename
skb_read_and_csum_bits() to be consistent.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:52 -05:00
Chuck Lever
7559c7a28f SUNRPC: Make address format buffers more generic
For now we will assume that all transports will use the address format
buffers in the rpc_xprt struct to store their addresses.  Change
rpc_peer2str() to be a generic routine to handle this, and get rid of the
print_address() op in the rpc_xprt_ops vector.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:51 -05:00
Chuck Lever
314dfd7987 SUNRPC: move saved socket callback functions to a private data structure
Move the three fields for saving socket callback functions out of the
rpc_xprt structure and into a private data structure maintained in
net/sunrpc/xprtsock.c.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:51 -05:00
Chuck Lever
7c6e066ec2 SUNRPC: Move the UDP socket bufsize parameters to a private data structure
Move the socket-specific buffer size parameters for UDP sockets to a
private data structure maintained in net/sunrpc/xprtsock.c.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:50 -05:00
Chuck Lever
c847546182 SUNRPC: Move rpc_xprt socket connect fields into private data structure
Move the socket-specific connection management fields out of the generic
rpc_xprt structure into a private data structure maintained in
net/sunrpc/xprtsock.c.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:50 -05:00
Chuck Lever
e136d0926e SUNRPC: Move TCP state flags into xprtsock.c
Move "XPRT_LAST_FRAG" and friends from xprt.h into xprtsock.c, and rename
them to use the naming scheme in use in xprtsock.c.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:50 -05:00
Chuck Lever
51971139b2 SUNRPC: Move TCP receive state variables into private data structure
Move the TCP receive state variables from the generic rpc_xprt structure to
a private structure maintained inside net/sunrpc/xprtsock.c.

Also rename a function/variable pair to refer to RPC fragment headers
instead of record markers, to be consistent with types defined in
sunrpc/*.h.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:49 -05:00
Chuck Lever
ee0ac0c227 SUNRPC: Remove sock and inet fields from rpc_xprt
The "sock" and "inet" fields are socket-specific.  Move them to a private
data structure maintained entirely within net/sunrpc/xprtsock.c

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:49 -05:00
J. Bruce Fields
717757ad10 rpcgss: krb5: ignore seed
We're currently not actually using seed or seed_init.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:47 -05:00
J. Bruce Fields
d922a84a8b rpcgss: krb5: sanity check sealalg value in the downcall
The sealalg is checked in several places, giving the impression it could be
either SEAL_ALG_NONE or SEAL_ALG_DES.  But in fact SEAL_ALG_NONE seems to
be sufficient only for making mic's, and all the contexts we get must be
capable of wrapping as well.  So the sealalg must be SEAL_ALG_DES.  As
with signalg, just check for the right value on the downcall and ignore it
otherwise.  Similarly, tighten expectations for the sealalg on incoming
tokens, in case we do support other values eventually.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:47 -05:00
J. Bruce Fields
ca54f89645 rpcgss: simplify make_checksum
We're doing some pointless translation between krb5 constants and kernel
crypto string names.

Also clean up some related spkm3 code as necessary.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:46 -05:00
J. Bruce Fields
e678e06bf8 gss: krb5: remove signalg and sealalg
We designed the krb5 context import without completely understanding the
context.  Now it's clear that there are a number of fields that we ignore,
or that we depend on having one single value.

In particular, we only support one value of signalg currently; so let's
check the signalg field in the downcall (in case we decide there's
something else we could support here eventually), but ignore it otherwise.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:44 -05:00
Olga Kornievskaia
adeb8133dd rpc: spkm3 update
This updates the spkm3 code to bring it up to date with our current
understanding of the spkm3 spec.

In doing so, we're changing the downcall format used by gssd in the spkm3 case,
which will cause an incompatilibity with old userland spkm3 support.  Since the
old code a) didn't implement the protocol correctly, and b) was never
distributed except in the form of some experimental patches from the citi web
site, we're assuming this is OK.

We do detect the old downcall format and print warning (and fail).  We also
include a version number in the new downcall format, to be used in the
future in case any further change is required.

In some more detail:

	- fix integrity support
	- removed dependency on NIDs. instead OIDs are used
	- known OID values for algorithms added.
	- fixed some context fields and types

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:44 -05:00
Olga Kornievskaia
37a4e6cb03 rpc: move process_xdr_buf
Since process_xdr_buf() is useful outside of the kerberos-specific code, we
move it to net/sunrpc/xdr.c, export it, and rename it in keeping with xdr_*
naming convention of xdr.c.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:44 -05:00
J. Bruce Fields
8fc7500bb8 rpc: gss: eliminate print_hexl()'s
Dumping all this data to the logs is wasteful (even when debugging is turned
off), and creates too much output to be useful when it's turned on.

Fix a minor style bug or two while we're at it.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:43 -05:00
Chuck Lever
c8541ecdd5 SUNRPC: Make the transport-specific setup routine allocate rpc_xprt
Change the location where the rpc_xprt structure is allocated so each
transport implementation can allocate a private area from the same
chunk of memory.

Note also that xprt->ops->destroy, rather than xprt_destroy, is now
responsible for freeing rpc_xprt when the transport is destroyed.

Test plan:
Connectathon.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:34 -05:00
Chuck Lever
e744cf2e3a SUNRPC: minor optimization of "xid" field in rpc_xprt
Move the xid field in the rpc_xprt structure to be in the same cache line
as the reserve_lock, since these are used at the same time.

Test plan:
None.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:34 -05:00
Trond Myklebust
1e78957e0a SUNRPC: Clean up argument types in xdr.c
Converts various integer buffer offsets and sizes to unsigned integer.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:32 -05:00
Trond Myklebust
bbd5a1f9fc SUNRPC: Fix up missing BKL in asynchronous RPC callback functions
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:29 -05:00
Trond Myklebust
3e32a5d99a SUNRPC: Give cloned RPC clients their own rpc_pipefs directory
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:29 -05:00
Trond Myklebust
8aca67f0ae SUNRPC: Fix a potential race in rpc_wake_up_task()
Use RCU to ensure that we can safely call rpc_finish_wakeup after we've
called __rpc_do_wake_up_task. If not, there is a theoretical race, in which
the rpc_task finishes executing, and gets freed first.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:26 -05:00
Trond Myklebust
e6b3c4db6f Fix a second potential rpc_wakeup race...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:25 -05:00
David Howells
4c1ac1b491 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:

	drivers/infiniband/core/iwcm.c
	drivers/net/chelsio/cxgb2.c
	drivers/net/wireless/bcm43xx/bcm43xx_main.c
	drivers/net/wireless/prism54/islpci_eth.c
	drivers/usb/core/hub.h
	drivers/usb/input/hid-core.c
	net/core/netpoll.c

Fix up merge failures with Linus's head and fix new compilation failures.

Signed-Off-By: David Howells <dhowells@redhat.com>
2006-12-05 14:37:56 +00:00
Al Viro
44bb93633f [NET]: Annotate csum_partial() callers in net/*
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:23:32 -08:00
David Howells
52bad64d95 WorkStruct: Separate delayable and non-delayable events.
Separate delayable work items from non-delayable work items be splitting them
into a separate structure (delayed_work), which incorporates a work_struct and
the timer_list removed from work_struct.

The work_struct struct is huge, and this limits it's usefulness.  On a 64-bit
architecture it's nearly 100 bytes in size.  This reduces that by half for the
non-delayable type of event.

Signed-Off-By: David Howells <dhowells@redhat.com>
2006-11-22 14:54:01 +00:00
Al Viro
7111c66e4e [PATCH] fix svc_procfunc declaration
svc_procfunc instances return __be32, not int

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-20 10:26:40 -07:00
NeilBrown
d343fce148 [PATCH] knfsd: Allow lockd to drop replies as appropriate
It is possible for the ->fopen callback from lockd into nfsd to find that an
answer cannot be given straight away (an upcall is needed) and so the request
has to be 'dropped', to be retried later.  That error status is not currently
propagated back.

So:
  Change nlm_fopen to return nlm error codes (rather than a private
  protocol) and define a new nlm_drop_reply code.
  Cause nlm_drop_reply to cause the rpc request to get rpc_drop_reply
  when this error comes back.
  Cause svc_process to drop a request which returns a status of
  rpc_drop_reply.

[akpm@osdl.org: fix warning storm]
Cc: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-17 08:18:46 -07:00
NeilBrown
c6b0a9f87b [PATCH] knfsd: tidy up up meaning of 'buffer size' in nfsd/sunrpc
There is some confusion about the meaning of 'bufsz' for a sunrpc server.
In some cases it is the largest message that can be sent or received.  In
other cases it is the largest 'payload' that can be included in a NFS
message.

In either case, it is not possible for both the request and the reply to be
this large.  One of the request or reply may only be one page long, which
fits nicely with NFS.

So we remove 'bufsz' and replace it with two numbers: 'max_payload' and
'max_mesg'.  Max_payload is the size that the server requests.  It is used
by the server to check the max size allowed on a particular connection:
depending on the protocol a lower limit might be used.

max_mesg is the largest single message that can be sent or received.  It is
calculated as the max_payload, rounded up to a multiple of PAGE_SIZE, and
with PAGE_SIZE added to overhead.  Only one of the request and reply may be
this size.  The other must be at most one page.

Cc: Greg Banks <gnb@sgi.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-06 08:53:41 -07:00
Olaf Kirch
bc5fea4299 [PATCH] knfsd: register all RPC programs with portmapper by default
The NFSACL patches introduced support for multiple RPC services listening on
the same transport.  However, only the first of these services was registered
with portmapper.  This was perfectly fine for nfsacl, as you traditionally do
not want these to show up in a portmapper listing.

The patch below changes the default behavior to always register all services
listening on a given transport, but retains the old behavior for nfsacl
services.

Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:19 -07:00
Greg Banks
7b2b1fee30 [PATCH] knfsd: knfsd: cache ipmap per TCP socket
Speed up high call-rate workloads by caching the struct ip_map for the peer on
the connected struct svc_sock instead of looking it up in the ip_map cache
hashtable on every call.  This helps workloads using AUTH_SYS authentication
over TCP.

Testing was on a 4 CPU 4 NIC Altix using 4 IRIX clients, each with 16
synthetic client threads simulating an rsync (i.e.  recursive directory
listing) workload reading from an i386 RH9 install image (161480 regular files
in 10841 directories) on the server.  That tree is small enough to fill in the
server's RAM so no disk traffic was involved.  This setup gives a sustained
call rate in excess of 60000 calls/sec before being CPU-bound on the server.

Profiling showed strcmp(), called from ip_map_match(), was taking 4.8% of each
CPU, and ip_map_lookup() was taking 2.9%.  This patch drops both contribution
into the profile noise.

Note that the above result overstates this value of this patch for most
workloads.  The synthetic clients are all using separate IP addresses, so
there are 64 entries in the ip_map cache hash.  Because the kernel measured
contained the bug fixed in commit

commit 1f1e030bf7

and was running on 64bit little-endian machine, probably all of those 64
entries were on a single chain, thus increasing the cost of ip_map_lookup().

With a modern kernel you would need more clients to see the same amount of
performance improvement.  This patch has helped to scale knfsd to handle a
deployment with 2000 NFS clients.

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:16 -07:00
Greg Banks
7adae489fe [PATCH] knfsd: Prepare knfsd for support of rsize/wsize of up to 1MB, over TCP
The limit over UDP remains at 32K.  Also, make some of the apparently
arbitrary sizing constants clearer.

The biggest change here involves replacing NFSSVC_MAXBLKSIZE by a function of
the rqstp.  This allows it to be different for different protocols (udp/tcp)
and also allows it to depend on the servers declared sv_bufsiz.

Note that we don't actually increase sv_bufsz for nfs yet.  That comes next.

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:16 -07:00
NeilBrown
3cc03b164c [PATCH] knfsd: Avoid excess stack usage in svc_tcp_recvfrom
..  by allocating the array of 'kvec' in 'struct svc_rqst'.

As we plan to increase RPCSVC_MAXPAGES from 8 upto 256, we can no longer
allocate an array of this size on the stack.  So we allocate it in 'struct
svc_rqst'.

However svc_rqst contains (indirectly) an array of the same type and size
(actually several, but they are in a union).  So rather than waste space, we
move those arrays out of the separately allocated union and into svc_rqst to
share with the kvec moved out of svc_tcp_recvfrom (various arrays are used at
different times, so there is no conflict).

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:15 -07:00
NeilBrown
4452435948 [PATCH] knfsd: Replace two page lists in struct svc_rqst with one
We are planning to increase RPCSVC_MAXPAGES from about 8 to about 256.  This
means we need to be a bit careful about arrays of size RPCSVC_MAXPAGES.

struct svc_rqst contains two such arrays.  However the there are never more
that RPCSVC_MAXPAGES pages in the two arrays together, so only one array is
needed.

The two arrays are for the pages holding the request, and the pages holding
the reply.  Instead of two arrays, we can simply keep an index into where the
first reply page is.

This patch also removes a number of small inline functions that probably
server to obscure what is going on rather than clarify it, and opencode the
needed functionality.

Also remove the 'rq_restailpage' variable as it is *always* 0.  i.e.  if the
response 'xdr' structure has a non-empty tail it is always in the same pages
as the head.

 check counters are initilised and incr properly
 check for consistant usage of ++ etc
 maybe extra some inlines for common approach
 general review

Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Magnus Maatta <novell@kiruna.se>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:15 -07:00
Uwe Zeisberger
f30c226954 fix file specification in comments
Many files include the filename at the beginning, serveral used a wrong one.

Signed-off-by: Uwe Zeisberger <Uwe_Zeisberger@digi.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-10-03 23:01:26 +02:00
Greg Banks
bfd241600a [PATCH] knfsd: make rpc threads pools numa aware
Actually implement multiple pools.  On NUMA machines, allocate a svc_pool per
NUMA node; on SMP a svc_pool per CPU; otherwise a single global pool.  Enqueue
sockets on the svc_pool corresponding to the CPU on which the socket bh is run
(i.e.  the NIC interrupt CPU).  Threads have their cpu mask set to limit them
to the CPUs in the svc_pool that owns them.

This is the patch that allows an Altix to scale NFS traffic linearly
beyond 4 CPUs and 4 NICs.

Incorporates changes and feedback from Neil Brown, Trond Myklebust, and
Christoph Hellwig.

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:20 -07:00
Greg Banks
a74554429e [PATCH] knfsd: add svc_set_num_threads
Currently knfsd keeps its own list of all nfsd threads in nfssvc.c; add a new
way of managing the list of all threads in a svc_serv.  Add
svc_create_pooled() to allow creation of a svc_serv whose threads are managed
by the sunrpc code.  Add svc_set_num_threads() to manage the number of threads
in a service, either per-pool or globally across the service.

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:19 -07:00
Greg Banks
9a24ab5749 [PATCH] knfsd: add svc_get
add svc_get() for those occasions when we need to temporarily bump up
svc_serv->sv_nrthreads as a pseudo refcount.

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:19 -07:00
Greg Banks
3262c816a3 [PATCH] knfsd: split svc_serv into pools
Split out the list of idle threads and pending sockets from svc_serv into a
new svc_pool structure, and allocate a fixed number (in this patch, 1) of
pools per svc_serv.  The new structure contains a lock which takes over
several of the duties of svc_serv->sv_lock, which is now relegated to
protecting only sv_tempsocks, sv_permsocks, and sv_tmpcnt in svc_serv.

The point is to move the hottest fields out of svc_serv and into svc_pool,
allowing a following patch to arrange for a svc_pool per NUMA node or per CPU.
 This is a major step towards making the NFS server NUMA-friendly.

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:19 -07:00
Greg Banks
5685f0fa1c [PATCH] knfsd: convert sk_reserved to atomic_t
Convert the svc_sock->sk_reserved variable from an int protected by
svc_serv->sv_lock, to an atomic.  This reduces (by 1) the number of places we
need to take the (effectively global) svc_serv->sv_lock.

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:19 -07:00
Greg Banks
1a68d952af [PATCH] knfsd: use new lock for svc_sock deferred list
Protect the svc_sock->sk_deferred list with a new lock svc_sock->sk_defer_lock
instead of svc_serv->sv_lock.  Using the more fine-grained lock reduces the
number of places we need to take the svc_serv lock.

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:19 -07:00
Greg Banks
c45c357d7d [PATCH] knfsd: convert sk_inuse to atomic_t
Convert the svc_sock->sk_inuse counter from an int protected by
svc_serv->sv_lock, to an atomic.  This reduces the number of places we need to
take the (effectively global) svc_serv->sv_lock.

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:19 -07:00
Greg Banks
36bdfc8bae [PATCH] knfsd: move tempsock aging to a timer
Following are 11 patches from Greg Banks which combine to make knfsd more
Numa-aware.  They reduce hitting on 'global' data structures, and create some
data-structures that can be node-local.

knfsd threads are bound to a particular node, and the thread to handle a new
request is chosen from the threads that are attach to the node that received
the interrupt.

The distribution of threads across nodes can be controlled by a new file in
the 'nfsd' filesystem, though the default approach of an even spread is
probably fine for most sites.

Some (old) numbers that show the efficacy of these patches: N == number of
NICs == number of CPUs == nmber of clients.  Number of NUMA nodes == N/2

N	Throughput, MiB/s	CPU usage, % (max=N*100)
	Before	After		Before	After
	---	------	----		-----	-----
	4	312	435		350	228
	6	500	656		501	418
	8	562	804		690	589

This patch:

Move the aging of RPC/TCP connection sockets from the main svc_recv() loop to
a timer which uses a mark-and-sweep algorithm every 6 minutes.  This reduces
the amount of work that needs to be done in the main RPC loop and the length
of time we need to hold the (effectively global) svc_serv->sv_lock.

[akpm@osdl.org: cleanup]
Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:19 -07:00
NeilBrown
6fb2b47fa1 [PATCH] knfsd: Drop 'serv' option to svc_recv and svc_process
It isn't needed as it is available in rqstp->rq_server, and dropping it allows
some local vars to be dropped.

[akpm@osdl.org: build fix]
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:18 -07:00
NeilBrown
b41b66d63c [PATCH] knfsd: allow sockets to be passed to nfsd via 'portlist'
Userspace should create and bind a socket (but not connectted) and write the
'fd' to portlist.  This will cause the nfs server to listen on that socket.

To close a socket, the name of the socket - as read from 'portlist' can be
written to 'portlist' with a preceding '-'.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:18 -07:00
NeilBrown
80212d59e3 [PATCH] knfsd: define new nfsdfs file: portlist - contains list of ports
This file will list all ports that nfsd has open.
Default when TCP enabled will be
   ipv4 udp 0.0.0.0 2049
   ipv4 tcp 0.0.0.0 2049

Later, the list of ports will be settable.

'portlist' chosen rather than 'ports', to avoid unnecessary confusion with
non-mainline patches which created 'ports' with different semantics.

[akpm@osdl.org: cleanups, build fix]
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:18 -07:00