Commit graph

5376 commits

Author SHA1 Message Date
Jens Axboe
5c61ee2cd5 Linux 5.1-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAly8rGYeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGmZMH/1IRB0E1Qmzz8yzw
 wj79UuRGYPqxDDSWW+wNc8sU4Ic7iYirn9APHAztCdQqsjmzU/OVLfSa3JhdBe5w
 THo7pbGKBqEDcWnKfNk/21jXFNLZ1vr9BoQv2DGU2MMhHAyo/NZbalo2YVtpQPmM
 OCRth5n+LzvH7rGrX7RYgWu24G9l3NMfgtaDAXBNXesCGFAjVRrdkU5CBAaabvtU
 4GWh/nnutndOOLdByL3x+VZ3H3fIBnbNjcIGCglvvqzk7h3hrfGEl4UCULldTxcM
 IFsfMUhSw1ENy7F6DHGbKIG90cdCJcrQ8J/ziEzjj/KLGALluutfFhVvr6YCM2J6
 2RgU8CY=
 =CfY1
 -----END PGP SIGNATURE-----

Merge tag 'v5.1-rc6' into for-5.2/block

Pull in v5.1-rc6 to resolve two conflicts. One is in BFQ, in just a
comment, and is trivial. The other one is a conflict due to a later fix
in the bio multi-page work, and needs a bit more care.

* tag 'v5.1-rc6': (770 commits)
  Linux 5.1-rc6
  block: make sure that bvec length can't be overflow
  block: kill all_q_node in request_queue
  x86/cpu/intel: Lower the "ENERGY_PERF_BIAS: Set to normal" message's log priority
  coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping
  mm/kmemleak.c: fix unused-function warning
  init: initialize jump labels before command line option parsing
  kernel/watchdog_hld.c: hard lockup message should end with a newline
  kcov: improve CONFIG_ARCH_HAS_KCOV help text
  mm: fix inactive list balancing between NUMA nodes and cgroups
  mm/hotplug: treat CMA pages as unmovable
  proc: fixup proc-pid-vm test
  proc: fix map_files test on F29
  mm/vmstat.c: fix /proc/vmstat format for CONFIG_DEBUG_TLBFLUSH=y CONFIG_SMP=n
  mm/memory_hotplug: do not unlock after failing to take the device_hotplug_lock
  mm: swapoff: shmem_unuse() stop eviction without igrab()
  mm: swapoff: take notice of completion sooner
  mm: swapoff: remove too limiting SWAP_UNUSE_MAX_TRIES
  mm: swapoff: shmem_find_swap_entries() filter out other types
  slab: store tagged freelist for off-slab slabmgmt
  ...

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-22 09:47:36 -06:00
Olga Kornievskaia
0769663b4f NFSv4.1 fix incorrect return value in copy_file_range
According to the NFSv4.2 spec if the input and output file is the
same file, operation should fail with EINVAL. However, linux
copy_file_range() system call has no such restrictions. Therefore,
in such case let's return EOPNOTSUPP and allow VFS to fallback
to doing do_splice_direct(). Also when copy_file_range is called
on an NFSv4.0 or 4.1 mount (ie., a server that doesn't support
COPY functionality), we also need to return EOPNOTSUPP and
fallback to a regular copy.

Fixes xfstest generic/075, generic/091, generic/112, generic/263
for all NFSv4.x versions.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-04-11 15:23:48 -04:00
Chuck Lever
29e7ca715f NFS: Fix handling of reply page vector
NFSv4 GETACL and FS_LOCATIONS requests stopped working in v5.1-rc.

These two need the extra padding to be added directly to the reply
length.

Reported-by: Olga Kornievskaia <aglo@umich.edu>
Fixes: 02ef04e432 ("NFS: Account for XDR pad of buf->pages")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Olga Kornievskaia <aglo@umich.edu>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-04-11 15:23:48 -04:00
Tetsuo Handa
7c2bd9a398 NFS: Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family.
syzbot is reporting uninitialized value at rpc_sockaddr2uaddr() [1]. This
is because syzbot is setting AF_INET6 to "struct sockaddr_in"->sin_family
(which is embedded into user-visible "struct nfs_mount_data" structure)
despite nfs23_validate_mount_data() cannot pass sizeof(struct sockaddr_in6)
bytes of AF_INET6 address to rpc_sockaddr2uaddr().

Since "struct nfs_mount_data" structure is user-visible, we can't change
"struct nfs_mount_data" to use "struct sockaddr_storage". Therefore,
assuming that everybody is using AF_INET family when passing address via
"struct nfs_mount_data"->addr, reject if its sin_family is not AF_INET.

[1] https://syzkaller.appspot.com/bug?id=599993614e7cbbf66bc2656a919ab2a95fb5d75c

Reported-by: syzbot <syzbot+047a11c361b872896a4f@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-04-11 15:23:48 -04:00
Christoph Hellwig
72deb455b5 block: remove CONFIG_LBDAF
Currently support for 64-bit sector_t and blkcnt_t is optional on 32-bit
architectures.  These types are required to support block device and/or
file sizes larger than 2 TiB, and have generally defaulted to on for
a long time.  Enabling the option only increases the i386 tinyconfig
size by 145 bytes, and many data structures already always use
64-bit values for their in-core and on-disk data structures anyway,
so there should not be a large change in dynamic memory usage either.

Dropping this option removes a somewhat weird non-default config that
has cause various bugs or compiler warnings when actually used.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-06 10:48:35 -06:00
Trond Myklebust
166bd5b889 pNFS/flexfiles: Fix layoutstats handling during read failovers
During a read failover, we may end up changing the value of
the pgio_mirror_idx, so make sure that we record the layout
stats before that update.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-23 12:03:58 -04:00
Trond Myklebust
5a69824393 NFS: Fix a typo in nfs_init_timeout_values()
Specifying a retrans=0 mount parameter to a NFS/TCP mount, is
inadvertently causing the NFS client to rewrite any specified
timeout parameter to the default of 60 seconds.

Fixes: a956beda19 ("NFS: Allow the mount option retrans=0")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-23 12:03:58 -04:00
Olga Kornievskaia
0cb98abb5b NFSv4.1 don't free interrupted slot on open
Allow the async rpc task for finish and update the open state if needed,
then free the slot. Otherwise, the async rpc unable to decode the reply.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Fixes: ae55e59da0 ("pnfs: Don't release the sequence slot...")
Cc: stable@vger.kernel.org # v4.18+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-19 13:17:49 -04:00
Catalin Marinas
3028efe03b NFS: Fix nfs4_lock_state refcounting in nfs4_alloc_{lock,unlock}data()
Commit 7b587e1a5a ("NFS: use locks_copy_lock() to copy locks.")
changed the lock copying from memcpy() to the dedicated
locks_copy_lock() function. The latter correctly increments the
nfs4_lock_state.ls_count via nfs4_fl_copy_lock(), however, this refcount
has already been incremented in the nfs4_alloc_{lock,unlock}data().
Kmemleak subsequently reports an unreferenced nfs4_lock_state object as
below (arm64 platform):

unreferenced object 0xffff8000fce0b000 (size 256):
  comm "systemd-sysuser", pid 1608, jiffies 4294892825 (age 32.348s)
  hex dump (first 32 bytes):
    20 57 4c fb 00 80 ff ff 20 57 4c fb 00 80 ff ff   WL..... WL.....
    00 57 4c fb 00 80 ff ff 01 00 00 00 00 00 00 00  .WL.............
  backtrace:
    [<000000000d15010d>] kmem_cache_alloc+0x178/0x208
    [<00000000d7c1d264>] nfs4_set_lock_state+0x124/0x1f0
    [<000000009c867628>] nfs4_proc_lock+0x90/0x478
    [<000000001686bd74>] do_setlk+0x64/0xe8
    [<00000000e01500d4>] nfs_lock+0xe8/0x1f0
    [<000000004f387d8d>] vfs_lock_file+0x18/0x40
    [<00000000656ab79b>] do_lock_file_wait+0x68/0xf8
    [<00000000f17c4a4b>] fcntl_setlk+0x224/0x280
    [<0000000052a242c6>] do_fcntl+0x418/0x730
    [<000000004f47291a>] __arm64_sys_fcntl+0x84/0xd0
    [<00000000d6856e01>] el0_svc_common+0x80/0xf0
    [<000000009c4bd1df>] el0_svc_handler+0x2c/0x80
    [<00000000b1a0d479>] el0_svc+0x8/0xc
    [<0000000056c62a0f>] 0xffffffffffffffff

This patch removes the original refcount_inc(&lsp->ls_count) that was
paired with the memcpy() lock copying.

Fixes: 7b587e1a5a ("NFS: use locks_copy_lock() to copy locks.")
Cc: <stable@vger.kernel.org> # 5.0.x-
Cc: NeilBrown <neilb@suse.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-18 13:14:24 -04:00
Trond Myklebust
400417b05f pNFS: Fix a typo in pnfs_update_layout
We're supposed to wait for the outstanding layout count to go to zero,
but that got lost somehow.

Fixes: d03360aaf5 ("pNFS: Ensure we return the error if someone...")
Reported-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-12 16:04:51 -04:00
Trond Myklebust
067c469671 NFSv4.1: Bump the default callback session slot count to 16
Users can still control this value explicitly using the
max_session_cb_slots module parameter, but let's bump the default
up to 16 for now.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-02 16:25:26 -05:00
Trond Myklebust
cefa587a40 NFS/flexfiles: Clean up mirror DS initialisation
Get rid of the redundant parameter and rename the function
ff_layout_mirror_valid() to ff_layout_init_mirror_ds() for clarity.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01 22:37:39 -05:00
Trond Myklebust
29a23909e4 NFS/flexfiles: Remove dead code in ff_layout_mirror_valid()
nfs4_ff_alloc_deviceid_node() guarantees that if mirror->mirror_ds is
a valid pointer, then so is mirror->mirror_ds->ds.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01 22:37:38 -05:00
Trond Myklebust
4cbc8a571c NFS/flexfile: Simplify nfs4_ff_layout_select_ds_stateid()
Pass in a pointer to the mirror rather than forcing another
array access.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01 22:37:38 -05:00
Trond Myklebust
626d48b12c NFS/flexfile: Simplify nfs4_ff_layout_ds_version()
Pass in a pointer to the mirror rather than forcing another
array access.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01 22:37:38 -05:00
Trond Myklebust
312cd4cb12 NFS/flexfiles: Simplify ff_layout_get_ds_cred()
Pass in a pointer to the mirror rather than forcing another
array access.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01 22:37:38 -05:00
Trond Myklebust
561d6f8aaf NFS/flexfiles: Simplify nfs4_ff_find_or_create_ds_client()
Pass in a pointer to the mirror rather than forcing another
array access.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01 22:37:38 -05:00
Trond Myklebust
749da527b3 NFS/flexfiles: Simplify nfs4_ff_layout_select_ds_fh()
Pass in a pointer to the mirror rather than having to retrieve it from
the array and then verify the resulting pointer.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01 22:37:38 -05:00
Trond Myklebust
76c6690522 NFS/flexfiles: Speed up read failover when DSes are down
If we notice that a DS may be down, we should attempt to read from the
other mirrors first before we go back to retry the dead DS.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01 22:37:38 -05:00
Trond Myklebust
17aaec8167 NFS/flexfiles: Don't invalidate DS deviceids for being unresponsive
If the DS is unresponsive, we want to just mark it as such, while
reporting the errors. If the server later returns the same deviceid
in a new layout, then we don't want to have to look it up again.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01 22:37:38 -05:00
Trond Myklebust
d082d4b5a0 NFS/flexfiles: Remove bogus checks for invalid deviceids
We already check the deviceids before we start the RPC call.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01 22:37:38 -05:00
Trond Myklebust
0a156dd582 NFS/flexfiles: Avoid unnecessary layout invalidations
In ff_layout_mirror_valid() we may not want to invalidate the layout
segment despite the call to GETDEVICEINFO failing. The reason is that
a read may still be able to make progress on another mirror.

So instead we let the caller (in this case nfs4_ff_layout_prepare_ds())
decide whether or not it needs to invalidate.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01 22:37:37 -05:00
Trond Myklebust
2444ff277a NFS/flexfiles: refactor calls to fs4_ff_layout_prepare_ds()
While we may want to skip attempting to connect to a downed mirror
when we're deciding which mirror to select for a read, we do not
want to do so once we've committed to attempting the I/O in
ff_layout_read/write_pagelist(), or ff_layout_initiate_commit()

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01 22:37:37 -05:00
Trond Myklebust
18c0778a65 NFSv4: Handle early exit in layoutget by returning an error
If the LAYOUTGET rpc call exits early without an error, convert it to
EAGAIN.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01 22:37:37 -05:00
Trond Myklebust
f0922a6c0c NFS/flexfiles: Send LAYOUTERROR when failing over mirrored reads
When a read to the preferred mirror returns an error, the flexfiles
driver records the error in the inode list and currently marks the
layout for return before failing over the attempted read to the next
mirror.
What we actually want to do is fire off a LAYOUTERROR to notify the
MDS that there is an issue with the preferred mirror, then we fail
over. Only once we've failed to read from all mirrors should we
return the layout.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01 22:37:37 -05:00
Trond Myklebust
3eb86093ea NFSv4.2: Add client support for the generic 'layouterror' RPC call
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01 16:20:16 -05:00
Trond Myklebust
a79f194aa4 NFSv4/flexfiles: Abort I/O early if the layout segment was invalidated
If a layout segment gets invalidated while a pNFS I/O operation
is queued for transmission, then we ideally want to abort
immediately. This is particularly the case when there is a large
number of I/O related RPCs queued in the RPC layer, and the layout
segment gets invalidated due to an ENOSPC error, or an EACCES (because
the client was fenced). We may end up forced to spam the MDS with a
lot of otherwise unnecessary LAYOUTERRORs after that I/O fails.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01 16:20:16 -05:00
Trond Myklebust
39a5201a2b NFSv4/pnfs: Fix barriers in nfs4_mark_deviceid_unavailable()
Fix the memory barriers in nfs4_mark_deviceid_unavailable() and
nfs4_test_deviceid_unavailable().

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01 16:20:16 -05:00
Trond Myklebust
762bb7e973 NFS/flexfiles: Fix up sparse RCU annotations
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01 16:20:16 -05:00
Trond Myklebust
108bb4afd3 NFSv4/flexfiles: Fix invalid deref in FF_LAYOUT_DEVID_NODE()
If the attempt to instantiate the mirror's layout DS pointer failed,
then that pointer may hold a value of type ERR_PTR(), so we need
to check that before we dereference it.

Fixes: 65990d1afb ("pNFS/flexfiles: Fix a deadlock on LAYOUTGET")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01 16:20:16 -05:00
Anna Schumaker
1a3466aed3 NFS: Add missing encode / decode sequence_maxsz to v4.2 operations
These really should have been there from the beginning, but we never
noticed because there was enough slack in the RPC request for the extra
bytes. Chuck's recent patch to use au_cslack and au_rslack to compute
buffer size shrunk the buffer enough that this was now a problem for
SEEK operations on my test client.

Fixes: f4ac1674f5 ("nfs: Add ALLOCATE support")
Fixes: 2e72448b07 ("NFS: Add COPY nfs operation")
Fixes: cb95deea0b ("NFS OFFLOAD_CANCEL xdr")
Fixes: 624bd5b7b6 ("nfs: Add DEALLOCATE support")
Fixes: 1c6dcbe5ce ("NFS: Implement SEEK")
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01 16:19:46 -05:00
Trond Myklebust
c71c46f015 NFSv4.1: Don't process the sequence op more than once.
Ensure that if we call nfs41_sequence_process() a second time for the
same rpc_task, then we only process the results once.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01 12:16:28 -05:00
Trond Myklebust
c1dffe0bf7 NFSv4.1: Reinitialise sequence results before retransmitting a request
If we have to retransmit a request, we should ensure that we reinitialise
the sequence results structure, since in the event of a signal
we need to treat the request as if it had not been sent.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org
2019-03-01 12:13:34 -05:00
Trond Myklebust
06b5fc3ad9 NFSoRDMA client updates for 5.1
New features:
 - Convert rpc auth layer to use xdr_streams
 - Config option to disable insecure enctypes
 - Reduce size of RPC receive buffers
 
 Bugfixes and cleanups:
 - Fix sparse warnings
 - Check inline size before providing a write chunk
 - Reduce the receive doorbell rate
 - Various tracepoint improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAlxwX1IACgkQ18tUv7Cl
 QOv0tBAA3VXVuKdAtUH4b70q4ufBLkwz40puenDzlQEZXa4XjsGif+Iq62qHmAWW
 oYQfdaof3P+p1G/k9wEmFd6g+vk75a+2QYmmnlzVcSoOHc1teg8we39AbQt6Nz4X
 CZnb1VAuVYctprMXatZugyKsHi+EGWX4raUDtVlx8Zbte6BOSlzn/Cbnvvozeyi4
 bMDQ5mi6vof/20o1qf9FhIjrx3UTYvqF6XOPDdMsQZs8pxDF8Z21LiRgKpPTRNrb
 ci1oIaqraai5SV2riDtMpVnGxR+GDQXaYnyozPnF7kFOwG5nIFyQ56m5aTd2ntd2
 q09lRBHnmiy2sWaocoziXqUonnNi1sZI+fbdCzSTRD45tM0B34DkrvOKsDJuzuba
 m5xZqpoI8hL874EO0AFSEkPmv55BF+K7IMotPmzGo7i4ic+IlyLACDUXh5OkPx6D
 2VSPvXOoAY1U4iJGg6LS9aLWNX99ShVJAuhD5InUW12FLC4GuRwVTIWY3v1s5TIJ
 boUe2EFVoKIxwVkNvf5tKAR1LTNsqtFBPTs1ENtXIdFo1+9ucZX7REhp3bxTlODM
 HheDAqUjlVV5CboB+c1Pggekyv3ON8ihyV3P+dlZ6MFwHnN9s8YOPcReQ91quBZY
 0RNIMaNo2lBgLrkvCMlbDC05AZG6P8LuKhPTcAQ4+7/vfL4PpWI=
 =EiRa
 -----END PGP SIGNATURE-----

Merge tag 'nfs-rdma-for-5.1-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

NFSoRDMA client updates for 5.1

New features:
- Convert rpc auth layer to use xdr_streams
- Config option to disable insecure enctypes
- Reduce size of RPC receive buffers

Bugfixes and cleanups:
- Fix sparse warnings
- Check inline size before providing a write chunk
- Reduce the receive doorbell rate
- Various tracepoint improvements

[Trond: Fix up merge conflicts]
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-25 09:35:49 -05:00
Trond Myklebust
5085607d20 NFS/pnfs: Bulk destroy of layouts needs to be safe w.r.t. umount
If a bulk layout recall or a metadata server reboot coincides with a
umount, then holding a reference to an inode is unsafe unless we
also hold a reference to the super block.

Fixes: fd9a8d7160 ("NFSv4.1: Fix bulk recall and destroy of layouts")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-23 13:59:29 -05:00
Trond Myklebust
6f9449be53 NFS: Fix a soft lockup in the delegation recovery code
Fix a soft lockup when NFS client delegation recovery is attempted
but the inode is in the process of being freed. When the
igrab(inode) call fails, and we have to restart the recovery process,
we need to ensure that we won't attempt to recover the same delegation
again.

Fixes: 45870d6909 ("NFSv4.1: Test delegation stateids when server...")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-21 14:51:25 -05:00
Trond Myklebust
3453d5708b NFSv4.1: Avoid false retries when RPC calls are interrupted
A 'false retry' in NFSv4.1 occurs when the client attempts to transmit a
new RPC call using a slot+sequence number combination that references an
already cached one. Currently, the Linux NFS client will do this if a
user process interrupts an RPC call that is in progress.
The problem with doing so is that we defeat the main mechanism used by
the server to differentiate between a new call and a replayed one. Even
if the server is able to perfectly cache the arguments of the old call,
it cannot know if the client intended to replay or send a new call.

The obvious fix is to bump the sequence number pre-emptively if an
RPC call is interrupted, but in order to deal with the corner cases
where the interrupted call is not actually received and processed by
the server, we need to interpret the error NFS4ERR_SEQ_MISORDERED
as a sign that we need to either wait or locate a correct sequence
number that lies between the value we sent, and the last value that
was acked by a SEQUENCE call on that slot.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Tested-by: Jason Tibbitts <tibbs@math.uh.edu>
2019-02-21 13:22:43 -05:00
ZhangXiaoxu
ded52fbe70 nfs: fix xfstest generic/099 failed on nfsv3
After setxattr, the nfsv3 cached the acl which set by user.

But at the backend, the shared file system (eg. ext4) will check
the acl, if it can merged with mode, it won't add acl to the file.
So, the nfsv3 cached acl is redundant.

Don't 'set_cached_acl' when setxattr.

Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20 17:33:55 -05:00
Kazuo Ito
2cde04e90d pNFS: Avoid read/modify/write when it is not necessary
As the block and SCSI layouts can only read/write fixed-length
blocks, we must perform read-modify-write when data to be written is
not aligned to a block boundary or smaller than the block size.
(612aa983a0 pnfs: add flag to force read-modify-write in ->write_begin)

The current code tries to see if we have to do read-modify-write
on block-oriented pNFS layouts by just checking !PageUptodate(page),
but the same condition also applies for overwriting of any uncached
potions of existing files, making such operations excessively slow
even it is block-aligned.

The change does not affect the optimization for modify-write-read
cases (38c73044f5 NFS: read-modify-write page updating),
because partial update of !PageUptodate() pages can only happen
in layouts that can do arbitrary length read/write and never
in block-based ones.

Testing results:

We ran fio on one of the pNFS clients running 4.20 kernel
(vanilla and patched) in this configuration to read/write/overwrite
files on the storage array, exported as pnfs share by the server.

 pNFS clients ---1G Ethernet--- pNFS server
 (HP DL360 G8)                  (HP DL360 G8)
       |                              |
       |                              |
       +------8G Fiber Channel--------+
                     |
               Storage Array
                 (HP P6350)

Throughput of overwrite (both buffered and O_SYNC) is noticeably
improved.

Ops.     |block size|   Throughput   |
         |  (KiB)   |    (MiB/s)     |
         |          |  4.20 | patched|
---------+----------+----------------+
buffered |         4|  21.3 |  232   |
overwrite|        32|  22.2 |  256   |
         |       512|  22.4 |  260   |
---------+----------+----------------+
O_SYNC   |         4|   3.84|    4.77|
overwrite|        32|  12.2 |   32.0 |
         |       512|  18.5 |  152   |
---------+----------+----------------+

Read and write (buffered and O_SYNC) by the same client remain unchanged
by the patch either negatively or positively, as they should do.

Ops.     |block size|   Throughput   |
         |  (KiB)   |    (MiB/s)     |
         |          |  4.20 | patched|
---------+----------+----------------+
read     |         4| 548   |  550   |
         |        32| 547   |  551   |
         |       512| 548   |  551   |
---------+----------+----------------+
buffered |         4| 237   |  244   |
write    |        32| 261   |  268   |
         |       512| 265   |  272   |
---------+----------+----------------+
O_SYNC   |         4|   0.46|    0.46|
write    |        32|   3.60|    3.57|
         |       512| 105   |  106   |
---------+----------+----------------+

Signed-off-by: Kazuo Ito <ito_kazuo_g3@lab.ntt.co.jp>
Tested-by: Hiroyuki Watanabe <watanabe.hiroyuki@lab.ntt.co.jp>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20 17:33:55 -05:00
Kazuo Ito
97ae91bbf3 pNFS: Fix potential corruption of page being written
nfs_want_read_modify_write() didn't check for !PagePrivate when pNFS
block or SCSI layout was in use, therefore we could lose data forever
if the page being written was filled by a read before completion.

Signed-off-by: Kazuo Ito <ito_kazuo_g3@lab.ntt.co.jp>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20 17:33:55 -05:00
zhangliguang
bf211ca1a8 NFS: Fix typo in comments of nfs_readdir_alloc_pages()
This fixes the typo in comments of nfs_readdir_alloc_pages().
Because nfs_readdir_large_page and nfs_readdir_free_pagearray had been
renamed.

Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20 17:33:55 -05:00
zhangliguang
42f72cf368 NFS: Remove redundant semicolon
This removes redundant semicolon for ending code.

Fixes: c7944ebb9c ("NFSv4: Fix lookup revalidate of regular files")
Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20 17:33:55 -05:00
luanshi
be4c2d4723 NFS: readdirplus optimization by cache mechanism
When listing very large directories via NFS, clients may take a long
time to complete. There are about three factors involved:

First of all, ls and practically every other method of listing a
directory including python os.listdir and find rely on libc readdir().
However readdir() only reads 32K of directory entries at a time, which
means that if you have a lot of files in the same directory, it is going
to take an insanely long time to read all the directory entries.

Secondly, libc readdir() reads 32K of directory entries at a time, in
kernel space 32K buffer split into 8 pages. One NFS readdirplus rpc will
be called for one page, which introduces many readdirplus rpc calls.

Lastly, one NFS readdirplus rpc asks for 32K data (filled by nfs_dentry)
to fill one page (filled by dentry), we found that nearly one third of
data was wasted.

To solve above problems, pagecache mechanism was introduced. One NFS
readdirplus rpc will ask for a large data (more than 32k), the data can
fill more than one page, the cached pages can be used for next readdir
call. This can reduce many readdirplus rpc calls and improve readdirplus
performance.

TESTING:
When listing very large directories(include 300 thousand files) via NFS

time ls -l /nfs_mount | wc -l

without the patch:
300001
real    1m53.524s
user    0m2.314s
sys     0m2.599s

with the patch:
300001
real    0m23.487s
user    0m2.305s
sys     0m2.558s

Improved performance: 79.6%
readdirplus rpc calls decrease: 85%

Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20 17:33:55 -05:00
Eric W. Biederman
40cc394be1 fs/nfs: Fix nfs_parse_devname to not modify it's argument
In the rare and unsupported case of a hostname list nfs_parse_devname
will modify dev_name.  There is no need to modify dev_name as the all
that is being computed is the length of the hostname, so the computed
length can just be shorted.

Fixes: dc04589827 ("NFS: Use common device name parsing logic for NFSv4 and NFSv2/v3")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20 17:33:55 -05:00
Julia Lawall
45bb8d8027 NFS: drop useless LIST_HEAD
Drop LIST_HEAD where the variable it declares has never
been used.

The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
identifier x;
@@
- LIST_HEAD(x);
  ... when != x
// </smpl>

Fixes: 0e20162ed1 ("NFSv4.1 Use MDS auth flavor for data server connection")
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20 17:33:55 -05:00
Trond Myklebust
e9acf2105f NFS: Fix sparse annotations for nfs_set_open_stateid_locked()
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20 15:14:21 -05:00
Trond Myklebust
302fad7bd5 NFS: Fix up documentation warnings
Fix up some compiler warnings about function parameters, etc not being
correctly described or formatted.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20 15:14:21 -05:00
Trond Myklebust
2dc23afffb NFS: ENOMEM should also be a fatal error.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20 15:14:21 -05:00
Trond Myklebust
7dc58ca5d8 NFS: EINTR is also a fatal error.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20 15:14:21 -05:00
Trond Myklebust
875bc3fbf2 NFS: Ensure NFS writeback allocations don't recurse back into NFS.
All the allocations that we can hit in the NFS layer and sunrpc layers
themselves are already marked as GFP_NOFS, but we need to ensure that
any calls to generic kernel functionality do the right thing as well.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20 15:14:20 -05:00
Trond Myklebust
df3accb849 NFS: Pass error information to the pgio error cleanup routine
Allow the caller to pass error information when cleaning up a failed
I/O request so that we can conditionally take action to cancel the
request altogether if the error turned out to be fatal.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20 15:14:20 -05:00
Trond Myklebust
078b5fd92c NFS: Clean up list moves of struct nfs_page
In several places we're just moving the struct nfs_page from one list to
another by first removing from the existing list, then adding to the new
one.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20 15:14:20 -05:00
Trond Myklebust
8127d82705 NFS: Don't recoalesce on error in nfs_pageio_complete_mirror()
If the I/O completion failed with a fatal error, then we should just
exit nfs_pageio_complete_mirror() rather than try to recoalesce.

Fixes: a7d42ddb30 ("nfs: add mirroring support to pgio layer")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org # v4.0+
2019-02-20 15:14:20 -05:00
Trond Myklebust
4d91969ed4 NFS: Fix an I/O request leakage in nfs_do_recoalesce
Whether we need to exit early, or just reprocess the list, we
must not lost track of the request which failed to get recoalesced.

Fixes: 03d5eb65b5 ("NFS: Fix a memory leak in nfs_do_recoalesce")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org # v4.0+
2019-02-20 15:14:20 -05:00
Trond Myklebust
f57dcf4c72 NFS: Fix I/O request leakages
When we fail to add the request to the I/O queue, we currently leave it
to the caller to free the failed request. However since some of the
requests that fail are actually created by nfs_pageio_add_request()
itself, and are not passed back the caller, this leads to a leakage
issue, which can again cause page locks to leak.

This commit addresses the leakage by freeing the created requests on
error, using desc->pg_completion_ops->error_cleanup()

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Fixes: a7d42ddb30 ("nfs: add mirroring support to pgio layer")
Cc: stable@vger.kernel.org # v4.0: c18b96a1b8: nfs: clean up rest of reqs
Cc: stable@vger.kernel.org # v4.0: d600ad1f2b: NFS41: pop some layoutget
Cc: stable@vger.kernel.org # v4.0+
2019-02-20 15:14:20 -05:00
Linus Torvalds
1f5a018c5b Merge branch 'fixes-v5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull keys fixes from James Morris:

 - Handle quotas better, allowing full quota to be reached.

 - Fix the creation of shortcuts in the assoc_array internal
   representation when the index key needs to be an exact multiple of
   the machine word size.

 - Fix a dependency loop between the request_key contruction record and
   the request_key authentication key. The construction record isn't
   really necessary and can be dispensed with.

 - Set the timestamp on a new key rather than leaving it as 0. This
   would ordinarily be fine - provided the system clock is never set to
   a time before 1970

* 'fixes-v5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  keys: Timestamp new keys
  keys: Fix dependency loop between construction record and auth key
  assoc_array: Fix shortcut creation
  KEYS: allow reaching the keys quotas exactly
2019-02-20 09:09:33 -08:00
David Howells
822ad64d7e keys: Fix dependency loop between construction record and auth key
In the request_key() upcall mechanism there's a dependency loop by which if
a key type driver overrides the ->request_key hook and the userspace side
manages to lose the authorisation key, the auth key and the internal
construction record (struct key_construction) can keep each other pinned.

Fix this by the following changes:

 (1) Killing off the construction record and using the auth key instead.

 (2) Including the operation name in the auth key payload and making the
     payload available outside of security/keys/.

 (3) The ->request_key hook is given the authkey instead of the cons
     record and operation name.

Changes (2) and (3) allow the auth key to naturally be cleaned up if the
keyring it is in is destroyed or cleared or the auth key is unlinked.

Fixes: 7ee02a316600 ("keys: Fix dependency loop between construction record and auth key")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
2019-02-15 14:12:09 -08:00
Chuck Lever
02ef04e432 NFS: Account for XDR pad of buf->pages
Certain NFS results (eg. READLINK) might expect a data payload that
is not an exact multiple of 4 bytes. In this case, XDR encoding
is required to pad that payload so its length on the wire is a
multiple of 4 bytes. The constants that define the maximum size of
each NFS result do not appear to account for this extra word.

In each case where the data payload is to be received into pages:

- 1 word is added to the size of the receive buffer allocated by
  call_allocate

- rpc_inline_rcv_pages subtracts 1 word from @hdrsize so that the
  extra buffer space falls into the rcv_buf's tail iovec

- If buf->pagelen is word-aligned, an XDR pad is not needed and
  is thus removed from the tail

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-02-14 10:13:49 -05:00
Chuck Lever
cf500bac8f SUNRPC: Introduce rpc_prepare_reply_pages()
prepare_reply_buffer() and its NFSv4 equivalents expose the details
of the RPC header and the auth slack values to upper layer
consumers, creating a layering violation, and duplicating code.

Remedy these issues by adding a new RPC client API that hides those
details from upper layers in a common helper function.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-02-14 10:04:37 -05:00
Chuck Lever
f23f658404 NFS: Add trace events to report non-zero NFS status codes
These can help field troubleshooting without needing the overhead
of a full network capture (ie, tcpdump).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-02-13 12:03:21 -05:00
Chuck Lever
eb72f484a5 NFS: Remove print_overflow_msg()
This issue is now captured by a trace point in the RPC client.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-02-13 11:53:45 -05:00
Chuck Lever
0ccc61b1c7 SUNRPC: Add xdr_stream::rqst field
Having access to the controlling rpc_rqst means a trace point in the
XDR code can report:

 - the XID
 - the task ID and client ID
 - the p_name of RPC being processed

Subsequent patches will introduce such trace points.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-02-13 11:05:50 -05:00
Benjamin Coddington
d2ceb7e570 NFS: Don't use page_file_mapping after removing the page
If nfs_page_async_flush() removes the page from the mapping, then we can't
use page_file_mapping() on it as nfs_updatepate() is wont to do when
receiving an error.  Instead, push the mapping to the stack before the page
is possibly truncated.

Fixes: 8fc75bed96 ("NFS: Fix up return value on fatal errors in nfs_page_async_flush()")
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-02-12 15:56:28 -05:00
Trond Myklebust
8fc75bed96 NFS: Fix up return value on fatal errors in nfs_page_async_flush()
Ensure that we return the fatal error value that caused us to exit
nfs_page_async_flush().

Fixes: c373fff7bd ("NFSv4: Don't special case "launder"")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org # v4.12+
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-29 16:33:24 -05:00
Yao Liu
80ff001724 nfs: Fix NULL pointer dereference of dev_name
There is a NULL pointer dereference of dev_name in nfs_parse_devname()

The oops looks something like:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
  ...
  RIP: 0010:nfs_fs_mount+0x3b6/0xc20 [nfs]
  ...
  Call Trace:
   ? ida_alloc_range+0x34b/0x3d0
   ? nfs_clone_super+0x80/0x80 [nfs]
   ? nfs_free_parsed_mount_data+0x60/0x60 [nfs]
   mount_fs+0x52/0x170
   ? __init_waitqueue_head+0x3b/0x50
   vfs_kern_mount+0x6b/0x170
   do_mount+0x216/0xdc0
   ksys_mount+0x83/0xd0
   __x64_sys_mount+0x25/0x30
   do_syscall_64+0x65/0x220
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fix this by adding a NULL check on dev_name

Signed-off-by: Yao Liu <yotta.liu@ucloud.cn>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-28 12:08:30 -05:00
Olga Kornievskaia
45ac486ecf NFSv4.2 fix unnecessary retry in nfs4_copy_file_range
Currently nfs42_proc_copy_file_range() can not return EAGAIN.

Fixes: e4648aa4f9 ("NFS recover from destination server reboot for copies")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-15 11:24:49 -05:00
Linus Torvalds
505b050fdf Merge branch 'mount.part1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs mount API prep from Al Viro:
 "Mount API prereqs.

  Mostly that's LSM mount options cleanups. There are several minor
  fixes in there, but nothing earth-shattering (leaks on failure exits,
  mostly)"

* 'mount.part1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (27 commits)
  mount_fs: suppress MAC on MS_SUBMOUNT as well as MS_KERNMOUNT
  smack: rewrite smack_sb_eat_lsm_opts()
  smack: get rid of match_token()
  smack: take the guts of smack_parse_opts_str() into a new helper
  LSM: new method: ->sb_add_mnt_opt()
  selinux: rewrite selinux_sb_eat_lsm_opts()
  selinux: regularize Opt_... names a bit
  selinux: switch away from match_token()
  selinux: new helper - selinux_add_opt()
  LSM: bury struct security_mnt_opts
  smack: switch to private smack_mnt_opts
  selinux: switch to private struct selinux_mnt_opts
  LSM: hide struct security_mnt_opts from any generic code
  selinux: kill selinux_sb_get_mnt_opts()
  LSM: turn sb_eat_lsm_opts() into a method
  nfs_remount(): don't leak, don't ignore LSM options quietly
  btrfs: sanitize security_mnt_opts use
  selinux; don't open-code a loop in sb_finish_set_opts()
  LSM: split ->sb_set_mnt_opts() out of ->sb_kern_mount()
  new helper: security_sb_eat_lsm_opts()
  ...
2019-01-05 13:25:58 -08:00
Linus Torvalds
e6b9257280 NFS client updates for Linux 4.21
Note that there is a conflict with the rdma tree in this pull request, since
 we delete a file that has been changed in the rdma tree.  Hopefully that's
 easy enough to resolve!
 
 We also were unable to track down a maintainer for Neil Brown's changes to
 the generic cred code that are prerequisites to his RPC cred cleanup patches.
 We've been asking around for several months without any response, so
 hopefully it's okay to include those patches in this pull request.
 
 Stable bugfixes:
 - xprtrdma: Yet another double DMA-unmap # v4.20
 
 Features:
 - Allow some /proc/sys/sunrpc entries without CONFIG_SUNRPC_DEBUG
 - Per-xprt rdma receive workqueues
 - Drop support for FMR memory registration
 - Make port= mount option optional for RDMA mounts
 
 Other bugfixes and cleanups:
 - Remove unused nfs4_xdev_fs_type declaration
 - Fix comments for behavior that has changed
 - Remove generic RPC credentials by switching to 'struct cred'
 - Fix crossing mountpoints with different auth flavors
 - Various xprtrdma fixes from testing and auditing the close code
 - Fixes for disconnect issues when using xprtrdma with krb5
 - Clean up and improve xprtrdma trace points
 - Fix NFS v4.2 async copy reboot recovery
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAlwtO50ACgkQ18tUv7Cl
 QOtZWQ//e5Hhp2TnQZ6U+99YKedjwBHP6psH3GKSEdeHSNdlSpZ5ckgHxvMb9TBa
 6t4ecgv5P/uYLIePQ0u2ubUFc9+TlyGi7Iacx13/YhK7kihGHDPnZhfl0QbYixV7
 rwa9bFcKmOrXs8ld+Hw3P2UL22G1gMf/LHDhPNshbW7LFZmcshKz+mKTk70kwkq9
 v7tFC59p6GwV8Sr2YI2NXn2fOWsUS00sQfgj2jceJYJ8PsNa+wHYF4wPj2IY5NsE
 D5Oq2kLPbytBhCllOHgopNZaf4qb5BfqhVETyc1O+kDF3BZKUhQ1PoDi2FPinaHM
 5/d8hS+5fr3eMBsQrPWQLXYjWQFUXnkQQJvU3Bo52AIgomsk/8uBq3FvH7XmFcBd
 C8sgnuUAkAS8feMes8GCS50BTxclnGuYGdyFJyCRXoG9Kn9rMrw9EKitky6EVq0v
 NmXhW79jK84a3yDXVlAIpZ8Y9BU/HQ3GviGX8lQEdZU9YiYRzDIHvpMFwzMgqaBi
 XvLbr8PlLOm8GZokThS8QYT/G2Wu6IwfUq/AufVjVD4+HiL3duKKfWSGAvcm6aAa
 GoRF6UG+OmjWlzKojtRc1dI+sy22Fzh+DW+Mx6tuf/b/66wkmYnW7eKcV4rt6Tm5
 /JEhvTMo9q7elL/4FgCoMCcdoc5eXqQyXRXrQiOU7YHLzn2aWU0=
 =DvVW
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.21-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client updates from Anna Schumaker:
 "Stable bugfixes:
   - xprtrdma: Yet another double DMA-unmap # v4.20

  Features:
   - Allow some /proc/sys/sunrpc entries without CONFIG_SUNRPC_DEBUG
   - Per-xprt rdma receive workqueues
   - Drop support for FMR memory registration
   - Make port= mount option optional for RDMA mounts

  Other bugfixes and cleanups:
   - Remove unused nfs4_xdev_fs_type declaration
   - Fix comments for behavior that has changed
   - Remove generic RPC credentials by switching to 'struct cred'
   - Fix crossing mountpoints with different auth flavors
   - Various xprtrdma fixes from testing and auditing the close code
   - Fixes for disconnect issues when using xprtrdma with krb5
   - Clean up and improve xprtrdma trace points
   - Fix NFS v4.2 async copy reboot recovery"

* tag 'nfs-for-4.21-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (63 commits)
  sunrpc: convert to DEFINE_SHOW_ATTRIBUTE
  sunrpc: Add xprt after nfs4_test_session_trunk()
  sunrpc: convert unnecessary GFP_ATOMIC to GFP_NOFS
  sunrpc: handle ENOMEM in rpcb_getport_async
  NFS: remove unnecessary test for IS_ERR(cred)
  xprtrdma: Prevent leak of rpcrdma_rep objects
  NFSv4.2 fix async copy reboot recovery
  xprtrdma: Don't leak freed MRs
  xprtrdma: Add documenting comment for rpcrdma_buffer_destroy
  xprtrdma: Replace outdated comment for rpcrdma_ep_post
  xprtrdma: Update comments in frwr_op_send
  SUNRPC: Fix some kernel doc complaints
  SUNRPC: Simplify defining common RPC trace events
  NFS: Fix NFSv4 symbolic trace point output
  xprtrdma: Trace mapping, alloc, and dereg failures
  xprtrdma: Add trace points for calls to transport switch methods
  xprtrdma: Relocate the xprtrdma_mr_map trace points
  xprtrdma: Clean up of xprtrdma chunk trace points
  xprtrdma: Remove unused fields from rpcrdma_ia
  xprtrdma: Cull dprintk() call sites
  ...
2019-01-02 16:35:23 -08:00
Linus Torvalds
e45428a436 Thanks to Vasily Averin for fixing a use-after-free in the containerized
NFSv4.2 client, and cleaning up some convoluted backchannel server code
 in the process.  Otherwise, miscellaneous smaller bugfixes and cleanup.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJcLR3nAAoJECebzXlCjuG+oyAQALrPSTH9Qg2AwP2eGm+AevUj
 u/VFmimImIO9dYuT02t4w42w4qMIQ0/Y7R0UjT3DxG5Oixy/zA+ZaNXCCEKwSMIX
 abGF4YalUISbDc6n0Z8J14/T33wDGslhy3IQ9Jz5aBCDCocbWlzXvFlmrowbb3ak
 vtB0Fc3Xo6Z/Pu2GzNzlqR+f69IAmwQGJrRrAEp3JUWSIBKiSWBXTujDuVBqJNYj
 ySLzbzyAc7qJfI76K635XziULR2ueM3y5JbPX7kTZ0l3OJ6Yc0PtOj16sIv5o0XK
 DBYPrtvw3ZbxQE/bXqtJV9Zn6MG5ODGxKszG1zT1J3dzotc9l/LgmcAY8xVSaO+H
 QNMdU9QuwmyUG20A9rMoo/XfUb5KZBHzH7HIYOmkfBidcaygwIInIKoIDtzimm4X
 OlYq3TL/3QDY6rgTCZv6n2KEnwiIDpc5+TvFhXRWclMOJMcMSHJfKFvqERSv9V3o
 90qrCebPA0K8Dnc0HMxcBXZ+0TqZ2QeXp/wfIjibCXqMwlg+BZhmbeA0ngZ7x7qf
 2F33E9bfVJjL+VI5FcVYQf43bOTWZgD6ZKGk4T7keYl0CPH+9P70bfhl4KKy9dqc
 GwYooy/y5FPb2CvJn/EETeILRJ9OyIHUrw7HBkpz9N8n9z+V6Qbp9yW7LKgaMphW
 1T+GpHZhQjwuBPuJhDK0
 =dRLp
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-4.21' of git://linux-nfs.org/~bfields/linux

Pull nfsd updates from Bruce Fields:
 "Thanks to Vasily Averin for fixing a use-after-free in the
  containerized NFSv4.2 client, and cleaning up some convoluted
  backchannel server code in the process.

  Otherwise, miscellaneous smaller bugfixes and cleanup"

* tag 'nfsd-4.21' of git://linux-nfs.org/~bfields/linux: (25 commits)
  nfs: fixed broken compilation in nfs_callback_up_net()
  nfs: minor typo in nfs4_callback_up_net()
  sunrpc: fix debug message in svc_create_xprt()
  sunrpc: make visible processing error in bc_svc_process()
  sunrpc: remove unused xpo_prep_reply_hdr callback
  sunrpc: remove svc_rdma_bc_class
  sunrpc: remove svc_tcp_bc_class
  sunrpc: remove unused bc_up operation from rpc_xprt_ops
  sunrpc: replace svc_serv->sv_bc_xprt by boolean flag
  sunrpc: use-after-free in svc_process_common()
  sunrpc: use SVC_NET() in svcauth_gss_* functions
  nfsd: drop useless LIST_HEAD
  lockd: Show pid of lockd for remote locks
  NFSD remove OP_CACHEME from 4.2 op_flags
  nfsd: Return EPERM, not EACCES, in some SETATTR cases
  sunrpc: fix cache_head leak due to queued request
  nfsd: clean up indentation, increase indentation in switch statement
  svcrdma: Optimize the logic that selects the R_key to invalidate
  nfsd: fix a warning in __cld_pipe_upcall()
  nfsd4: fix crash on writing v4_end_grace before nfsd startup
  ...
2019-01-02 16:21:50 -08:00
Santosh kumar pradhan
10e037d1e0 sunrpc: Add xprt after nfs4_test_session_trunk()
Multipathing: In case of NFSv3, rpc_clnt_test_and_add_xprt() adds
the xprt to xprt switch (i.e. xps) if rpc_call_null_helper() returns
success. But in case of NFSv4.1, it needs to do EXCHANGEID to verify
the path along with check for session trunking.

Add the xprt in nfs4_test_session_trunk() only when
nfs4_detect_session_trunking() returns success. Also release refcount
hold by rpc_clnt_setup_test_and_add_xprt().

Signed-off-by: Santosh kumar pradhan <santoshkumar.pradhan@wdc.com>
Tested-by: Suresh Jayaraman <suresh.jayaraman@wdc.com>
Reported-by: Aditya Agnihotri <aditya.agnihotri@wdc.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02 12:05:19 -05:00
NeilBrown
c2c7d84fd1 NFS: remove unnecessary test for IS_ERR(cred)
As gte_current_cred() cannot return an error,
this test is not necessary.
It hasn't been necessary for years, but it wasn't so obvious
before.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02 12:05:19 -05:00
Olga Kornievskaia
9aeaf8cfcb NFSv4.2 fix async copy reboot recovery
Original commit (e4648aa4f9 "NFS recover from destination server
reboot for copies") used memcmp() and then it was changed to use
nfs4_stateid_match_other() but that function returns opposite of
memcmp. As the result, recovery can't find the copy leading
to copy hanging.

Fixes: 80f4236886 ("NFSv4: Split out NFS v4.2 copy completion functions")
Fixes: cb7a8384dc ("NFS: Split out the body of nfs4_reclaim_open_state")
Signed-of-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02 12:05:19 -05:00
Chuck Lever
5b2095d0ce NFS: Fix NFSv4 symbolic trace point output
These symbolic values were not being displayed in string form.
TRACE_DEFINE_ENUM was missing in many cases. It also turns out that
__print_symbolic wants an unsigned long in the first field...

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02 12:05:18 -05:00
Chuck Lever
0dfbb5f05e NFS: Make "port=" mount option optional for RDMA mounts
Having to specify "proto=rdma,port=20049" is cumbersome.

RFC 8267 Section 6.3 requires NFSv4 clients to use "the alternative
well-known port number", which is 20049. Make the use of the well-
known port number automatic, just as it is for NFS/TCP and port
2049.

For NFSv2/3, Section 4.2 allows clients to simply choose 20049 as
the default or use rpcbind. I don't know of an NFS/RDMA server
implementation that registers it's NFS/RDMA service with rpcbind,
so automatically choosing 20049 seems like the better choice. The
other widely-deployed NFS/RDMA client, Solaris, also uses 20049
as the default port.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02 12:05:17 -05:00
Vasily Averin
0ad30ff67b nfs: fixed broken compilation in nfs_callback_up_net()
Patch fixes compilation error in nfs_callback_up_net()
serv->sv_bc_enabled is defined under enabled CONFIG_SUNRPC_BACKCHANNEL,
however nfs_callback_up_net() can access it even if this config option
was not set.

Fixes: a289ce5311 (sunrpc: replace svc_serv->sv_bc_xprt by boolean flag)
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-12-31 11:25:16 -05:00
Arun KS
ca79b0c211 mm: convert totalram_pages and totalhigh_pages variables to atomic
totalram_pages and totalhigh_pages are made static inline function.

Main motivation was that managed_page_count_lock handling was complicating
things.  It was discussed in length here,
https://lore.kernel.org/patchwork/patch/995739/#1181785 So it seemes
better to remove the lock and convert variables to atomic, with preventing
poteintial store-to-read tearing as a bonus.

[akpm@linux-foundation.org: coding style fixes]
Link: http://lkml.kernel.org/r/1542090790-21750-4-git-send-email-arunks@codeaurora.org
Signed-off-by: Arun KS <arunks@codeaurora.org>
Suggested-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 12:11:47 -08:00
Vasily Averin
91bd2ffa90 nfs: minor typo in nfs4_callback_up_net()
Closing ")" was lost in debug message.

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-12-27 21:01:41 -05:00
Vasily Averin
a289ce5311 sunrpc: replace svc_serv->sv_bc_xprt by boolean flag
svc_serv-> sv_bc_xprt is netns-unsafe and cannot be used as pointer.
To prevent its misuse in future it is replaced by new boolean flag.

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-12-27 21:01:41 -05:00
Linus Torvalds
00c569b567 File locking changes for v4.21
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJcILryAAoJEAAOaEEZVoIVByMP/RYty0dsV9ALt0PKqxKyDVT7
 9KOGA73JIahpeO2dnlIiZAXdeC3Y350UH2axrX6Xy/X6ktTCca3X/xqZMn/wK5kt
 zeMeqcjsyTz4wJoOCWUm7nsteMALfusDAIMd8axEnBFgWk9nsQwgh+jS1gYVj2D1
 GUSdCceKcQ0ZOSsZDzDFgd/R34dNMobvYd1aOE2bgHL19BAXj1aKIv81yAjjY41A
 D+VVvEyzIHQUtxmWk1X3X8kYfhHMu4X5AQhhqPxw8jw8CC0w0lfQTOZe/ApeMfxZ
 BFhusMdwf8QhkGWMOfhOTldTm3GobBmdsNX5HmukvcAEjDVPiOLCiKXaLWEBJ68r
 HbmB3YyxzT2re0PfSa72WIu6W8aKZHUny+BLgTHiuW3KIV1khJK4AflWKMLfe8yi
 0xUWm0SeiwMCdBLtkD8IykC19LBCAKM15JBxpUHadBvBsO9G+54DfzRImjnxSjAH
 tX0RnFWmA7hXt02fZjcywT+/+n84kt0lbjdJAKXK6tl0DbGxlEj8YoJVBNG/p94q
 ARHgBKMFTrsDPJiMY5WSobHapjmnSNDBk6uSf6TPe2E5wmnhLcwfTqwCdnpocW6c
 A7CX+q/Jfyk1oNEGajXXyIsjaDc2Ma0Fs/mfb3o5Lhkjq68e+LT0+eIjn4veuY5H
 KoZ/KVBGs4Uxy81qMPkl
 =sUSp
 -----END PGP SIGNATURE-----

Merge tag 'locks-v4.21-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux

Pull file locking updates from Jeff Layton:
 "The main change in this set is Neil Brown's work to reduce the
  thundering herd problem when a heavily-contended file lock is
  released.

  Previously we'd always wake up all waiters when this occurred. With
  this set, we'll now we only wake up waiters that were blocked on the
  range being released"

* tag 'locks-v4.21-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
  locks: Use inode_is_open_for_write
  fs/locks: remove unnecessary white space.
  fs/locks: merge posix_unblock_lock() and locks_delete_block()
  fs/locks: create a tree of dependent requests.
  fs/locks: change all *_conflict() functions to return bool.
  fs/locks: always delete_block after waiting.
  fs/locks: allow a lock request to block other requests.
  fs/locks: use properly initialized file_lock when unlocking.
  ocfs2: properly initial file_lock used for unlock.
  gfs2: properly initial file_lock used for unlock.
  NFS: use locks_copy_lock() to copy locks.
  fs/locks: split out __locks_wake_up_blocks().
  fs/locks: rename some lists and pointers.
2018-12-27 17:12:30 -08:00
Chris Perl
594d1644cd NFS: nfs_compare_mount_options always compare auth flavors.
This patch removes the check from nfs_compare_mount_options to see if a
`sec' option was passed for the current mount before comparing auth
flavors and instead just always compares auth flavors.

Consider the following scenario:

You have a server with the address 192.168.1.1 and two exports /export/a
and /export/b.  The first export supports `sys' and `krb5' security, the
second just `sys'.

Assume you start with no mounts from the server.

The following results in EIOs being returned as the kernel nfs client
incorrectly thinks it can share the underlying `struct nfs_server's:

$ mkdir /tmp/{a,b}
$ sudo mount -t nfs -o vers=3,sec=krb5 192.168.1.1:/export/a /tmp/a
$ sudo mount -t nfs -o vers=3          192.168.1.1:/export/b /tmp/b
$ df >/dev/null
df: ‘/tmp/b’: Input/output error

Signed-off-by: Chris Perl <cperl@janestreet.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-21 13:38:45 -05:00
Al Viro
757cbe597f LSM: new method: ->sb_add_mnt_opt()
Adding options to growing mnt_opts.  NFS kludge with passing
context= down into non-text-options mount switched to it, and
with that the last use of ->sb_parse_opts_str() is gone.

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-12-21 11:50:02 -05:00
Al Viro
204cc0ccf1 LSM: hide struct security_mnt_opts from any generic code
Keep void * instead, allocate on demand (in parse_str_opts, at the
moment).  Eventually both selinux and smack will be better off
with private structures with several strings in those, rather than
this "counter and two pointers to dynamically allocated arrays"
ugliness.  This commit allows to do that at leisure, without
disrupting anything outside of given module.

Changes:
	* instead of struct security_mnt_opt use an opaque pointer
initialized to NULL.
	* security_sb_eat_lsm_opts(), security_sb_parse_opts_str() and
security_free_mnt_opts() take it as var argument (i.e. as void **);
call sites are unchanged.
	* security_sb_set_mnt_opts() and security_sb_remount() take
it by value (i.e. as void *).
	* new method: ->sb_free_mnt_opts().  Takes void *, does
whatever freeing that needs to be done.
	* ->sb_set_mnt_opts() and ->sb_remount() might get NULL as
mnt_opts argument, meaning "empty".

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-12-21 11:48:34 -05:00
Al Viro
6a0440e5b7 nfs_remount(): don't leak, don't ignore LSM options quietly
* if mount(2) passes something like "context=foo" with MS_REMOUNT
in flags (/sbin/mount.nfs will _not_ do that - you need to issue
the syscall manually), you'll get leaked copies for LSM options.
The reason is that instead of nfs_{alloc,free}_parsed_mount_data()
nfs_remount() uses kzalloc/kfree, which lacks the needed cleanup.

* selinux options are not changed on remount (as for any other
fs), but in case of NFS the failure is quiet - they are not compared
to what we used to have, with complaint in case of attempted changes.
Trivially fixed by converting to use of security_sb_remount().

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-12-21 11:47:19 -05:00
Al Viro
f5c0c26d90 new helper: security_sb_eat_lsm_opts()
combination of alloc_secdata(), security_sb_copy_data(),
security_sb_parse_opt_str() and free_secdata().

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-12-21 11:46:00 -05:00
NeilBrown
a52458b48a NFS/NFSD/SUNRPC: replace generic creds with 'struct cred'.
SUNRPC has two sorts of credentials, both of which appear as
"struct rpc_cred".
There are "generic credentials" which are supplied by clients
such as NFS and passed in 'struct rpc_message' to indicate
which user should be used to authorize the request, and there
are low-level credentials such as AUTH_NULL, AUTH_UNIX, AUTH_GSS
which describe the credential to be sent over the wires.

This patch replaces all the generic credentials by 'struct cred'
pointers - the credential structure used throughout Linux.

For machine credentials, there is a special 'struct cred *' pointer
which is statically allocated and recognized where needed as
having a special meaning.  A look-up of a low-level cred will
map this to a machine credential.

Signed-off-by: NeilBrown <neilb@suse.com>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19 13:52:46 -05:00
NeilBrown
684f39b4cf NFS: struct nfs_open_dir_context: convert rpc_cred pointer to cred.
Use the common 'struct cred' to pass credentials for readdir.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19 13:52:46 -05:00
NeilBrown
b68572e07c NFS: change access cache to use 'struct cred'.
Rather than keying the access cache with 'struct rpc_cred',
use 'struct cred'.  Then use cred_fscmp() to compare
credentials rather than comparing the raw pointer.

A benefit of this approach is that in the common case we avoid the
rpc_lookup_cred_nonblock() call which can be slow when the cred cache is large.
This also keeps many fewer items pinned in the rpc cred cache, so the
cred cache is less likely to get large.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19 13:52:45 -05:00
NeilBrown
ddf529eeed NFS: move credential expiry tracking out of SUNRPC into NFS.
NFS needs to know when a credential is about to expire so that
it can modify write-back behaviour to finish the write inside the
expiry time.
It currently uses functions in SUNRPC code which make use of a
fairly complex callback scheme and flags in the generic credientials.

As I am working to discard the generic credentials, this has to change.

This patch moves the logic into NFS, in part by finding and caching
the low-level credential in the open_context.  We then make direct
cred-api calls on that.

This makes the code much simpler and removes a dependency on generic
rpc credentials.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19 13:52:45 -05:00
NeilBrown
5e16923b43 NFS/SUNRPC: don't lookup machine credential until rpcauth_bindcred().
When NFS creates a machine credential, it is a "generic" credential,
not tied to any auth protocol, and is really just a container for
the princpal name.
This doesn't get linked to a genuine credential until rpcauth_bindcred()
is called.
The lookup always succeeds, so various places that test if the machine
credential is NULL, are pointless.

As a step towards getting rid of generic credentials, this patch gets
rid of generic machine credentials.  The nfs_client and rpc_client
just hold a pointer to a constant principal name.
When a machine credential is wanted, a special static 'struct rpc_cred'
pointer is used. rpcauth_bindcred() recognizes this, finds the
principal from the client, and binds the correct credential.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19 13:52:45 -05:00
NeilBrown
f15e1e8bc6 NFSv4: don't require lock for get_renew_cred or get_machine_cred
This lock is no longer necessary.

If nfs4_get_renew_cred() needs to hunt through the open-state
creds for a user cred, it still takes the lock to stablize
the rbtree, but otherwise there are no races.

Note that this completely removes the lock from nfs4_renew_state().
It appears that the original need for the locking here was removed
long ago, and there is no longer anything to protect.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19 13:52:45 -05:00
NeilBrown
a534ecb013 NFSv4: add cl_root_cred for use when machine cred is not available.
NFSv4 state management tries a root credential when no machine
credential is available, as can happen with kerberos.
It does this by replacing the cl_machine_cred with a root credential.
This means that any user of the machine credential needs to take
a lock while getting a reference to the machine credential, which is
a little cumbersome.

So introduce an explicit cl_root_cred, and never free either
credential until client shutdown.  This means that no locking
is needed to reference these credentials.  Future patches
will make use of this.

This is only a temporary addition.  both cl_machine_cred and
cl_root_cred will disappear later in the series.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19 13:52:45 -05:00
NeilBrown
8276c902bb SUNRPC: remove uid and gid from struct auth_cred
Use cred->fsuid and cred->fsgid instead.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19 13:52:45 -05:00
NeilBrown
fc0664fd9b SUNRPC: remove groupinfo from struct auth_cred.
We can use cred->groupinfo (from the 'struct cred') instead.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19 13:52:45 -05:00
NeilBrown
97f68c6b02 SUNRPC: add 'struct cred *' to auth_cred and rpc_cred
The SUNRPC credential framework was put together before
Linux has 'struct cred'.  Now that we have it, it makes sense to
use it.
This first step just includes a suitable 'struct cred *' pointer
in every 'struct auth_cred' and almost every 'struct rpc_cred'.

The rpc_cred used for auth_null has a NULL 'struct cred *' as nothing
else really makes sense.

For rpc_cred, the pointer is reference counted.
For auth_cred it isn't.  struct auth_cred are either allocated on
the stack, in which case the thread owns a reference to the auth,
or are part of 'struct generic_cred' in which case gc_base owns the
reference, and "acred" shares it.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19 13:52:44 -05:00
Pavel Tikhomirov
ac0aa5e843 nfs: fix comment to nfs_generic_pg_test which does the opposite
Please see comment to filelayout_pg_test for reference.

To: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: linux-nfs@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19 13:52:44 -05:00
Olga Kornievskaia
069d5bf5ec NFSv4: cleanup remove unused nfs4_xdev_fs_type
commit e8f25e6d6d "NFS: Remove the NFS v4 xdev mount function"
removed the last use of this.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19 13:52:44 -05:00
Dave Kleikamp
ad3cba223a nfs: don't dirty kernel pages read by direct-io
When we use direct_IO with an NFS backing store, we can trigger a
WARNING in __set_page_dirty(), as below, since we're dirtying the page
unnecessarily in nfs_direct_read_completion().

To fix, replicate the logic in commit 53cbf3b157 ("fs: direct-io:
don't dirtying pages for ITER_BVEC/ITER_KVEC direct read").

Other filesystems that implement direct_IO handle this; most use
blockdev_direct_IO(). ceph and cifs have similar logic.

mount 127.0.0.1:/export /nfs
dd if=/dev/zero of=/nfs/image bs=1M count=200
losetup --direct-io=on -f /nfs/image
mkfs.btrfs /dev/loop0
mount -t btrfs /dev/loop0 /mnt/

kernel: WARNING: CPU: 0 PID: 8067 at fs/buffer.c:580 __set_page_dirty+0xaf/0xd0
kernel: Modules linked in: loop(E) nfsv3(E) rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) fscache(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) fuse(E) tun(E) ip6t_rpfilter(E) ipt_REJECT(E) nf_
kernel:  snd_seq(E) snd_seq_device(E) snd_pcm(E) video(E) snd_timer(E) snd(E) soundcore(E) ip_tables(E) xfs(E) libcrc32c(E) sd_mod(E) sr_mod(E) cdrom(E) ata_generic(E) pata_acpi(E) crc32c_intel(E) ahci(E) li
kernel: CPU: 0 PID: 8067 Comm: kworker/0:2 Tainted: G            E     4.20.0-rc1.master.20181111.ol7.x86_64 #1
kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
kernel: Workqueue: nfsiod rpc_async_release [sunrpc]
kernel: RIP: 0010:__set_page_dirty+0xaf/0xd0
kernel: Code: c3 48 8b 02 f6 c4 04 74 d4 48 89 df e8 ba 05 f7 ff 48 89 c6 eb cb 48 8b 43 08 a8 01 75 1f 48 89 d8 48 8b 00 a8 04 74 02 eb 87 <0f> 0b eb 83 48 83 e8 01 eb 9f 48 83 ea 01 0f 1f 00 eb 8b 48 83 e8
kernel: RSP: 0000:ffffc1c8825b7d78 EFLAGS: 00013046
kernel: RAX: 000fffffc0020089 RBX: fffff2b603308b80 RCX: 0000000000000001
kernel: RDX: 0000000000000001 RSI: ffff9d11478115c8 RDI: ffff9d11478115d0
kernel: RBP: ffffc1c8825b7da0 R08: 0000646f6973666e R09: 8080808080808080
kernel: R10: 0000000000000001 R11: 0000000000000000 R12: ffff9d11478115d0
kernel: R13: ffff9d11478115c8 R14: 0000000000003246 R15: 0000000000000001
kernel: FS:  0000000000000000(0000) GS:ffff9d115ba00000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 00007f408686f640 CR3: 0000000104d8e004 CR4: 00000000000606f0
kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
kernel: Call Trace:
kernel:  __set_page_dirty_buffers+0xb6/0x110
kernel:  set_page_dirty+0x52/0xb0
kernel:  nfs_direct_read_completion+0xc4/0x120 [nfs]
kernel:  nfs_pgio_release+0x10/0x20 [nfs]
kernel:  rpc_free_task+0x30/0x70 [sunrpc]
kernel:  rpc_async_release+0x12/0x20 [sunrpc]
kernel:  process_one_work+0x174/0x390
kernel:  worker_thread+0x4f/0x3e0
kernel:  kthread+0x102/0x140
kernel:  ? drain_workqueue+0x130/0x130
kernel:  ? kthread_stop+0x110/0x110
kernel:  ret_from_fork+0x35/0x40
kernel: ---[ end trace 01341980905412c9 ]---

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

[forward-ported to v4.20]
Signed-off-by: Calum Mackay <calum.mackay@oracle.com>
Reviewed-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-12-02 09:43:56 -05:00
Tigran Mkrtchyan
320f35b7bf flexfiles: enforce per-mirror stateid only for v4 DSes
Since commit bb21ce0ad2 we always enforce per-mirror stateid.
However, this makes sense only for v4+ servers.

Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-12-02 09:43:56 -05:00
NeilBrown
7b587e1a5a NFS: use locks_copy_lock() to copy locks.
Using memcpy() to copy lock requests leaves the various
list_head in an inconsistent state.
As we will soon attach a list of conflicting request to
another pending request, we need these lists to be consistent.
So change NFS to use locks_init_lock/locks_copy_lock instead
of memcpy.

Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
2018-11-30 11:26:12 -05:00
Tigran Mkrtchyan
bb21ce0ad2 flexfiles: use per-mirror specified stateid for IO
rfc8435 says:

  For tight coupling, ffds_stateid provides the stateid to be used by
  the client to access the file.

However current implementation replaces per-mirror provided stateid with
by open or lock stateid.

Ensure that per-mirror stateid is used by ff_layout_write_prepare_v4 and
nfs4_ff_layout_prepare_ds.

Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Signed-off-by: Rick Macklem <rmacklem@uoguelph.ca>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-11-22 14:04:55 -05:00