Go to file
David Howells ec0fa0b659 afs: Fix deadlock between writeback and truncate
The afs filesystem has a lock[*] that it uses to serialise I/O operations
going to the server (vnode->io_lock), as the server will only perform one
modification operation at a time on any given file or directory.  This
prevents the the filesystem from filling up all the call slots to a server
with calls that aren't going to be executed in parallel anyway, thereby
allowing operations on other files to obtain slots.

  [*] Note that is probably redundant for directories at least since
      i_rwsem is used to serialise directory modifications and
      lookup/reading vs modification.  The server does allow parallel
      non-modification ops, however.

When a file truncation op completes, we truncate the in-memory copy of the
file to match - but we do it whilst still holding the io_lock, the idea
being to prevent races with other operations.

However, if writeback starts in a worker thread simultaneously with
truncation (whilst notify_change() is called with i_rwsem locked, writeback
pays it no heed), it may manage to set PG_writeback bits on the pages that
will get truncated before afs_setattr_success() manages to call
truncate_pagecache().  Truncate will then wait for those pages - whilst
still inside io_lock:

    # cat /proc/8837/stack
    [<0>] wait_on_page_bit_common+0x184/0x1e7
    [<0>] truncate_inode_pages_range+0x37f/0x3eb
    [<0>] truncate_pagecache+0x3c/0x53
    [<0>] afs_setattr_success+0x4d/0x6e
    [<0>] afs_wait_for_operation+0xd8/0x169
    [<0>] afs_do_sync_operation+0x16/0x1f
    [<0>] afs_setattr+0x1fb/0x25d
    [<0>] notify_change+0x2cf/0x3c4
    [<0>] do_truncate+0x7f/0xb2
    [<0>] do_sys_ftruncate+0xd1/0x104
    [<0>] do_syscall_64+0x2d/0x3a
    [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

The writeback operation, however, stalls indefinitely because it needs to
get the io_lock to proceed:

    # cat /proc/5940/stack
    [<0>] afs_get_io_locks+0x58/0x1ae
    [<0>] afs_begin_vnode_operation+0xc7/0xd1
    [<0>] afs_store_data+0x1b2/0x2a3
    [<0>] afs_write_back_from_locked_page+0x418/0x57c
    [<0>] afs_writepages_region+0x196/0x224
    [<0>] afs_writepages+0x74/0x156
    [<0>] do_writepages+0x2d/0x56
    [<0>] __writeback_single_inode+0x84/0x207
    [<0>] writeback_sb_inodes+0x238/0x3cf
    [<0>] __writeback_inodes_wb+0x68/0x9f
    [<0>] wb_writeback+0x145/0x26c
    [<0>] wb_do_writeback+0x16a/0x194
    [<0>] wb_workfn+0x74/0x177
    [<0>] process_one_work+0x174/0x264
    [<0>] worker_thread+0x117/0x1b9
    [<0>] kthread+0xec/0xf1
    [<0>] ret_from_fork+0x1f/0x30

and thus deadlock has occurred.

Note that whilst afs_setattr() calls filemap_write_and_wait(), the fact
that the caller is holding i_rwsem doesn't preclude more pages being
dirtied through an mmap'd region.

Fix this by:

 (1) Use the vnode validate_lock to mediate access between afs_setattr()
     and afs_writepages():

     (a) Exclusively lock validate_lock in afs_setattr() around the whole
     	 RPC operation.

     (b) If WB_SYNC_ALL isn't set on entry to afs_writepages(), trying to
     	 shared-lock validate_lock and returning immediately if we couldn't
     	 get it.

     (c) If WB_SYNC_ALL is set, wait for the lock.

     The validate_lock is also used to validate a file and to zap its cache
     if the file was altered by a third party, so it's probably a good fit
     for this.

 (2) Move the truncation outside of the io_lock in setattr, using the same
     hook as is used for local directory editing.

     This requires the old i_size to be retained in the operation record as
     we commit the revised status to the inode members inside the io_lock
     still, but we still need to know if we reduced the file size.

Fixes: d2ddc776a4 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-08 10:50:55 -07:00
Documentation Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-10-05 11:27:14 -07:00
LICENSES LICENSES: Rename other to deprecated 2019-05-03 06:34:32 -06:00
arch Fix a kernel panic in the AES crypto code caused by a BR tail call not 2020-10-06 12:09:29 -07:00
block block-5.9-2020-10-02 2020-10-02 14:34:52 -07:00
certs .gitignore: add SPDX License Identifier 2020-03-25 11:50:48 +01:00
crypto Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2020-08-30 15:53:44 -07:00
drivers platform-drivers-x86 for v5.9-3 2020-10-06 12:00:52 -07:00
fs afs: Fix deadlock between writeback and truncate 2020-10-08 10:50:55 -07:00
include drm fixes for 5.9 final 2020-10-06 11:05:44 -07:00
init bootconfig: init: make xbc_namebuf static 2020-09-18 22:17:05 -04:00
ipc ipc: adjust proc_ipc_sem_dointvec definition to match prototype 2020-09-05 12:14:29 -07:00
kernel usermodehelper: reset umask to default before executing user process 2020-10-06 10:31:52 -07:00
lib drm-misc-fixes for v5.9: 2020-10-06 12:38:28 +10:00
mm mm: avoid early COW write protect games during fork() 2020-10-08 10:11:32 -07:00
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-10-05 11:27:14 -07:00
samples treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
scripts scripts/spelling.txt: fix malformed entry 2020-10-03 11:28:12 -07:00
security device_cgroup RCU warning fix from Amol Grover <frextrite@gmail.com> 2020-09-15 16:26:57 -07:00
sound sound fixes for 5.9-rc7 2020-09-24 09:00:05 -07:00
tools Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-10-05 11:27:14 -07:00
usr Merge branch 'work.fdpic' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-08-07 13:29:39 -07:00
virt KVM: fix memory leak in kvm_io_bus_unregister_dev() 2020-09-11 13:15:11 -04:00
.clang-format clang-format: Update with the latest for_each macro list 2020-09-01 12:53:42 +02:00
.cocciconfig
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes .gitattributes: use 'dts' diff driver for dts files 2019-12-04 19:44:11 -08:00
.gitignore .gitignore: Add ZSTD-compressed files 2020-07-31 11:50:49 +02:00
.mailmap mailmap: add older email addresses for Kees Cook 2020-09-19 13:13:38 -07:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS CREDITS: Replace HTTP links with HTTPS ones 2020-07-23 14:53:58 -06:00
Kbuild kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS drm fixes for 5.9 final 2020-10-06 11:05:44 -07:00
Makefile Linux 5.9-rc8 2020-10-04 16:04:34 -07:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.