linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-09-04 01:49:36 +00:00

Author	SHA1	Message	Date
Peter Xu	14819305e0	userfaultfd: wp: declare _UFFDIO_WRITEPROTECT conditionally Only declare _UFFDIO_WRITEPROTECT if the user specified UFFDIO_REGISTER_MODE_WP and if all the checks passed. Then when the user registers regions with shmem/hugetlbfs we won't expose the new ioctl to them. Even with complete anonymous memory range, we'll only expose the new WP ioctl bit if the register mode has MODE_WP. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: Brian Geffon <bgeffon@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Rik van Riel <riel@redhat.com> Cc: Shaohua Li <shli@fb.com> Link: http://lkml.kernel.org/r/20200220163112.11409-18-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-04-07 10:43:40 -07:00
Peter Xu	23080e2783	userfaultfd: wp: don't wake up when doing write protect It does not make sense to try to wake up any waiting thread when we're write-protecting a memory region. Only wake up when resolving a write protected page fault. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: Brian Geffon <bgeffon@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Rik van Riel <riel@redhat.com> Cc: Shaohua Li <shli@fb.com> Link: http://lkml.kernel.org/r/20200220163112.11409-16-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-04-07 10:43:39 -07:00
Andrea Arcangeli	63b2d4174c	userfaultfd: wp: add the writeprotect API to userfaultfd ioctl Introduce the new uffd-wp APIs for userspace. Firstly, we'll allow to do UFFDIO_REGISTER with write protection tracking using the new UFFDIO_REGISTER_MODE_WP flag. Note that this flag can co-exist with the existing UFFDIO_REGISTER_MODE_MISSING, in which case the userspace program can not only resolve missing page faults, and at the same time tracking page data changes along the way. Secondly, we introduced the new UFFDIO_WRITEPROTECT API to do page level write protection tracking. Note that we will need to register the memory region with UFFDIO_REGISTER_MODE_WP before that. [peterx@redhat.com: write up the commit message] [peterx@redhat.com: remove useless block, write commit message, check against VM_MAYWRITE rather than VM_WRITE when register] Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Jerome Glisse <jglisse@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: Brian Geffon <bgeffon@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Rik van Riel <riel@redhat.com> Cc: Shaohua Li <shli@fb.com> Link: http://lkml.kernel.org/r/20200220163112.11409-14-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-04-07 10:43:39 -07:00
Andrea Arcangeli	72981e0e7b	userfaultfd: wp: add UFFDIO_COPY_MODE_WP This allows UFFDIO_COPY to map pages write-protected. [peterx@redhat.com: switch to VM_WARN_ON_ONCE in mfill_atomic_pte; add brackets around "dst_vma->vm_flags & VM_WRITE"; fix wordings in comments and commit messages] Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Jerome Glisse <jglisse@redhat.com> Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: Brian Geffon <bgeffon@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Rik van Riel <riel@redhat.com> Cc: Shaohua Li <shli@fb.com> Link: http://lkml.kernel.org/r/20200220163112.11409-6-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-04-07 10:43:39 -07:00
Anshuman Khandual	03911132aa	mm/vma: replace all remaining open encodings with is_vm_hugetlb_page() This replaces all remaining open encodings with is_vm_hugetlb_page(). Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Will Deacon <will@kernel.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Nick Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Paul Burton <paulburton@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rich Felker <dalias@libc.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/1582520593-30704-4-git-send-email-anshuman.khandual@arm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-04-07 10:43:37 -07:00
Long Li	044b541c11	cifs: smbd: Do not schedule work to send immediate packet on every receive Immediate packets should only be sent to peer when there are new receive credits made available. New credits show up on freeing receive buffer, not on receiving data. Fix this by avoid unnenecessary work schedules. Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2020-04-07 12:41:16 -05:00
Long Li	f1b7b862bf	cifs: smbd: Properly process errors on ib_post_send When processing errors from ib_post_send(), the transport state needs to be rolled back to the condition before the error. Refactor the old code to make it easy to roll back on IB errors, and fix this. Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2020-04-07 12:41:16 -05:00
Long Li	eda1c54f14	cifs: Allocate crypto structures on the fly for calculating signatures of incoming packets CIFS uses pre-allocated crypto structures to calculate signatures for both incoming and outgoing packets. In this way it doesn't need to allocate crypto structures for every packet, but it requires a lock to prevent concurrent access to crypto structures. Remove the lock by allocating crypto structures on the fly for incoming packets. At the same time, we can still use pre-allocated crypto structures for outgoing packets, as they are already protected by transport lock srv_mutex. Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2020-04-07 12:41:16 -05:00
Long Li	d4e5160d1a	cifs: smbd: Update receive credits before sending and deal with credits roll back on failure before sending Recevie credits should be updated before sending the packet, not before a work is scheduled. Also, the value needs roll back if something fails and cannot send. Signed-off-by: Long Li <longli@microsoft.com> Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2020-04-07 12:41:16 -05:00
Long Li	3ffbe78aff	cifs: smbd: Check send queue size before posting a send Sometimes the remote peer may return more send credits than the send queue depth. If all the send credits are used to post senasd, we may overflow the send queue. Fix this by checking the send queue size before posting a send. Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2020-04-07 12:41:16 -05:00
Long Li	072a14ec63	cifs: smbd: Merge code to track pending packets As an optimization, SMBD tries to track two types of packets: packets with payload and without payload. There is no obvious benefit or performance gain to separately track two types of packets. Just treat them as pending packets and merge the tracking code. Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2020-04-07 12:41:16 -05:00
Aurelien Aptel	e79b0332ae	cifs: ignore cached share root handle closing errors Fix tcon use-after-free and NULL ptr deref. Customer system crashes with the following kernel log: [462233.169868] CIFS VFS: Cancelling wait for mid 4894753 cmd: 14 => a QUERY DIR [462233.228045] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4 [462233.305922] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4 [462233.306205] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4 [462233.347060] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4 [462233.347107] CIFS VFS: Close unmatched open [462233.347113] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 ... [exception RIP: cifs_put_tcon+0xa0] (this is doing tcon->ses->server) #6 [...] smb2_cancelled_close_fid at ... [cifs] #7 [...] process_one_work at ... #8 [...] worker_thread at ... #9 [...] kthread at ... The most likely explanation we have is: * When we put the last reference of a tcon (refcount=0), we close the cached share root handle. * If closing a handle is interrupted, SMB2_close() will queue a SMB2_close() in a work thread. * The queued object keeps a tcon ref so we bump the tcon refcount, jumping from 0 to 1. * We reach the end of cifs_put_tcon(), we free the tcon object despite it now having a refcount of 1. * The queued work now runs, but the tcon, ses & server was freed in the meantime resulting in a crash. THREAD 1 ======== cifs_put_tcon => tcon refcount reach 0 SMB2_tdis close_shroot_lease close_shroot_lease_locked => if cached root has lease && refcount = 0 smb2_close_cached_fid => if cached root valid SMB2_close => retry close in a thread if interrupted smb2_handle_cancelled_close __smb2_handle_cancelled_close => !! tcon refcount bump 0 => 1 !! INIT_WORK(&cancelled->work, smb2_cancelled_close_fid); queue_work(cifsiod_wq, &cancelled->work) => queue work tconInfoFree(tcon); ==> freed! cifs_put_smb_ses(ses); ==> freed! THREAD 2 (workqueue) ======== smb2_cancelled_close_fid SMB2_close(0, cancelled->tcon, ...); => use-after-free of tcon cifs_put_tcon(cancelled->tcon); => tcon refcount reach 0 second time CRASH Fixes: `d919131935` ("CIFS: Close cached root handle only if it has a lease") Signed-off-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>	2020-04-07 12:40:40 -05:00
Xiaoguang Wang	f7fe934686	io_uring: initialize fixed_file_data lock syzbot reports below warning: INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. CPU: 1 PID: 7099 Comm: syz-executor897 Not tainted 5.6.0-next-20200406-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x188/0x20d lib/dump_stack.c:118 assign_lock_key kernel/locking/lockdep.c:913 [inline] register_lock_class+0x1664/0x1760 kernel/locking/lockdep.c:1225 __lock_acquire+0x104/0x4e00 kernel/locking/lockdep.c:4223 lock_acquire+0x1f2/0x8f0 kernel/locking/lockdep.c:4923 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0x8c/0xbf kernel/locking/spinlock.c:159 io_sqe_files_register fs/io_uring.c:6599 [inline] __io_uring_register+0x1fe8/0x2f00 fs/io_uring.c:8001 __do_sys_io_uring_register fs/io_uring.c:8081 [inline] __se_sys_io_uring_register fs/io_uring.c:8063 [inline] __x64_sys_io_uring_register+0x192/0x560 fs/io_uring.c:8063 do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295 entry_SYSCALL_64_after_hwframe+0x49/0xb3 RIP: 0033:0x440289 Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007ffff1bbf558 EFLAGS: 00000246 ORIG_RAX: 00000000000001ab RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440289 RDX: 0000000020000280 RSI: 0000000000000002 RDI: 0000000000000003 RBP: 00000000006ca018 R08: 0000000000000000 R09: 00000000004002c8 R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000401b10 R13: 0000000000401ba0 R14: 0000000000000000 R15: 0000000000000000 Initialize struct fixed_file_data's lock to fix this issue. Reported-by: syzbot+e6eeca4a035da76b3065@syzkaller.appspotmail.com Fixes: `0558955373` ("io_uring: refactor file register/unregister/update handling") Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-04-07 09:45:51 -06:00
Colin Ian King	211fea18a7	io_uring: remove redundant variable pointer nxt and io_wq_assign_next call An earlier commit "io_uring: remove @nxt from handlers" removed the setting of pointer nxt and now it is always null, hence the non-null check and call to io_wq_assign_next is redundant and can be removed. Addresses-Coverity: ("'Constant' variable guard") Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-04-07 09:45:33 -06:00
Trond Myklebust	93ce4af774	NFS: Clean up process of marking inode stale. Instead of the various open coded calls to set the NFS_INO_STALE bit and call nfs_zap_caches(), consolidate them into a single function nfs_set_inode_stale(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-04-06 13:56:33 -04:00
Linus Torvalds	b6ff10700d	\n -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAl6LFJ0ACgkQnJ2qBz9k QNkSUQgAzwaescnHeVTF7/Zg9Uj2xrfrTJZ1E+Mn9qnd/0/z/asVV+RKfY7Gnu7h g19inDI4ZESFz2gWz4jwJD1c2/yMZb8vnae4ye3dtCv2yjG/0JxCeue6vjwsWqmO 4jbSgk8YNQqzwEFVMzNp43ZJr3CFooLCIsJcL8q4yYk8Kt4pDUPmQ1vBvAc6k9vK BKMBvp926tbomP27nq0n0CjvHy7ipDGMl4H6i4vBxHRfbDPih2x9VEklK3JatC1n 4AKS6IYJrkZVdOjli+DrResbcWxyT4db5tPio5MU0RDnVhNZT2cHyNVXf5EpRJqP 72pa7gfPu1Rx1+tU8bDR/daSveou2A== =fkCV -----END PGP SIGNATURE----- Merge tag 'fsnotify_for_v5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify updates from Jan Kara: "This implements the fanotify FAN_DIR_MODIFY event. This event reports the name in a directory under which a change happened and together with the directory filehandle and fstatat() allows reliable and efficient implementation of directory synchronization" * tag 'fsnotify_for_v5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fanotify: Fix the checks in fanotify_fsid_equal fanotify: report name info for FAN_DIR_MODIFY event fanotify: record name info for FAN_DIR_MODIFY event fanotify: Drop fanotify_event_has_fid() fanotify: prepare to report both parent and child fid's fanotify: send FAN_DIR_MODIFY event flavor with dir inode and name fanotify: divorce fanotify_path_event and fanotify_fid_event fanotify: Store fanotify handles differently fanotify: Simplify create_fd() fanotify: fix merging marks masks with FAN_ONDIR fanotify: merge duplicate events on parent and child fsnotify: replace inode pointer with an object id fsnotify: simplify arguments passing to fsnotify_parent() fsnotify: use helpers to access data by data_type fsnotify: funnel all dirent events through fsnotify_name() fsnotify: factor helpers fsnotify_dentry() and fsnotify_file() fsnotify: tidy up FS_ and FAN_ constants	2020-04-06 08:58:42 -07:00
Linus Torvalds	74e934ba0d	\n -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAl6LE+QACgkQnJ2qBz9k QNn0HAf/fz+W/WqZFMEMLoAOJB5ALQa4v1DQGqsSIyKb9GIF8OBhzqNCP2wryx/r Qa37jTOLEQxsdMIQ3LaDH8RVZOj6GUAhk5RZIgTX2XYcWRdqqhws0a27BAJSPnt9 +uujPccf7rBQJ1aN8+kt8QMDkxo4AX8keHtfT4nbl2c1JNsd2nbjsEsWUBx7R1kb aoblIklUMfMr5NvKcWakhJoMtST0fyNX4vbV3tePJTTOS1uzatQLJ5ox2WmY6M+q tTEaKpR/zHMGXegVAlBMFBTJ3NVANcS4MoIYcHlkg5DMOO00XwHH/ZtsJlVsAOuj FURu2KJ3d4RYMxaKlX+wBscliL2GqQ== =weEs -----END PGP SIGNATURE----- Merge tag 'for_v5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull ext2/udf updates from Jan Kara: "Cleanups and fixes for ext2 and one cleanup for udf" * tag 'for_v5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: ext2: fix empty body warnings when -Wextra is used ext2: fix debug reference to ext2_xattr_cache udf: udf_sb.h: Replace zero-length array with flexible-array member ext2: xattr.h: Replace zero-length array with flexible-array member ext2: Silence lockdep warning about reclaim under xattr_sem	2020-04-06 08:55:04 -07:00
Linus Torvalds	e14679b62d	9p pull request for inclusion in 5.7 - Fix read with O_NONBLOCK to allow incomplete read and return immediately - Rest is just cleanup (indent, unused field in struct, extra semicolon) -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE/IPbcYBuWt0zoYhOq06b7GqY5nAFAl6LCyEACgkQq06b7GqY 5nAAQg//ThE/rojrB74S8yaD3sJlfsT+BoMzGDeos9FGdsHpO8t1ISUiheAcUn/Y a6igfJxTefxEO5SWqvsGEousRl0sRTyLo8RuQ2CC7+zKN8Ou+eG+4s2QXfDxBD+p xJGJrsWD2SkQ98/hkxDY6d31BM9iQGvpvVp/c9qLwtvTlnvEMXll0JiACcnmiPRt z92bx8QkaPlt399qv7LltfHftxCHOMcieQlYAF3wDjLauiEdDWysiMn0NIC2izwY hPlsVCiV504yhvQFcXVnmbWVJe88THlBVVjuVAi9IwpU8D2+1UK1kM/yiyVkYml+ U4JC7ZBQoqw2obSLPk5mZ6E7nGWfWnIBpt/B9dO7akP8pS9bv+uCjwg65tFEop5/ z9nuyjoQT/dEoThveeoEoxpIbyd690n1vbhcTdGDFDX5nfglrxj0S6bNVW0k6e41 9aPwWzAfU9Ip9GHwE6Ewf/y88mMI7rCP+o/pC8XkUUGiV+foFt3oO5wCcvGbxwgJ nnZ5oY+Ren/QXOmq/cN3RUWJvTRbc8TDBm6Fa0iGkk7d1OUBuRTWuBUXvxHMpO3A xHapDRddPcrZ9QQSJZGvMSYwG0ZUZ6MRL2qTU4X/3sIi0giApFkRMpmmo0HjvMAf PIFl2MG2Ok4A8yWQN98AMXy8vgs6ql6+Wb3W0b5dNFsAGEk7f9U= =wAe5 -----END PGP SIGNATURE----- Merge tag '9p-for-5.7' of git://github.com/martinetd/linux Pull 9p updates from Dominique Martinet: "Not much new, but a few patches for this cycle: - Fix read with O_NONBLOCK to allow incomplete read and return immediately - Rest is just cleanup (indent, unused field in struct, extra semicolon)" * tag '9p-for-5.7' of git://github.com/martinetd/linux: net/9p: remove unused p9_req_t aux field 9p: read only once on O_NONBLOCK 9pnet: allow making incomplete read requests 9p: Remove unneeded semicolon 9p: Fix Kconfig indentation	2020-04-06 08:46:59 -07:00
Christoph Hellwig	5833112df7	xfs: reflink should force the log out if mounted with wsync Reflink should force the log out to disk if the filesystem was mounted with wsync, the same as most other operations in xfs. [Note: XFS_MOUNT_WSYNC is set when the admin mounts the filesystem with either the 'wsync' or 'sync' mount options, which effectively means that we're classifying reflink/dedupe as IO operations and making them synchronous when required.] Fixes: `3fc9f5e409` ("xfs: remove xfs_reflink_remap_range") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> [darrick: add more to the changelog] Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2020-04-06 08:44:39 -07:00
Christoph Hellwig	54fbdd1035	xfs: factor out a new xfs_log_force_inode helper Create a new helper to force the log up to the last LSN touching an inode. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2020-04-06 08:44:35 -07:00
Linus Torvalds	77a73eecd4	Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pathwalk fix from Al Viro: "Dumb braino in legitimize_path()..." * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fix a braino in legitimize_path()	2020-04-06 08:38:52 -07:00
Al Viro	5bd73286d5	fix a braino in legitimize_path() brown paperbag time... wrong order of arguments ended up confusing the values to check dentry and mount_lock seqcounts against. Reported-by: kernel test robot <rong.a.chen@intel.com> Fixes: `2aa3847085` ("non-RCU analogue of the previous commit") Tested-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-04-06 10:38:59 -04:00
Pavel Begunkov	48bdd849e9	io_uring: fix ctx refcounting in io_submit_sqes() If io_get_req() fails, it drops a ref. Then, awhile keeping @submitted unmodified, io_submit_sqes() breaks the loop and puts @nr - @submitted refs. For each submitted req a ref is dropped in io_put_req() and friends. So, for @nr taken refs there will be (@nr - @submitted + @submitted + 1) dropped. Remove ctx refcounting from io_get_req(), that at the same time makes it clearer. Fixes: `2b85edfc0c` ("io_uring: batch getting pcpu references") Cc: stable@vger.kernel.org # v5.6 Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-04-05 16:23:48 -06:00
Linus Torvalds	70fbdfef4b	sysfs: remove redundant __compat_only_sysfs_link_entry_to_kobj fn Commit `9255782f70` ("sysfs: Wrap __compat_only_sysfs_link_entry_to_kobj function to change the symlink name") made this function a wrapper around a new non-underscored function, which is a bit odd. The normal naming convention is the other way around: the underscored function is the wrappee, and the non-underscored function is the wrapper. There's only one single user (well, two call-sites in that user) of the more limited double underscore version of this function, so just remove the oddly named wrapper entirely and just add the extra NULL argument to the user. I considered just doing that in the merge, but that tends to make history really hard to read. Link: https://lore.kernel.org/lkml/CAHk-=wgkkmNV5tMzQDmPAQuNJBuMcry--Jb+h8H1o4RA3kF7QQ@mail.gmail.com/ Cc: Sourabh Jain <sourabhjain@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-04-05 11:34:35 -07:00
Linus Torvalds	d38c07afc3	powerpc updates for 5.7 - A large series from Nick for 64-bit to further rework our exception vectors, and rewrite portions of the syscall entry/exit and interrupt return in C. The result is much easier to follow code that is also faster in general. - Cleanup of our ptrace code to split various parts out that had become badly intertwined with #ifdefs over the years. - Changes to our NUMA setup under the PowerVM hypervisor which should hopefully avoid non-sensical topologies which can lead to warnings from the workqueue code and other problems. - MAINTAINERS updates to remove some of our old orphan entries and update the status of others. - Quite a few other small changes and fixes all over the map. Thanks to: Abdul Haleem, afzal mohammed, Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V, Balamuruhan S, Cédric Le Goater, Chen Zhou, Christophe JAILLET, Christophe Leroy, Christoph Hellwig, Clement Courbet, Daniel Axtens, David Gibson, Douglas Miller, Fabiano Rosas, Fangrui Song, Ganesh Goudar, Gautham R. Shenoy, Greg Kroah-Hartman, Greg Kurz, Gustavo Luiz Duarte, Hari Bathini, Ilie Halip, Jan Kara, Joe Lawrence, Joe Perches, Kajol Jain, Larry Finger, Laurentiu Tudor, Leonardo Bras, Libor Pechacek, Madhavan Srinivasan, Mahesh Salgaonkar, Masahiro Yamada, Masami Hiramatsu, Mauricio Faria de Oliveira, Michael Neuling, Michal Suchanek, Mike Rapoport, Nageswara R Sastry, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Nick Desaulniers, Oliver O'Halloran, Po-Hsu Lin, Pratik Rajesh Sampat, Rasmus Villemoes, Ravi Bangoria, Roman Bolshakov, Sam Bobroff, Sandipan Das, Santosh S, Sedat Dilek, Segher Boessenkool, Shilpasri G Bhat, Sourabh Jain, Srikar Dronamraju, Stephen Rothwell, Tyrel Datwyler, Vaibhav Jain, YueHaibing. -----BEGIN PGP SIGNATURE----- iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl6JypATHG1wZUBlbGxl cm1hbi5pZC5hdQAKCRBR6+o8yOGlgOTyD/0U90tXb3VXlQcc4OFIb8vWIj76k4Zn ZSZ7RyOuvb5pCISBZjSK79XkR9eMHT77qagX4V41q64k4yQl8nbgLeVnwL76hLLc IJCs23f4nsO0uqX/MhSCc5dfOOOS2i8V+OQYtsYWsH5QaG95v0cHIqVaHHMlfQxu 507GO/W5W6KTd4x008b5unQOuE51zMKlKvqEJXkT59obQFpaa2S5Wn7OzhsnarCH YSRNxaC7vtgBKLA9wUnFh8UUbh0FbOwXBCaq4OhHMhgRihdteVBCzlcR/6c+IRbt EoZxKzfQ0hI1z5f++kJNaRXMtUbSpM8D1HdKKHgiWjpdBSD0eu2X106KQT2R2ZOF qhX8xPLWNzdBglA6L43AaZUu+4ayd3QrrJIkjDv/K1rCHZjfGOzSQfoZgTEBNLFA tC0crhEfw8m98e4EwhCtekGQxdczRdLS9YvtC/h6mU2xkpA35yNSwB1/iuVQdkYD XyrEqImAQ1PJla7NL0hxSy5ZxrBtMeKT4WZZ0BNgKXryemldg8Tuv3AEyach3BHz eU0pIwpbnPm1JAPyrpDQ1yEf7QsD77gTPfEvilEci60R9DhvIMGAY+pt0qfME3yX wOLp2yVBEXlRmvHk/y/+r+m4aCsmwSrikbWwmLLwAAA6JehtzFOWxTEfNpACP23V mZyyZznsHIIE3Q== =ARdm -----END PGP SIGNATURE----- Merge tag 'powerpc-5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Slightly late as I had to rebase mid-week to insert a bug fix: - A large series from Nick for 64-bit to further rework our exception vectors, and rewrite portions of the syscall entry/exit and interrupt return in C. The result is much easier to follow code that is also faster in general. - Cleanup of our ptrace code to split various parts out that had become badly intertwined with #ifdefs over the years. - Changes to our NUMA setup under the PowerVM hypervisor which should hopefully avoid non-sensical topologies which can lead to warnings from the workqueue code and other problems. - MAINTAINERS updates to remove some of our old orphan entries and update the status of others. - Quite a few other small changes and fixes all over the map. Thanks to: Abdul Haleem, afzal mohammed, Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V, Balamuruhan S, Cédric Le Goater, Chen Zhou, Christophe JAILLET, Christophe Leroy, Christoph Hellwig, Clement Courbet, Daniel Axtens, David Gibson, Douglas Miller, Fabiano Rosas, Fangrui Song, Ganesh Goudar, Gautham R. Shenoy, Greg Kroah-Hartman, Greg Kurz, Gustavo Luiz Duarte, Hari Bathini, Ilie Halip, Jan Kara, Joe Lawrence, Joe Perches, Kajol Jain, Larry Finger, Laurentiu Tudor, Leonardo Bras, Libor Pechacek, Madhavan Srinivasan, Mahesh Salgaonkar, Masahiro Yamada, Masami Hiramatsu, Mauricio Faria de Oliveira, Michael Neuling, Michal Suchanek, Mike Rapoport, Nageswara R Sastry, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Nick Desaulniers, Oliver O'Halloran, Po-Hsu Lin, Pratik Rajesh Sampat, Rasmus Villemoes, Ravi Bangoria, Roman Bolshakov, Sam Bobroff, Sandipan Das, Santosh S, Sedat Dilek, Segher Boessenkool, Shilpasri G Bhat, Sourabh Jain, Srikar Dronamraju, Stephen Rothwell, Tyrel Datwyler, Vaibhav Jain, YueHaibing" * tag 'powerpc-5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (158 commits) powerpc: Make setjmp/longjmp signature standard powerpc/cputable: Remove unnecessary copy of cpu_spec->oprofile_type powerpc: Suppress .eh_frame generation powerpc: Drop -fno-dwarf2-cfi-asm powerpc/32: drop unused ISA_DMA_THRESHOLD powerpc/powernv: Add documentation for the opal sensor_groups sysfs interfaces selftests/powerpc: Fix try-run when source tree is not writable powerpc/vmlinux.lds: Explicitly retain .gnu.hash powerpc/ptrace: move ptrace_triggered() into hw_breakpoint.c powerpc/ptrace: create ppc_gethwdinfo() powerpc/ptrace: create ptrace_get_debugreg() powerpc/ptrace: split out ADV_DEBUG_REGS related functions. powerpc/ptrace: move register viewing functions out of ptrace.c powerpc/ptrace: split out TRANSACTIONAL_MEM related functions. powerpc/ptrace: split out SPE related functions. powerpc/ptrace: split out ALTIVEC related functions. powerpc/ptrace: split out VSX related functions. powerpc/ptrace: drop PARAMETER_SAVE_AREA_OFFSET powerpc/ptrace: drop unnecessary #ifdefs CONFIG_PPC64 powerpc/ptrace: remove unused header includes ...	2020-04-05 11:12:59 -07:00
Linus Torvalds	9c94b39560	1) Replace ext4's bmap and iopoll implementations to use iomap. 2) Clean up extent tree handling. 3) Other cleanups and miscellaneous bug fixes -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAl6JWT0ACgkQ8vlZVpUN gaMTOQgAo7bkkyc5+FCbpIUejle1DvM9Hg32fG1p/I5XH771RJRlGURVcx4FgsAZ TOiP4R4csADWcfUWnky/MK1pubVPLq2x7c0OyYnyOENM5PsLufeIa1EGjKBrGYeE 6et4zyqlB7bIOUkVarluSp0j3YklxGrg2k45cnVFp572W5FaWiwpIHL/qNcwvbv0 NIDyB+tW16tlmKINT/YLnDMRAOagoJoQwgNYwQPk+qVAvKWOUT3YQH0KXhXIRaMp M2HU9lUqSXT5F2uNNypeRaqwoFayYV7tVnaSoht30fFk04Eg+RLTyhCUWSVR2EHW zBjKNy9l3KQFTI8eriguGcLNy8cxig== =nLff -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: - Replace ext4's bmap and iopoll implementations to use iomap. - Clean up extent tree handling. - Other cleanups and miscellaneous bug fixes * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (31 commits) ext4: save all error info in save_error_info() and drop ext4_set_errno() ext4: fix incorrect group count in ext4_fill_super error message ext4: fix incorrect inodes per group in error message ext4: don't set dioread_nolock by default for blocksize < pagesize ext4: disable dioread_nolock whenever delayed allocation is disabled ext4: do not commit super on read-only bdev ext4: avoid ENOSPC when avoiding to reuse recently deleted inodes ext4: unregister sysfs path before destroying jbd2 journal ext4: check for non-zero journal inum in ext4_calculate_overhead ext4: remove map_from_cluster from ext4_ext_map_blocks ext4: clean up ext4_ext_insert_extent() call in ext4_ext_map_blocks() ext4: mark block bitmap corrupted when found instead of BUGON ext4: use flexible-array member for xattr structs ext4: use flexible-array member in struct fname Documentation: correct the description of FIEMAP_EXTENT_LAST ext4: move ext4_fiemap to use iomap framework ext4: make ext4_ind_map_blocks work with fiemap ext4: move ext4 bmap to use iomap infrastructure ext4: optimize ext4_ext_precache for 0 depth ext4: add IOMAP_F_MERGED for non-extent based mapping ...	2020-04-05 10:54:03 -07:00
Linus Torvalds	83eb69f3b8	Merge branch 'work.exfat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull exfat filesystem from Al Viro: "Shiny new fs/exfat replacement for drivers/staging/exfat" * 'work.exfat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: exfat: update file system parameter handling staging: exfat: make staging/exfat and fs/exfat mutually exclusive MAINTAINERS: add exfat filesystem exfat: add Kconfig and Makefile exfat: add nls operations exfat: add misc operations exfat: add exfat cache exfat: add bitmap operations exfat: add fat entry operations exfat: add file operations exfat: add directory operations exfat: add inode operations exfat: add super block operations exfat: add in-memory and on-disk structures and headers	2020-04-04 11:46:09 -07:00
Linus Torvalds	b3d8e42282	Highlights: - Fix EXCHANGE_ID response when NFSD runs in a container - A battery of new static trace points - Socket transports now use bio_vec to send Replies - NFS/RDMA now supports filesystems with no .splice_read method - Favor memcpy() over DMA mapping for small RPC/RDMA Replies - Add pre-requisites for supporting multiple Write chunks - Numerous minor fixes and clean-ups -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJegj9pAAoJEDNqszNvZn+XNGgP/RsRul/UGe70YoPS6AwxI+c1 2JVni5LV83aVGSN1df/xRNdugWh4j8e8stBIJPCnWFzUERFvrzVeVyW0/dlIy37l SRL1L62EzFUejAL45O+CkF5+KI2kAWMgDCv+rPnFnIuXVa/sThx63F1AJikVMPjB 7We3vd5Kh/CrMeMflebJYuY12xE6di2b3ifkZRO0/yuMaAuqJrDreYf4L6xpA4rC QnKQcNl7LGlOwGSI2WvDrCLE056PJFhTuzTawI80NKnkXMMFNc6/7NoXJqasVlHG fiki2mHbJrbYd8isIm3Vl/QkFsM8QjijtpVxC9gd151w0P7DfpMYmSzlZL7nvq/R Nt6IIqbaxWSS1VULsuS7rDtBwwZpW/LRWaUhEvMwimR2jeOxcwtlDVTX/dRH2mxq Ume64Hn8xMEhhx9tHCPQ+Rgjqv5m+ZEAvmV6B7RM9nT2z2MSzQQESeMB14VZZmF/ 2oH1dDCVdCmb4ZOcD5yxL6Y1hijn45s+YHdts9uIsCudKYPI906vPhogFC+PVJv+ MrOiUf8d40H0ra8VAUFCjAceOulkv90aLhBjoHbPsP4SQOTsRuUXnsKESZpSHY72 nT/uPM23ULv4kQ6tHB8yQ3ordjCBRgb4zIKtotc3Wpi7dhO8u6ptPj4soiflRShO 8/3N5dYfqdt9FRyr7Z8/ =o5G0 -----END PGP SIGNATURE----- Merge tag 'nfsd-5.7' of git://git.linux-nfs.org/projects/cel/cel-2.6 Pull nfsd updates from Chuck Lever: - Fix EXCHANGE_ID response when NFSD runs in a container - A battery of new static trace points - Socket transports now use bio_vec to send Replies - NFS/RDMA now supports filesystems with no .splice_read method - Favor memcpy() over DMA mapping for small RPC/RDMA Replies - Add pre-requisites for supporting multiple Write chunks - Numerous minor fixes and clean-ups [ Chuck is filling in for Bruce this time while he and his family settle into a new house ] * tag 'nfsd-5.7' of git://git.linux-nfs.org/projects/cel/cel-2.6: (39 commits) svcrdma: Fix leak of transport addresses SUNRPC: Fix a potential buffer overflow in 'svc_print_xprts()' SUNRPC/cache: don't allow invalid entries to be flushed nfsd: fsnotify on rmdir under nfsd/clients/ nfsd4: kill warnings on testing stateids with mismatched clientids nfsd: remove read permission bit for ctl sysctl NFSD: Fix NFS server build errors sunrpc: Add tracing for cache events SUNRPC/cache: Allow garbage collection of invalid cache entries nfsd: export upcalls must not return ESTALE when mountd is down nfsd: Add tracepoints for update of the expkey and export cache entries nfsd: Add tracepoints for exp_find_key() and exp_get_by_name() nfsd: Add tracing to nfsd_set_fh_dentry() nfsd: Don't add locks to closed or closing open stateids SUNRPC: Teach server to use xprt_sock_sendmsg for socket sends SUNRPC: Refactor xs_sendpages() svcrdma: Avoid DMA mapping small RPC Replies svcrdma: Fix double sync of transport header buffer svcrdma: Refactor chunk list encoders SUNRPC: Add encoders for list item discriminators ...	2020-04-04 11:13:51 -07:00
Trond Myklebust	44ea8dfce0	NFS/pnfs: Reference the layout cred in pnfs_prepare_layoutreturn() When we're sending a layoutreturn, ensure that we reference the layout cred atomically with the copy of the stateid. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-04-03 18:29:10 -04:00
Trond Myklebust	97a728f5e2	NFS/pnfs: Fix dereference of layout cred in pnfs_layoutcommit_inode() Ensure that the dereference of the layout cred is atomic with the stateid. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-04-03 18:29:10 -04:00
Trond Myklebust	fc51b1cf39	NFS: Beware when dereferencing the delegation cred When we look up the delegation cred, we are usually doing so in conjunction with a read of the stateid, and we want to ensure that the look up is atomic with that read. Fixes: `57f188e047` ("NFSv4: nfs_update_inplace_delegation() should update delegation cred") [sfr@canb.auug.org.au: Fixed up borken Fixes: line from Trond :-)] Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-04-03 18:26:02 -04:00
Bijan Mottahedeh	581f981034	io_uring: process requests completed with -EAGAIN on poll list A request that completes with an -EAGAIN result after it has been added to the poll list, will not be removed from that list in io_do_iopoll() because the f_op->iopoll() will not succeed for that request. Maintain a retryable local list similar to the done list, and explicity reissue requests with an -EAGAIN result. Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-04-03 14:55:39 -06:00
Linus Torvalds	ff2ae607c6	SPDX patches for 5.7-rc1. Here are 3 SPDX patches for 5.7-rc1. One fixes up the SPDX tag for a single driver, while the other two go through the tree and add SPDX tags for all of the .gitignore files as needed. Nothing too complex, but you will get a merge conflict with your current tree, that should be trivial to handle (one file modified by two things, one file deleted.) All 3 of these have been in linux-next for a while, with no reported issues other than the merge conflict. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXodg5A8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ykySQCgy9YDrkz7nWq6v3Gohl6+lW/L+rMAnRM4uTZm m5AuCzO3Azt9KBi7NL+L =2Lm5 -----END PGP SIGNATURE----- Merge tag 'spdx-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx Pull SPDX updates from Greg KH: "Here are three SPDX patches for 5.7-rc1. One fixes up the SPDX tag for a single driver, while the other two go through the tree and add SPDX tags for all of the .gitignore files as needed. Nothing too complex, but you will get a merge conflict with your current tree, that should be trivial to handle (one file modified by two things, one file deleted.) All three of these have been in linux-next for a while, with no reported issues other than the merge conflict" * tag 'spdx-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx: ASoC: MT6660: make spdxcheck.py happy .gitignore: add SPDX License Identifier .gitignore: remove too obvious comments	2020-04-03 13:12:26 -07:00
Jens Axboe	c336e992cb	io_uring: remove bogus RLIMIT_NOFILE check in file registration We already checked this limit when the file was opened, and we keep it open in the file table. Hence when we added unit_inflight to the count we want to register, we're doubly accounting these files. This results in -EMFILE for file registration, if we're at half the limit. Cc: stable@vger.kernel.org # v5.1+ Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-04-03 13:56:44 -06:00
Linus Torvalds	d883600523	Merge branch 'for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: - Christian extended clone3 so that processes can be spawned into cgroups directly. This is not only neat in terms of semantics but also avoids grabbing the global cgroup_threadgroup_rwsem for migration. - Daniel added !root xattr support to cgroupfs. Userland already uses xattrs on cgroupfs for bookkeeping. This will allow delegated cgroups to support such usages. - Prateek tried to make cpuset hotplug handling synchronous but that led to possible deadlock scenarios. Reverted. - Other minor changes including release_agent_path handling cleanup. * 'for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: docs: cgroup-v1: Document the cpuset_v2_mode mount option Revert "cpuset: Make cpuset hotplug synchronous" cgroupfs: Support user xattrs kernfs: Add option to enable user xattrs kernfs: Add removed_size out param for simple_xattr_set kernfs: kvmalloc xattr value instead of kmalloc cgroup: Restructure release_agent_path handling selftests/cgroup: add tests for cloning into cgroups clone3: allow spawning processes into cgroups cgroup: add cgroup_may_write() helper cgroup: refactor fork helpers cgroup: add cgroup_get_from_file() helper cgroup: unify attach permission checking cpuset: Make cpuset hotplug synchronous cgroup.c: Use built-in RCU list checking kselftest/cgroup: add cgroup destruction test cgroup: Clean up css_set task traversal	2020-04-03 11:30:20 -07:00
Jens Axboe	aa96bf8a9e	io_uring: use io-wq manager as backup task if task is exiting If the original task is (or has) exited, then the task work will not get queued properly. Allow for using the io-wq manager task to queue this work for execution, and ensure that the io-wq manager notices and runs this work if woken up (or exiting). Reported-by: Dan Melnic <dmm@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-04-03 11:35:57 -06:00
Jens Axboe	3537b6a7c6	io_uring: grab task reference for poll requests We can have a task exit if it's not the owner of the ring. Be safe and grab an actual reference to it, to avoid a potential use-after-free. Reported-by: Dan Melnic <dmm@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-04-03 11:35:57 -06:00
Jens Axboe	a6ba632d2c	io_uring: retry poll if we got woken with non-matching mask If we get woken and the poll doesn't match our mask, re-add the task to the poll waitqueue and try again instead of completing the request with a mask of 0. Reported-by: Dan Melnic <dmm@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-04-03 11:35:48 -06:00
Chao Yu	531dfae52e	f2fs: keep inline_data when compression conversion We can keep compressed inode's data inline before inline conversion. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-04-03 10:21:32 -07:00
Chao Yu	aa576970fb	f2fs: fix to disable compression on directory It needs to call f2fs_disable_compressed_file() to disable compression on directory. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-04-03 10:21:32 -07:00
Chao Yu	9b6ed143c1	f2fs: add missing CONFIG_F2FS_FS_COMPRESSION Compression sysfs node should not be shown if f2fs module disables compression feature. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-04-03 10:21:31 -07:00
Chao Yu	6ce48b0c6e	f2fs: switch discard_policy.timeout to bool type While checking discard timeout, we use specified type UMOUNT_DISCARD_TIMEOUT, so just replace doplicy.timeout with it, and switch doplicy.timeout to bool type. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-04-03 10:21:31 -07:00
Chao Yu	8908e75310	f2fs: fix to verify tpage before releasing in f2fs_free_dic() In below error path, tpages[i] could be NULL, fix to check it before releasing it. - f2fs_read_multi_pages - f2fs_alloc_dic - f2fs_free_dic Fixes: `61fbae2b2b` ("f2fs: fix to avoid NULL pointer dereference") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-04-03 10:21:31 -07:00
Chao Yu	fd26725f6e	f2fs: show compression in statx fstest reports below message when compression is on: generic/424 1s ... - output mismatch --- tests/generic/424.out +++ results/generic/424.out.bad @@ -1,2 +1,26 @@ QA output created by 424 +[!] Attribute compressed should be set +Failed +stat_test failed +[!] Attribute compressed should be set +Failed +stat_test failed We missed to set STATX_ATTR_COMPRESSED on compressed inode in getattr(), fix it. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-04-03 10:21:31 -07:00
Chao Yu	80d0d45ab5	f2fs: clean up dic->tpages assignment Just cleanup, no logic change. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-04-03 10:21:31 -07:00
Chao Yu	50cfa66f0d	f2fs: compress: support zstd compress algorithm Add zstd compress algorithm support, use "compress_algorithm=zstd" mountoption to enable it. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-04-03 10:21:10 -07:00
Vivek Goyal	4f3b4f161d	dax,iomap: Add helper dax_iomap_zero() to zero a range Add a helper dax_ioamp_zero() to zero a range. This patch basically merges __dax_zero_page_range() and iomap_dax_zero(). Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20200228163456.1587-7-vgoyal@redhat.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2020-04-02 19:15:03 -07:00
Vivek Goyal	0a23f9ffa5	dax: Use new dax zero page method for zeroing a page Use new dax native zero page method for zeroing page if I/O is page aligned. Otherwise fall back to direct_access() + memcpy(). This gets rid of one of the depenendency on block device in dax path. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Link: https://lore.kernel.org/r/20200228163456.1587-6-vgoyal@redhat.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2020-04-02 19:15:03 -07:00
Trond Myklebust	f30a6ea0f3	NFS: Add a module parameter to set nfs_mountpoint_expiry_timeout Setting nfs_mountpoint_expiry_timeout() to a negative value stops mountpoint expiration, while setting it to a positive value restarts the scheduler. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-04-02 18:53:59 -04:00
Trond Myklebust	75da98586a	NFS: finish_automount() requires us to hold 2 refs to the mount record We must not return from nfs_d_automount() without holding 2 references to the mount record. Doing so, will trigger the BUG() in finish_automount(). Also ensure that we don't try to reschedule the automount timer with a negative or zero timeout value. Fixes: `22a1ae9a93` ("NFS: If nfs_mountpoint_expiry_timeout < 0, do not expire submounts") Cc: stable@vger.kernel.org # v5.5+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-04-02 18:51:12 -04:00
Scott Mayhew	529af90576	NFS: Fix a few constant_table array definitions nfs_vers_tokens, nfs_xprt_protocol_tokens, and nfs_secflavor_tokens were all missing an empty item at the end of the array, allowing lookup_constant() to potentially walk off the end and trigger and oops. Reported-by: Olga Kornievskaia <aglo@umich.edu> Signed-off-by: Scott Mayhew <smayhew@redhat.com> Fixes: `e38bb238ed` ("NFS: Convert mount option parsing to use functionality from fs_parser.h") Cc: stable@vger.kernel.org # v5.6 Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-04-02 18:37:13 -04:00
Linus Torvalds	6cad420cc6	Merge branch 'akpm' (patches from Andrew) Merge updates from Andrew Morton: "A large amount of MM, plenty more to come. Subsystems affected by this patch series: - tools - kthread - kbuild - scripts - ocfs2 - vfs - mm: slub, kmemleak, pagecache, gup, swap, memcg, pagemap, mremap, sparsemem, kasan, pagealloc, vmscan, compaction, mempolicy, hugetlbfs, hugetlb" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (155 commits) include/linux/huge_mm.h: check PageTail in hpage_nr_pages even when !THP mm/hugetlb: fix build failure with HUGETLB_PAGE but not HUGEBTLBFS selftests/vm: fix map_hugetlb length used for testing read and write mm/hugetlb: remove unnecessary memory fetch in PageHeadHuge() mm/hugetlb.c: clean code by removing unnecessary initialization hugetlb_cgroup: add hugetlb_cgroup reservation docs hugetlb_cgroup: add hugetlb_cgroup reservation tests hugetlb: support file_region coalescing again hugetlb_cgroup: support noreserve mappings hugetlb_cgroup: add accounting for shared mappings hugetlb: disable region_add file_region coalescing hugetlb_cgroup: add reservation accounting for private mappings mm/hugetlb_cgroup: fix hugetlb_cgroup migration hugetlb_cgroup: add interface for charge/uncharge hugetlb reservations hugetlb_cgroup: add hugetlb_cgroup reservation counter hugetlbfs: Use i_mmap_rwsem to address page fault/truncate race hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization mm/memblock.c: remove redundant assignment to variable max_addr mm: mempolicy: require at least one nodeid for MPOL_PREFERRED mm: mempolicy: use VM_BUG_ON_VMA in queue_pages_test_walk() ...	2020-04-02 13:55:34 -07:00
Linus Torvalds	7be97138e7	New code for 5.7: - Fix a hard to trigger race between iclog error checking and log shutdown. - Strengthen the AGF verifier. - Ratelimit some of the more spammy error messages. - Remove the icdinode uid/gid members and just use the ones in the vfs inode. - Hold ILOCK across insert/collapse range. - Clean up the extended attribute interfaces. - Clean up the attr flags mess. - Restore PF_MEMALLOC after exiting xfsaild thread to avoid triggering warnings in the process accounting code. - Remove the flexibly-sized array from struct xfs_agfl to eliminate compiler warnings about unaligned pointers and packed structures. - Various macro and typedef removals. - Stale metadata buffers if we decide they're corrupt outside of a verifier. - Check directory data/block/free block owners. - Fix a UAF when aborting inactivation of a corrupt xattr fork. - Teach online scrub to report failed directory and attr name lookups as a metadata corruption instead of a runtime error. - Avoid potential buffer overflows in sysfs files by using scnprintf. - Fix a regression in getdents lookups due to a mistake in pointer arithmetic. - Refactor btree cursor private data structures to use anonymous unions. - Cleanups in the log unmounting code. - Fix a potential mishandling of ENOMEM errors on multi-block directory buffer lookups. - Fix an incorrect test in the block allocation code. - Cleanups and name prefix shortening in the scrub code. - Introduce btree bulk loading code for online repair and scrub. - Fix a quotaoff log item leak (and hang) when the fs goes down midway through a quotaoff operation. - Remove di_version from the incore inode. - Refactor some of the log shutdown checking code. - Record the forcing of the log unmount records in the log force counters. - Fix a longstanding bug where quotacheck would purge the administrator's default quota grace interval and warning limits. - Reduce memory usage when scrubbing directory and xattr trees. - Don't let fsfreeze race with GETFSMAP or online scrub. - Handle bio_add_page failures more gracefully in xlog_write_iclog. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl58z3AACgkQ+H93GTRK tOukrRAAhJmowV5+Req5YMYawRjafkIbCDH3WlFy9AdpFFA6pXSfX6YCtIKwKfq8 +yRj/BFRGoMc6SouXo+J0i3YMS6yQZTjcmVWrQPVnj/+DGVjh+Y70gKExtz2CyjO ItGGxpRwOhpw49zVYmcH6Mrw8sBztHR0VsM0cq6YfJrkNcm0BsnAC+W6zQNaDG24 UO1ivehBOooVh0C8pv0smVcPtBL2N+RRyS3XRT5hGFozUJgLLGDqnHAl1d+KOrWp hPQhUlDw9luiHPBxWkxUuFDr79gjUi7kyHILNt7TIkByyRcTUO9jhS2VpZd4oXlj /J3i1AS+9lhP1yGVxw2RHQhKMvdYBQiLADSCpzkA1dMma99cFGyzMMA6rG0WRMJ4 erXxhAEoM4um3gxDka6+HJxySLOT8E22FesJbn6YIv4QSAkXDBPWz/9hPbjJuJQm 6Y/YkFOZLp3c+xJM0tpCWxWaWW7A+t2OMRIFISSsXesrySpalpbkVXkHwz3NwO6L 3SeTnLWqnADbjl2qsuyF0uYHqURygVz7g+r4X7AO5D1IRyCCkmtDOuwumxERiQ3p 3vZMQrWh+y3SgRiF8brDG5KTshhxcinKdHEYXrwq3XgaHZg4mtLI4XjOyZlJruoX MGWhZjga6+RGysH0RKjZbHaMr/f4m3X00SHa/5Ibcp6Q21TIx6M= =8iJB -----END PGP SIGNATURE----- Merge tag 'xfs-5.7-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs updates from Darrick Wong: "There's a lot going on this cycle with cleanups in the log code, the btree code, and the xattr code. We're tightening of metadata validation and online fsck checking, and introducing a common btree rebuilding library so that we can refactor xfs_repair and introduce online repair in a future cycle. We also fixed a few visible bugs -- most notably there's one in getdents that we introduced in 5.6; and a fix for hangs when disabling quotas. This series has been running fstests & other QA in the background for over a week and looks good so far. I anticipate sending a second pull request next week. That batch will change how xfs interacts with memory reclaim; how the log batches and throttles log items; how hard writes near ENOSPC will try to squeeze more space out of the filesystem; and hopefully fix the last of the umount hangs after a catastrophic failure. That should ease a lot of problems when running at the limits, but for now I'm leaving that in for-next for another week to make sure we got all the subtleties right. Summary: - Fix a hard to trigger race between iclog error checking and log shutdown. - Strengthen the AGF verifier. - Ratelimit some of the more spammy error messages. - Remove the icdinode uid/gid members and just use the ones in the vfs inode. - Hold ILOCK across insert/collapse range. - Clean up the extended attribute interfaces. - Clean up the attr flags mess. - Restore PF_MEMALLOC after exiting xfsaild thread to avoid triggering warnings in the process accounting code. - Remove the flexibly-sized array from struct xfs_agfl to eliminate compiler warnings about unaligned pointers and packed structures. - Various macro and typedef removals. - Stale metadata buffers if we decide they're corrupt outside of a verifier. - Check directory data/block/free block owners. - Fix a UAF when aborting inactivation of a corrupt xattr fork. - Teach online scrub to report failed directory and attr name lookups as a metadata corruption instead of a runtime error. - Avoid potential buffer overflows in sysfs files by using scnprintf. - Fix a regression in getdents lookups due to a mistake in pointer arithmetic. - Refactor btree cursor private data structures to use anonymous unions. - Cleanups in the log unmounting code. - Fix a potential mishandling of ENOMEM errors on multi-block directory buffer lookups. - Fix an incorrect test in the block allocation code. - Cleanups and name prefix shortening in the scrub code. - Introduce btree bulk loading code for online repair and scrub. - Fix a quotaoff log item leak (and hang) when the fs goes down midway through a quotaoff operation. - Remove di_version from the incore inode. - Refactor some of the log shutdown checking code. - Record the forcing of the log unmount records in the log force counters. - Fix a longstanding bug where quotacheck would purge the administrator's default quota grace interval and warning limits. - Reduce memory usage when scrubbing directory and xattr trees. - Don't let fsfreeze race with GETFSMAP or online scrub. - Handle bio_add_page failures more gracefully in xlog_write_iclog" * tag 'xfs-5.7-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (108 commits) xfs: prohibit fs freezing when using empty transactions xfs: shutdown on failure to add page to log bio xfs: directory bestfree check should release buffers xfs: drop all altpath buffers at the end of the sibling check xfs: preserve default grace interval during quotacheck xfs: remove xlog_state_want_sync xfs: move the ioerror check out of xlog_state_clean_iclog xfs: refactor xlog_state_clean_iclog xfs: remove the aborted parameter to xlog_state_done_syncing xfs: simplify log shutdown checking in xfs_log_release_iclog xfs: simplify the xfs_log_release_iclog calling convention xfs: factor out a xlog_wait_on_iclog helper xfs: merge xlog_cil_push into xlog_cil_push_work xfs: remove the di_version field from struct icdinode xfs: simplify a check in xfs_ioctl_setattr_check_cowextsize xfs: simplify di_flags2 inheritance in xfs_ialloc xfs: only check the superblock version for dinode size calculation xfs: add a new xfs_sb_version_has_v3inode helper xfs: fix unmount hang and memory leak on shutdown during quotaoff xfs: factor out quotaoff intent AIL removal and memory free ...	2020-04-02 13:02:07 -07:00
Linus Torvalds	7db83c070b	New code for 5.7: - Fix a regression where we broke the userspace hibernation driver by disallowing writes to the swap device. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl542q0ACgkQ+H93GTRK tOun8A//QIAvIuMQl9k/S4lDqvVNAmSMJDdp0v3x+BOMBDmbqJeDO+D9u59nVWAP zun1Zp3weO7v8kMBPDyvTVhKP0Z9v8ogQj4yT22W0YiBsKgsaqM9tupc3NPm036V oPusLFC44RRXyLZSjBhNr3xYTBqeGmJMKBUGrnwYeQK2g87o3gi8s9KmVq3olp/L W/ZvFgmTl4FpbA1aNaMtZ1YBawu9wyQDvmtZtnD7xuXGKGsQjGUt20P7yuFu2Mb8 vmUHNcCBG29j8Fwd+6Gub2Jg25BhLGBSjftLHcGdG8aRN4Y5DQ3w+rBwUD7fHQmi u0DXMnPIP8twsQPKwWabfZ3PMqyfUiz5rSnJGGd+T7uPP5xYvpKhYGm8IBPQb699 2LY4NZKQqp9IWSbwmU7jSwCEl0x/GDMflF3frpfTmvCDvpW7TUQf8lJsVsF6OcNP uGJPz3AoE5ebt2XD+IWCurWrfn/nGnxp9ZEKjK69nm3BFXI0GRdqBq6lueBsh6re zKUoFp7IHIb4ET6V21JPq9iyKUlKLqgb+rpqcA4CwA4tvJZkZXVlYOdwi1CWse3o 8o9xaDmW1murvc0XrnQu8Z8way1nkUYBBkhRJsHCdy8Qn2xA3fyVumZGichNNccO Mzu8+IKttCTBK8ZY6iAXsjbL61eR1vrr3GGbz4kh6dZy5fBX99c= =y7GN -----END PGP SIGNATURE----- Merge tag 'vfs-5.7-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull hibernation fix from Darrick Wong: "Fix a regression where we broke the userspace hibernation driver by disallowing writes to the swap device" * tag 'vfs-5.7-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: hibernate: Allow uswsusp to write to swap	2020-04-02 12:59:36 -07:00
Linus Torvalds	35a9fafe23	New iomap code for 5.7: - Fix a broken tracepoint - Fix a broken comment -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl5yOLYACgkQ+H93GTRK tOueKA/+O83EV/a8oI5/EEM95uoe5JRC4VUaCj3iDyMVEefQC8Phl3pUraQoG/z3 d511bTvca70rBSYj9HaSoa3Wt3MTJTPy9RIxBIKaaumzTXMe4erfXZ/JRJIf5DSl JZs+Ujng5ythLR6ederr/Kn7LXg5MGax5I4j41JHCDiRp5xFaqRN+4Rib18jj4Lf WNlbwblixWzCeEeHhs+CNm9G7cy+wt84Qv0CTJjC/nThtDc2tOVFkI3EODX0oZH5 R+KEbEStHebI9AZ0NIfDagoHQ926ROjAEpJn4VS/tOYbN1favrSE5RTnaNZ0asln c6XnGIPvpHcjsFztNFEuPXS9rP6aGNLic+P+V5TADjzY3M0TCXnIsYXFhEV23HNe 2hprg6T7+KvcAaTdq3t5k2jjYC9AEiaxBLIJRGyTwYKmJPobvTXU9QTyJpA3KX3u mCAo8jJzl3XHcMrCcXA3xBVS8e23cAkQN5gwpvMIfOeGRr006iFAJS5MZ23Rqyk8 k3w0V4vVWW3QnLYwJI+cGKanuhFrZ7PNulnqC4AmUdYpeyhmWoH2e1mYgU3JKNm9 vF4zad/MYhp0xfEUopsLEE+F0sj4v1yUKEvPSoXxGYSycf9/KFa3/MQvF3akaOeX kQu29pzDeErFsdLzm+qQmt/xFHuSA5WP6ShO+0XC9uzhN/ezRNg= =2wnO -----END PGP SIGNATURE----- Merge tag 'iomap-5.7-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull iomap updates from Darrick Wong: "We're fixing tracepoints and comments in this cycle, so there shouldn't be any surprises here. I anticipate sending a second pull request next week with a single bug fix for readahead, but it's still undergoing QA. Summary: - Fix a broken tracepoint - Fix a broken comment" * tag 'iomap-5.7-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: iomap: fix comments in iomap_dio_rw iomap: Remove pgoff from tracepoints	2020-04-02 12:57:18 -07:00
Linus Torvalds	9c577491b9	Merge branch 'work.dotdot1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pathwalk sanitizing from Al Viro: "Massive pathwalk rewrite and cleanups. Several iterations have been posted; hopefully this thing is getting readable and understandable now. Pretty much all parts of pathname resolutions are affected... The branch is identical to what has sat in -next, except for commit message in "lift all calls of step_into() out of follow_dotdot/ follow_dotdot_rcu", crediting Qian Cai for reporting the bug; only commit message changed there." * 'work.dotdot1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (69 commits) lookup_open(): don't bother with fallbacks to lookup+create atomic_open(): no need to pass struct open_flags anymore open_last_lookups(): move complete_walk() into do_open() open_last_lookups(): lift O_EXCL\|O_CREAT handling into do_open() open_last_lookups(): don't abuse complete_walk() when all we want is unlazy open_last_lookups(): consolidate fsnotify_create() calls take post-lookup part of do_last() out of loop link_path_walk(): sample parent's i_uid and i_mode for the last component __nd_alloc_stack(): make it return bool reserve_stack(): switch to __nd_alloc_stack() pick_link(): take reserving space on stack into a new helper pick_link(): more straightforward handling of allocation failures fold path_to_nameidata() into its only remaining caller pick_link(): pass it struct path already with normal refcounting rules fs/namei.c: kill follow_mount() non-RCU analogue of the previous commit helper for mount rootwards traversal follow_dotdot(): be lazy about changing nd->path follow_dotdot_rcu(): be lazy about changing nd->path follow_dotdot{,_rcu}(): massage loops ...	2020-04-02 12:30:08 -07:00
Linus Torvalds	d987ca1c6b	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull exec/proc updates from Eric Biederman: "This contains two significant pieces of work: the work to sort out proc_flush_task, and the work to solve a deadlock between strace and exec. Fixing proc_flush_task so that it no longer requires a persistent mount makes improvements to proc possible. The removal of the persistent mount solves an old regression that that caused the hidepid mount option to only work on remount not on mount. The regression was found and reported by the Android folks. This further allows Alexey Gladkov's work making proc mount options specific to an individual mount of proc to move forward. The work on exec starts solving a long standing issue with exec that it takes mutexes of blocking userspace applications, which makes exec extremely deadlock prone. For the moment this adds a second mutex with a narrower scope that handles all of the easy cases. Which makes the tricky cases easy to spot. With a little luck the code to solve those deadlocks will be ready by next merge window" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (25 commits) signal: Extend exec_id to 64bits pidfd: Use new infrastructure to fix deadlocks in execve perf: Use new infrastructure to fix deadlocks in execve proc: io_accounting: Use new infrastructure to fix deadlocks in execve proc: Use new infrastructure to fix deadlocks in execve kernel/kcmp.c: Use new infrastructure to fix deadlocks in execve kernel: doc: remove outdated comment cred.c mm: docs: Fix a comment in process_vm_rw_core selftests/ptrace: add test cases for dead-locks exec: Fix a deadlock in strace exec: Add exec_update_mutex to replace cred_guard_mutex exec: Move exec_mmap right after de_thread in flush_old_exec exec: Move cleanup of posix timers on exec out of de_thread exec: Factor unshare_sighand out of de_thread and call it separately exec: Only compute current once in flush_old_exec pid: Improve the comment about waiting in zap_pid_ns_processes proc: Remove the now unnecessary internal mount of proc uml: Create a private mount of proc for mconsole uml: Don't consult current to find the proc_mnt in mconsole_proc proc: Use a list of inodes to flush from proc ...	2020-04-02 11:22:17 -07:00
Mike Kravetz	87bf91d39b	hugetlbfs: Use i_mmap_rwsem to address page fault/truncate race hugetlbfs page faults can race with truncate and hole punch operations. Current code in the page fault path attempts to handle this by 'backing out' operations if we encounter the race. One obvious omission in the current code is removing a page newly added to the page cache. This is pretty straight forward to address, but there is a more subtle and difficult issue of backing out hugetlb reservations. To handle this correctly, the 'reservation state' before page allocation needs to be noted so that it can be properly backed out. There are four distinct possibilities for reservation state: shared/reserved, shared/no-resv, private/reserved and private/no-resv. Backing out a reservation may require memory allocation which could fail so that needs to be taken into account as well. Instead of writing the required complicated code for this rare occurrence, just eliminate the race. i_mmap_rwsem is now held in read mode for the duration of page fault processing. Hold i_mmap_rwsem in write mode when modifying i_size. In this way, truncation can not proceed when page faults are being processed. In addition, i_size will not change during fault processing so a single check can be made to ensure faults are not beyond (proposed) end of file. Faults can still race with hole punch, but that race is handled by existing code and the use of hugetlb_fault_mutex. With this modification, checks for races with truncation in the page fault path can be simplified and removed. remove_inode_hugepages no longer needs to take hugetlb_fault_mutex in the case of truncation. Comments are expanded to explain reasoning behind locking. Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Hugh Dickins <hughd@google.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Prakash Sangappa <prakash.sangappa@oracle.com> Link: http://lkml.kernel.org/r/20200316205756.146666-3-mike.kravetz@oracle.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-04-02 09:35:32 -07:00
Mike Kravetz	c0d0381ade	hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization Patch series "hugetlbfs: use i_mmap_rwsem for more synchronization", v2. While discussing the issue with huge_pte_offset [1], I remembered that there were more outstanding hugetlb races. These issues are: 1) For shared pmds, huge PTE pointers returned by huge_pte_alloc can become invalid via a call to huge_pmd_unshare by another thread. 2) hugetlbfs page faults can race with truncation causing invalid global reserve counts and state. A previous attempt was made to use i_mmap_rwsem in this manner as described at [2]. However, those patches were reverted starting with [3] due to locking issues. To effectively use i_mmap_rwsem to address the above issues it needs to be held (in read mode) during page fault processing. However, during fault processing we need to lock the page we will be adding. Lock ordering requires we take page lock before i_mmap_rwsem. Waiting until after taking the page lock is too late in the fault process for the synchronization we want to do. To address this lock ordering issue, the following patches change the lock ordering for hugetlb pages. This is not too invasive as hugetlbfs processing is done separate from core mm in many places. However, I don't really like this idea. Much ugliness is contained in the new routine hugetlb_page_mapping_lock_write() of patch 1. The only other way I can think of to address these issues is by catching all the races. After catching a race, cleanup, backout, retry ... etc, as needed. This can get really ugly, especially for huge page reservations. At one time, I started writing some of the reservation backout code for page faults and it got so ugly and complicated I went down the path of adding synchronization to avoid the races. Any other suggestions would be welcome. [1] https://lore.kernel.org/linux-mm/1582342427-230392-1-git-send-email-longpeng2@huawei.com/ [2] https://lore.kernel.org/linux-mm/20181222223013.22193-1-mike.kravetz@oracle.com/ [3] https://lore.kernel.org/linux-mm/20190103235452.29335-1-mike.kravetz@oracle.com [4] https://lore.kernel.org/linux-mm/1584028670.7365.182.camel@lca.pw/ [5] https://lore.kernel.org/lkml/20200312183142.108df9ac@canb.auug.org.au/ This patch (of 2): While looking at BUGs associated with invalid huge page map counts, it was discovered and observed that a huge pte pointer could become 'invalid' and point to another task's page table. Consider the following: A task takes a page fault on a shared hugetlbfs file and calls huge_pte_alloc to get a ptep. Suppose the returned ptep points to a shared pmd. Now, another task truncates the hugetlbfs file. As part of truncation, it unmaps everyone who has the file mapped. If the range being truncated is covered by a shared pmd, huge_pmd_unshare will be called. For all but the last user of the shared pmd, huge_pmd_unshare will clear the pud pointing to the pmd. If the task in the middle of the page fault is not the last user, the ptep returned by huge_pte_alloc now points to another task's page table or worse. This leads to bad things such as incorrect page map/reference counts or invalid memory references. To fix, expand the use of i_mmap_rwsem as follows: - i_mmap_rwsem is held in read mode whenever huge_pmd_share is called. huge_pmd_share is only called via huge_pte_alloc, so callers of huge_pte_alloc take i_mmap_rwsem before calling. In addition, callers of huge_pte_alloc continue to hold the semaphore until finished with the ptep. - i_mmap_rwsem is held in write mode whenever huge_pmd_unshare is called. One problem with this scheme is that it requires taking i_mmap_rwsem before taking the page lock during page faults. This is not the order specified in the rest of mm code. Handling of hugetlbfs pages is mostly isolated today. Therefore, we use this alternative locking order for PageHuge() pages. mapping->i_mmap_rwsem hugetlb_fault_mutex (hugetlbfs specific page fault mutex) page->flags PG_locked (lock_page) To help with lock ordering issues, hugetlb_page_mapping_lock_write() is introduced to write lock the i_mmap_rwsem associated with a page. In most cases it is easy to get address_space via vma->vm_file->f_mapping. However, in the case of migration or memory errors for anon pages we do not have an associated vma. A new routine _get_hugetlb_page_mapping() will use anon_vma to get address_space in these cases. Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Prakash Sangappa <prakash.sangappa@oracle.com> Link: http://lkml.kernel.org/r/20200316205756.146666-2-mike.kravetz@oracle.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-04-02 09:35:32 -07:00
Peter Xu	3e69ad081c	mm/userfaultfd: honor FAULT_FLAG_KILLABLE in fault path Userfaultfd fault path was by default killable even if the caller does not have FAULT_FLAG_KILLABLE. That makes sense before in that when with gup we don't have FAULT_FLAG_KILLABLE properly set before. Now after previous patch we've got FAULT_FLAG_KILLABLE applied even for gup code so it should also make sense to let userfaultfd to honor the FAULT_FLAG_KILLABLE. Because we're unconditionally setting FAULT_FLAG_KILLABLE in gup code right now, this patch should have no functional change. It also cleaned the code a little bit by introducing some helpers. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Brian Geffon <bgeffon@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220160300.9941-1-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-04-02 09:35:30 -07:00
Peter Xu	c270a7eedc	mm: introduce FAULT_FLAG_INTERRUPTIBLE handle_userfaultfd() is currently the only one place in the kernel page fault procedures that can respond to non-fatal userspace signals. It was trying to detect such an allowance by checking against USER & KILLABLE flags, which was "un-official". In this patch, we introduced a new flag (FAULT_FLAG_INTERRUPTIBLE) to show that the fault handler allows the fault procedure to respond even to non-fatal signals. Meanwhile, add this new flag to the default fault flags so that all the page fault handlers can benefit from the new flag. With that, replacing the userfault check to this one. Since the line is getting even longer, clean up the fault flags a bit too to ease TTY users. Although we've got a new flag and applied it, we shouldn't have any functional change with this patch so far. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Brian Geffon <bgeffon@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220195348.16302-1-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-04-02 09:35:30 -07:00
Peter Xu	ef429ee740	userfaultfd: don't retake mmap_sem to emulate NOPAGE This patch removes the risk path in handle_userfault() then we will be sure that the callers of handle_mm_fault() will know that the VMAs might have changed. Meanwhile with previous patch we don't lose responsiveness as well since the core mm code now can handle the nonfatal userspace signals even if we return VM_FAULT_RETRY. Suggested-by: Andrea Arcangeli <aarcange@redhat.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Brian Geffon <bgeffon@google.com> Reviewed-by: Jerome Glisse <jglisse@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220160234.9646-1-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-04-02 09:35:29 -07:00
Roman Gushchin	f4b00eab50	mm: kmem: rename memcg_kmem_(un)charge() into memcg_kmem_(un)charge_page() Rename (__)memcg_kmem_(un)charge() into (__)memcg_kmem_(un)charge_page() to better reflect what they are actually doing: 1) call __memcg_kmem_(un)charge_memcg() to actually charge or uncharge the current memcg 2) set or clear the PageKmemcg flag Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Link: http://lkml.kernel.org/r/20200109202659.752357-4-guro@fb.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-04-02 09:35:28 -07:00
Kees Cook	c537338c05	fs_parse: remove pr_notice() about each validation This notice fills my boot logs with scary-looking asterisks but doesn't really tell me anything. Let's just remove it; validation errors are already reported separately, so this is just a redundant list of filesystems. $ dmesg \| grep VALIDATE [ 0.306256] * VALIDATE tmpfs * [ 0.307422] * VALIDATE proc * [ 0.308355] * VALIDATE cgroup * [ 0.308741] * VALIDATE cgroup2 * [ 0.813256] * VALIDATE bpf * [ 0.815272] * VALIDATE ramfs * [ 0.815665] * VALIDATE hugetlbfs * [ 0.876970] * VALIDATE nfs * [ 0.877383] * VALIDATE nfs4 * Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Seth Arnold <seth.arnold@canonical.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Link: http://lkml.kernel.org/r/202003061617.A8835CAAF@keescook Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-04-02 09:35:26 -07:00
Matthew Wilcox (Oracle)	4ceb229f66	ocfs2: use memalloc_nofs_save instead of memalloc_noio_save OCFS2 doesn't mind if memory reclaim makes I/Os happen; it just cares that it won't be reentered, so it can use memalloc_nofs_save() instead of memalloc_noio_save(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Link: http://lkml.kernel.org/r/20200326200214.1102-1-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-04-02 09:35:26 -07:00
Takashi Iwai	d293d3af2d	ocfs2: use scnprintf() for avoiding potential buffer overflow Since snprintf() returns the would-be-output size instead of the actual output size, the succeeding calls may go beyond the given buffer limit. Fix it by replacing with scnprintf(). Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Link: http://lkml.kernel.org/r/20200311093516.25300-1-tiwai@suse.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-04-02 09:35:26 -07:00
wangjian	0434c9f404	ocfs2: roll back the reference count modification of the parent directory if an error occurs Under some conditions, the directory cannot be deleted. The specific scenarios are as follows: (for example, /mnt/ocfs2 is the mount point) 1. Create the /mnt/ocfs2/p_dir directory. At this time, the i_nlink corresponding to the inode of the /mnt/ocfs2/p_dir directory is equal to 2. 2. During the process of creating the /mnt/ocfs2/p_dir/s_dir directory, if the call to the inc_nlink function in ocfs2_mknod succeeds, the functions such as ocfs2_init_acl, ocfs2_init_security_set, and ocfs2_dentry_attach_lock fail. At this time, the i_nlink corresponding to the inode of the /mnt/ocfs2/p_dir directory is equal to 3, but /mnt/ocfs2/p_dir/s_dir is not added to the /mnt/ocfs2/p_dir directory entry. 3. Delete the /mnt/ocfs2/p_dir directory (rm -rf /mnt/ocfs2/p_dir). At this time, it is found that the i_nlink corresponding to the inode corresponding to the /mnt/ocfs2/p_dir directory is equal to 3. Therefore, the /mnt/ocfs2/p_dir directory cannot be deleted. Signed-off-by: Jian wang <wangjian161@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Jun Piao <piaojun@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Link: http://lkml.kernel.org/r/a44f6666-bbc4-405e-0e6c-0f4e922eeef6@huawei.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-04-02 09:35:26 -07:00
Gustavo A. R. Silva	95f3427c24	ocfs2: ocfs2_fs.h: replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://urldefense.com/v3/__https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html__;!!GqivPVa7Brio!OKPotRhYhHbCG2kibo8Q6_6CuKaa28d_74h1svxyR6rbshrK2L_BdrQpNbvJWBWb40QCkg$ [2] https://urldefense.com/v3/__https://github.com/KSPP/linux/issues/21__;!!GqivPVa7Brio!OKPotRhYhHbCG2kibo8Q6_6CuKaa28d_74h1svxyR6rbshrK2L_BdrQpNbvJWBUhNn9M6g$ [3] commit `7649773293` ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Link: http://lkml.kernel.org/r/20200309202155.GA8432@embeddedor Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-04-02 09:35:26 -07:00
Gustavo A. R. Silva	8cb92435e2	ocfs2: dlm: replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://urldefense.com/v3/__https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html__;!!GqivPVa7Brio!OVOYL_CouISa5L1Lw-20EEFQntw6cKMx-j8UdY4z78uYgzKBUFcfpn50GaurvbV5v7YiUA$ [2] https://urldefense.com/v3/__https://github.com/KSPP/linux/issues/21__;!!GqivPVa7Brio!OVOYL_CouISa5L1Lw-20EEFQntw6cKMx-j8UdY4z78uYgzKBUFcfpn50GaurvbXs8Eh8eg$ [3] commit `7649773293` ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Link: http://lkml.kernel.org/r/20200309202016.GA8210@embeddedor Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-04-02 09:35:25 -07:00
Gustavo A. R. Silva	fa803cf8f3	ocfs2: cluster: replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://urldefense.com/v3/__https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html__;!!GqivPVa7Brio!NzMr-YRl2zy-K3lwLVVatz7x0uD2z7-ykQag4GrGigxmfWU8TWzDy6xrkTiW3hYl00czlw$ [2] https://urldefense.com/v3/__https://github.com/KSPP/linux/issues/21__;!!GqivPVa7Brio!NzMr-YRl2zy-K3lwLVVatz7x0uD2z7-ykQag4GrGigxmfWU8TWzDy6xrkTiW3hYHG1nAnw$ [3] commit `7649773293` ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Link: http://lkml.kernel.org/r/20200309201907.GA8005@embeddedor Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-04-02 09:35:25 -07:00
Gustavo A. R. Silva	3c9210d45d	ocfs2: replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit `7649773293` ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Link: http://lkml.kernel.org/r/20200213160244.GA6088@embeddedor Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-04-02 09:35:25 -07:00
Jules Irenge	185a73216f	ocfs2: add missing annotations for ocfs2_refcount_cache_lock() and ocfs2_refcount_cache_unlock() Sparse reports warnings at ocfs2_refcount_cache_lock() and ocfs2_refcount_cache_unlock() warning: context imbalance in ocfs2_refcount_cache_lock() - wrong count at exit warning: context imbalance in ocfs2_refcount_cache_unlock() - unexpected unlock The root cause is the missing annotation at ocfs2_refcount_cache_lock() and at ocfs2_refcount_cache_unlock() Add the missing __acquires(&rf->rf_lock) annotation to ocfs2_refcount_cache_lock() Add the missing __releases(&rf->rf_lock) annotation to ocfs2_refcount_cache_unlock() Signed-off-by: Jules Irenge <jbi.octave@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Link: http://lkml.kernel.org/r/20200224204130.18178-1-jbi.octave@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-04-02 09:35:25 -07:00
Alex Shi	1a5692e477	ocfs2: remove useless err We don't need 'err' in these 2 places, better to remove them. Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: ChenGang <cg.chen@huawei.com> Cc: Richard Fontana <rfontana@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1579577836-251879-1-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-04-02 09:35:25 -07:00
wangyan	41f4dc8331	ocfs2: correct annotation from "l_next_rec" to "l_next_free_rec" Correct annotation from "l_next_rec" to "l_next_free_rec" Signed-off-by: Yan Wang <wangyan122@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Jun Piao <piaojun@huawei.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Link: http://lkml.kernel.org/r/5e76c953-3479-1280-023c-ad05e4c75608@huawei.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-04-02 09:35:25 -07:00
wangyan	cb5bc8557a	ocfs2: there is no need to log twice in several functions There is no need to log twice in several functions. Signed-off-by: Yan Wang <wangyan122@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Jun Piao <piaojun@huawei.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Link: http://lkml.kernel.org/r/77eec86a-f634-5b98-4f7d-0cd15185a37b@huawei.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-04-02 09:35:25 -07:00
Alex Shi	e0369873e6	ocfs2: remove dlm_lock_is_remote This macro has been unused since it was introduced. Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Link: http://lkml.kernel.org/r/1579578203-254451-1-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-04-02 09:35:25 -07:00
Alex Shi	31cc0c8029	ocfs2: use OCFS2_SEC_BITS in macro This macro should be used. Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Link: http://lkml.kernel.org/r/1579577840-251956-1-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-04-02 09:35:25 -07:00
Alex Shi	8e6ef3731e	ocfs2: remove unused macros O2HB_DEFAULT_BLOCK_BITS/DLM_THREAD_MAX_ASTS/DLM_MIGRATION_RETRY_MS and OCFS2_MAX_RESV_WINDOW_BITS/OCFS2_MIN_RESV_WINDOW_BITS have been unused since commit `66effd3c68` ("ocfs2/dlm: Do not migrate resource to a node that is leaving the domain"). Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: ChenGang <cg.chen@huawei.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Richard Fontana <rfontana@redhat.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Link: http://lkml.kernel.org/r/1579577827-251796-1-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-04-02 09:35:25 -07:00
Alex Shi	ee9dc325ac	ocfs2: remove FS_OCFS2_NM This macro is unused since commit `ab09203e30` ("sysctl fs: Remove dead binary sysctl support"). Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Link: http://lkml.kernel.org/r/1579577812-251572-1-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-04-02 09:35:25 -07:00
Matthew Wilcox (Oracle)	457df33e03	iomap: Handle memory allocation failure in readahead bio_alloc() can fail when we use GFP_NORETRY. If it does, allocate a bio large enough for a single page like mpage_readpages() does. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2020-04-02 09:08:53 -07:00
Brian Foster	d9fdd0adf9	xfs: fix inode number overflow in ifree cluster helper Qian Cai reports seemingly random buffer read verifier errors during filesystem writeback. This was isolated to a recent patch that factored out some inode cluster freeing code and happened to cast an unsigned inode number type to a signed value. If the inode number value overflows, we can skip marking in-core inodes associated with the underlying buffer stale at the time the physical inodes are freed. If such an inode happens to be dirty, xfsaild will eventually attempt to write it back over non-inode blocks. The invalidation of the underlying inode buffer causes writeback to read the buffer from disk. This fails the read verifier (preventing eventual corruption) if the buffer no longer looks like an inode cluster. Analysis by Dave Chinner. Fix up the helper to use the proper type for inode number values. Fixes: `5806165a66` ("xfs: factor inode lookup from xfs_ifree_cluster") Reported-by: Qian Cai <cai@lca.pw> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2020-04-02 08:19:25 -07:00
Michal Suchanek	9e62ccec3b	powerpc: Add back __ARCH_WANT_SYS_LLSEEK macro This partially reverts commit `caf6f9c8a3` ("asm-generic: Remove unneeded __ARCH_WANT_SYS_LLSEEK macro") When CONFIG_COMPAT is disabled on ppc64 the kernel does not build. There is resistance to both removing the llseek syscall from the 64bit syscall tables and building the llseek interface unconditionally. Signed-off-by: Michal Suchanek <msuchanek@suse.de> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/lkml/20190828151552.GA16855@infradead.org/ Link: https://lore.kernel.org/lkml/20190829214319.498c7de2@naga/ Link: https://lore.kernel.org/r/dd4575c51e31766e87f7e7fa121d099ab78d3290.1584699455.git.msuchanek@suse.de	2020-04-03 00:09:59 +11:00
Al Viro	99a4a90c8e	lookup_open(): don't bother with fallbacks to lookup+create We fall back to lookup+create (instead of atomic_open) in several cases: 1) we don't have write access to filesystem and O_TRUNC is present in the flags. It's not something we want ->atomic_open() to see - it just might go ahead and truncate the file. However, we can pass it the flags sans O_TRUNC - eventually do_open() will call handle_truncate() anyway. 2) we have O_CREAT \| O_EXCL and we can't write to parent. That's going to be an error, of course, but we want to know _which_ error should that be - might be EEXIST (if file exists), might be EACCES or EROFS. Simply stripping O_CREAT (and checking if we see ENOENT) would suffice, if not for O_EXCL. However, we used to have ->atomic_open() fully responsible for rejecting O_CREAT \| O_EXCL on existing file and just stripping O_CREAT would've disarmed those checks. With nothing downstream to catch the problem - FMODE_OPENED used to be "don't bother with EEXIST checks, ->atomic_open() has done those". Now EEXIST checks downstream are skipped only if FMODE_CREATED is set - FMODE_OPENED alone is not enough. That has eliminated the need to fall back onto lookup+create path in this case. 3) O_WRONLY or O_RDWR when we have no write access to filesystem, with nothing else objectionable. Fallback is (and had always been) pointless. IOW, we don't really need that fallback; all we need in such cases is to trim O_TRUNC and O_CREAT properly. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-04-02 01:09:31 -04:00
Al Viro	d489cf9a3e	atomic_open(): no need to pass struct open_flags anymore argument had been unused since `1643b43fbd` (lookup_open(): lift the "fallback to !O_CREAT" logics from atomic_open()) back in 2016 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-04-02 01:09:31 -04:00
Al Viro	ff326a3299	open_last_lookups(): move complete_walk() into do_open() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-04-02 01:09:30 -04:00
Al Viro	b94e0b32c8	open_last_lookups(): lift O_EXCL\|O_CREAT handling into do_open() Currently path_openat() has "EEXIST on O_EXCL\|O_CREAT" checks done on one of the ways out of open_last_lookups(). There are 4 cases: 1) the last component is . or ..; check is not done. 2) we had FMODE_OPENED or FMODE_CREATED set while in lookup_open(); check is not done. 3) symlink to be traversed is found; check is not done (nor should it be) 4) everything else: check done (before complete_walk(), even). In case (1) O_EXCL\|O_CREAT ends up failing with -EISDIR - that's open("/tmp/.", O_CREAT\|O_EXCL, 0600) Note that in the same conditions open("/tmp", O_CREAT\|O_EXCL, 0600) would have yielded EEXIST. Either error is allowed, switching to -EEXIST in these cases would've been more consistent. Case (2) is more subtle; first of all, if we have FMODE_CREATED set, the object hadn't existed prior to the call. The check should not be done in such a case. The rest is problematic, though - we have FMODE_OPENED set (i.e. it went through ->atomic_open() and got successfully opened there) FMODE_CREATED is NOT set O_CREAT and O_EXCL are both set. Any such case is a bug - either we failed to set FMODE_CREATED when we had, in fact, created an object (no such instances in the tree) or we have opened a pre-existing file despite having had both O_CREAT and O_EXCL passed. One of those was, in fact caught (and fixed) while sorting out this mess (gfs2 on cold dcache). And in such situations we should fail with EEXIST. Note that for (1) and (4) FMODE_CREATED is not set - for (1) there's nothing in handle_dots() to set it, for (4) we'd explicitly checked that. And (1), (2) and (4) are exactly the cases when we leave the loop in the caller, with do_open() called immediately after that loop. IOW, we can move the check over there, and make it If we have O_CREAT\|O_EXCL and after successful pathname resolution FMODE_CREATED is not set, we must have run into a preexisting file and should fail with EEXIST. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-04-02 01:09:30 -04:00
Al Viro	72287417ab	open_last_lookups(): don't abuse complete_walk() when all we want is unlazy Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-04-02 01:09:29 -04:00
Al Viro	f7bb959d96	open_last_lookups(): consolidate fsnotify_create() calls Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-04-02 01:09:28 -04:00
Al Viro	c5971b8c63	take post-lookup part of do_last() out of loop now we can have open_last_lookups() directly from the loop in path_openat() - the rest of do_last() never returns a symlink to follow, so we can bloody well leave the loop first. Rename the rest of that thing from do_last() to do_open() and make it return an int. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-04-02 01:09:28 -04:00
Al Viro	0f70595301	link_path_walk(): sample parent's i_uid and i_mode for the last component Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-04-02 01:09:27 -04:00
Al Viro	60ef60c7d7	__nd_alloc_stack(): make it return bool ... and adjust the caller (reserve_stack()). Rename to nd_alloc_stack(), while we are at it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-04-02 01:09:26 -04:00
Al Viro	4542576b79	reserve_stack(): switch to __nd_alloc_stack() expand the call of nd_alloc_stack() into it (and don't recheck the depth on the second call) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-04-02 01:09:26 -04:00
Al Viro	49055906af	pick_link(): take reserving space on stack into a new helper Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-04-02 01:09:25 -04:00
Al Viro	aef9404d8c	pick_link(): more straightforward handling of allocation failures pick_link() needs to push onto stack; we start with using two-element array embedded into struct nameidata and the first time we need more than that we switch to separately allocated array. Allocation can fail, of course, and handling of that would be simple enough - we need to drop 'link' and bugger off. However, the things get more complicated in RCU mode. There we must do GFP_ATOMIC allocation. If that fails, we try to switch to non-RCU mode and repeat the allocation. To switch to non-RCU mode we need to grab references to 'link' and to everything in nameidata. The latter done by unlazy_walk(); the former - legitimize_path(). 'link' must go first - after unlazy_walk() we are out of RCU-critical period and it's too late to call legitimize_path() since the references in link->mnt and link->dentry might be pointing to freed and reused memory. So we do legitimize_path(), then unlazy_walk(). And that's where it gets too subtle: what to do if the former fails? We MUST do path_put(link) to avoid leaks. And we can't do that under rcu_read_lock(). Solution in mainline was to empty then nameidata manually, drop out of RCU mode and then do put_path(). In effect, we open-code the things eventual terminate_walk() would've done on error in RCU mode. That looks badly out of place and confusing. We could add a comment along the lines of the explanation above, but... there's a simpler solution. Call unlazy_walk() even if legitimaze_path() fails. It will take us out of RCU mode, so we'll be able to do path_put(link). Yes, it will do unnecessary work - attempt to grab references on the stuff in nameidata, only to have them dropped as soon as we return the error to upper layer and get terminate_walk() called there. So what? We are thoroughly off the fast path by that point - we had GFP_ATOMIC allocation fail, we had ->d_seq or mount_lock mismatch and we are about to try walking the same path from scratch in non-RCU mode. Which will need to do the same allocation, this time with GFP_KERNEL, so it will be able to apply memory pressure for blocking stuff. Compared to that the cost of several lockref_get_not_dead() is noise. And the logics become much easier to understand that way. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-04-02 01:09:25 -04:00
Al Viro	c99687a03a	fold path_to_nameidata() into its only remaining caller Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-04-02 01:09:24 -04:00
Al Viro	84f0cd9e83	pick_link(): pass it struct path already with normal refcounting rules step_into() tries to avoid grabbing and dropping mount references on the steps that do not involve crossing mountpoints (which is obviously the majority of cases). So it uses a local struct path with unusual refcounting rules - path.mnt is pinned if and only if it's not equal to nd->path.mnt. We used to have similar beasts all over the place and we had quite a few bugs crop up in their handling - it's easy to get confused when changing e.g. cleanup on failure exits (or adding a new check, etc.) Now that's mostly gone - the step_into() instance (which is what we need them for) is the only one left. It is exposed to mount traversal and it's (shortly) seen by pick_link(). Since pick_link() needs to store it in link stack, where the normal rules apply, it has to make sure that mount is pinned regardless of nd->path.mnt value. That's done on all calls of pick_link() and very early in those. Let's do that in the caller (step_into()) instead - that way the fewer places need to be aware of such struct path instances. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-04-02 01:09:23 -04:00
Al Viro	19f6028a01	fs/namei.c: kill follow_mount() The only remaining caller (path_pts()) should be using follow_down() anyway. And clean path_pts() a bit. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-04-02 01:09:23 -04:00
Al Viro	2aa3847085	non-RCU analogue of the previous commit new helper: choose_mountpoint(). Wrapper around choose_mountpoint_rcu(), similar to lookup_mnt() vs. __lookup_mnt(). follow_dotdot() switched to it. Now we don't grab mount_lock exclusive anymore; note that the primitive used non-RCU mount traversals in other direction (lookup_mnt()) doesn't bother with that either - it uses mount_lock seqcount instead. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-04-02 01:09:22 -04:00
Al Viro	7ef482fa65	helper for mount rootwards traversal The loops in follow_dotdot{_rcu()} are doing the same thing: we have a mount and we want to find out how far up the chain of mounts do we need to go. We follow the chain of mount until we find one that is not directly overmounting the root of another mount. If such a mount is found, we want the location it's mounted upon. If we run out of chain (i.e. get to a mount that is not mounted on anything else) or run into process' root, we report failure. On success, we want (in RCU case) d_seq of resulting location sampled or (in non-RCU case) references to that location acquired. This commit introduces such primitive for RCU case and switches follow_dotdot_rcu() to it; non-RCU case will be go in the next commit. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-04-02 01:09:21 -04:00
Al Viro	165200d6cb	follow_dotdot(): be lazy about changing nd->path Change nd->path only after the loop is done and only in case we hadn't ended up finding ourselves in root. Same for NO_XDEV check. That separates the "check how far back do we need to go through the mount stack" logics from the rest of .. traversal. NOTE: path_get/path_put introduced here are temporary. They will go away later in the series. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-04-02 01:09:21 -04:00

1 2 3 4 5 ...

63909 commits