linux-stable/fs
Yang Shi 88aa7cc688 mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct
mmap_sem is on the hot path of kernel, and it very contended, but it is
abused too.  It is used to protect arg_start|end and evn_start|end when
reading /proc/$PID/cmdline and /proc/$PID/environ, but it doesn't make
sense since those proc files just expect to read 4 values atomically and
not related to VM, they could be set to arbitrary values by C/R.

And, the mmap_sem contention may cause unexpected issue like below:

INFO: task ps:14018 blocked for more than 120 seconds.
       Tainted: G            E 4.9.79-009.ali3000.alios7.x86_64 #1
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
 ps              D    0 14018      1 0x00000004
 Call Trace:
   schedule+0x36/0x80
   rwsem_down_read_failed+0xf0/0x150
   call_rwsem_down_read_failed+0x18/0x30
   down_read+0x20/0x40
   proc_pid_cmdline_read+0xd9/0x4e0
   __vfs_read+0x37/0x150
   vfs_read+0x96/0x130
   SyS_read+0x55/0xc0
   entry_SYSCALL_64_fastpath+0x1a/0xc5

Both Alexey Dobriyan and Michal Hocko suggested to use dedicated lock
for them to mitigate the abuse of mmap_sem.

So, introduce a new spinlock in mm_struct to protect the concurrent
access to arg_start|end, env_start|end and others, as well as replace
write map_sem to read to protect the race condition between prctl and
sys_brk which might break check_data_rlimit(), and makes prctl more
friendly to other VM operations.

This patch just eliminates the abuse of mmap_sem, but it can't resolve
the above hung task warning completely since the later
access_remote_vm() call needs acquire mmap_sem.  The mmap_sem
scalability issue will be solved in the future.

[yang.shi@linux.alibaba.com: add comment about mmap_sem and arg_lock]
  Link: http://lkml.kernel.org/r/1524077799-80690-1-git-send-email-yang.shi@linux.alibaba.com
Link: http://lkml.kernel.org/r/1523730291-109696-1-git-send-email-yang.shi@linux.alibaba.com
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mateusz Guzik <mguzik@redhat.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07 17:34:34 -07:00
..
9p fs/9p: detect invalid options as much as possible 2018-06-07 17:34:34 -07:00
adfs adfs_lookup: do not fail with ENOENT on negatives, use d_splice_alias() 2018-05-22 14:27:56 -04:00
affs affs: fix potential memory leak when parsing option 'prefix' 2018-05-28 12:36:41 +02:00
afs Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2018-06-06 18:39:49 -07:00
autofs4 autofs: mount point create should honour passed in mode 2018-04-20 17:18:35 -07:00
befs befs_lookup(): use d_splice_alias() 2018-05-21 14:30:07 -04:00
bfs bfs_add_entry: pass name/len as qstr pointer 2018-05-22 14:27:50 -04:00
btrfs for-4.18-tag 2018-06-04 14:29:13 -07:00
cachefiles Merge branch 'hch.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-06-04 10:00:01 -07:00
ceph ceph: fix iov_iter issues in ceph_direct_read_write() 2018-05-10 10:15:12 +02:00
cifs some smb3 fixes for stable, as well as addition of ftrace hooks for cifs.ko, and improvements in compounding and smbdirect (RDMA) 2018-06-04 14:42:46 -07:00
coda vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
configfs
cramfs cramfs_lookup(): use d_splice_alias() 2018-05-22 14:27:51 -04:00
crypto fscrypt: log the crypto algorithm implementations 2018-05-20 16:36:00 -04:00
debugfs debugfs: inode: debugfs_create_dir uses mode permission from parent 2018-05-14 16:48:18 +02:00
devpts devpts: comment devpts_mntget() 2018-03-14 13:31:23 +01:00
dlm dlm: remove O_NONBLOCK flag in sctp_connect_to_sock 2018-05-29 10:48:35 -05:00
ecryptfs Merge branch 'fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs into aio-base 2018-05-26 09:16:25 +02:00
efivarfs efivarfs: Limit the rate for non-root to read files 2018-02-22 10:21:02 -08:00
efs
exofs scsi/osd: remove the gfp argument to osd_start_request 2018-05-14 08:55:09 -06:00
exportfs ovl: do not try to reconnect a disconnected origin dentry 2018-04-12 12:04:49 +02:00
ext2 Changes for 4.18: 2018-06-05 13:24:20 -07:00
ext4 Add bunch of cleanups, and add support for the Speck128/256 2018-06-05 15:15:32 -07:00
f2fs Add bunch of cleanups, and add support for the Speck128/256 2018-06-05 15:15:32 -07:00
fat vfat: simplify checks in vfat_lookup() 2018-05-13 12:09:14 -04:00
freevxfs freevxfs_lookup(): use d_splice_alias() 2018-05-22 14:27:51 -04:00
fscache proc: introduce proc_create_single{,_data} 2018-05-16 07:23:35 +02:00
fuse fuse update for 4.18 2018-06-07 08:50:57 -07:00
gfs2 Changes for 4.18: 2018-06-05 13:24:20 -07:00
hfs hfs: don't allow mounting over .../rsrc 2018-05-22 14:28:00 -04:00
hfsplus Merge branch 'work.lookup' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-06-04 13:46:22 -07:00
hostfs hostfs: rename do_rmdir() to hostfs_do_rmdir() 2018-04-02 20:15:53 +02:00
hpfs
hugetlbfs hugetlbfs: fix bug in pgoff overflow checking 2018-04-05 21:36:21 -07:00
isofs isofs: fix potential memory leak in mount option parsing 2018-04-16 09:47:41 +02:00
jbd2 jbd2: remove NULL check before calling kmem_cache_destroy() 2018-05-20 22:38:26 -04:00
jffs2 do d_instantiate/unlock_new_inode combinations safely 2018-05-11 15:36:37 -04:00
jfs Merge branch 'hch.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-06-04 10:00:01 -07:00
kernfs Driver core changes for 4.18-rc1 2018-06-05 16:29:19 -07:00
lockd net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
minix minix_lookup: use d_splice_alias() 2018-05-22 14:27:52 -04:00
nfs proc: introduce proc_create_net{,_data} 2018-05-16 07:24:30 +02:00
nfs_common net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
nfsd for-4.18/block-20180603 2018-06-04 07:58:06 -07:00
nilfs2 do d_instantiate/unlock_new_inode combinations safely 2018-05-11 15:36:37 -04:00
nls
notify fsnotify: fix ignore mask logic in send_to_group() 2018-04-13 15:52:49 +02:00
ntfs ntfs: fix bogus __mark_inode_dirty(I_DIRTY_SYNC | I_DIRTY_DATASYNC) call 2018-03-28 01:39:02 -04:00
ocfs2 fs: ocfs2: use new return type vm_fault_t 2018-06-07 17:34:34 -07:00
omfs omfs_lookup(): report IO errors, use d_splice_alias() 2018-05-22 14:27:58 -04:00
openpromfs openpromfs: switch to d_splice_alias() 2018-05-22 14:27:57 -04:00
orangefs orangefs: fixes and cleanups 2018-06-07 09:23:12 -07:00
overlayfs ovl: use inode_insert5() to hash a newly created inode 2018-05-31 11:06:12 +02:00
proc mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct 2018-06-07 17:34:34 -07:00
pstore pstore: fix crypto dependencies without compression 2018-04-06 15:45:33 -07:00
qnx4 qnx4_lookup: use d_splice_alias() 2018-05-22 14:27:52 -04:00
qnx6 qnx6_lookup: switch to d_splice_alias() 2018-05-22 14:27:54 -04:00
quota fs: quota: Replace GFP_ATOMIC with GFP_KERNEL in dquot_init 2018-04-09 17:48:54 +02:00
ramfs
reiserfs Merge branch 'hch.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-06-04 10:00:01 -07:00
romfs romfs_lookup: switch to d_splice_alias() 2018-05-22 14:27:55 -04:00
squashfs
sysfs unfuck sysfs_mount() 2018-05-21 14:30:09 -04:00
sysv sysv_lookup: use d_splice_alias() 2018-05-22 14:27:53 -04:00
tracefs
ubifs Add bunch of cleanups, and add support for the Speck128/256 2018-06-05 15:15:32 -07:00
udf \n 2018-06-07 09:36:29 -07:00
ufs do d_instantiate/unlock_new_inode combinations safely 2018-05-11 15:36:37 -04:00
xfs Changes for 4.18: 2018-06-05 13:24:20 -07:00
Kconfig docs/admin-guide/mm: start moving here files from Documentation/vm 2018-04-27 17:02:48 -06:00
Kconfig.binfmt treewide: simplify Kconfig dependencies for removed archs 2018-03-26 15:55:57 +02:00
Makefile split d_path() and friends into a separate file 2018-03-29 15:07:46 -04:00
aio.c aio: sanitize the limit checking in io_submit(2) 2018-05-29 23:20:17 -04:00
anon_inodes.c
attr.c fs: Allow superblock owner to replace invalid owners of inodes 2018-05-24 11:57:18 -05:00
bad_inode.c
binfmt_aout.c exec: introduce finalize_exec() before start_thread() 2018-04-11 10:28:37 -07:00
binfmt_elf.c fs, elf: don't complain MAP_FIXED_NOREPLACE unless -EEXIST error 2018-04-20 17:18:36 -07:00
binfmt_elf_fdpic.c exec: introduce finalize_exec() before start_thread() 2018-04-11 10:28:37 -07:00
binfmt_em86.c
binfmt_flat.c exec: introduce finalize_exec() before start_thread() 2018-04-11 10:28:37 -07:00
binfmt_misc.c fs: add ksys_close() wrapper; remove in-kernel calls to sys_close() 2018-04-02 20:16:00 +02:00
binfmt_script.c
block_dev.c fs: convert block_dev.c to bioset_init() 2018-05-30 15:33:32 -06:00
buffer.c fs: move page_cache_seek_hole_data to iomap.c 2018-06-01 18:37:33 -07:00
char_dev.c block, char_dev: Use correct format specifier for unsigned ints 2018-03-15 17:59:24 +01:00
compat.c
compat_binfmt_elf.c
compat_ioctl.c
coredump.c
d_path.c split d_path() and friends into a separate file 2018-03-29 15:07:46 -04:00
dax.c fs/dax.c: use new return type vm_fault_t 2018-06-07 17:34:33 -07:00
dcache.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-06-04 10:14:28 -07:00
dcookies.c fs: add do_lookup_dcookie() helper; remove in-kernel call to syscall 2018-04-02 20:15:39 +02:00
direct-io.c block: consistently use GFP_NOIO instead of __GFP_NORECLAIM 2018-05-14 08:55:18 -06:00
drop_caches.c
eventfd.c eventfd: switch to ->poll_mask 2018-05-26 09:16:44 +02:00
eventpoll.c fs: add new vfs_poll and file_can_poll helpers 2018-05-26 09:16:44 +02:00
exec.c umh: introduce fork_usermode_blob() helper 2018-05-23 13:23:39 -04:00
fcntl.c fasync: Fix deadlock between task-context and interrupt-context kill_fasync() 2018-05-01 07:39:50 -04:00
fhandle.c vfs: Copy struct mount.mnt_id to userspace using put_user() 2018-01-15 12:07:51 -08:00
file.c fs: add ksys_close() wrapper; remove in-kernel calls to sys_close() 2018-04-02 20:16:00 +02:00
file_table.c
filesystems.c proc: introduce proc_create_single{,_data} 2018-05-16 07:23:35 +02:00
fs-writeback.c bdi: Fix oops in wb_workfn() 2018-05-03 16:11:37 -06:00
fs_pin.c
fs_struct.c
inode.c overlayfs fixes for 4.18 2018-06-07 08:53:50 -07:00
internal.h Revert "fs: fold open_check_o_direct into do_dentry_open" 2018-06-03 10:58:23 -07:00
ioctl.c fs: Allow CAP_SYS_ADMIN in s_user_ns to freeze and thaw filesystems 2018-05-24 12:04:28 -05:00
iomap.c fs: use ->is_partially_uptodate in page_cache_seek_hole_data 2018-06-01 18:37:33 -07:00
libfs.c fs, dax: prepare for dax-specific address_space_operations 2018-03-30 11:34:55 -07:00
locks.c proc: introduce proc_create_seq_private 2018-05-16 07:23:35 +02:00
mbcache.c
mount.h
mpage.c
namei.c Merge branch 'userns-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2018-06-04 15:21:19 -07:00
namespace.c fs: Allow superblock owner to access do_remount_sb() 2018-05-24 12:02:25 -05:00
no-block.c
nsfs.c net: Export open_related_ns() 2018-02-15 15:34:42 -05:00
open.c Revert "fs: fold open_check_o_direct into do_dentry_open" 2018-06-03 10:58:23 -07:00
pipe.c pipe: convert to ->poll_mask 2018-05-26 09:16:44 +02:00
pnode.c
pnode.h
posix_acl.c
proc_namespace.c vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
read_write.c fs: avoid fdput() after failed fdget() in vfs_dedupe_file_range() 2018-04-15 23:36:26 -04:00
readdir.c fs: add ksys_getdents64() helper; remove in-kernel calls to sys_getdents64() 2018-04-02 20:16:02 +02:00
select.c fs: introduce new ->get_poll_head and ->poll_mask methods 2018-05-26 09:16:44 +02:00
seq_file.c proc: fix smaps and meminfo alignment 2018-05-25 18:12:11 -07:00
signalfd.c signal: Extend siginfo_layout with SIL_FAULT_{MCEERR|BNDERR|PKUERR} 2018-04-26 19:51:14 -05:00
splice.c fs: add do_vmsplice() helper; remove in-kernel call to syscall 2018-04-02 20:15:40 +02:00
stack.c
stat.c fs: add do_readlinkat() helper; remove internal call to sys_readlinkat() 2018-04-02 20:15:34 +02:00
statfs.c
super.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-06-04 10:14:28 -07:00
sync.c Changes for this release: 2018-04-04 12:44:02 -07:00
timerfd.c timerfd: convert to ->poll_mask 2018-05-26 09:16:44 +02:00
userfaultfd.c vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
utimes.c fs: add do_compat_futimesat() helper; remove in-kernel call to compat syscall 2018-04-02 20:15:44 +02:00
xattr.c vfs: delete unnecessary assignment in vfs_listxattr 2018-05-29 13:22:41 -04:00