Commit graph

54457 commits

Author SHA1 Message Date
Linus Torvalds
ab0b2e5932 This pull request contains updates for both UBI and UBIFS:
- The UBI on-disk format header file is now dual licensed
 - New way to detect Fastmap problems during runtime
 - Bugfix for Fastmap
 - Minor updates for UBIFS (spelling, comments, vm_fault_t, ...)
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAlsdi6AWHHJpY2hhcmRA
 c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wVimEADam9dNKs7hVAlSR5G8fgoFMiPI
 PgP/cQ719PAYh8Zub0w8fIIvtSCaVZmPX/K40w4KAgP/jdQ/v4QD3ZYdHO3vn3/M
 6WI/cEB/Cn5lJp78tt14cGlqG3+xnlwyc9EEeiMCPDWmoSf66J7fwuL3xRq4dDet
 CBISYrF8ZMe6BJohT9kAYeqYoYLzALUmZoOx1DGhX7b2SEaLphCyW0A0rkxoHbYG
 sxAfUiR4/bdaFfqLAmEMgoovIHGYmOdnMJ3Hk+uUVrPEfBzxDQuGfpevDJ1u99hS
 D4qdvY3+VAcTpTK0XJUBgPaVpkYYMS5nfHtpObbyOT/kBmgCloCAmtvVcznxUwJ7
 R+8UBZH/SrxUw+R9JCP1hFgb/PlEAE0+Vsc2jgvqWOQ8xnOqojG5FdqAueGanX+l
 /on2/HNoZp5C8x15+cB9SiQwfdoZrH7vr9uzi1lqaWSV+ZMDpvG/QFMX1QX1gLl9
 TFeQJBojE41y+hjTnnUQZdy1oPaisffU44ymKlf1FR2SdzW6d7wIRBd0jnB+7Pjz
 KocSAA2kqDUcCjUtLFSSVfE7kUL9wwJQUfUxeQFViTGDvAnLcmy1a6mqQ1K5YWq9
 e5mpMQRWzoq0XrPyaVI5EEhvaqWalJwIa6GA7ZksOvS4W1pqFLLyFOKkHejUPou8
 lxcw0dDsyZdmnDRuMA==
 =b0lt
 -----END PGP SIGNATURE-----

Merge tag 'upstream-4.18-rc1' of git://git.infradead.org/linux-ubifs

Pull UBI and UBIFS updates from Richard Weinberger:

 - the UBI on-disk format header file is now dual licensed

 - new way to detect Fastmap problems during runtime

 - bugfix for Fastmap

 - minor updates for UBIFS (spelling, comments, vm_fault_t, ...)

* tag 'upstream-4.18-rc1' of git://git.infradead.org/linux-ubifs:
  mtd: ubi: Update ubi-media.h to dual license
  ubi: fastmap: Detect EBA mismatches on-the-fly
  ubi: fastmap: Check each mapping only once
  ubi: fastmap: Correctly handle interrupted erasures in EBA
  ubi: fastmap: Cancel work upon detach
  ubifs: lpt: Fix wrong pnode number range in comment
  ubifs: gc: Fix typo
  ubifs: log: Some spelling fixes
  ubifs: Spelling fix someting -> something
  ubifs: journal: Remove wrong comment
  ubifs: remove set but never used variable
  ubifs, xattr: remove misguided quota flags
  fs: ubifs: Adding new return type vm_fault_t
2018-06-10 15:52:09 -07:00
Linus Torvalds
0c14e43a42 one smb3 (ACL related) fix for stable, one SMB3 security enhancement (when mounting -t smb3 forbid less secure dialects), and some RDMA and compounding fixes
-----BEGIN PGP SIGNATURE-----
 
 iQHHBAABCgAxFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAlsdWXsTHHNtZnJlbmNo
 QGdtYWlsLmNvbQAKCRCKLL1wB3JPUXiyC/wJdnwM+R2JGtj1ygZqPfS6/xWbCI0F
 uXfn9a09nrMVbM2CQCDChiqoU2yvg9cKqc2d/UhssuwJYdfNNYSVe5MDkYYCADRn
 WSvCQ4+ZDQEl65pcjVKcOlhwl73cIVQmb6yaxMA/scZx+ikfe67EgJEEJZHJPhFO
 WJnMA5IUMAQFw5KxBdKtuVL0ixHBYvtrtQ9HAyPq0GTmCsTyb0GcbeMYaqRwpCMS
 R4NXyTr+s68UUi+iM9r+I6nYqml5L2EBFjrSwRbu7e6PIRF5fEMnKzj0WSmbgsSG
 KY2b/YWSskN5p5dkU+cLETXfjOjg8ugROJzz4LYkS/dUn+AJpg7IOwUNfr/a0yO5
 dHMwpa9SltPOslcHZgnBUsfqLHZSP0y/QODVhhc80uapxmllA4CHOc+lgG5i0VlB
 mTS4SQNRPw2NINoGCr+C65ghcJJyf6ivP6J3PoCjobo76yte8TiyauxjuDIJgY8T
 h2Wc/fWgN9lc9IUcpvedUjxoacG+rMSopQc=
 =yibA
 -----END PGP SIGNATURE-----

Merge tag '4.18-fixes-smb3' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:

 - one smb3 (ACL related) fix for stable

 - one SMB3 security enhancement (when mounting -t smb3 forbid less
   secure dialects)

 - some RDMA and compounding fixes

* tag '4.18-fixes-smb3' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: fix a buffer leak in smb2_query_symlink
  smb3: do not allow insecure cifs mounts when using smb3
  CIFS: Fix NULL ptr deref
  CIFS: fix encryption in SMB3.1.1
  CIFS: Pass page offset for encrypting
  CIFS: Pass page offset for calculating signature
  CIFS: SMBD: Support page offset in memory registration
  CIFS: SMBD: Support page offset in RDMA recv
  CIFS: SMBD: Support page offset in RDMA send
  CIFS: When sending data on socket, pass the correct page offset
  CIFS: Introduce helper function to get page offset and length in smb_rqst
  CIFS: Calculate the correct request length based on page offset and tail size
  cifs: For SMB2 security informaion query, check for minimum sized security descriptor instead of sizeof FileAllInformation class
  CIFS: Fix signing for SMB2/3
2018-06-10 10:53:31 -07:00
Linus Torvalds
d82991a868 Merge branch 'core-rseq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull restartable sequence support from Thomas Gleixner:
 "The restartable sequences syscall (finally):

  After a lot of back and forth discussion and massive delays caused by
  the speculative distraction of maintainers, the core set of
  restartable sequences has finally reached a consensus.

  It comes with the basic non disputed core implementation along with
  support for arm, powerpc and x86 and a full set of selftests

  It was exposed to linux-next earlier this week, so it does not fully
  comply with the merge window requirements, but there is really no
  point to drag it out for yet another cycle"

* 'core-rseq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  rseq/selftests: Provide Makefile, scripts, gitignore
  rseq/selftests: Provide parametrized tests
  rseq/selftests: Provide basic percpu ops test
  rseq/selftests: Provide basic test
  rseq/selftests: Provide rseq library
  selftests/lib.mk: Introduce OVERRIDE_TARGETS
  powerpc: Wire up restartable sequences system call
  powerpc: Add syscall detection for restartable sequences
  powerpc: Add support for restartable sequences
  x86: Wire up restartable sequence system call
  x86: Add support for restartable sequences
  arm: Wire up restartable sequences system call
  arm: Add syscall detection for restartable sequences
  arm: Add restartable sequences support
  rseq: Introduce restartable sequences system call
  uapi/headers: Provide types_32_64.h
2018-06-10 10:17:09 -07:00
Trond Myklebust
f9312a5410 NFSv4.1: Fix the client behaviour on NFS4ERR_SEQ_FALSE_RETRY
If the server returns NFS4ERR_SEQ_FALSE_RETRY or NFS4ERR_RETRY_UNCACHED_REP,
then it thinks we're trying to replay an existing request. If so, then
let's just bump the sequence ID and retry the operation.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-09 19:10:31 -04:00
Linus Torvalds
3ca24ce9ff Merge branch 'proc-cmdline'
Merge proc_cmdline simplifications.

This re-writes the get_mm_cmdline() logic to be rather simpler than it
used to be, and makes the semantics for "cmdline goes past the end of
the original area" more natural.

You _can_ use prctl(PR_SET_MM) to just point your command line somewhere
else entirely, but the traditional model is to just edit things in place
and that still needs to continue to work.  At least this way the code
makes some sense.

* proc-cmdline:
  fs/proc: simplify and clarify get_mm_cmdline() function
  fs/proc: re-factor proc_pid_cmdline_read() a bit
2018-06-09 15:31:35 -07:00
Mikulas Patocka
f72328d27f hpfs: Use EUCLEAN for filesystem errors
Use the error code EUCLEAN for filesystem errors because other
filesystems use this code too.

[ And remove unused EMEMERROR  - Linus ]

Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-09 14:34:48 -07:00
Linus Torvalds
eafdca4d70 Staging/IIO patches for 4.18-rc1
Here is the big staging and IIO driver update for 4.18-rc1.
 
 It was delayed as I wanted to make sure the final driver deletions did
 not cause any major merge issues, and all now looks good.
 
 There are a lot of patches here, just over 1000.  The diffstat summary
 shows the major changes here:
 	1007 files changed, 16828 insertions(+), 227770 deletions(-)
 Because of this, we might be close to shrinking the overall kernel
 source code size for two releases in a row.
 
 There was loads of work in this release cycle, primarily:
 	- tons of ks7010 driver cleanups
 	- lots of mt7621 driver fixes and cleanups
 	- most driver cleanups
 	- wilc1000 fixes and cleanups
 	- lots and lots of IIO driver cleanups and new additions
 	- debugfs cleanups for all staging drivers
 	- lots of other staging driver cleanups and fixes, the shortlog
 	  has the full details.
 
 but the big user-visable things here are the removal of 3 chunks of
 code:
 	- ncpfs and ipx were removed on schedule, no one has cared about
 	  this code since it moved to staging last year, and if it needs
 	  to come back, it can be reverted.
 	- lustre file system is removed.  I've ranted at the lustre
 	  developers about once a year for the past 5 years, with no
 	  real forward progress at all to clean things up and get the
 	  code into the "real" part of the kernel.  Given that the
 	  lustre developers continue to work on an external tree and try
 	  to port those changes to the in-kernel tree every once in a
 	  while, this whole thing really really is not working out at
 	  all.  So I'm deleting it so that the developers can spend the
 	  time working in their out-of-tree location and get things
 	  cleaned up properly to get merged into the tree correctly at a
 	  later date.
 
 Because of these file removals, you will have merge issues on some of
 these files (2 in the ipx code, 1 in the ncpfs code, and 1 in the
 atomisp driver).  Just delete those files, it's a simple merge :)
 
 All of this has been in linux-next for a while with no reported
 problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWxvjGQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ymoEwCbBYnyUl3cwCszIJ3L3/zvUWpmqIgAn1DDsAim
 dM4lmKg6HX/JBSV4GAN0
 =zdta
 -----END PGP SIGNATURE-----

Merge tag 'staging-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging/IIO updates from Greg KH:
 "Here is the big staging and IIO driver update for 4.18-rc1.

  It was delayed as I wanted to make sure the final driver deletions did
  not cause any major merge issues, and all now looks good.

  There are a lot of patches here, just over 1000. The diffstat summary
  shows the major changes here:

	1007 files changed, 16828 insertions(+), 227770 deletions(-)

  Because of this, we might be close to shrinking the overall kernel
  source code size for two releases in a row.

  There was loads of work in this release cycle, primarily:

   - tons of ks7010 driver cleanups

   - lots of mt7621 driver fixes and cleanups

   - most driver cleanups

   - wilc1000 fixes and cleanups

   - lots and lots of IIO driver cleanups and new additions

   - debugfs cleanups for all staging drivers

   - lots of other staging driver cleanups and fixes, the shortlog has
     the full details.

  but the big user-visable things here are the removal of 3 chunks of
  code:

   - ncpfs and ipx were removed on schedule, no one has cared about this
     code since it moved to staging last year, and if it needs to come
     back, it can be reverted.

   - lustre file system is removed.

     I've ranted at the lustre developers about once a year for the past
     5 years, with no real forward progress at all to clean things up
     and get the code into the "real" part of the kernel.

     Given that the lustre developers continue to work on an external
     tree and try to port those changes to the in-kernel tree every once
     in a while, this whole thing really really is not working out at
     all. So I'm deleting it so that the developers can spend the time
     working in their out-of-tree location and get things cleaned up
     properly to get merged into the tree correctly at a later date.

  Because of these file removals, you will have merge issues on some of
  these files (2 in the ipx code, 1 in the ncpfs code, and 1 in the
  atomisp driver). Just delete those files, it's a simple merge :)

  All of this has been in linux-next for a while with no reported
  problems"

* tag 'staging-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (1011 commits)
  staging: ipx: delete it from the tree
  ncpfs: remove uapi .h files
  ncpfs: remove Documentation
  ncpfs: remove compat functionality
  staging: ncpfs: delete it
  staging: lustre: delete the filesystem from the tree.
  staging: vc04_services: no need to save the log debufs dentries
  staging: vc04_services: vchiq_debugfs_log_entry can be a void *
  staging: vc04_services: remove struct vchiq_debugfs_info
  staging: vc04_services: move client dbg directory into static variable
  staging: vc04_services: remove odd vchiq_debugfs_top() wrapper
  staging: vc04_services: no need to check debugfs return values
  staging: mt7621-gpio: reorder includes alphabetically
  staging: mt7621-gpio: change gc_map to don't use pointers
  staging: mt7621-gpio: use GPIOF_DIR_OUT and GPIOF_DIR_IN macros instead of custom values
  staging: mt7621-gpio: change 'to_mediatek_gpio' to make just a one line return
  staging: mt7621-gpio: dt-bindings: update documentation for #interrupt-cells property
  staging: mt7621-gpio: update #interrupt-cells for the gpio node
  staging: mt7621-gpio: dt-bindings: complete documentation for the gpio
  staging: mt7621-dts: add missing properties to gpio node
  ...
2018-06-09 10:32:39 -07:00
Trond Myklebust
995891006c NFSv4: Fix a typo in nfs41_sequence_process
We want to compare the slot_id to the highest slot number advertised by the
server.

Fixes: 3be0f80b5f ("NFSv4.1: Fix up replays of interrupted requests")
Cc: stable@vger.kernel.org # 4.15+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-09 12:59:55 -04:00
Trond Myklebust
fc40724fc6 NFSv4: Revert commit 5f83d86cf5 ("NFSv4.x: Fix wraparound issues..")
The correct behaviour for NFSv4 sequence IDs is to wrap around
to the value 0 after 0xffffffff.
See https://tools.ietf.org/html/rfc5661#section-2.10.6.1

Fixes: 5f83d86cf5 ("NFSv4.x: Fix wraparound issues when validing...")
Cc: stable@vger.kernel.org # 4.6+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-09 12:56:30 -04:00
Linus Torvalds
7d3bf613e9 libnvdimm for 4.18
* DAX broke a fundamental assumption of truncate of file mapped pages.
   The truncate path assumed that it is safe to disconnect a pinned page
   from a file and let the filesystem reclaim the physical block. With DAX
   the page is equivalent to the filesystem block. Introduce
   dax_layout_busy_page() to enable filesystems to wait for pinned DAX
   pages to be released. Without this wait a filesystem could allocate
   blocks under active device-DMA to a new file.
 
 * DAX arranges for the block layer to be bypassed and uses
   dax_direct_access() + copy_to_iter() to satisfy read(2) calls.
   However, the memcpy_mcsafe() facility is available through the pmem
   block driver. In order to safely handle media errors, via the DAX
   block-layer bypass, introduce copy_to_iter_mcsafe().
 
 * Fix cache management policy relative to the ACPI NFIT Platform
   Capabilities Structure to properly elide cache flushes when they are not
   necessary. The table indicates whether CPU caches are power-fail
   protected. Clarify that a deep flush is always performed on
   REQ_{FUA,PREFLUSH} requests.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJbGxI7AAoJEB7SkWpmfYgCDjsP/2Lcibu9Kf4tKIzuInsle6iE
 6qP29qlkpHVTpDKbhvIxTYTYL9sMU0DNUrpPCJR/EYdeyztLWDFC5EAT1wF240vf
 maV37s/uP331jSC/2VJnKWzBs2ztQxmKLEIQCxh6aT0qs9cbaOvJgB/WlVu+qtsl
 aGJFLmb6vdQacp31noU5plKrMgMA1pADyF5qx9I9K2HwowHE7T368ZEFS/3S//c3
 LXmpx/Nfq52sGu/qbRbu6B1CTJhIGhmarObyQnvBYoKntK1Ov4e8DS95wD3EhNDe
 FuRkOCUKhjl6cFy7QVWh1ct1bFm84ny+b4/AtbpOmv9l/+0mveJ7e+5mu8HQTifT
 wYiEe2xzXJ+OG/xntv8SvlZKMpjP3BqI0jYsTutsjT4oHrciiXdXM186cyS+BiGp
 KtFmWyncQJgfiTq6+Hj5XpP9BapNS+OYdYgUagw9ZwzdzptuGFYUMSVOBrYrn6c/
 fwqtxjubykJoW0P3pkIoT91arFSea7nxOKnGwft06imQ7TwR4ARsI308feQ9itJq
 2P2e7/20nYMsw2aRaUDDA70Yu+Lagn1m8WL87IybUGeUDLb1BAkjphAlWa6COJ+u
 PhvAD2tvyM9m0c7O5Mytvz7iWKG6SVgatoAyOPkaeplQK8khZ+wEpuK58sO6C1w8
 4GBvt9ri9i/Ww/A+ppWs
 =4bfw
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm updates from Dan Williams:
 "This adds a user for the new 'bytes-remaining' updates to
  memcpy_mcsafe() that you already received through Ingo via the
  x86-dax- for-linus pull.

  Not included here, but still targeting this cycle, is support for
  handling memory media errors (poison) consumed via userspace dax
  mappings.

  Summary:

   - DAX broke a fundamental assumption of truncate of file mapped
     pages. The truncate path assumed that it is safe to disconnect a
     pinned page from a file and let the filesystem reclaim the physical
     block. With DAX the page is equivalent to the filesystem block.
     Introduce dax_layout_busy_page() to enable filesystems to wait for
     pinned DAX pages to be released. Without this wait a filesystem
     could allocate blocks under active device-DMA to a new file.

   - DAX arranges for the block layer to be bypassed and uses
     dax_direct_access() + copy_to_iter() to satisfy read(2) calls.
     However, the memcpy_mcsafe() facility is available through the pmem
     block driver. In order to safely handle media errors, via the DAX
     block-layer bypass, introduce copy_to_iter_mcsafe().

   - Fix cache management policy relative to the ACPI NFIT Platform
     Capabilities Structure to properly elide cache flushes when they
     are not necessary. The table indicates whether CPU caches are
     power-fail protected. Clarify that a deep flush is always performed
     on REQ_{FUA,PREFLUSH} requests"

* tag 'libnvdimm-for-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (21 commits)
  dax: Use dax_write_cache* helpers
  libnvdimm, pmem: Do not flush power-fail protected CPU caches
  libnvdimm, pmem: Unconditionally deep flush on *sync
  libnvdimm, pmem: Complete REQ_FLUSH => REQ_PREFLUSH
  acpi, nfit: Remove ecc_unit_size
  dax: dax_insert_mapping_entry always succeeds
  libnvdimm, e820: Register all pmem resources
  libnvdimm: Debug probe times
  linvdimm, pmem: Preserve read-only setting for pmem devices
  x86, nfit_test: Add unit test for memcpy_mcsafe()
  pmem: Switch to copy_to_iter_mcsafe()
  dax: Report bytes remaining in dax_iomap_actor()
  dax: Introduce a ->copy_to_iter dax operation
  uio, lib: Fix CONFIG_ARCH_HAS_UACCESS_MCSAFE compilation
  xfs, dax: introduce xfs_break_dax_layouts()
  xfs: prepare xfs_break_layouts() for another layout type
  xfs: prepare xfs_break_layouts() to be called with XFS_MMAPLOCK_EXCL
  mm, fs, dax: handle layout changes to pinned dax mappings
  mm: fix __gup_device_huge vs unmap
  mm: introduce MEMORY_DEVICE_FS_DAX and CONFIG_DEV_PAGEMAP_OPS
  ...
2018-06-08 17:21:52 -07:00
Dan Williams
930218affe Merge branch 'for-4.18/mcsafe' into libnvdimm-for-next 2018-06-08 15:16:44 -07:00
Dan Williams
b56845794e Merge branch 'for-4.18/dax' into libnvdimm-for-next 2018-06-08 15:16:40 -07:00
Andrew Elble
692ad280bf nfsd: fix error handling in nfs4_set_delegation()
I noticed a memory corruption crash in nfsd in
4.17-rc1. This patch corrects the issue.

Fix to return error if the delegation couldn't be hashed or there was
a recall in progress. Use the existing error path instead of
destroy_delegation() for readability.

Signed-off-by: Andrew Elble <aweits@rit.edu>
Fixes: 353601e7d3 ("nfsd: create a separate lease for each delegation")
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-06-08 16:42:29 -04:00
Scott Mayhew
3171822fdc nfsd: fix potential use-after-free in nfsd4_decode_getdeviceinfo
When running a fuzz tester against a KASAN-enabled kernel, the following
splat periodically occurs.

The problem occurs when the test sends a GETDEVICEINFO request with a
malformed xdr array (size but no data) for gdia_notify_types and the
array size is > 0x3fffffff, which results in an overflow in the value of
nbytes which is passed to read_buf().

If the array size is 0x40000000, 0x80000000, or 0xc0000000, then after
the overflow occurs, the value of nbytes 0, and when that happens the
pointer returned by read_buf() points to the end of the xdr data (i.e.
argp->end) when really it should be returning NULL.

Fix this by returning NFS4ERR_BAD_XDR if the array size is > 1000 (this
value is arbitrary, but it's the same threshold used by
nfsd4_decode_bitmap()... in could really be any value >= 1 since it's
expected to get at most a single bitmap in gdia_notify_types).

[  119.256854] ==================================================================
[  119.257611] BUG: KASAN: use-after-free in nfsd4_decode_getdeviceinfo+0x5a4/0x5b0 [nfsd]
[  119.258422] Read of size 4 at addr ffff880113ada000 by task nfsd/538

[  119.259146] CPU: 0 PID: 538 Comm: nfsd Not tainted 4.17.0+ #1
[  119.259662] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.fc25 04/01/2014
[  119.261202] Call Trace:
[  119.262265]  dump_stack+0x71/0xab
[  119.263371]  print_address_description+0x6a/0x270
[  119.264609]  kasan_report+0x258/0x380
[  119.265854]  ? nfsd4_decode_getdeviceinfo+0x5a4/0x5b0 [nfsd]
[  119.267291]  nfsd4_decode_getdeviceinfo+0x5a4/0x5b0 [nfsd]
[  119.268549]  ? nfs4svc_decode_compoundargs+0xa5b/0x13c0 [nfsd]
[  119.269873]  ? nfsd4_decode_sequence+0x490/0x490 [nfsd]
[  119.271095]  nfs4svc_decode_compoundargs+0xa5b/0x13c0 [nfsd]
[  119.272393]  ? nfsd4_release_compoundargs+0x1b0/0x1b0 [nfsd]
[  119.273658]  nfsd_dispatch+0x183/0x850 [nfsd]
[  119.274918]  svc_process+0x161c/0x31a0 [sunrpc]
[  119.276172]  ? svc_printk+0x190/0x190 [sunrpc]
[  119.277386]  ? svc_xprt_release+0x451/0x680 [sunrpc]
[  119.278622]  nfsd+0x2b9/0x430 [nfsd]
[  119.279771]  ? nfsd_destroy+0x1c0/0x1c0 [nfsd]
[  119.281157]  kthread+0x2db/0x390
[  119.282347]  ? kthread_create_worker_on_cpu+0xc0/0xc0
[  119.283756]  ret_from_fork+0x35/0x40

[  119.286041] Allocated by task 436:
[  119.287525]  kasan_kmalloc+0xa0/0xd0
[  119.288685]  kmem_cache_alloc+0xe9/0x1f0
[  119.289900]  get_empty_filp+0x7b/0x410
[  119.291037]  path_openat+0xca/0x4220
[  119.292242]  do_filp_open+0x182/0x280
[  119.293411]  do_sys_open+0x216/0x360
[  119.294555]  do_syscall_64+0xa0/0x2f0
[  119.295721]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

[  119.298068] Freed by task 436:
[  119.299271]  __kasan_slab_free+0x130/0x180
[  119.300557]  kmem_cache_free+0x78/0x210
[  119.301823]  rcu_process_callbacks+0x35b/0xbd0
[  119.303162]  __do_softirq+0x192/0x5ea

[  119.305443] The buggy address belongs to the object at ffff880113ada000
                which belongs to the cache filp of size 256
[  119.308556] The buggy address is located 0 bytes inside of
                256-byte region [ffff880113ada000, ffff880113ada100)
[  119.311376] The buggy address belongs to the page:
[  119.312728] page:ffffea00044eb680 count:1 mapcount:0 mapping:0000000000000000 index:0xffff880113ada780
[  119.314428] flags: 0x17ffe000000100(slab)
[  119.315740] raw: 0017ffe000000100 0000000000000000 ffff880113ada780 00000001000c0001
[  119.317379] raw: ffffea0004553c60 ffffea00045c11e0 ffff88011b167e00 0000000000000000
[  119.319050] page dumped because: kasan: bad access detected

[  119.321652] Memory state around the buggy address:
[  119.322993]  ffff880113ad9f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  119.324515]  ffff880113ad9f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  119.326087] >ffff880113ada000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  119.327547]                    ^
[  119.328730]  ffff880113ada080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  119.330218]  ffff880113ada100: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
[  119.331740] ==================================================================

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-06-08 16:38:59 -04:00
Trond Myklebust
ce5624f7e6 NFSv4: Return NFS4ERR_DELAY when a layout recall fails due to igrab()
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-08 16:36:10 -04:00
Trond Myklebust
6c34265502 NFSv4: Return NFS4ERR_DELAY when a delegation recall fails due to igrab()
If the attempt to recall the delegation fails because the inode is
in the process of being evicted from cache, then use NFS4ERR_DELAY
to ask the server to retry later.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-08 16:36:10 -04:00
Dave Chinner
0703a8e1c1 xfs: replace do_mod with native operations
do_mod() is a hold-over from when we have different sizes for file
offsets and and other internal values for 40 bit XFS filesystems.
Hence depending on build flags variables passed to do_mod() could
change size. We no longer support those small format filesystems and
hence everything is of fixed size theses days, even on 32 bit
platforms.

As such, we can convert all the do_mod() callers to platform
optimised modulus operations as defined by linux/math64.h.
Individual conversions depend on the types of variables being used.

Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-08 10:07:52 -07:00
Eric Sandeen
bb3d48dcf8 xfs: don't call xfs_da_shrink_inode with NULL bp
xfs_attr3_leaf_create may have errored out before instantiating a buffer,
for example if the blkno is out of range.  In that case there is no work
to do to remove it, and in fact xfs_da_shrink_inode will lead to an oops
if we try.

This also seems to fix a flaw where the original error from
xfs_attr3_leaf_create gets overwritten in the cleanup case, and it
removes a pointless assignment to bp which isn't used after this.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199969
Reported-by: Xu, Wen <wen.xu@gatech.edu>
Tested-by: Xu, Wen <wen.xu@gatech.edu>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-08 10:07:52 -07:00
Dave Chinner
9bb54cb56a xfs: clean up MIN/MAX
Get rid of the MIN/MAX macros and just use the native min/max macros
directly in the XFS code.

Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-08 10:07:52 -07:00
Dave Chinner
86210fbeba xfs: move various type verifiers to common file
New verification functions like xfs_verify_fsbno() and
xfs_verify_agino() are spread across multiple files and different
header files. They really don't fit cleanly into the places they've
been put, and have wider scope than the current header includes.

Move the type verifiers to a new file in libxfs (xfs-types.c) and
the prototypes to xfs_types.h where they will be visible to all the
code that uses the types.

Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-08 10:07:51 -07:00
Dave Chinner
4a2d01b076 xfs: xfs_reflink_convert_cow() memory allocation deadlock
xfs_reflink_convert_cow() manipulates the incore extent list
in GFP_KERNEL context in the IO submission path whilst holding
locked pages under writeback. This is a memory reclaim deadlock
vector. This code is not in a transaction, so any memory allocations
it makes aren't protected via the memalloc_nofs_save() context that
transactions carry.

Hence we need to run this call under memalloc_nofs_save() context to
prevent potential memory allocations from being run as GFP_KERNEL
and deadlocking.

Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-08 10:07:51 -07:00
Dave Chinner
ef215e394e xfs: setup VFS i_rwsem lockdep state correctly
When lockdep is enabled, it changes the type of the inode i_rwsem
semaphore before unlocking a newly instantiated inode. THere is the
possibility that there is already a waiter on that inode lock by the
time we unlock the new inode, so having lockdep re-initialise the
lock is a vector for trouble.

Avoid this whole situation by setting up the i_rwsem lockdep class
at the same time we set up the XFS inode i_ilock classes and so the
VFS doesn't have to change the lock class itself when it is
potentially unsafe.

This change is necessary because the equivalent fixes to the VFS code
made in commit 1e2e547a93 ("do d_instantiate/unlock_new_inode
combinations safely") are not relevant to XFS as it has it's own
internal inode cache lookup and instantiation routines.

Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-08 10:07:51 -07:00
Linus Torvalds
4a189982e2 Merge branch 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull aio iopriority support from Al Viro:
 "The rest of aio stuff for this cycle - Adam's aio ioprio series"

* 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs: aio ioprio use ioprio_check_cap ret val
  fs: aio ioprio add explicit block layer dependence
  fs: iomap dio set bio prio from kiocb prio
  fs: blkdev set bio prio from kiocb prio
  fs: Add aio iopriority support
  fs: Convert kiocb rw_hint from enum to u16
  block: add ioprio_check_cap function
2018-06-08 10:00:20 -07:00
Linus Torvalds
4189b863ba Merge branch 'work.lookup' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull proc_fill_cache regression fix from Al Viro:
 "Regression fix for proc_fill_cache() braino introduced when switching
  instantiate() callback to d_splice_alias()"

* 'work.lookup' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fix proc_fill_cache() in case of d_alloc_parallel() failure
2018-06-08 09:56:38 -07:00
Al Viro
d85b399b64 fix proc_fill_cache() in case of d_alloc_parallel() failure
If d_alloc_parallel() returns ERR_PTR(...), we don't want to dput()
that.  Small reorganization allows to have all error-in-lookup
cases rejoin the main codepath after dput(child), avoiding the
entire problem.

Spotted-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Fixes: 0168b9e38c "procfs: switch instantiate_t to d_splice_alias()"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-06-08 01:17:11 -04:00
Ronnie Sahlberg
9d874c3655 cifs: fix a buffer leak in smb2_query_symlink
This leak was introduced in 91cb74f514 and caused us
to leak one small buffer for every symlink query.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-06-07 23:39:41 -05:00
Dan Carpenter
016e92da03 autofs: small cleanup in autofs_getpath()
We don't set "*name" so it's slightly nicer to just pass "name" instead
of "&name".

Link: http://lkml.kernel.org/r/20180531064736.lnisb55eajwjynvk@kili.mountain
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07 17:34:40 -07:00
Ian Kent
6471e93863 autofs: clean up includes
Remove includes that aren't needed from autofs (and fs/compat_ioctl.c).

Link: http://lkml.kernel.org/r/152635085258.5968.9743527195522188148.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07 17:34:40 -07:00
Ian Kent
8240b716e2 autofs: comment on selinux changes needed for module autoload
Due to the autofs4 module using a file system type name of autofs
different from the module containing directory name autoload did not
function properly.  To work around this kernel configurations have often
elected to build the module into the kernel.

This can result in selinux policies that prohibit autoloading of the
autofs module which need to be changed.

Add a comment about this to "possible changes" section of the autofs4
module help.

Link: http://lkml.kernel.org/r/152686474171.6155.1239659539983577463.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07 17:34:39 -07:00
Ian Kent
2a3ae0a121 autofs: create autofs Kconfig and Makefile
Create Makefile and Kconfig for autofs module.

[raven@themaw.net: make autofs4 Kconfig depend on AUTOFS_FS]
  Link: http://lkml.kernel.org/r/152687649097.8263.7046086367407522029.stgit@pluto.themaw.net
Link: http://lkml.kernel.org/r/152626705591.28589.356365986974038383.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07 17:34:39 -07:00
Ian Kent
8547190490 autofs: delete fs/autofs4 source files
Delete the now unused autofs4 module files.

Link: http://lkml.kernel.org/r/152626707391.28589.3553309771262313504.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07 17:34:39 -07:00
Ian Kent
f7e095f5d1 autofs: update fs/autofs4/Makefile
Update Makefile to build from source in fs/autofs instead of fs/autofs4.

Link: http://lkml.kernel.org/r/152626706824.28589.1915028175544560855.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07 17:34:39 -07:00
Ian Kent
6ed3874604 autofs: update fs/autofs4/Kconfig
Update Kconfig and add a depricated warning.

[raven@themaw.net: make autofs4 Kconfig depend on AUTOFS_FS]
  Link: http://lkml.kernel.org/r/152687649097.8263.7046086367407522029.stgit@pluto.themaw.net
Link: http://lkml.kernel.org/r/152626706133.28589.11994171621899212952.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07 17:34:39 -07:00
Ian Kent
ebc921ca9b autofs: copy autofs4 to autofs
Copy source files from the autofs4 directory to the autofs directory.

Link: http://lkml.kernel.org/r/152626705013.28589.931913083997578251.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07 17:34:39 -07:00
Ian Kent
47206e012a autofs4: use autofs instead of autofs4 everywhere
Update naming within autofs source to be consistent by changing
occurrences of autofs4 to autofs.

Link: http://lkml.kernel.org/r/152626703688.28589.8315406711135226803.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07 17:34:39 -07:00
Ian Kent
ef8b42f78e autofs4: merge auto_fs.h and auto_fs4.h
The autofs module has long since been removed so there's no need to have
two separate include files for autofs.

Link: http://lkml.kernel.org/r/152626703024.28589.9571964661718767929.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07 17:34:39 -07:00
Thadeu Lima de Souza Cascardo
5cc41e0995 fs/binfmt_misc.c: do not allow offset overflow
WHen registering a new binfmt_misc handler, it is possible to overflow
the offset to get a negative value, which might crash the system, or
possibly leak kernel data.

Here is a crash log when 2500000000 was used as an offset:

  BUG: unable to handle kernel paging request at ffff989cfd6edca0
  IP: load_misc_binary+0x22b/0x470 [binfmt_misc]
  PGD 1ef3e067 P4D 1ef3e067 PUD 0
  Oops: 0000 [#1] SMP NOPTI
  Modules linked in: binfmt_misc kvm_intel ppdev kvm irqbypass joydev input_leds serio_raw mac_hid parport_pc qemu_fw_cfg parpy
  CPU: 0 PID: 2499 Comm: bash Not tainted 4.15.0-22-generic #24-Ubuntu
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
  RIP: 0010:load_misc_binary+0x22b/0x470 [binfmt_misc]
  Call Trace:
    search_binary_handler+0x97/0x1d0
    do_execveat_common.isra.34+0x667/0x810
    SyS_execve+0x31/0x40
    do_syscall_64+0x73/0x130
    entry_SYSCALL_64_after_hwframe+0x3d/0xa2

Use kstrtoint instead of simple_strtoul.  It will work as the code
already set the delimiter byte to '\0' and we only do it when the field
is not empty.

Tested with offsets -1, 2500000000, UINT_MAX and INT_MAX.  Also tested
with examples documented at Documentation/admin-guide/binfmt-misc.rst
and other registrations from packages on Ubuntu.

Link: http://lkml.kernel.org/r/20180529135648.14254-1-cascardo@canonical.com
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07 17:34:39 -07:00
Alexey Dobriyan
5d008fb414 proc: use "unsigned int" for /proc/*/stack
struct stack_trace::nr_entries is defined as "unsigned int" (YAY!) so
the iterator should be unsigned as well.

It saves 1 byte of code or something like that.

Link: http://lkml.kernel.org/r/20180423215248.GG9043@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07 17:34:38 -07:00
Alexey Dobriyan
197850a1e0 proc: use "unsigned int" for sigqueue length
It's defined as atomic_t and really long signal queues are unheard of.

Link: http://lkml.kernel.org/r/20180423215119.GF9043@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07 17:34:38 -07:00
Alexey Dobriyan
a4ef389565 proc: use "unsigned int" in proc_fill_cache()
All those lengths are unsigned as they should be.

Link: http://lkml.kernel.org/r/20180423213751.GC9043@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07 17:34:38 -07:00
Alexey Dobriyan
941169298a proc: smaller RCU section in ->getattr()
struct kstat is thread local.

Link: http://lkml.kernel.org/r/20180423213626.GB9043@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07 17:34:38 -07:00
Alexey Dobriyan
3cb4e162e4 proc: deduplicate /proc/*/cmdline implementation
Code can be sonsolidated if a dummy region of 0 length is used in normal
case of \0-separated command line:

1) [arg_start, arg_end) + [dummy len=0]
2) [arg_start, arg_end) + [env_start, env_end)

Link: http://lkml.kernel.org/r/20180221193335.GB28678@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07 17:34:38 -07:00
Alexey Dobriyan
6a6cbe75db proc: simpler iterations for /proc/*/cmdline
"rv" variable is used both as a counter of bytes transferred and an
error value holder but it can be reduced solely to error values if
original start of userspace buffer is stashed and used at the very end.

[akpm@linux-foundation.org: simplify cleanup code]
Link: http://lkml.kernel.org/r/20180221193009.GA28678@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07 17:34:38 -07:00
Alexey Dobriyan
6a6b9c4c11 proc: somewhat simpler code for /proc/*/cmdline
"final" variable is OK but we can get away with less lines.

Link: http://lkml.kernel.org/r/20180221192751.GC28548@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07 17:34:38 -07:00
Alexey Dobriyan
b42262af5e proc: more "unsigned int" in /proc/*/cmdline
access_remote_vm() doesn't return negative errors, it returns number of
bytes read/written (0 if error occurs).  This allows to delete some
comparisons which never trigger.

Reuse "nr_read" variable while I'm at it.

Link: http://lkml.kernel.org/r/20180221192605.GB28548@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07 17:34:38 -07:00
Mike Rapoport
df2cc96e77 userfaultfd: prevent non-cooperative events vs mcopy_atomic races
If a process monitored with userfaultfd changes it's memory mappings or
forks() at the same time as uffd monitor fills the process memory with
UFFDIO_COPY, the actual creation of page table entries and copying of
the data in mcopy_atomic may happen either before of after the memory
mapping modifications and there is no way for the uffd monitor to
maintain consistent view of the process memory layout.

For instance, let's consider fork() running in parallel with
userfaultfd_copy():

process        		         |	uffd monitor
---------------------------------+------------------------------
fork()        		         | userfaultfd_copy()
...        		         | ...
    dup_mmap()        	         |     down_read(mmap_sem)
    down_write(mmap_sem)         |     /* create PTEs, copy data */
        dup_uffd()               |     up_read(mmap_sem)
        copy_page_range()        |
        up_write(mmap_sem)       |
        dup_uffd_complete()      |
            /* notify monitor */ |

If the userfaultfd_copy() takes the mmap_sem first, the new page(s) will
be present by the time copy_page_range() is called and they will appear
in the child's memory mappings.  However, if the fork() is the first to
take the mmap_sem, the new pages won't be mapped in the child's address
space.

If the pages are not present and child tries to access them, the monitor
will get page fault notification and everything is fine.  However, if
the pages *are present*, the child can access them without uffd
noticing.  And if we copy them into child it'll see the wrong data.
Since we are talking about background copy, we'd need to decide whether
the pages should be copied or not regardless #PF notifications.

Since userfaultfd monitor has no way to determine what was the order,
let's disallow userfaultfd_copy in parallel with the non-cooperative
events.  In such case we return -EAGAIN and the uffd monitor can
understand that userfaultfd_copy() clashed with a non-cooperative event
and take an appropriate action.

Link: http://lkml.kernel.org/r/1527061324-19949-1-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07 17:34:38 -07:00
Matthew Wilcox
1d40a5ea01 mm: mark pages in use for page tables
Define a new PageTable bit in the page_type and use it to mark pages in
use as page tables.  This can be helpful when debugging crashdumps or
analysing memory fragmentation.  Add a KPF flag to report these pages to
userspace and update page-types.c to interpret that flag.

Note that only pages currently accounted as NR_PAGETABLES are tracked as
PageTable; this does not include pgd/p4d/pud/pmd pages.  Those will be the
subject of a later patch.

Link: http://lkml.kernel.org/r/20180518194519.3820-4-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07 17:34:37 -07:00
Huang Ying
ab6ecf247a mm: /proc/pid/pagemap: hide swap entries from unprivileged users
In commit ab676b7d6f ("pagemap: do not leak physical addresses to
non-privileged userspace"), the /proc/PID/pagemap is restricted to be
readable only by CAP_SYS_ADMIN to address some security issue.

In commit 1c90308e7a ("pagemap: hide physical addresses from
non-privileged users"), the restriction is relieved to make
/proc/PID/pagemap readable, but hide the physical addresses for
non-privileged users.

But the swap entries are readable for non-privileged users too.  This
has some security issues.  For example, for page under migrating, the
swap entry has physical address information.  So, in this patch, the
swap entries are hided for non-privileged users too.

Link: http://lkml.kernel.org/r/20180508012745.7238-1-ying.huang@intel.com
Fixes: 1c90308e7a ("pagemap: hide physical addresses from non-privileged users")
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Andrei Vagin <avagin@openvz.org>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Daniel Colascione <dancol@google.com>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07 17:34:36 -07:00
Mike Kravetz
5d752600a8 mm: restructure memfd code
With the addition of memfd hugetlbfs support, we now have the situation
where memfd depends on TMPFS -or- HUGETLBFS.  Previously, memfd was only
supported on tmpfs, so it made sense that the code resided in shmem.c.
In the current code, memfd is only functional if TMPFS is defined.  If
HUGETLFS is defined and TMPFS is not defined, then memfd functionality
will not be available for hugetlbfs.  This does not cause BUGs, just a
lack of potentially desired functionality.

Code is restructured in the following way:
- include/linux/memfd.h is a new file containing memfd specific
  definitions previously contained in shmem_fs.h.
- mm/memfd.c is a new file containing memfd specific code previously
  contained in shmem.c.
- memfd specific code is removed from shmem_fs.h and shmem.c.
- A new config option MEMFD_CREATE is added that is defined if TMPFS
  or HUGETLBFS is defined.

No functional changes are made to the code: restructuring only.

Link: http://lkml.kernel.org/r/20180415182119.4517-4-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Herrmann <dh.herrmann@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Marc-Andr Lureau <marcandre.lureau@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07 17:34:35 -07:00
Yang Shi
88aa7cc688 mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct
mmap_sem is on the hot path of kernel, and it very contended, but it is
abused too.  It is used to protect arg_start|end and evn_start|end when
reading /proc/$PID/cmdline and /proc/$PID/environ, but it doesn't make
sense since those proc files just expect to read 4 values atomically and
not related to VM, they could be set to arbitrary values by C/R.

And, the mmap_sem contention may cause unexpected issue like below:

INFO: task ps:14018 blocked for more than 120 seconds.
       Tainted: G            E 4.9.79-009.ali3000.alios7.x86_64 #1
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
 ps              D    0 14018      1 0x00000004
 Call Trace:
   schedule+0x36/0x80
   rwsem_down_read_failed+0xf0/0x150
   call_rwsem_down_read_failed+0x18/0x30
   down_read+0x20/0x40
   proc_pid_cmdline_read+0xd9/0x4e0
   __vfs_read+0x37/0x150
   vfs_read+0x96/0x130
   SyS_read+0x55/0xc0
   entry_SYSCALL_64_fastpath+0x1a/0xc5

Both Alexey Dobriyan and Michal Hocko suggested to use dedicated lock
for them to mitigate the abuse of mmap_sem.

So, introduce a new spinlock in mm_struct to protect the concurrent
access to arg_start|end, env_start|end and others, as well as replace
write map_sem to read to protect the race condition between prctl and
sys_brk which might break check_data_rlimit(), and makes prctl more
friendly to other VM operations.

This patch just eliminates the abuse of mmap_sem, but it can't resolve
the above hung task warning completely since the later
access_remote_vm() call needs acquire mmap_sem.  The mmap_sem
scalability issue will be solved in the future.

[yang.shi@linux.alibaba.com: add comment about mmap_sem and arg_lock]
  Link: http://lkml.kernel.org/r/1524077799-80690-1-git-send-email-yang.shi@linux.alibaba.com
Link: http://lkml.kernel.org/r/1523730291-109696-1-git-send-email-yang.shi@linux.alibaba.com
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mateusz Guzik <mguzik@redhat.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07 17:34:34 -07:00
Chengguang Xu
478ae0ca08 fs/9p: detect invalid options as much as possible
Currently when detecting invalid options in option parsing, some
options(e.g.  msize) just set errno and allow to continuously validate
other options so that it can detect invalid options as much as possible
and give proper error messages together.

This patch applies same rule to option 'cache' and 'access' when
detecting -EINVAL.

Link: http://lkml.kernel.org/r/1525340676-34072-2-git-send-email-cgxu519@gmx.com
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07 17:34:34 -07:00
Souptick Joarder
c6137fe36d fs: ocfs2: use new return type vm_fault_t
Use new return type vm_fault_t for fault handler.  For now, this is just
documenting that the function returns a VM_FAULT value rather than an
errno.  Once all instances are converted, vm_fault_t will become a
distinct type.

Ref-> commit 1c8f422059 ("mm: change return type to vm_fault_t")

vmf_error() is the newly introduce inline function in 4.18.

Fix one checkpatch.pl warning by replacing BUG_ON() with WARN_ON()

[akpm@linux-foundation.org: undo BUG_ON->WARN_ON change]
Link: http://lkml.kernel.org/r/20180523153258.GA28451@jordon-HP-15-Notebook-PC
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07 17:34:34 -07:00
Salvatore Mesoraca
64202a21a4 ocfs2: drop a VLA in ocfs2_orphan_del()
Avoid a VLA by using a real constant expression instead of a variable.
The compiler should be able to optimize the original code and avoid
using an actual VLA.  Anyway this change is useful because it will avoid
a false positive with -Wvla, it might also help the compiler generating
better code.

Link: http://lkml.kernel.org/r/1520970710-19732-1-git-send-email-s.mesoraca16@gmail.com
Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07 17:34:34 -07:00
Guozhonghua
f3797d8ae5 ocfs2: correct the comments position of struct ocfs2_dir_block_trailer
Correct the comments position of the structure ocfs2_dir_block_trailer.

Link: http://lkml.kernel.org/r/71604351584F6A4EBAE558C676F37CA401071C5FDE@H3CMLB12-EX.srv.huawei-3com.com
Signed-off-by: guozhonghua <guozhonghua@h3c.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07 17:34:34 -07:00
Zhen Lei
731a40fab1 ocfs2: eliminate a misreported warning
The warning is invalid because the parameter chunksize passed from
ocfs2_info_freefrag_scan_chain-->ocfs2_info_update_ffg is guaranteed to
be positive.  So __ilog2_u32 cannot return -1.

  fs/ocfs2/ioctl.c: In function 'ocfs2_info_update_ffg':
  fs/ocfs2/ioctl.c:411:17: warning: array subscript is below array bounds [-Warray-bounds]
    hist->fc_chunks[index]++;
                   ^
  fs/ocfs2/ioctl.c:411:17: warning: array subscript is below array bounds [-Warray-bounds]

Link: http://lkml.kernel.org/r/1524655799-12112-1-git-send-email-thunder.leizhen@huawei.com
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07 17:34:33 -07:00
Larry Chen
133b81f28e ocfs2: ocfs2_inode_lock_tracker does not distinguish lock level
ocfs2_inode_lock_tracker as a variant of ocfs2_inode_lock, is used to
prevent deadlock due to recursive lock acquisition.

But this function does not distinguish whether the requested level is EX
or PR.

If a RP lock has been attained, this function will immediately return
success afterwards even an EX lock is requested.

But actually the return value does not mean that the process got a EX
lock, because ocfs2_inode_lock has not been called.

When taking lock levels into account, we face some different situations:

1. no lock is held
   In this case, just lock the inode and return 0

2. We are holding a lock
   For this situation, things diverges into several cases

   wanted     holding	     what to do
   ex		ex	    see 2.1 below
   ex		pr	    see 2.2 below
   pr		ex	    see 2.1 below
   pr		pr	    see 2.1 below

   2.1 lock level that is been held is compatible
   with the wanted level, so no lock action will be tacken.

   2.2 Otherwise, an upgrade is needed, but it is forbidden.

Reason why upgrade within a process is forbidden is that lock upgrade
may cause dead lock.  The following illustrate how it happens.

        process 1                             process 2
ocfs2_inode_lock_tracker(ex=0)
                               <======   ocfs2_inode_lock_tracker(ex=1)

ocfs2_inode_lock_tracker(ex=1)

For the status quo of ocfs2, without this patch, neither a bug nor
end-user impact will be caused because the wrong logic is avoided.

But I'm afraid this generic interface, may be called by other developers
in future and used in this situation.

  a process
ocfs2_inode_lock_tracker(ex=0)
ocfs2_inode_lock_tracker(ex=1)

Link: http://lkml.kernel.org/r/20180510053230.17217-1-lchen@suse.com
Signed-off-by: Larry Chen <lchen@suse.com>
Reviewed-by: Gang He <ghe@suse.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07 17:34:33 -07:00
Jia Guo
5bc55d654b ocfs2: clean up redundant function declarations
ocfs2_extend_allocation() has been deleted, clean up its declaration.
Also change the static function name from __ocfs2_extend_allocation() to
ocfs2_extend_allocation() to be consistent with the corresponding trace
events as well as comments for ocfs2_lock_allocators().

Link: http://lkml.kernel.org/r/09cf7125-6f12-e53e-20f5-e606b2c16b48@huawei.com
Fixes: 964f14a0d3 ("ocfs2: clean up some dead code")
Signed-off-by: Jia Guo <guojia12@huawei.com>
Acked-by: Joseph Qi <jiangqi903@gmail.com>
Reviewed-by: Jun Piao <piaojun@huawei.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07 17:34:33 -07:00
Souptick Joarder
ab77dab462 fs/dax.c: use new return type vm_fault_t
Use new return type vm_fault_t for fault handler.  For now, this is just
documenting that the function returns a VM_FAULT value rather than an
errno.  Once all instances are converted, vm_fault_t will become a
distinct type.

commit 1c8f422059 ("mm: change return type to vm_fault_t")

There was an existing bug inside dax_load_hole() if vm_insert_mixed had
failed to allocate a page table, we'd return VM_FAULT_NOPAGE instead of
VM_FAULT_OOM.  With new vmf_insert_mixed() this issue is addressed.

vm_insert_mixed_mkwrite has inefficiency when it returns an error value,
driver has to convert it to vm_fault_t type.  With new
vmf_insert_mixed_mkwrite() this limitation will be addressed.

Link: http://lkml.kernel.org/r/20180510181121.GA15239@jordon-HP-15-Notebook-PC
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07 17:34:33 -07:00
Linus Torvalds
c90fca951e powerpc updates for 4.18
Notable changes:
 
  - Support for split PMD page table lock on 64-bit Book3S (Power8/9).
 
  - Add support for HAVE_RELIABLE_STACKTRACE, so we properly support live
    patching again.
 
  - Add support for patching barrier_nospec in copy_from_user() and syscall entry.
 
  - A couple of fixes for our data breakpoints on Book3S.
 
  - A series from Nick optimising TLB/mm handling with the Radix MMU.
 
  - Numerous small cleanups to squash sparse/gcc warnings from Mathieu Malaterre.
 
  - Several series optimising various parts of the 32-bit code from Christophe Leroy.
 
  - Removal of support for two old machines, "SBC834xE" and "C2K" ("GEFanuc,C2K"),
    which is why the diffstat has so many deletions.
 
 And many other small improvements & fixes.
 
 There's a few out-of-area changes. Some minor ftrace changes OK'ed by Steve, and
 a fix to our powernv cpuidle driver. Then there's a series touching mm, x86 and
 fs/proc/task_mmu.c, which cleans up some details around pkey support. It was
 ack'ed/reviewed by Ingo & Dave and has been in next for several weeks.
 
 Thanks to:
   Akshay Adiga, Alastair D'Silva, Alexey Kardashevskiy, Al Viro, Andrew
   Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Arnd Bergmann, Balbir Singh,
   Cédric Le Goater, Christophe Leroy, Christophe Lombard, Colin Ian King, Dave
   Hansen, Fabio Estevam, Finn Thain, Frederic Barrat, Gautham R. Shenoy, Haren
   Myneni, Hari Bathini, Ingo Molnar, Jonathan Neuschäfer, Josh Poimboeuf,
   Kamalesh Babulal, Madhavan Srinivasan, Mahesh Salgaonkar, Mark Greer, Mathieu
   Malaterre, Matthew Wilcox, Michael Neuling, Michal Suchanek, Naveen N. Rao,
   Nicholas Piggin, Nicolai Stange, Olof Johansson, Paul Gortmaker, Paul
   Mackerras, Peter Rosin, Pridhiviraj Paidipeddi, Ram Pai, Rashmica Gupta, Ravi
   Bangoria, Russell Currey, Sam Bobroff, Samuel Mendoza-Jonas, Segher
   Boessenkool, Shilpasri G Bhat, Simon Guo, Souptick Joarder, Stewart Smith,
   Thiago Jung Bauermann, Torsten Duwe, Vaibhav Jain, Wei Yongjun, Wolfram Sang,
   Yisheng Xie, YueHaibing.
 -----BEGIN PGP SIGNATURE-----
 
 iQIwBAABCAAaBQJbGQKBExxtcGVAZWxsZXJtYW4uaWQuYXUACgkQUevqPMjhpYBq
 TRAAioK7rz5xYMkxaM3Ng3ybobEeNAwQqOolz98xvmnB9SfDWNuc99vf8cGu0/fQ
 zc8AKZ5RcnwipOjyGlxW9oa1ZhVq0xtYnQPiYLEKMdLQmh5D+C7+KpvAd1UElweg
 ub40/xDySWfMujfuMSF9JDCWPIXyojt4Xg5nJKIVRrAm/3YMe/+i5Am7NWHuMCEb
 aQmZtlYW5Mz81XY0968hjpUO6eKFRmsaM7yFAhGTXx6+oLRpGj1PZB4AwdRIKS2L
 Ak7q/VgxtE4W+s3a0GK2s+eXIhGKeFuX9AVnx3nti+8/K1OqrqhDcLMUC/9JpCpv
 EvOtO7dxPnZujHjdu4Eai/xNoo4h6zRy7bWqve9LoBM40CP5jljKzu1lwqqb5yO0
 jC7/aXhgiSIxxcRJLjoI/TYpZPu40MifrkydmczykdPyPCnMIWEJDcj4KsRL/9Y8
 9SSbJzRNC/SgQNTbUYPZFFi6G0QaMmlcbCb628k8QT+Gn3Xkdf/ZtxzqEyoF4Irq
 46kFBsiSSK4Bu0rVlcUtJQLgdqytWULO6NKEYnD67laxYcgQd8pGFQ8SjZhRZLgU
 q5LA3HIWhoAI4M0wZhOnKXO6JfiQ1UbO8gUJLsWsfF0Fk5KAcdm+4kb4jbI1H4Qk
 Vol9WNRZwEllyaiqScZN9RuVVuH0GPOZeEH1dtWK+uWi0lM=
 =ZlBf
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:
 "Notable changes:

   - Support for split PMD page table lock on 64-bit Book3S (Power8/9).

   - Add support for HAVE_RELIABLE_STACKTRACE, so we properly support
     live patching again.

   - Add support for patching barrier_nospec in copy_from_user() and
     syscall entry.

   - A couple of fixes for our data breakpoints on Book3S.

   - A series from Nick optimising TLB/mm handling with the Radix MMU.

   - Numerous small cleanups to squash sparse/gcc warnings from Mathieu
     Malaterre.

   - Several series optimising various parts of the 32-bit code from
     Christophe Leroy.

   - Removal of support for two old machines, "SBC834xE" and "C2K"
     ("GEFanuc,C2K"), which is why the diffstat has so many deletions.

  And many other small improvements & fixes.

  There's a few out-of-area changes. Some minor ftrace changes OK'ed by
  Steve, and a fix to our powernv cpuidle driver. Then there's a series
  touching mm, x86 and fs/proc/task_mmu.c, which cleans up some details
  around pkey support. It was ack'ed/reviewed by Ingo & Dave and has
  been in next for several weeks.

  Thanks to: Akshay Adiga, Alastair D'Silva, Alexey Kardashevskiy, Al
  Viro, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Arnd
  Bergmann, Balbir Singh, Cédric Le Goater, Christophe Leroy, Christophe
  Lombard, Colin Ian King, Dave Hansen, Fabio Estevam, Finn Thain,
  Frederic Barrat, Gautham R. Shenoy, Haren Myneni, Hari Bathini, Ingo
  Molnar, Jonathan Neuschäfer, Josh Poimboeuf, Kamalesh Babulal,
  Madhavan Srinivasan, Mahesh Salgaonkar, Mark Greer, Mathieu Malaterre,
  Matthew Wilcox, Michael Neuling, Michal Suchanek, Naveen N. Rao,
  Nicholas Piggin, Nicolai Stange, Olof Johansson, Paul Gortmaker, Paul
  Mackerras, Peter Rosin, Pridhiviraj Paidipeddi, Ram Pai, Rashmica
  Gupta, Ravi Bangoria, Russell Currey, Sam Bobroff, Samuel
  Mendoza-Jonas, Segher Boessenkool, Shilpasri G Bhat, Simon Guo,
  Souptick Joarder, Stewart Smith, Thiago Jung Bauermann, Torsten Duwe,
  Vaibhav Jain, Wei Yongjun, Wolfram Sang, Yisheng Xie, YueHaibing"

* tag 'powerpc-4.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (251 commits)
  powerpc/64s/radix: Fix missing ptesync in flush_cache_vmap
  cpuidle: powernv: Fix promotion from snooze if next state disabled
  powerpc: fix build failure by disabling attribute-alias warning in pci_32
  ocxl: Fix missing unlock on error in afu_ioctl_enable_p9_wait()
  powerpc-opal: fix spelling mistake "Uniterrupted" -> "Uninterrupted"
  powerpc: fix spelling mistake: "Usupported" -> "Unsupported"
  powerpc/pkeys: Detach execute_only key on !PROT_EXEC
  powerpc/powernv: copy/paste - Mask SO bit in CR
  powerpc: Remove core support for Marvell mv64x60 hostbridges
  powerpc/boot: Remove core support for Marvell mv64x60 hostbridges
  powerpc/boot: Remove support for Marvell mv64x60 i2c controller
  powerpc/boot: Remove support for Marvell MPSC serial controller
  powerpc/embedded6xx: Remove C2K board support
  powerpc/lib: optimise PPC32 memcmp
  powerpc/lib: optimise 32 bits __clear_user()
  powerpc/time: inline arch_vtime_task_switch()
  powerpc/Makefile: set -mcpu=860 flag for the 8xx
  powerpc: Implement csum_ipv6_magic in assembly
  powerpc/32: Optimise __csum_partial()
  powerpc/lib: Adjust .balign inside string functions for PPC32
  ...
2018-06-07 10:23:33 -07:00
Linus Torvalds
d987f62cce \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAlsZTnUACgkQnJ2qBz9k
 QNmSlwf6AwQFhSrrPTOwL9OIUzPnjvFC/Yk2CRmQzC+6jkFoS+cRPsCwDeGOP9KK
 9Col8Dr3M72PLKbyA3Z46YzXsRCB+pifgQKDjpMNKZJxuHSd8JmvAgEuLs8a2119
 4ZM0nNwhFJ2fnrpUwBDKoGB7m+Xb+xxnEY4hQ+jFPrDBuXmcjOz41al4EcZeb1YE
 Eu6zpmB4jIqkOY+LxMtHRSE7GlDAP6g9ERYodjsL/+Vg424wHLRQb/IVaXngalGS
 es1cMOoIqfT0rsODweXP4aGOY+z+Am+Htspqg7mrWk9zu/lP/57vS9Kwy/VsOQfp
 zRoDzuMnxumXVjTjHnd1Y9NIJw+gYg==
 =wZmz
 -----END PGP SIGNATURE-----

Merge tag 'udf_for_v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull udf updates from Jan Kara:
 "UDF support for UTF-16 characters in file names"

* tag 'udf_for_v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  udf: Add support for decoding UTF-16 characters
  udf: Add support for encoding UTF-16 characters
  udf: Push sb argument to udf_name_[to|from]_CS0()
  udf: Convert ident strings to proper charset
  udf: Use UTF-32 <-> UTF-8 conversion functions from NLS
  udf: Always require NLS support
2018-06-07 09:36:29 -07:00
Linus Torvalds
091a0f2785 orangefs: fixes and cleanups
+ fix some sparse warnings
  + cleanup some code formatting
  + fix up some attribute/meta-data related code
 -----BEGIN PGP SIGNATURE-----
 
 iQIbBAABAgAGBQJbGUIEAAoJEM9EDqnrzg2+3ygP9RPTSd2Mh/vyzokxmBacT/iW
 CrJXdrJkG+Yb/adGuT2UAPIMg73aGLqAPMN8dHhfhzotdd1BgrhMK+zZECUClwOf
 dxQCA+RFn1YofXMojqWYAFtJ6ptJgMDJxje62hS+QCWzDz0IM1PYr2qCQPs1IRmb
 CWVuP1igwt82VIVLey8YEuI5yuOnFOSO+7rKi7TPsY991oPxxz2brmrMuiW2Dg0W
 6MCP19XUBpxdDJWtL2GWXAwgkP9JYkhokwkAloqWXoOpDUf0EniomuRMBLOtISoO
 tJfgnYIN+7sRTSbqsNtKQrm7f8/B01t+tGuzJYDGlUls02V3DkS9EjypvP+VZ6e9
 bsN3oQHD6xFx/7egxUVBRFD6Hw83j58DH1N+79whVrU3ospkj3z0QX1g6zqgY2ew
 mgileXBAJQD+SUr0F42QqeFVexjky9dKtDZLQe42jxLZXpPly2t737h4ZatdW3ro
 yKJLUzuHbLGQzgGJOSQsSfmwUkSeTilgCV6WFyJWinlxUWQEkp0F+1u5B4RX87BQ
 SHUz5O3pWDpz+kLtauzsVlXDbBQfZHimIbnKWHZoQ5dER/LQfWBfvOz+CT/5uBBg
 qhdPZ4wKLD4qvzByuZJKyYmPXpUoOWh/i5QcvJ8B8rNxCx8XEP/VoikjUqkCUajb
 4HFQwnZtnQOLfnufTTw=
 =vwvd
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.18-ofs' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux

Pull orangefs updates from Mike Marshall:
 "Fixes and cleanups:

   - fix some sparse warnings

   - cleanup some code formatting

   - fix up some attribute/meta-data related code"

* tag 'for-linus-4.18-ofs' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
  orangefs: use sparse annotations for holding locks across function calls.
  orangefs: make debug_help_fops static
  orangefs: remove unused function orangefs_get_bufmap_init
  orangefs: specify user pointers when using dev_map_desc and bufmap
  orangefs: formatting cleanups
  orangefs: set i_size on new symlink
  orangefs: report attributes_mask and attributes for statx
  orangefs: make struct orangefs_file_vm_ops static
  orangefs: revamp block sizes
2018-06-07 09:23:12 -07:00
Linus Torvalds
70f2ae1f00 overlayfs fixes for 4.18
This contains a fix for the vfs_mkdir() issue discovered by Al, as well as
 other fixes and cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCWxatHQAKCRDh3BK/laaZ
 POg5AP95a/uUOrTJeTsENJwTmyAwHed9a6y4abKtvNErxUm4awD9FmhyYXodzJNq
 9/mheT4kV2XkR/KkxI5sizfT1uPuvgA=
 =+ljQ
 -----END PGP SIGNATURE-----

Merge tag 'ovl-fixes-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs

Pull overlayfs fixes from Miklos Szeredi:
 "This contains a fix for the vfs_mkdir() issue discovered by Al, as
  well as other fixes and cleanups"

* tag 'ovl-fixes-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: use inode_insert5() to hash a newly created inode
  ovl: Pass argument to ovl_get_inode() in a structure
  vfs: factor out inode_insert5()
  ovl: clean up copy-up error paths
  ovl: return EIO on internal error
  ovl: make ovl_create_real() cope with vfs_mkdir() safely
  ovl: create helper ovl_create_temp()
  ovl: return dentry from ovl_create_real()
  ovl: struct cattr cleanups
  ovl: strip debug argument from ovl_do_ helpers
  ovl: remove WARN_ON() real inode attributes mismatch
  ovl: Kconfig documentation fixes
  ovl: update documentation for unionmount-testsuite
2018-06-07 08:53:50 -07:00
Linus Torvalds
da315f6e03 fuse update for 4.18
The most interesting part of this update is user namespace support, mostly
 done by Eric Biederman.  This enables safe unprivileged fuse mounts within
 a user namespace.
 
 There are also a couple of fixes for bugs found by syzbot and miscellaneous
 fixes and cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCWxanJAAKCRDh3BK/laaZ
 PJjDAP4r6f4kL/5DZxK7JSnSue8BHESGD1LCMVgL57e9WmZukgD/cOtaO85ie3lh
 DWuhX5xGZVMMX4frIGLfBn8ogSS+egw=
 =3luD
 -----END PGP SIGNATURE-----

Merge tag 'fuse-update-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse

Pull fuse updates from Miklos Szeredi:
 "The most interesting part of this update is user namespace support,
  mostly done by Eric Biederman. This enables safe unprivileged fuse
  mounts within a user namespace.

  There are also a couple of fixes for bugs found by syzbot and
  miscellaneous fixes and cleanups"

* tag 'fuse-update-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: don't keep dead fuse_conn at fuse_fill_super().
  fuse: fix control dir setup and teardown
  fuse: fix congested state leak on aborted connections
  fuse: Allow fully unprivileged mounts
  fuse: Ensure posix acls are translated outside of init_user_ns
  fuse: add writeback documentation
  fuse: honor AT_STATX_FORCE_SYNC
  fuse: honor AT_STATX_DONT_SYNC
  fuse: Restrict allow_other to the superblock's namespace or a descendant
  fuse: Support fuse filesystems outside of init_user_ns
  fuse: Fail all requests with invalid uids or gids
  fuse: Remove the buggy retranslation of pids in fuse_dev_do_read
  fuse: return -ECONNABORTED on /dev/fuse read after abort
  fuse: atomic_o_trunc should truncate pagecache
2018-06-07 08:50:57 -07:00
Souptick Joarder
a528a24150 btrfs: change return type of btrfs_page_mkwrite to vm_fault_t
Use the new return type vm_fault_t for fault handler. For now, this is
just documenting that the function returns a VM_FAULT value rather than
an errno. Once all instances are converted, vm_fault_t will become a
distinct type.

Reference commit 1c8f422059 ("mm: change return type to vm_fault_t")

vmf_error() is the newly introduced inline function in 4.17-rc6.

Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-06-07 17:27:45 +02:00
Sascha Hauer
e1db654d8e ubifs: lpt: Fix wrong pnode number range in comment
The comment above pnode_lookup claims the range for the pnode number is
from 0 to main_lebs - 1. This is wrong because every pnode has
informations about UBIFS_LPT_FANOUT LEBs, thus the corrent range is
0 to to (main_lebs - 1) / UBIFS_LPT_FANOUT.

Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
2018-06-07 15:53:15 +02:00
Sascha Hauer
28e5dfd842 ubifs: gc: Fix typo
"point of view" makes more sense than "point of few". Fix this.

Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
2018-06-07 15:53:15 +02:00
Sascha Hauer
71d561f026 ubifs: log: Some spelling fixes
- add missing article
- remove misplaced 'it'
- s/tress/trees

Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
2018-06-07 15:53:15 +02:00
Sascha Hauer
c7e593b3bd ubifs: Spelling fix someting -> something
Replace "someting" with "something"

Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
2018-06-07 15:53:14 +02:00
Sascha Hauer
671b9b75f6 ubifs: journal: Remove wrong comment
In the description of reserve_space() it is claimed that write_node()
and write_head() unlock the journal head. This is not true and has never
been true. All callers of write_node() and write_head() call
release_head() themselves. Remove the wrong comment.

Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
2018-06-07 15:53:14 +02:00
Sascha Hauer
c971dad849 ubifs: remove set but never used variable
replay_sqnum is set but never used. Remove it.

Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
2018-06-07 15:53:14 +02:00
Wang Shilong
422edacec0 ubifs, xattr: remove misguided quota flags
Originally, Yang Dongsheng added quota support
for ubifs, but it turned out upstream won't accept it.

Since ubifs don't touch any quota code, S_NOQUOTA flag
is misguided here, and currently it is mainly used to
avoid recursion for system quota files.

Let's make things clearly and remove unnecessary and
misguied quota flags here.

Reported-by: Rock Lee <rockdotlee@gmail.com>
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2018-06-07 15:53:14 +02:00
Souptick Joarder
31c49eac78 fs: ubifs: Adding new return type vm_fault_t
Use new return type vm_fault_t for page_mkwrite handler.

Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2018-06-07 15:53:13 +02:00
Steve French
c7c137b931 smb3: do not allow insecure cifs mounts when using smb3
if mounting as smb3 do not allow cifs (vers=1.0) or insecure vers=2.0
mounts.

For example:
root@smf-Thinkpad-P51:~/cifs-2.6# mount -t smb3 //127.0.0.1/scratch /mnt -o username=testuser,password=Testpass1
root@smf-Thinkpad-P51:~/cifs-2.6# umount /mnt
root@smf-Thinkpad-P51:~/cifs-2.6# mount -t smb3 //127.0.0.1/scratch /mnt -o username=testuser,password=Testpass1,vers=1.0
mount: /mnt: wrong fs type, bad option, bad superblock on //127.0.0.1/scratch ...
root@smf-Thinkpad-P51:~/cifs-2.6# dmesg | grep smb3
[ 4302.200122] CIFS VFS: vers=1.0 (cifs) not permitted when mounting with smb3
root@smf-Thinkpad-P51:~/cifs-2.6# mount -t smb3 //127.0.0.1/scratch /mnt -o username=testuser,password=Testpass1,vers=3.11

Signed-off-by: Steve French <stfrench@microsoft.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Sachin Prabhu <sprabhu@redhat.com>
2018-06-07 08:36:39 -05:00
Aurelien Aptel
8ddecf5fd7 CIFS: Fix NULL ptr deref
cifs->master_tlink is NULL against Win Server 2016 (which is
strange.. not sure why) and is dereferenced in cifs_sb_master_tcon().

move master_tlink getter to cifsglob.h so it can be used from
smb2misc.c

Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-06-07 08:31:31 -05:00
Robbie Ko
9d311e11fc Btrfs: fiemap: pass correct bytenr when fm_extent_count is zero
[BUG]
fm_mapped_extents is not correct when fm_extent_count is 0
Like:
   # mount /dev/vdb5 /mnt/btrfs
   # dd if=/dev/zero bs=16K count=4 oflag=dsync of=/mnt/btrfs/file
   # xfs_io -c "fiemap -v" /mnt/btrfs/file
   /mnt/btrfs/file:
   EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
     0: [0..127]:        25088..25215       128   0x1

When user space wants to get the number of file extents,
set fm_extent_count to 0 to run fiemap and then read fm_mapped_extents.

In the above example, fiemap will return with fm_mapped_extents set to 4,
but it should be 1 since there's only one entry in the output.

[REASON]
The problem seems to be that disko is only set if
fieinfo->fi_extents_max is set. And this member is initialized, in the
generic ioctl_fiemap function, to the value of used-passed
fm_extent_count. So when the user passes 0 then fi_extent_max is also
set to zero and this causes btrfs to not initialize disko at all.
Eventually this leads emit_fiemap_extent being called with a bogus
'phys' argument preventing proper fiemap entries merging.

[FIX]
Move the disko initialization earlier in extent_fiemap making it
independent of user-passed arguments, allowing emit_fiemap_extent to
properly handle consecutive extent entries.

Signed-off-by: Robbie Ko <robbieko@synology.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-06-07 14:26:29 +02:00
Linus Torvalds
1c8c5a9d38 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) Add Maglev hashing scheduler to IPVS, from Inju Song.

 2) Lots of new TC subsystem tests from Roman Mashak.

 3) Add TCP zero copy receive and fix delayed acks and autotuning with
    SO_RCVLOWAT, from Eric Dumazet.

 4) Add XDP_REDIRECT support to mlx5 driver, from Jesper Dangaard
    Brouer.

 5) Add ttl inherit support to vxlan, from Hangbin Liu.

 6) Properly separate ipv6 routes into their logically independant
    components. fib6_info for the routing table, and fib6_nh for sets of
    nexthops, which thus can be shared. From David Ahern.

 7) Add bpf_xdp_adjust_tail helper, which can be used to generate ICMP
    messages from XDP programs. From Nikita V. Shirokov.

 8) Lots of long overdue cleanups to the r8169 driver, from Heiner
    Kallweit.

 9) Add BTF ("BPF Type Format"), from Martin KaFai Lau.

10) Add traffic condition monitoring to iwlwifi, from Luca Coelho.

11) Plumb extack down into fib_rules, from Roopa Prabhu.

12) Add Flower classifier offload support to igb, from Vinicius Costa
    Gomes.

13) Add UDP GSO support, from Willem de Bruijn.

14) Add documentation for eBPF helpers, from Quentin Monnet.

15) Add TLS tx offload to mlx5, from Ilya Lesokhin.

16) Allow applications to be given the number of bytes available to read
    on a socket via a control message returned from recvmsg(), from
    Soheil Hassas Yeganeh.

17) Add x86_32 eBPF JIT compiler, from Wang YanQing.

18) Add AF_XDP sockets, with zerocopy support infrastructure as well.
    From Björn Töpel.

19) Remove indirect load support from all of the BPF JITs and handle
    these operations in the verifier by translating them into native BPF
    instead. From Daniel Borkmann.

20) Add GRO support to ipv6 gre tunnels, from Eran Ben Elisha.

21) Allow XDP programs to do lookups in the main kernel routing tables
    for forwarding. From David Ahern.

22) Allow drivers to store hardware state into an ELF section of kernel
    dump vmcore files, and use it in cxgb4. From Rahul Lakkireddy.

23) Various RACK and loss detection improvements in TCP, from Yuchung
    Cheng.

24) Add TCP SACK compression, from Eric Dumazet.

25) Add User Mode Helper support and basic bpfilter infrastructure, from
    Alexei Starovoitov.

26) Support ports and protocol values in RTM_GETROUTE, from Roopa
    Prabhu.

27) Support bulking in ->ndo_xdp_xmit() API, from Jesper Dangaard
    Brouer.

28) Add lots of forwarding selftests, from Petr Machata.

29) Add generic network device failover driver, from Sridhar Samudrala.

* ra.kernel.org:/pub/scm/linux/kernel/git/davem/net-next: (1959 commits)
  strparser: Add __strp_unpause and use it in ktls.
  rxrpc: Fix terminal retransmission connection ID to include the channel
  net: hns3: Optimize PF CMDQ interrupt switching process
  net: hns3: Fix for VF mailbox receiving unknown message
  net: hns3: Fix for VF mailbox cannot receiving PF response
  bnx2x: use the right constant
  Revert "net: sched: cls: Fix offloading when ingress dev is vxlan"
  net: dsa: b53: Fix for brcm tag issue in Cygnus SoC
  enic: fix UDP rss bits
  netdev-FAQ: clarify DaveM's position for stable backports
  rtnetlink: validate attributes in do_setlink()
  mlxsw: Add extack messages for port_{un, }split failures
  netdevsim: Add extack error message for devlink reload
  devlink: Add extack to reload and port_{un, }split operations
  net: metrics: add proper netlink validation
  ipmr: fix error path when ipmr_new_table fails
  ip6mr: only set ip6mr_table from setsockopt when ip6mr_new_table succeeds
  net: hns3: remove unused hclgevf_cfg_func_mta_filter
  netfilter: provide udp*_lib_lookup for nf_tproxy
  qed*: Utilize FW 8.37.2.0
  ...
2018-06-06 18:39:49 -07:00
Linus Torvalds
2857676045 - Introduce arithmetic overflow test helper functions (Rasmus)
- Use overflow helpers in 2-factor allocators (Kees, Rasmus)
 - Introduce overflow test module (Rasmus, Kees)
 - Introduce saturating size helper functions (Matthew, Kees)
 - Treewide use of struct_size() for allocators (Kees)
 -----BEGIN PGP SIGNATURE-----
 Comment: Kees Cook <kees@outflux.net>
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAlsYJ1gWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJlCTEACwdEeriAd2VwxknnsstojGD/3g
 8TTFA19vSu4Gxa6WiDkjGoSmIlfhXTlZo1Nlmencv16ytSvIVDNLUIB3uDxUIv1J
 2+dyHML9JpXYHHR7zLXXnGFJL0wazqjbsD3NYQgXqmun7EVVYnOsAlBZ7h/Lwiej
 jzEJd8DaHT3TA586uD3uggiFvQU0yVyvkDCDONIytmQx+BdtGdg9TYCzkBJaXuDZ
 YIthyKDvxIw5nh/UaG3L+SKo73tUr371uAWgAfqoaGQQCWe+mxnWL4HkCKsjFzZL
 u9ouxxF/n6pij3E8n6rb0i2fCzlsTDdDF+aqV1rQ4I4hVXCFPpHUZgjDPvBWbj7A
 m6AfRHVNnOgI8HGKqBGOfViV+2kCHlYeQh3pPW33dWzy/4d/uq9NIHKxE63LH+S4
 bY3oO2ela8oxRyvEgXLjqmRYGW1LB/ZU7FS6Rkx2gRzo4k8Rv+8K/KzUHfFVRX61
 jEbiPLzko0xL9D53kcEn0c+BhofK5jgeSWxItdmfuKjLTW4jWhLRlU+bcUXb6kSS
 S3G6aF+L+foSUwoq63AS8QxCuabuhreJSB+BmcGUyjthCbK/0WjXYC6W/IJiRfBa
 3ZTxBC/2vP3uq/AGRNh5YZoxHL8mSxDfn62F+2cqlJTTKR/O+KyDb1cusyvk3H04
 KCDVLYPxwQQqK1Mqig==
 =/3L8
 -----END PGP SIGNATURE-----

Merge tag 'overflow-v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull overflow updates from Kees Cook:
 "This adds the new overflow checking helpers and adds them to the
  2-factor argument allocators. And this adds the saturating size
  helpers and does a treewide replacement for the struct_size() usage.
  Additionally this adds the overflow testing modules to make sure
  everything works.

  I'm still working on the treewide replacements for allocators with
  "simple" multiplied arguments:

     *alloc(a * b, ...) -> *alloc_array(a, b, ...)

  and

     *zalloc(a * b, ...) -> *calloc(a, b, ...)

  as well as the more complex cases, but that's separable from this
  portion of the series. I expect to have the rest sent before -rc1
  closes; there are a lot of messy cases to clean up.

  Summary:

   - Introduce arithmetic overflow test helper functions (Rasmus)

   - Use overflow helpers in 2-factor allocators (Kees, Rasmus)

   - Introduce overflow test module (Rasmus, Kees)

   - Introduce saturating size helper functions (Matthew, Kees)

   - Treewide use of struct_size() for allocators (Kees)"

* tag 'overflow-v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  treewide: Use struct_size() for devm_kmalloc() and friends
  treewide: Use struct_size() for vmalloc()-family
  treewide: Use struct_size() for kmalloc()-family
  device: Use overflow helpers for devm_kmalloc()
  mm: Use overflow helpers in kvmalloc()
  mm: Use overflow helpers in kmalloc_array*()
  test_overflow: Add memory allocation overflow tests
  overflow.h: Add allocation size calculation helpers
  test_overflow: Report test failures
  test_overflow: macrofy some more, do more tests for free
  lib: add runtime test of check_*_overflow functions
  compiler.h: enable builtin overflow checkers and add fallback code
2018-06-06 17:27:14 -07:00
Aurelien Aptel
83210ba6f8 CIFS: fix encryption in SMB3.1.1
The smb2 hdr is now in iov 1

Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.de>
Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-06-06 16:50:31 -05:00
Arnd Bergmann
4bb8b65a04 xfs: fix string handling in label get/set functions
[sandeen: fix subject, avoid copy-out of uninit data in getlabel]

gcc-8 reports two warnings for the newly added getlabel/setlabel code:

fs/xfs/xfs_ioctl.c: In function 'xfs_ioc_getlabel':
fs/xfs/xfs_ioctl.c:1822:38: error: argument to 'sizeof' in 'strncpy' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess]
  strncpy(label, sbp->sb_fname, sizeof(sbp->sb_fname));
                                      ^
In function 'strncpy',
    inlined from 'xfs_ioc_setlabel' at /git/arm-soc/fs/xfs/xfs_ioctl.c:1863:2,
    inlined from 'xfs_file_ioctl' at /git/arm-soc/fs/xfs/xfs_ioctl.c:1918:10:
include/linux/string.h:254:9: error: '__builtin_strncpy' output may be truncated copying 12 bytes from a string of length 12 [-Werror=stringop-truncation]
  return __builtin_strncpy(p, q, size);

In both cases, part of the problem is that one of the strncpy()
arguments is a fixed-length character array with zero-padding rather
than a zero-terminated string. In the first one case, we also get an
odd warning about sizeof-pointer-memaccess, which doesn't seem right
(the sizeof is for an array that happens to be the same as the second
strncpy argument).

To work around the bogus warning, I use a plain 'XFSLABEL_MAX' for
the strncpy() length when copying the label in getlabel. For setlabel(),
using memcpy() with the correct length that is already known avoids
the second warning and is slightly simpler.

In a related issue, it appears that we accidentally skip the trailing
\0 when copying a 12-character label back to user space in getlabel().
Using the correct sizeof() argument here copies the extra character.

Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85602
Fixes: f7664b3197 ("xfs: implement online get/set fs label")
Cc: Eric Sandeen <sandeen@redhat.com>
Cc: Martin Sebor <msebor@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-06 14:17:53 -07:00
Dave Chinner
0b61f8a407 xfs: convert to SPDX license tags
Remove the verbose license text from XFS files and replace them
with SPDX tags. This does not change the license of any of the code,
merely refers to the common, up-to-date license files in LICENSES/

This change was mostly scripted. fs/xfs/Makefile and
fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected
and modified by the following command:

for f in `git grep -l "GNU General" fs/xfs/` ; do
	echo $f
	cat $f | awk -f hdr.awk > $f.new
	mv -f $f.new $f
done

And the hdr.awk script that did the modification (including
detecting the difference between GPL-2.0 and GPL-2.0+ licenses)
is as follows:

$ cat hdr.awk
BEGIN {
	hdr = 1.0
	tag = "GPL-2.0"
	str = ""
}

/^ \* This program is free software/ {
	hdr = 2.0;
	next
}

/any later version./ {
	tag = "GPL-2.0+"
	next
}

/^ \*\// {
	if (hdr > 0.0) {
		print "// SPDX-License-Identifier: " tag
		print str
		print $0
		str=""
		hdr = 0.0
		next
	}
	print $0
	next
}

/^ \* / {
	if (hdr > 1.0)
		next
	if (hdr > 0.0) {
		if (str != "")
			str = str "\n"
		str = str $0
		next
	}
	print $0
	next
}

/^ \*/ {
	if (hdr > 0.0)
		next
	print $0
	next
}

// {
	if (hdr > 0.0) {
		if (str != "")
			str = str "\n"
		str = str $0
		next
	}
	print $0
}

END { }
$

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-06 14:17:53 -07:00
Kees Cook
acafe7e302 treewide: Use struct_size() for kmalloc()-family
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct foo {
    int stuff;
    void *entry[];
};

instance = kmalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL);

This patch makes the changes for kmalloc()-family (and kvmalloc()-family)
uses. It was done via automatic conversion with manual review for the
"CHECKME" non-standard cases noted below, using the following Coccinelle
script:

// pkey_cache = kmalloc(sizeof *pkey_cache + tprops->pkey_tbl_len *
//                      sizeof *pkey_cache->table, GFP_KERNEL);
@@
identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc";
expression GFP;
identifier VAR, ELEMENT;
expression COUNT;
@@

- alloc(sizeof(*VAR) + COUNT * sizeof(*VAR->ELEMENT), GFP)
+ alloc(struct_size(VAR, ELEMENT, COUNT), GFP)

// mr = kzalloc(sizeof(*mr) + m * sizeof(mr->map[0]), GFP_KERNEL);
@@
identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc";
expression GFP;
identifier VAR, ELEMENT;
expression COUNT;
@@

- alloc(sizeof(*VAR) + COUNT * sizeof(VAR->ELEMENT[0]), GFP)
+ alloc(struct_size(VAR, ELEMENT, COUNT), GFP)

// Same pattern, but can't trivially locate the trailing element name,
// or variable name.
@@
identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc";
expression GFP;
expression SOMETHING, COUNT, ELEMENT;
@@

- alloc(sizeof(SOMETHING) + COUNT * sizeof(ELEMENT), GFP)
+ alloc(CHECKME_struct_size(&SOMETHING, ELEMENT, COUNT), GFP)

Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-06 11:15:43 -07:00
Chuck Lever
025bb9f872 NFSv4.0: Remove transport protocol name from non-UCS client ID
Commit 69dd716c5f ("NFSv4: Add socket proto argument to
setclientid") (2007) added the transport protocol name to the client
ID string, but the patch description doesn't explain why this was
necessary.

At that time, the only transport protocol name that would have been
used is "tcp" (for both IPv4 and IPv6), resulting in no additional
distinctiveness of the client ID string.

Since there is one client instance, the server should recognize it's
state whether the client is connecting via TCP or RDMA. Same client,
same lease.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-06 11:45:52 -04:00
Chuck Lever
848a4eb2e3 NFSv4.0: Remove cl_ipaddr from non-UCS client ID
It is possible for two distinct clients to have the same cl_ipaddr:

 - if the client admin disables callback with clientaddr=0.0.0.0 on
   more than one client

 - if two clients behind separate NATs use the same private subnet
   number

 - if the client admin specifies the same address via clientaddr=
   mount option (pointing the server at the same NAT box, for
   example)

Because of the way the Linux NFSv4.0 client constructs its client
ID string by default, such clients could interfere with each others'
lease state when mounting the same server:

	scnprintf(str, len, "Linux NFSv4.0 %s/%s %s",
		clp->cl_ipaddr,
		rpc_peeraddr2str(clp->cl_rpcclient, RPC_DISPLAY_ADDR),
		rpc_peeraddr2str(clp->cl_rpcclient, RPC_DISPLAY_PROTO));

cl_ipaddr is set to the value of the clientaddr= mount option. Two
clients whose addresses are 192.168.3.77 that mount the same server
(whose public IP address is, say, 3.4.5.6) would both generate the
same client ID string when sending a SETCLIENTID:

  Linux NFSv4.0 192.168.3.77/3.4.5.6 tcp

and thus the server would not be able to distinguish the clients'
leases. If both clients are using AUTH_SYS when sending SETCLIENTID
then the server could possibly permit the two clients to interfere
with or purge each others' leases.

To better ensure that Linux's NFSv4.0 client ID strings are distinct
in these cases, remove cl_ipaddr from the client ID string and
replace it with something more likely to be unique. Note that the
replacement looks a lot like the uniform client ID string.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-06 11:45:44 -04:00
Dave Chinner
9e6c08d4a8 xfs: validate btree records on retrieval
So we don't check the validity of records as we walk the btree. When
there are corrupt records in the free space btree (e.g. zero
startblock/length or beyond EOAG) we just blindly use it and things
go bad from there. That leads to assert failures on debug kernels
like this:

XFS: Assertion failed: fs_is_ok, file: fs/xfs/libxfs/xfs_alloc.c, line: 450
....
Call Trace:
 xfs_alloc_fixup_trees+0x368/0x5c0
 xfs_alloc_ag_vextent_near+0x79a/0xe20
 xfs_alloc_ag_vextent+0x1d3/0x330
 xfs_alloc_vextent+0x5e9/0x870

Or crashes like this:

XFS (loop0): xfs_buf_find: daddr 0x7fb28 out of range, EOFS 0x8000
.....
BUG: unable to handle kernel NULL pointer dereference at 00000000000000c8
....
Call Trace:
 xfs_bmap_add_extent_hole_real+0x67d/0x930
 xfs_bmapi_write+0x934/0xc90
 xfs_da_grow_inode_int+0x27e/0x2f0
 xfs_dir2_grow_inode+0x55/0x130
 xfs_dir2_sf_to_block+0x94/0x5d0
 xfs_dir2_sf_addname+0xd0/0x590
 xfs_dir_createname+0x168/0x1a0
 xfs_rename+0x658/0x9b0

By checking that free space records pulled from the trees are
within the valid range, we catch many of these corruptions before
they can do damage.

This is a generic btree record checking deficiency. We need to
validate the records we fetch from all the different btrees before
we use them to catch corruptions like this.

This patch results in a corrupt record emitting an error message and
returning -EFSCORRUPTED, and the higher layers catch that and abort:

 XFS (loop0): Size Freespace BTree record corruption in AG 0 detected!
 XFS (loop0): start block 0x0 block count 0x0
 XFS (loop0): Internal error xfs_trans_cancel at line 1012 of file fs/xfs/xfs_trans.c.  Caller xfs_create+0x42a/0x670
 .....
 Call Trace:
  dump_stack+0x85/0xcb
  xfs_trans_cancel+0x19f/0x1c0
  xfs_create+0x42a/0x670
  xfs_generic_create+0x1f6/0x2c0
  vfs_create+0xf9/0x180
  do_mknodat+0x1f9/0x210
  do_syscall_64+0x5a/0x180
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
.....
 XFS (loop0): xfs_do_force_shutdown(0x8) called from line 1013 of file fs/xfs/xfs_trans.c.  Return address = ffffffff81500868
 XFS (loop0): Corruption of in-memory data detected.  Shutting down filesystem

Signed-off-by: Dave Chinner <dchinner@redhat.com>

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-06 08:12:00 -07:00
Dave Chinner
29cad0b3ed xfs: push corruption -> ESTALE conversion to xfs_nfs_get_inode()
In xfs_imap_to_bp(), we convert a -EFSCORRUPTED error to -EINVAL if
we are doing an untrusted lookup. This is done because we need
failed filehandle lookups to report -ESTALE to the caller, and it
does this by converting -EINVAL and -ENOENT errors to -ESTALE.

The squashing of EFSCORRUPTED in imap_to_bp makes it impossible for
for xfs_iget(UNTRUSTED) callers to determine the difference between
"inode does not exist" and "corruption detected during lookup". We
realy need that distinction in places calling xfS_iget(UNTRUSTED),
so move the filehandle error case handling all the way out to
xfs_nfs_get_inode() where it is needed.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-06 08:10:26 -07:00
Dave Chinner
541b5acc85 xfs: verify root inode more thoroughly
When looking up the root inode at mount time, we don't actually do
any verification to check that the inode is allocated and accounted
for correctly in the INOBT. Make the checks on the root inode more
robust by making it an untrusted lookup. This forces the inode
lookup to use the inode btree to verify the inode is allocated
and mapped correctly to disk. This will also have the effect of
catching a significant number of AGI/INOBT related corruptions in
AG 0 at mount time.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-06 08:10:26 -07:00
Dave Chinner
02a0fda875 xfs: verify COW extent size hint is valid in inode verifier
There are rules for vald extent size hints. We enforce them when
applications set them, but fuzzers violate those rules and that
screws us over. Validate COW extent size hint rules in the inode
verifier to catch this.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-06 08:10:26 -07:00
Dave Chinner
7d71a671a2 xfs: verify extent size hint is valid in inode verifier
There are rules for vald extent size hints. We enforce them when
applications set them, but fuzzers violate those rules and that
screws us over.

This results in alignment assertion failures when setting up
allocations such as this in direct IO:

XFS: Assertion failed: ap->length, file: fs/xfs/libxfs/xfs_bmap.c, line: 3432
....
Call Trace:
 xfs_bmap_btalloc+0x415/0x910
 xfs_bmapi_write+0x71c/0x12e0
 xfs_iomap_write_direct+0x2a9/0x420
 xfs_file_iomap_begin+0x4dc/0xa70
 iomap_apply+0x43/0x100
 iomap_file_buffered_write+0x62/0x90
 xfs_file_buffered_aio_write+0xba/0x300
 __vfs_write+0xd5/0x150
 vfs_write+0xb6/0x180
 ksys_write+0x45/0xa0
 do_syscall_64+0x5a/0x180
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

And from xfs_db:

core.extsize = 10380288

Which is not an integer multiple of the block size, and so violates
Rule #7 for setting extent size hints. Validate extent size hint
rules in the inode verifier to catch this.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-06 08:10:26 -07:00
Dave Chinner
fa4ca9c557 xfs: catch bad stripe alignment configurations
When stripe alignments are invalid, data alignment algorithms in the
allocator may not work correctly. Ensure we catch superblocks with
invalid stripe alignment setups at mount time. These data alignment
mismatches are now detected at mount time like this:

XFS (loop0): SB stripe unit sanity check failed
XFS (loop0): Metadata corruption detected at xfs_sb_read_verify+0xab/0x110, xfs_sb block 0xffffffffffffffff
XFS (loop0): Unmount and run xfs_repair
XFS (loop0): First 128 bytes of corrupted metadata buffer:
0000000091c2de02: 58 46 53 42 00 00 10 00 00 00 00 00 00 00 10 00  XFSB............
0000000023bff869: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00000000cdd8c893: 17 32 37 15 ff ca 46 3d 9a 17 d3 33 04 b5 f1 a2  .27...F=...3....
000000009fd2844f: 00 00 00 00 00 00 00 04 00 00 00 00 00 00 06 d0  ................
0000000088e9b0bb: 00 00 00 00 00 00 06 d1 00 00 00 00 00 00 06 d2  ................
00000000ff233a20: 00 00 00 01 00 00 10 00 00 00 00 01 00 00 00 00  ................
000000009db0ac8b: 00 00 03 60 e1 34 02 00 08 00 00 02 00 00 00 00  ...`.4..........
00000000f7022460: 00 00 00 00 00 00 00 00 0c 09 0b 01 0c 00 00 19  ................
XFS (loop0): SB validate failed with error -117.

And the mount fails.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-06 08:10:26 -07:00
Mathieu Desnoyers
d7822b1e24 rseq: Introduce restartable sequences system call
Expose a new system call allowing each thread to register one userspace
memory area to be used as an ABI between kernel and user-space for two
purposes: user-space restartable sequences and quick access to read the
current CPU number value from user-space.

* Restartable sequences (per-cpu atomics)

Restartables sequences allow user-space to perform update operations on
per-cpu data without requiring heavy-weight atomic operations.

The restartable critical sections (percpu atomics) work has been started
by Paul Turner and Andrew Hunter. It lets the kernel handle restart of
critical sections. [1] [2] The re-implementation proposed here brings a
few simplifications to the ABI which facilitates porting to other
architectures and speeds up the user-space fast path.

Here are benchmarks of various rseq use-cases.

Test hardware:

arm32: ARMv7 Processor rev 4 (v7l) "Cubietruck", 2-core
x86-64: Intel E5-2630 v3@2.40GHz, 16-core, hyperthreading

The following benchmarks were all performed on a single thread.

* Per-CPU statistic counter increment

                getcpu+atomic (ns/op)    rseq (ns/op)    speedup
arm32:                344.0                 31.4          11.0
x86-64:                15.3                  2.0           7.7

* LTTng-UST: write event 32-bit header, 32-bit payload into tracer
             per-cpu buffer

                getcpu+atomic (ns/op)    rseq (ns/op)    speedup
arm32:               2502.0                 2250.0         1.1
x86-64:               117.4                   98.0         1.2

* liburcu percpu: lock-unlock pair, dereference, read/compare word

                getcpu+atomic (ns/op)    rseq (ns/op)    speedup
arm32:                751.0                 128.5          5.8
x86-64:                53.4                  28.6          1.9

* jemalloc memory allocator adapted to use rseq

Using rseq with per-cpu memory pools in jemalloc at Facebook (based on
rseq 2016 implementation):

The production workload response-time has 1-2% gain avg. latency, and
the P99 overall latency drops by 2-3%.

* Reading the current CPU number

Speeding up reading the current CPU number on which the caller thread is
running is done by keeping the current CPU number up do date within the
cpu_id field of the memory area registered by the thread. This is done
by making scheduler preemption set the TIF_NOTIFY_RESUME flag on the
current thread. Upon return to user-space, a notify-resume handler
updates the current CPU value within the registered user-space memory
area. User-space can then read the current CPU number directly from
memory.

Keeping the current cpu id in a memory area shared between kernel and
user-space is an improvement over current mechanisms available to read
the current CPU number, which has the following benefits over
alternative approaches:

- 35x speedup on ARM vs system call through glibc
- 20x speedup on x86 compared to calling glibc, which calls vdso
  executing a "lsl" instruction,
- 14x speedup on x86 compared to inlined "lsl" instruction,
- Unlike vdso approaches, this cpu_id value can be read from an inline
  assembly, which makes it a useful building block for restartable
  sequences.
- The approach of reading the cpu id through memory mapping shared
  between kernel and user-space is portable (e.g. ARM), which is not the
  case for the lsl-based x86 vdso.

On x86, yet another possible approach would be to use the gs segment
selector to point to user-space per-cpu data. This approach performs
similarly to the cpu id cache, but it has two disadvantages: it is
not portable, and it is incompatible with existing applications already
using the gs segment selector for other purposes.

Benchmarking various approaches for reading the current CPU number:

ARMv7 Processor rev 4 (v7l)
Machine model: Cubietruck
- Baseline (empty loop):                                    8.4 ns
- Read CPU from rseq cpu_id:                               16.7 ns
- Read CPU from rseq cpu_id (lazy register):               19.8 ns
- glibc 2.19-0ubuntu6.6 getcpu:                           301.8 ns
- getcpu system call:                                     234.9 ns

x86-64 Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz:
- Baseline (empty loop):                                    0.8 ns
- Read CPU from rseq cpu_id:                                0.8 ns
- Read CPU from rseq cpu_id (lazy register):                0.8 ns
- Read using gs segment selector:                           0.8 ns
- "lsl" inline assembly:                                   13.0 ns
- glibc 2.19-0ubuntu6 getcpu:                              16.6 ns
- getcpu system call:                                      53.9 ns

- Speed (benchmark taken on v8 of patchset)

Running 10 runs of hackbench -l 100000 seems to indicate, contrary to
expectations, that enabling CONFIG_RSEQ slightly accelerates the
scheduler:

Configuration: 2 sockets * 8-core Intel(R) Xeon(R) CPU E5-2630 v3 @
2.40GHz (directly on hardware, hyperthreading disabled in BIOS, energy
saving disabled in BIOS, turboboost disabled in BIOS, cpuidle.off=1
kernel parameter), with a Linux v4.6 defconfig+localyesconfig,
restartable sequences series applied.

* CONFIG_RSEQ=n

avg.:      41.37 s
std.dev.:   0.36 s

* CONFIG_RSEQ=y

avg.:      40.46 s
std.dev.:   0.33 s

- Size

On x86-64, between CONFIG_RSEQ=n/y, the text size increase of vmlinux is
567 bytes, and the data size increase of vmlinux is 5696 bytes.

[1] https://lwn.net/Articles/650333/
[2] http://www.linuxplumbersconf.org/2013/ocw/system/presentations/1695/original/LPC%20-%20PerCpu%20Atomics.pdf

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Watson <davejwatson@fb.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Chris Lameter <cl@linux.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Andrew Hunter <ahh@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Maurer <bmaurer@fb.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-api@vger.kernel.org
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20151027235635.16059.11630.stgit@pjt-glaptop.roam.corp.google.com
Link: http://lkml.kernel.org/r/20150624222609.6116.86035.stgit@kitami.mtv.corp.google.com
Link: https://lkml.kernel.org/r/20180602124408.8430-3-mathieu.desnoyers@efficios.com
2018-06-06 11:58:31 +02:00
Linus Torvalds
af6c5d5e01 Merge branch 'for-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo:

 - make kworkers report the workqueue it is executing or has executed
   most recently in /proc/PID/comm (so they show up in ps/top)

 - CONFIG_SMP shuffle to move stuff which isn't necessary for UP builds
   inside CONFIG_SMP.

* 'for-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: move function definitions within CONFIG_SMP block
  workqueue: Make sure struct worker is accessible for wq_worker_comm()
  workqueue: Show the latest workqueue name in /proc/PID/{comm,stat,status}
  proc: Consolidate task->comm formatting into proc_task_name()
  workqueue: Set worker->desc to workqueue name by default
  workqueue: Make worker_attach/detach_pool() update worker->pool
  workqueue: Replace pool->attach_mutex with global wq_pool_attach_mutex
2018-06-05 17:31:33 -07:00
Deepa Dinamani
95582b0083 vfs: change inode times to use struct timespec64
struct timespec is not y2038 safe. Transition vfs to use
y2038 safe struct timespec64 instead.

The change was made with the help of the following cocinelle
script. This catches about 80% of the changes.
All the header file and logic changes are included in the
first 5 rules. The rest are trivial substitutions.
I avoid changing any of the function signatures or any other
filesystem specific data structures to keep the patch simple
for review.

The script can be a little shorter by combining different cases.
But, this version was sufficient for my usecase.

virtual patch

@ depends on patch @
identifier now;
@@
- struct timespec
+ struct timespec64
  current_time ( ... )
  {
- struct timespec now = current_kernel_time();
+ struct timespec64 now = current_kernel_time64();
  ...
- return timespec_trunc(
+ return timespec64_trunc(
  ... );
  }

@ depends on patch @
identifier xtime;
@@
 struct \( iattr \| inode \| kstat \) {
 ...
-       struct timespec xtime;
+       struct timespec64 xtime;
 ...
 }

@ depends on patch @
identifier t;
@@
 struct inode_operations {
 ...
int (*update_time) (...,
-       struct timespec t,
+       struct timespec64 t,
...);
 ...
 }

@ depends on patch @
identifier t;
identifier fn_update_time =~ "update_time$";
@@
 fn_update_time (...,
- struct timespec *t,
+ struct timespec64 *t,
 ...) { ... }

@ depends on patch @
identifier t;
@@
lease_get_mtime( ... ,
- struct timespec *t
+ struct timespec64 *t
  ) { ... }

@te depends on patch forall@
identifier ts;
local idexpression struct inode *inode_node;
identifier i_xtime =~ "^i_[acm]time$";
identifier ia_xtime =~ "^ia_[acm]time$";
identifier fn_update_time =~ "update_time$";
identifier fn;
expression e, E3;
local idexpression struct inode *node1;
local idexpression struct inode *node2;
local idexpression struct iattr *attr1;
local idexpression struct iattr *attr2;
local idexpression struct iattr attr;
identifier i_xtime1 =~ "^i_[acm]time$";
identifier i_xtime2 =~ "^i_[acm]time$";
identifier ia_xtime1 =~ "^ia_[acm]time$";
identifier ia_xtime2 =~ "^ia_[acm]time$";
@@
(
(
- struct timespec ts;
+ struct timespec64 ts;
|
- struct timespec ts = current_time(inode_node);
+ struct timespec64 ts = current_time(inode_node);
)

<+... when != ts
(
- timespec_equal(&inode_node->i_xtime, &ts)
+ timespec64_equal(&inode_node->i_xtime, &ts)
|
- timespec_equal(&ts, &inode_node->i_xtime)
+ timespec64_equal(&ts, &inode_node->i_xtime)
|
- timespec_compare(&inode_node->i_xtime, &ts)
+ timespec64_compare(&inode_node->i_xtime, &ts)
|
- timespec_compare(&ts, &inode_node->i_xtime)
+ timespec64_compare(&ts, &inode_node->i_xtime)
|
ts = current_time(e)
|
fn_update_time(..., &ts,...)
|
inode_node->i_xtime = ts
|
node1->i_xtime = ts
|
ts = inode_node->i_xtime
|
<+... attr1->ia_xtime ...+> = ts
|
ts = attr1->ia_xtime
|
ts.tv_sec
|
ts.tv_nsec
|
btrfs_set_stack_timespec_sec(..., ts.tv_sec)
|
btrfs_set_stack_timespec_nsec(..., ts.tv_nsec)
|
- ts = timespec64_to_timespec(
+ ts =
...
-)
|
- ts = ktime_to_timespec(
+ ts = ktime_to_timespec64(
...)
|
- ts = E3
+ ts = timespec_to_timespec64(E3)
|
- ktime_get_real_ts(&ts)
+ ktime_get_real_ts64(&ts)
|
fn(...,
- ts
+ timespec64_to_timespec(ts)
,...)
)
...+>
(
<... when != ts
- return ts;
+ return timespec64_to_timespec(ts);
...>
)
|
- timespec_equal(&node1->i_xtime1, &node2->i_xtime2)
+ timespec64_equal(&node1->i_xtime2, &node2->i_xtime2)
|
- timespec_equal(&node1->i_xtime1, &attr2->ia_xtime2)
+ timespec64_equal(&node1->i_xtime2, &attr2->ia_xtime2)
|
- timespec_compare(&node1->i_xtime1, &node2->i_xtime2)
+ timespec64_compare(&node1->i_xtime1, &node2->i_xtime2)
|
node1->i_xtime1 =
- timespec_trunc(attr1->ia_xtime1,
+ timespec64_trunc(attr1->ia_xtime1,
...)
|
- attr1->ia_xtime1 = timespec_trunc(attr2->ia_xtime2,
+ attr1->ia_xtime1 =  timespec64_trunc(attr2->ia_xtime2,
...)
|
- ktime_get_real_ts(&attr1->ia_xtime1)
+ ktime_get_real_ts64(&attr1->ia_xtime1)
|
- ktime_get_real_ts(&attr.ia_xtime1)
+ ktime_get_real_ts64(&attr.ia_xtime1)
)

@ depends on patch @
struct inode *node;
struct iattr *attr;
identifier fn;
identifier i_xtime =~ "^i_[acm]time$";
identifier ia_xtime =~ "^ia_[acm]time$";
expression e;
@@
(
- fn(node->i_xtime);
+ fn(timespec64_to_timespec(node->i_xtime));
|
 fn(...,
- node->i_xtime);
+ timespec64_to_timespec(node->i_xtime));
|
- e = fn(attr->ia_xtime);
+ e = fn(timespec64_to_timespec(attr->ia_xtime));
)

@ depends on patch forall @
struct inode *node;
struct iattr *attr;
identifier i_xtime =~ "^i_[acm]time$";
identifier ia_xtime =~ "^ia_[acm]time$";
identifier fn;
@@
{
+ struct timespec ts;
<+...
(
+ ts = timespec64_to_timespec(node->i_xtime);
fn (...,
- &node->i_xtime,
+ &ts,
...);
|
+ ts = timespec64_to_timespec(attr->ia_xtime);
fn (...,
- &attr->ia_xtime,
+ &ts,
...);
)
...+>
}

@ depends on patch forall @
struct inode *node;
struct iattr *attr;
struct kstat *stat;
identifier ia_xtime =~ "^ia_[acm]time$";
identifier i_xtime =~ "^i_[acm]time$";
identifier xtime =~ "^[acm]time$";
identifier fn, ret;
@@
{
+ struct timespec ts;
<+...
(
+ ts = timespec64_to_timespec(node->i_xtime);
ret = fn (...,
- &node->i_xtime,
+ &ts,
...);
|
+ ts = timespec64_to_timespec(node->i_xtime);
ret = fn (...,
- &node->i_xtime);
+ &ts);
|
+ ts = timespec64_to_timespec(attr->ia_xtime);
ret = fn (...,
- &attr->ia_xtime,
+ &ts,
...);
|
+ ts = timespec64_to_timespec(attr->ia_xtime);
ret = fn (...,
- &attr->ia_xtime);
+ &ts);
|
+ ts = timespec64_to_timespec(stat->xtime);
ret = fn (...,
- &stat->xtime);
+ &ts);
)
...+>
}

@ depends on patch @
struct inode *node;
struct inode *node2;
identifier i_xtime1 =~ "^i_[acm]time$";
identifier i_xtime2 =~ "^i_[acm]time$";
identifier i_xtime3 =~ "^i_[acm]time$";
struct iattr *attrp;
struct iattr *attrp2;
struct iattr attr ;
identifier ia_xtime1 =~ "^ia_[acm]time$";
identifier ia_xtime2 =~ "^ia_[acm]time$";
struct kstat *stat;
struct kstat stat1;
struct timespec64 ts;
identifier xtime =~ "^[acmb]time$";
expression e;
@@
(
( node->i_xtime2 \| attrp->ia_xtime2 \| attr.ia_xtime2 \) = node->i_xtime1  ;
|
 node->i_xtime2 = \( node2->i_xtime1 \| timespec64_trunc(...) \);
|
 node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
|
 node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
|
 stat->xtime = node2->i_xtime1;
|
 stat1.xtime = node2->i_xtime1;
|
( node->i_xtime2 \| attrp->ia_xtime2 \) = attrp->ia_xtime1  ;
|
( attrp->ia_xtime1 \| attr.ia_xtime1 \) = attrp2->ia_xtime2;
|
- e = node->i_xtime1;
+ e = timespec64_to_timespec( node->i_xtime1 );
|
- e = attrp->ia_xtime1;
+ e = timespec64_to_timespec( attrp->ia_xtime1 );
|
node->i_xtime1 = current_time(...);
|
 node->i_xtime2 = node->i_xtime1 = node->i_xtime3 =
- e;
+ timespec_to_timespec64(e);
|
 node->i_xtime1 = node->i_xtime3 =
- e;
+ timespec_to_timespec64(e);
|
- node->i_xtime1 = e;
+ node->i_xtime1 = timespec_to_timespec64(e);
)

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: <anton@tuxera.com>
Cc: <balbi@kernel.org>
Cc: <bfields@fieldses.org>
Cc: <darrick.wong@oracle.com>
Cc: <dhowells@redhat.com>
Cc: <dsterba@suse.com>
Cc: <dwmw2@infradead.org>
Cc: <hch@lst.de>
Cc: <hirofumi@mail.parknet.co.jp>
Cc: <hubcap@omnibond.com>
Cc: <jack@suse.com>
Cc: <jaegeuk@kernel.org>
Cc: <jaharkes@cs.cmu.edu>
Cc: <jslaby@suse.com>
Cc: <keescook@chromium.org>
Cc: <mark@fasheh.com>
Cc: <miklos@szeredi.hu>
Cc: <nico@linaro.org>
Cc: <reiserfs-devel@vger.kernel.org>
Cc: <richard@nod.at>
Cc: <sage@redhat.com>
Cc: <sfrench@samba.org>
Cc: <swhiteho@redhat.com>
Cc: <tj@kernel.org>
Cc: <trond.myklebust@primarydata.com>
Cc: <tytso@mit.edu>
Cc: <viro@zeniv.linux.org.uk>
2018-06-05 16:57:31 -07:00
Kees Cook
7aaa822ed0 pstore: Convert internal records to timespec64
This prepares pstore for converting the VFS layer to timespec64.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
2018-06-05 16:57:31 -07:00
Linus Torvalds
ec064d3c6b Driver core changes for 4.18-rc1
Here is the driver core patchset for 4.18-rc1.
 
 The large chunk of these are firmware core documentation and api
 updates.  Nothing major there, just better descriptions for others to be
 able to understand the firmware code better.  There's also a user for a
 new firmware api call.
 
 Other than that, there are some minor updates for debugfs, kernfs, and
 the driver core itself.
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWxbZAg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykbiACgu/2qqou1iV4GxOkvrj5wfsHD+lYAoNPNNDHu
 Qf3CCEKbxogF6YowDiAH
 =Dsqq
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the driver core patchset for 4.18-rc1.

  The large chunk of these are firmware core documentation and api
  updates. Nothing major there, just better descriptions for others to
  be able to understand the firmware code better. There's also a user
  for a new firmware api call.

  Other than that, there are some minor updates for debugfs, kernfs, and
  the driver core itself.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'driver-core-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (23 commits)
  driver core: hold dev's parent lock when needed
  driver-core: return EINVAL error instead of BUG_ON()
  driver core: add __printf verification to device_create_groups_vargs
  mm: memory_hotplug: use put_device() if device_register fail
  base: core: fix typo 'can by' to 'can be'
  debugfs: inode: debugfs_create_dir uses mode permission from parent
  debugfs: Re-use kstrtobool_from_user()
  Documentation: clarify firmware_class provenance and why we can't rename the module
  Documentation: remove stale firmware API reference
  Documentation: fix few typos and clarifications for the firmware loader
  ath10k: re-enable the firmware fallback mechanism for testmode
  ath10k: use firmware_request_nowarn() to load firmware
  firmware: add firmware_request_nowarn() - load firmware without warnings
  firmware_loader: make firmware_fallback_sysfs() print more useful
  firmware_loader: move kconfig FW_LOADER entries to its own file
  firmware_loader: replace ---help--- with help
  firmware_loader: enhance Kconfig documentation over FW_LOADER
  firmware_loader: document firmware_sysfs_fallback()
  firmware: rename fw_sysfs_fallback to firmware_fallback_sysfs()
  firmware: use () to terminate kernel-doc function names
  ...
2018-06-05 16:29:19 -07:00
Steve French
d5f07fb3ef CIFS: Pass page offset for encrypting
Encryption function needs to read data starting page offset from input
buffer.

This doesn't affect decryption path since it allocates its own page
buffers.

Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-06-05 17:46:24 -05:00
Long Li
4c0d2a5a64 CIFS: Pass page offset for calculating signature
When calculating signature for the packet, it needs to read into the
correct page offset for the data.

Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-06-05 17:44:30 -05:00
Long Li
7cf20bce77 CIFS: SMBD: Support page offset in memory registration
Change code to pass the correct page offset during memory registration for
RDMA read/write.

Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-06-05 17:43:59 -05:00
Long Li
6509f50cd1 CIFS: SMBD: Support page offset in RDMA recv
RDMA recv function needs to place data to the correct place starting at
page offset.

Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-06-05 17:43:27 -05:00
Long Li
b6903bcf0a CIFS: SMBD: Support page offset in RDMA send
The RDMA send function needs to look at offset in the request pages, and
send data starting from there.

Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-06-05 17:42:03 -05:00
Long Li
e8157b2729 CIFS: When sending data on socket, pass the correct page offset
It's possible that the offset is non-zero in the page to send, change the
function to pass this offset to socket.

Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-06-05 17:41:27 -05:00