Commit Graph

751108 Commits

Author SHA1 Message Date
Linus Torvalds e241e3f2bf virtio: feature
This adds reporting hugepage stats to virtio-balloon.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJaziF/AAoJECgfDbjSjVRpVu8H/Aw8MRgCDNx85w6HdruPeJWx
 NzRGAlZLaCnTc23PJ+bcAeribyPSeuTIj3M7QOMaY1fVGV8MmpQfS5lzdvmL9vJ/
 Lug/7f+QNYLlao1QlszVg+4n79BRtXvH6qOdS+nj8zvTbm/pCr3ec/yrBv4Rfqy5
 TWrZcceQ7Jhw/7EF7AFUxkmw2/TpRV/4yF9wOgDabshAytdN3PAzs38IYtOa+BLp
 bUiJTXGPeYe0M4qkZ6zfwU2fLZqc2DCSFAagPb8jU46OfcViH8/fYfPbm5kQ7X81
 LlSOg/ui6+ZJPHWzDjDy8N/HWpi0Qqbbdne60pKJC7dPlyQMRb2m5w6TqivmPyg=
 =QwFg
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio update from Michael Tsirkin:
 "This adds reporting hugepage stats to virtio-balloon"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  virtio_balloon: export hugetlb page allocation counts
2018-04-11 18:58:27 -07:00
Linus Torvalds e5c372280b IOMMU Updates for Linux v4.17
These updates come with:
 
 	- OF_IOMMU support for the Rockchip iommu driver so that it can
 	  use generic DT bindings
 
 	- Rework of locking in the AMD IOMMU interrupt remapping code to
 	  make it work better in RT kernels
 
 	- Support for improved iotlb flushing in the AMD IOMMU driver
 
 	- Support for 52-bit physical and virtual addressing in the
 	  ARM-SMMU
 
 	- Various other small fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJazi2BAAoJECvwRC2XARrjYVwP/AyXK7CjRvaiHFopIUO0WpwY
 V3GiKrODtSNHqPSKuFnqIssIhxZPw/SKFz6E/pe08pZ/pxHYTxeTL78Wz7D1+4Sp
 n0YokSM5qLb660OTQVnyKNCku8cEMCb9hkQ/75SFgwcILQYF93cZBDIdBn93OKVO
 6xAOE+tqd8Daulnk0YpdiCTFTJPzYHPl6B7scoUav26uaKxWeMJxeYe+EXC+4WQG
 U1u/jDiVXyllzGgRqqfrmO4L2acmsK8HL97hD4+m1URJKDlb8ho6xwaRThFZWqXS
 SbrYnvH0ruWGrLiQKmVUssw8FqbcXCzq3236g2O8jE4jqWSm70twg+q31iMjwD7v
 bwsJGMkk7aLrquv9Zpaylpf8tRECk5bjhTFC2zB0pdum5XLx47j0IHKWMLPYhkCz
 E0pBefvuhoSTbt/5X0urSRzH2Hk4ljEsM+QjlfH8SN3ALTljFjay607wbxC7t35M
 LEL5AuNsDDBddoJIi9D13CdJEZa4lps8dbpB8m40lQVvmiLPLcKraaG0RfKQ397T
 wsxhsDOQYM2FCwfUP3n8RTsMKRIp/UVkKY+2G7AsKofciSeulK6nDbrV7jFnitx4
 vTxbRgpNejJpqzKZG/W9lCGWk1BhmQK/Cbu6JW5IA4+ew9omWkFp61U6rtc645Te
 6cNEYBiMz/RZIiC2b18J
 =kte5
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull IOMMU updates from Joerg Roedel:

 - OF_IOMMU support for the Rockchip iommu driver so that it can use
   generic DT bindings

 - rework of locking in the AMD IOMMU interrupt remapping code to make
   it work better in RT kernels

 - support for improved iotlb flushing in the AMD IOMMU driver

 - support for 52-bit physical and virtual addressing in the ARM-SMMU

 - various other small fixes and cleanups

* tag 'iommu-updates-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (53 commits)
  iommu/io-pgtable-arm: Avoid warning with 32-bit phys_addr_t
  iommu/rockchip: Support sharing IOMMU between masters
  iommu/rockchip: Add runtime PM support
  iommu/rockchip: Fix error handling in init
  iommu/rockchip: Use OF_IOMMU to attach devices automatically
  iommu/rockchip: Use IOMMU device for dma mapping operations
  dt-bindings: iommu/rockchip: Add clock property
  iommu/rockchip: Control clocks needed to access the IOMMU
  iommu/rockchip: Fix TLB flush of secondary IOMMUs
  iommu/rockchip: Use iopoll helpers to wait for hardware
  iommu/rockchip: Fix error handling in attach
  iommu/rockchip: Request irqs in rk_iommu_probe()
  iommu/rockchip: Fix error handling in probe
  iommu/rockchip: Prohibit unbind and remove
  iommu/amd: Return proper error code in irq_remapping_alloc()
  iommu/amd: Make amd_iommu_devtable_lock a spin_lock
  iommu/amd: Drop the lock while allocating new irq remap table
  iommu/amd: Factor out setting the remap table for a devid
  iommu/amd: Use `table' instead `irt' as variable name in amd_iommu_update_ga()
  iommu/amd: Remove the special case from alloc_irq_table()
  ...
2018-04-11 18:50:41 -07:00
Linus Torvalds 1fe43114ea More power management updates for 4.17-rc1
- Rework the idle loop in order to prevent CPUs from spending too
    much time in shallow idle states by making it stop the scheduler
    tick before putting the CPU into an idle state only if the idle
    duration predicted by the idle governor is long enough.  That
    required the code to be reordered to invoke the idle governor
    before stopping the tick, among other things (Rafael Wysocki,
    Frederic Weisbecker, Arnd Bergmann).
 
  - Add the missing description of the residency sysfs attribute to
    the cpuidle documentation (Prashanth Prakash).
 
  - Finalize the cpufreq cleanup moving frequency table validation
    from drivers to the core (Viresh Kumar).
 
  - Fix a clock leak regression in the armada-37xx cpufreq driver
    (Gregory Clement).
 
  - Fix the initialization of the CPU performance data structures
    for shared policies in the CPPC cpufreq driver (Shunyong Yang).
 
  - Clean up the ti-cpufreq, intel_pstate and CPPC cpufreq drivers
    a bit (Viresh Kumar, Rafael Wysocki).
 
  - Mark the expected switch fall-throughs in the PM QoS core (Gustavo
    Silva).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJazfv7AAoJEILEb/54YlRx/kYP+gPOX5O5cFF22Y2xvDHPMWjm
 D/3Nc2aRo+5DuHHECSIJ3ZVQzVoamN5zQ1KbsBRV0bJgwim4fw4M199Jr/0I2nES
 1pkByuxLrAtwb83uX3uBIQnwgKOAwRftOTeVaFaMoXgIbyUqK7ZFkGq0xQTnKqor
 6+J+78O7wMaIZ0YXQP98BC6g96vs/f+ICrh7qqY85r4NtO/thTA1IKevBmlFeIWR
 yVhEYgwSFBaWehKK8KgbshmBBEk3qzDOYfwZF/JprPhiN/6madgHgYjHC8Seok5c
 QUUTRlyO1ULTQe4JulyJUKobx7HE9u/FXC0RjbBiKPnYR4tb9Hd8OpajPRZo96AT
 8IQCdzL2Iw/ZyQsmQZsWeO1HwPTwVlF/TO2gf6VdQtH221izuHG025p8/RcZe6zb
 fTTFhh6/tmBvmOlbKMwxaLbGbwcj/5W5GvQXlXAtaElLobwwNEcEyVfF4jo4Zx/U
 DQc7agaAps67lcgFAqNDy0PoU6bxV7yoiAIlTJHO9uyPkDNyIfb0ZPlmdIi3xYZd
 tUD7C+VBezrNCkw7JWL1xXLFfJ5X7K6x5bi9I7TBj1l928Hak0dwzs7KlcNBtF1Y
 SwnJsNa3kxunGsPajya8dy5gdO0aFeB9Bse0G429+ugk2IJO/Q9M9nQUArJiC9Xl
 Gw1bw5Ynv6lx+r5EqxHa
 =Pnk4
 -----END PGP SIGNATURE-----

Merge tag 'pm-4.17-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull more power management updates from Rafael Wysocki:
 "These include one big-ticket item which is the rework of the idle loop
  in order to prevent CPUs from spending too much time in shallow idle
  states. It reduces idle power on some systems by 10% or more and may
  improve performance of workloads in which the idle loop overhead
  matters. This has been in the works for several weeks and it has been
  tested and reviewed quite thoroughly.

  Also included are changes that finalize the cpufreq cleanup moving
  frequency table validation from drivers to the core, a few fixes and
  cleanups of cpufreq drivers, a cpuidle documentation update and a PM
  QoS core update to mark the expected switch fall-throughs in it.

  Specifics:

   - Rework the idle loop in order to prevent CPUs from spending too
     much time in shallow idle states by making it stop the scheduler
     tick before putting the CPU into an idle state only if the idle
     duration predicted by the idle governor is long enough.

     That required the code to be reordered to invoke the idle governor
     before stopping the tick, among other things (Rafael Wysocki,
     Frederic Weisbecker, Arnd Bergmann).

   - Add the missing description of the residency sysfs attribute to the
     cpuidle documentation (Prashanth Prakash).

   - Finalize the cpufreq cleanup moving frequency table validation from
     drivers to the core (Viresh Kumar).

   - Fix a clock leak regression in the armada-37xx cpufreq driver
     (Gregory Clement).

   - Fix the initialization of the CPU performance data structures for
     shared policies in the CPPC cpufreq driver (Shunyong Yang).

   - Clean up the ti-cpufreq, intel_pstate and CPPC cpufreq drivers a
     bit (Viresh Kumar, Rafael Wysocki).

   - Mark the expected switch fall-throughs in the PM QoS core (Gustavo
     Silva)"

* tag 'pm-4.17-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (23 commits)
  tick-sched: avoid a maybe-uninitialized warning
  cpufreq: Drop cpufreq_table_validate_and_show()
  cpufreq: SCMI: Don't validate the frequency table twice
  cpufreq: CPPC: Initialize shared perf capabilities of CPUs
  cpufreq: armada-37xx: Fix clock leak
  cpufreq: CPPC: Don't set transition_latency
  cpufreq: ti-cpufreq: Use builtin_platform_driver()
  cpufreq: intel_pstate: Do not include debugfs.h
  PM / QoS: mark expected switch fall-throughs
  cpuidle: Add definition of residency to sysfs documentation
  time: hrtimer: Use timerqueue_iterate_next() to get to the next timer
  nohz: Avoid duplication of code related to got_idle_tick
  nohz: Gather tick_sched booleans under a common flag field
  cpuidle: menu: Avoid selecting shallow states with stopped tick
  cpuidle: menu: Refine idle state selection for running tick
  sched: idle: Select idle state before stopping the tick
  time: hrtimer: Introduce hrtimer_next_event_without()
  time: tick-sched: Split tick_nohz_stop_sched_tick()
  cpuidle: Return nohz hint from cpuidle_select()
  jiffies: Introduce USER_TICK_USEC and redefine TICK_USEC
  ...
2018-04-11 17:03:20 -07:00
Linus Torvalds 9697376759 These commits have either been sitting in my INBOX or have been
in my local tree for some time. I need to push them upstream:
 
  - Separate out config-bisect.pl from ktest.pl.
    This allows users to do config bisects without full ktest setup.
 
  - Email on status change.
    Allow the user to be emailed on test start, finish, failure, etc.
 
  - Other small fixes and enhancements
 -----BEGIN PGP SIGNATURE-----
 
 iQHIBAABCgAyFiEEPm6V/WuN2kyArTUe1a05Y9njSUkFAlrOCmcUHHJvc3RlZHRA
 Z29vZG1pcy5vcmcACgkQ1a05Y9njSUnVhAv/Xa30lY98HFbssw2dUcGEtbv16em6
 iqcExca3tDBRN0JRx299WEozjOANI5OUWcNZP0PcVBBRKdn0RvyAhxj76P7Y+8MH
 tFbkiLhQqxPrGq+VQdWnmqC3V8yHTFk4yMlwowTvH+6F6ev8YtQbOU6aNcRFcke1
 lFzYxpU3KqlS1zm23zjzKazKJJTfP7DVtEDkoNEBK6xlRDz0PAVd8ectSbAShBEl
 9xODhPDeVI4fAxxt1uK4rhHU17+XFIHHuuftetT5NNuPTnhsarfVOse+fJxvi0Gn
 Ijfgzutad5HERsMZWOhhPy9IZItGg+tHceXAbPx98stZrCeCxWHRVZ9R2uDxa/2J
 4/9dCcXxDcjCMyqMsEtwyyuJrK7Nslsn0VqcbVRS1ModlSfyqvy81neOZ3g9B7Dd
 0nSBh+5rOirI/X82Ye8lQnZN5CjEZsUrYwtSK1iKzBeGiitdD+GbI4AaWrzvAlUc
 VzvUJ45tmhnodETJ2emddgpEFjHU0JGjSL70
 =19bA
 -----END PGP SIGNATURE-----

Merge tag 'ktest-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest

Pull ktest updates from Steven Rostedt:
 "These commits have either been sitting in my INBOX or have been in my
  local tree for some time. I need to push them upstream:

   - Separate out config-bisect.pl from ktest.pl.

     This allows users to do config bisects without full ktest setup.

   - Email on status change.

     Allow the user to be emailed on test start, finish, failure, etc.

   - Other small fixes and enhancements"

* tag 'ktest-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest: (24 commits)
  ktest: Take submenu into account for grub2 menus
  ktest.pl: Add MAIL_COMMAND option to define how to send email
  ktest.pl: Use run_command to execute sending mail
  ktest.pl: Allow dodie be recursive
  ktest.pl: Kill test if mailer is not supported
  ktest.pl: Add MAIL_PATH option to define where to find the mailer
  ktest.pl: No need to print no mailer is specified when mailto is not
  Ktest: add email options to sample.config
  Ktest: Use dodie for critical falures
  Ktest: Add SigInt handling
  Ktest: Add email support
  ktest.pl: Detect if a config-bisect was interrupted
  ktest.pl: Make finding config-bisect.pl dynamic
  ktest.pl: Have ktest.pl pass -r to config-bisect.pl to reset bisect
  ktest.pl: Use diffconfig if available for failed config bisects
  ktest.pl: Allow for the config-bisect.pl output to display to console
  ktest: Use config-bisect.pl in ktest.pl
  ktest: Add standalone config-bisect.pl program
  ktest: Set do_not_reboot=y for CONFIG_BISECT_TYPE=build
  ktest: Set buildonly=1 for CONFIG_BISECT_TYPE=build
  ...
2018-04-11 16:42:27 -07:00
Linus Torvalds 77cb51e65d This pull request contains updates for both UBI and UBIFS:
- Minor bug fixes and improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAlrNLAgWHHJpY2hhcmRA
 c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wekrD/9iAGatpW5+yMO293T+bYvGcMZP
 B+s1eFBq5/lBR/n8DpZQ9Mj/bE7hu6mLFf/QD8/18w8s7XQa7PA+VZnWmszitHVu
 mh/ciVYhxoJHXD3IxGTUkuP8CxmFEWh9VdebfWmEuKva7S2fYxTYEWuk5erjtjRm
 Wq+yzz0DkIHjm288DzVX1DloqdJHtyYkd6lDX8dS0hFHFDwee2QYIfB/4fmFsYjV
 H+lwwFo2L+8OY8qlu11Li7VGN38gaNS8YJQoGgpPSRPcEzzL6EBUMdoNBTEVQZgc
 Jm3VzCzkHxiN2cOJTC3auP2Lwj7NMcoJkB0s5ppFPZatla+m+r5TiiBAAxoZUDYe
 H1zg94M+X3n9yF8LBQcuu9vwYrcKsA+wHoO2AxHr/ERdY/K6NXQOJFeoKugFliwD
 3MlCz/WnQXsZI/6XgG4Lxi/WLReFXY/NPdkFAQWUagdLEKc08+mOnho7tEqVDdqM
 psVGB4twPkwSgNzjUt9JNu5O5DhVUr91E9zCaFG8GRwkQYDnMC24ehzcILDa2we3
 +/kU74F3ncd/Kzt+UTapPjbPpYNreeSmBWUtEmpCtxifCbN7P0YdL1Ew2UcN51/z
 4tID+uDybWkwSA4DW/CXHhkBXEpBEVJsAjN9VaF0oobztcdo05cSOYa0BxHVkAnq
 Pdp1eBwcGSFTLpJUBg==
 =d8gU
 -----END PGP SIGNATURE-----

Merge tag 'tags/upstream-4.17-rc1' of git://git.infradead.org/linux-ubifs

Pull UBI and UBIFS updates from Richard Weinberger:
 "Minor bug fixes and improvements"

* tag 'tags/upstream-4.17-rc1' of git://git.infradead.org/linux-ubifs:
  ubi: Reject MLC NAND
  ubifs: Remove useless parameter of lpt_heap_replace
  ubifs: Constify struct ubifs_lprops in scan_for_leb_for_idx
  ubifs: remove unnecessary assignment
  ubi: Fix error for write access
  ubi: fastmap: Don't flush fastmap work on detach
  ubifs: Check ubifs_wbuf_sync() return code
2018-04-11 16:39:34 -07:00
Linus Torvalds 375479c386 Merge git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:

 - a new and faster epoll based IRQ controller and NIC driver

 - misc fixes and janitorial updates

* git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
  Fix vector raw inintialization logic
  Migrate vector timers to new timer API
  um: Compile with modern headers
  um: vector: Fix an error handling path in 'vector_parse()'
  um: vector: Fix a memory allocation check
  um: vector: fix missing unlock on error in vector_net_open()
  um: Add missing EXPORT for free_irq_by_fd()
  High Performance UML Vector Network Driver
  Epoll based IRQ controller
  um: Use POSIX ucontext_t instead of struct ucontext
  um: time: Use timespec64 for persistent clock
  um: Restore symbol versions for __memcpy and memcpy
2018-04-11 16:36:47 -07:00
Linus Torvalds 45df60cd2c ARM: SoC fixes for 4.17
Here is a very small set of fixes for inclusion in linux-4.17-rc1:
 Two changes for the maintainer file, and one more fix for the newly
 added npcm platform, to enable the level 2 cache controller.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJazib3AAoJEGCrR//JCVIndE8P/1vRE7HcR+DPVwjDWGdfDhQ+
 LuF86+eZRwxHewykA+RQpsdQ8c2N2g+5oL20EvtJpInS/2iE36evZicFIMHQKfnL
 cbij5fi2/hlCRGWfscr2g5Zq1KNPSLqyfRms5z27TRddYoZMYqMg67TwwkQ0tdmf
 WeQ+2Dg17SaZecGFLWmr8cJlo+bx6U32KYHOMo7X8mhW91GPEHLqh/u+LhVQPnnt
 LdZ4IcMH4lC/uXl39qojT8aWmzm2fcKkYBJAMbm3cL3tu76Jyy8rEy5AAdqjHsin
 Xb+wcnCfk6q+dO3OjFhA3bvgDw4LUIyntMiFQvANWWMiGrmIynw9VYIMiQ2Y0S15
 Kv9e2uwkUHiuWJZCDbFCa9pa0kDmxMpoC45wDBR2ktuAa4HB/BP6PKTFrLC0Fxag
 KJvRRDklxHDWbsA+mZTser1mEKG9kslnGNfF3LCALWnFkMgWKiV1NuLr8HucaO7d
 7b7ppofmjva+qcHjDd8I9r4qFnLcWDJHZg3cZus+4WjTHt5ws+ueb0pYPNIKvnE1
 dhvjY9dSmeEZrM8rqLcUD9Npnm4GA+ySFpFty1jPP4hb0Q+Ho1p8BssP1Ktxw8mX
 FQ59x/RYj/5+yPj7st0ERZsxir4D39H3PlYsTWeqjZCCG8VMgBX+UND0ozh7ytYM
 1+1r1us28shsL5Xp6HRo
 =kiZ6
 -----END PGP SIGNATURE-----

Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Arnd Bergmann:
 "Here is a very small set of fixes for inclusion in linux-4.17-rc1: Two
  changes for the maintainer file, and one more fix for the newly added
  npcm platform, to enable the level 2 cache controller"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  MAINTAINERS: Update ASPEED entry with details
  MAINTAINERS: Migrate oxnas list to groups.io
  arm: npcm: enable L2 cache in NPCM7xx architecture
2018-04-11 16:12:21 -07:00
Linus Torvalds b82b6813ff nios2 update for v4.17-rc1
nios2: Use read_persistent_clock64() instead of read_persistent_clock()
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJawyvYAAoJEFWoEK+e3syCrUoP/j/0Tdqhsh4WULaFh7InUJ/x
 Ww0C83PuOvzfmqgv8e9a7tciWjdd+lhs7jAROd2FLxa+WYwr1wdnvWZfie2XEscN
 p4clq28tTcNql3pRGFnjd//qDFziZf0XuKkdi2J6AnAu1nP3tY+y9KZD8mpgF0/9
 K4v7Xb51Grj4ltk10t1wZhsuTbkEw+fysZQFEiEwBrir11vVZi+gWpYdRetN8GrQ
 xw7MDxj1rK6tGDL+0vIsmmOKp0CLZKNrkUItBhb2aQUEadhJzZn7CMIhmkRGzLLF
 6eouYZ0rdS8rGWjn1otolXsnKaqPlHoLKR8oG/b4zpQzxNd55Aa11kEKzSNXySXH
 stFJj5lG4DGho2UJmEWk54NHRtbtNEWo0VXBMHgGtg9qoaNQBVVN+4wvV2EKfD3u
 jrGCH1V9ZEHsOsAGbip8mKSELZfxnWF9UBinxw252ZnZoWpLXISaxY/Z45ubK6jv
 t5hNWOwHQO2zvcSHEMn6Q+19CYumFpD7IAYN4sMuIYPrxOJcZfMhdkcJyQwXQ/sR
 OWjCbLneDuSpZ3G/MxG76ilS+K9JlRF0/l3fr0MJW186RloVwlD/AN1Zk4d+asVR
 +341gNiDsSxAXdEnpcBItXFHj7hLUE1mOUXCMdN+y1spPK8YGf68vCeinRMf8lXy
 DRVlYdx5Wdlrd7r6tKQD
 =l1lE
 -----END PGP SIGNATURE-----

Merge tag 'nios2-v4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2

Pull nios2 update from Ley Foon Tan:
 "Use read_persistent_clock64() instead of read_persistent_clock()"

* tag 'nios2-v4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2:
  nios2: Use read_persistent_clock64() instead of read_persistent_clock()
2018-04-11 16:02:18 -07:00
Linus Torvalds 8837c70d53 Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:

 - almost all of the rest of MM

 - kasan updates

 - lots of procfs work

 - misc things

 - lib/ updates

 - checkpatch

 - rapidio

 - ipc/shm updates

 - the start of willy's XArray conversion

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (140 commits)
  page cache: use xa_lock
  xarray: add the xa_lock to the radix_tree_root
  fscache: use appropriate radix tree accessors
  export __set_page_dirty
  unicore32: turn flush_dcache_mmap_lock into a no-op
  arm64: turn flush_dcache_mmap_lock into a no-op
  mac80211_hwsim: use DEFINE_IDA
  radix tree: use GFP_ZONEMASK bits of gfp_t for flags
  linux/const.h: refactor _BITUL and _BITULL a bit
  linux/const.h: move UL() macro to include/linux/const.h
  linux/const.h: prefix include guard of uapi/linux/const.h with _UAPI
  xen, mm: allow deferred page initialization for xen pv domains
  elf: enforce MAP_FIXED on overlaying elf segments
  fs, elf: drop MAP_FIXED usage from elf_map
  mm: introduce MAP_FIXED_NOREPLACE
  MAINTAINERS: update bouncing aacraid@adaptec.com addresses
  fs/dcache.c: add cond_resched() in shrink_dentry_list()
  include/linux/kfifo.h: fix comment
  ipc/shm.c: shm_split(): remove unneeded test for NULL shm_file_data.vm_ops
  kernel/sysctl.c: add kdoc comments to do_proc_do{u}intvec_minmax_conv_param
  ...
2018-04-11 10:51:26 -07:00
Matthew Wilcox b93b016313 page cache: use xa_lock
Remove the address_space ->tree_lock and use the xa_lock newly added to
the radix_tree_root.  Rename the address_space ->page_tree to ->i_pages,
since we don't really care that it's a tree.

[willy@infradead.org: fix nds32, fs/dax.c]
  Link: http://lkml.kernel.org/r/20180406145415.GB20605@bombadil.infradead.orgLink: http://lkml.kernel.org/r/20180313132639.17387-9-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:39 -07:00
Matthew Wilcox f6bb2a2c0b xarray: add the xa_lock to the radix_tree_root
This results in no change in structure size on 64-bit machines as it
fits in the padding between the gfp_t and the void *.  32-bit machines
will grow the structure from 8 to 12 bytes.  Almost all radix trees are
protected with (at least) a spinlock, so as they are converted from
radix trees to xarrays, the data structures will shrink again.

Initialising the spinlock requires a name for the benefit of lockdep, so
RADIX_TREE_INIT() now needs to know the name of the radix tree it's
initialising, and so do IDR_INIT() and IDA_INIT().

Also add the xa_lock() and xa_unlock() family of wrappers to make it
easier to use the lock.  If we could rely on -fplan9-extensions in the
compiler, we could avoid all of this syntactic sugar, but that wasn't
added until gcc 4.6.

Link: http://lkml.kernel.org/r/20180313132639.17387-8-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:39 -07:00
Matthew Wilcox e5a9554196 fscache: use appropriate radix tree accessors
Don't open-code accesses to data structure internals.

Link: http://lkml.kernel.org/r/20180313132639.17387-7-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:39 -07:00
Matthew Wilcox f82b376413 export __set_page_dirty
XFS currently contains a copy-and-paste of __set_page_dirty().  Export
it from buffer.c instead.

Link: http://lkml.kernel.org/r/20180313132639.17387-6-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:39 -07:00
Matthew Wilcox d339d705f7 unicore32: turn flush_dcache_mmap_lock into a no-op
Unicore doesn't walk the VMA tree in its flush_dcache_page()
implementation, so has no need to take the tree_lock.

Link: http://lkml.kernel.org/r/20180313132639.17387-5-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:39 -07:00
Matthew Wilcox 427c896f26 arm64: turn flush_dcache_mmap_lock into a no-op
ARM64 doesn't walk the VMA tree in its flush_dcache_page()
implementation, so has no need to take the tree_lock.

Link: http://lkml.kernel.org/r/20180313132639.17387-4-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:39 -07:00
Matthew Wilcox 60a052719a mac80211_hwsim: use DEFINE_IDA
This is preferred to opencoding an IDA_INIT.

Link: http://lkml.kernel.org/r/20180313132639.17387-2-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:39 -07:00
Matthew Wilcox fa290cda10 radix tree: use GFP_ZONEMASK bits of gfp_t for flags
Patch series "XArray", v9.  (First part thereof).

This patchset is, I believe, appropriate for merging for 4.17.  It
contains the XArray implementation, to eventually replace the radix
tree, and converts the page cache to use it.

This conversion keeps the radix tree and XArray data structures in sync
at all times.  That allows us to convert the page cache one function at
a time and should allow for easier bisection.  Other than renaming some
elements of the structures, the data structures are fundamentally
unchanged; a radix tree walk and an XArray walk will touch the same
number of cachelines.  I have changes planned to the XArray data
structure, but those will happen in future patches.

Improvements the XArray has over the radix tree:

 - The radix tree provides operations like other trees do; 'insert' and
   'delete'. But what most users really want is an automatically
   resizing array, and so it makes more sense to give users an API that
   is like an array -- 'load' and 'store'. We still have an 'insert'
   operation for users that really want that semantic.

 - The XArray considers locking as part of its API. This simplifies a
   lot of users who formerly had to manage their own locking just for
   the radix tree. It also improves code generation as we can now tell
   RCU that we're holding a lock and it doesn't need to generate as much
   fencing code. The other advantage is that tree nodes can be moved
   (not yet implemented).

 - GFP flags are now parameters to calls which may need to allocate
   memory. The radix tree forced users to decide what the allocation
   flags would be at creation time. It's much clearer to specify them at
   allocation time.

 - Memory is not preloaded; we don't tie up dozens of pages on the off
   chance that the slab allocator fails. Instead, we drop the lock,
   allocate a new node and retry the operation. We have to convert all
   the radix tree, IDA and IDR preload users before we can realise this
   benefit, but I have not yet found a user which cannot be converted.

 - The XArray provides a cmpxchg operation. The radix tree forces users
   to roll their own (and at least four have).

 - Iterators take a 'max' parameter. That simplifies many users and will
   reduce the amount of iteration done.

 - Iteration can proceed backwards. We only have one user for this, but
   since it's called as part of the pagefault readahead algorithm, that
   seemed worth mentioning.

 - RCU-protected pointers are not exposed as part of the API. There are
   some fun bugs where the page cache forgets to use rcu_dereference()
   in the current codebase.

 - Value entries gain an extra bit compared to radix tree exceptional
   entries. That gives us the extra bit we need to put huge page swap
   entries in the page cache.

 - Some iterators now take a 'filter' argument instead of having
   separate iterators for tagged/untagged iterations.

The page cache is improved by this:

 - Shorter, easier to read code

 - More efficient iterations

 - Reduction in size of struct address_space

 - Fewer walks from the top of the data structure; the XArray API
   encourages staying at the leaf node and conducting operations there.

This patch (of 8):

None of these bits may be used for slab allocations, so we can use them
as radix tree flags as long as we mask them off before passing them to
the slab allocator. Move the IDR flag from the high bits to the
GFP_ZONEMASK bits.

Link: http://lkml.kernel.org/r/20180313132639.17387-3-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:39 -07:00
Masahiro Yamada 21e7bc600e linux/const.h: refactor _BITUL and _BITULL a bit
Minor cleanups available by _UL and _ULL.

Link: http://lkml.kernel.org/r/1519301715-31798-5-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:38 -07:00
Masahiro Yamada 2dd8a62c64 linux/const.h: move UL() macro to include/linux/const.h
ARM, ARM64 and UniCore32 duplicate the definition of UL():

  #define UL(x) _AC(x, UL)

This is not actually arch-specific, so it will be useful to move it to a
common header.  Currently, we only have the uapi variant for
linux/const.h, so I am creating include/linux/const.h.

I also added _UL(), _ULL() and ULL() because _AC() is mostly used in
the form either _AC(..., UL) or _AC(..., ULL).  I expect they will be
replaced in follow-up cleanups.  The underscore-prefixed ones should
be used for exported headers.

Link: http://lkml.kernel.org/r/1519301715-31798-4-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Guan Xuetao <gxt@mprc.pku.edu.cn>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:38 -07:00
Masahiro Yamada 2a6cc8a6c0 linux/const.h: prefix include guard of uapi/linux/const.h with _UAPI
Patch series "linux/const.h: cleanups of macros such as UL(), _BITUL(),
BIT() etc", v3.

ARM, ARM64, UniCore32 define UL() as a shorthand of _AC(..., UL).  More
architectures may introduce it in the future.

UL() is arch-agnostic, and useful. So let's move it to
include/linux/const.h

Currently, <asm/memory.h> must be included to use UL().  It pulls in more
bloats just for defining some bit macros.

I posted V2 one year ago.

The previous posts are:
https://patchwork.kernel.org/patch/9498273/
https://patchwork.kernel.org/patch/9498275/
https://patchwork.kernel.org/patch/9498269/
https://patchwork.kernel.org/patch/9498271/

At that time, what blocked this series was a comment from
David Howells:
  You need to be very careful doing this.  Some userspace stuff
  depends on the guard macro names on the kernel header files.

(https://patchwork.kernel.org/patch/9498275/)

Looking at the code closer, I noticed this is not a problem.

See the following line.
https://github.com/torvalds/linux/blob/v4.16-rc2/scripts/headers_install.sh#L40

scripts/headers_install.sh rips off _UAPI prefix from guard macro names.

I ran "make headers_install" and confirmed the result is what I expect.

So, we can prefix the include guard of include/uapi/linux/const.h,
and add a new include/linux/const.h.

This patch (of 4):

I am going to add include/linux/const.h for the kernel space.

Add _UAPI to the include guard of include/uapi/linux/const.h to
prepare for that.

Please notice the guard name of the exported one will be kept as-is.
So, this commit has no impact to the userspace even if some userspace
stuff depends on the guard macro names.

scripts/headers_install.sh processes exported headers by SED, and
rips off "_UAPI" from guard macro names.

  #ifndef _UAPI_LINUX_CONST_H
  #define _UAPI_LINUX_CONST_H

will be turned into

  #ifndef _LINUX_CONST_H
  #define _LINUX_CONST_H

Link: http://lkml.kernel.org/r/1519301715-31798-2-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:38 -07:00
Pavel Tatashin 6f84f8d158 xen, mm: allow deferred page initialization for xen pv domains
Juergen Gross noticed that commit f7f99100d8 ("mm: stop zeroing memory
during allocation in vmemmap") broke XEN PV domains when deferred struct
page initialization is enabled.

This is because the xen's PagePinned() flag is getting erased from
struct pages when they are initialized later in boot.

Juergen fixed this problem by disabling deferred pages on xen pv
domains.  It is desirable, however, to have this feature available as it
reduces boot time.  This fix re-enables the feature for pv-dmains, and
fixes the problem the following way:

The fix is to delay setting PagePinned flag until struct pages for all
allocated memory are initialized, i.e.  until after free_all_bootmem().

A new x86_init.hyper op init_after_bootmem() is called to let xen know
that boot allocator is done, and hence struct pages for all the
allocated memory are now initialized.  If deferred page initialization
is enabled, the rest of struct pages are going to be initialized later
in boot once page_alloc_init_late() is called.

xen_after_bootmem() walks page table's pages and marks them pinned.

Link: http://lkml.kernel.org/r/20180226160112.24724-2-pasha.tatashin@oracle.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Tested-by: Juergen Gross <jgross@suse.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: Jinbum Park <jinb.park7@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Jia Zhang <zhang.jia@linux.alibaba.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:38 -07:00
Michal Hocko ad55eac74f elf: enforce MAP_FIXED on overlaying elf segments
Anshuman has reported that with "fs, elf: drop MAP_FIXED usage from
elf_map" applied, some ELF binaries in his environment fail to start
with

 [   23.423642] 9148 (sed): Uhuuh, elf segment at 0000000010030000 requested but the memory is mapped already
 [   23.423706] requested [10030000, 10040000] mapped [10030000, 10040000] 100073 anon

The reason is that the above binary has overlapping elf segments:

  LOAD           0x0000000000000000 0x0000000010000000 0x0000000010000000
                 0x0000000000013a8c 0x0000000000013a8c  R E    10000
  LOAD           0x000000000001fd40 0x000000001002fd40 0x000000001002fd40
                 0x00000000000002c0 0x00000000000005e8  RW     10000
  LOAD           0x0000000000020328 0x0000000010030328 0x0000000010030328
                 0x0000000000000384 0x00000000000094a0  RW     10000

That binary has two RW LOAD segments, the first crosses a page border
into the second

  0x1002fd40 (LOAD2-vaddr) + 0x5e8 (LOAD2-memlen) == 0x10030328 (LOAD3-vaddr)

Handle this situation by enforcing MAP_FIXED when we establish a
temporary brk VMA to handle overlapping segments.  All other mappings
will still use MAP_FIXED_NOREPLACE.

Link: http://lkml.kernel.org/r/20180213100440.GM3443@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Andrei Vagin <avagin@openvz.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Kees Cook <keescook@chromium.org>
Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Joel Stanley <joel@jms.id.au>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Mark Brown <broonie@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:38 -07:00
Michal Hocko 4ed2863951 fs, elf: drop MAP_FIXED usage from elf_map
Both load_elf_interp and load_elf_binary rely on elf_map to map segments
on a controlled address and they use MAP_FIXED to enforce that.  This is
however dangerous thing prone to silent data corruption which can be
even exploitable.

Let's take CVE-2017-1000253 as an example.  At the time (before commit
eab09532d400: "binfmt_elf: use ELF_ET_DYN_BASE only for PIE")
ELF_ET_DYN_BASE was at TASK_SIZE / 3 * 2 which is not that far away from
the stack top on 32b (legacy) memory layout (only 1GB away).  Therefore
we could end up mapping over the existing stack with some luck.

The issue has been fixed since then (a87938b2e246: "fs/binfmt_elf.c: fix
bug in loading of PIE binaries"), ELF_ET_DYN_BASE moved moved much
further from the stack (eab09532d4 and later by c715b72c1ba4: "mm:
revert x86_64 and arm64 ELF_ET_DYN_BASE base changes") and excessive
stack consumption early during execve fully stopped by da029c11e6
("exec: Limit arg stack to at most 75% of _STK_LIM").  So we should be
safe and any attack should be impractical.  On the other hand this is
just too subtle assumption so it can break quite easily and hard to
spot.

I believe that the MAP_FIXED usage in load_elf_binary (et. al) is still
fundamentally dangerous.  Moreover it shouldn't be even needed.  We are
at the early process stage and so there shouldn't be unrelated mappings
(except for stack and loader) existing so mmap for a given address should
succeed even without MAP_FIXED.  Something is terribly wrong if this is
not the case and we should rather fail than silently corrupt the
underlying mapping.

Address this issue by changing MAP_FIXED to the newly added
MAP_FIXED_NOREPLACE.  This will mean that mmap will fail if there is an
existing mapping clashing with the requested one without clobbering it.

[mhocko@suse.com: fix build]
[akpm@linux-foundation.org: coding-style fixes]
[avagin@openvz.org: don't use the same value for MAP_FIXED_NOREPLACE and MAP_SYNC]
  Link: http://lkml.kernel.org/r/20171218184916.24445-1-avagin@openvz.org
Link: http://lkml.kernel.org/r/20171213092550.2774-3-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Joel Stanley <joel@jms.id.au>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:38 -07:00
Michal Hocko a4ff8e8620 mm: introduce MAP_FIXED_NOREPLACE
Patch series "mm: introduce MAP_FIXED_NOREPLACE", v2.

This has started as a follow up discussion [3][4] resulting in the
runtime failure caused by hardening patch [5] which removes MAP_FIXED
from the elf loader because MAP_FIXED is inherently dangerous as it
might silently clobber an existing underlying mapping (e.g.  stack).
The reason for the failure is that some architectures enforce an
alignment for the given address hint without MAP_FIXED used (e.g.  for
shared or file backed mappings).

One way around this would be excluding those archs which do alignment
tricks from the hardening [6].  The patch is really trivial but it has
been objected, rightfully so, that this screams for a more generic
solution.  We basically want a non-destructive MAP_FIXED.

The first patch introduced MAP_FIXED_NOREPLACE which enforces the given
address but unlike MAP_FIXED it fails with EEXIST if the given range
conflicts with an existing one.  The flag is introduced as a completely
new one rather than a MAP_FIXED extension because of the backward
compatibility.  We really want a never-clobber semantic even on older
kernels which do not recognize the flag.  Unfortunately mmap sucks
wrt flags evaluation because we do not EINVAL on unknown flags.  On
those kernels we would simply use the traditional hint based semantic so
the caller can still get a different address (which sucks) but at least
not silently corrupt an existing mapping.  I do not see a good way
around that.  Except we won't export expose the new semantic to the
userspace at all.

It seems there are users who would like to have something like that.
Jemalloc has been mentioned by Michael Ellerman [7]

Florian Weimer has mentioned the following:
: glibc ld.so currently maps DSOs without hints.  This means that the kernel
: will map right next to each other, and the offsets between them a completely
: predictable.  We would like to change that and supply a random address in a
: window of the address space.  If there is a conflict, we do not want the
: kernel to pick a non-random address. Instead, we would try again with a
: random address.

John Hubbard has mentioned CUDA example
: a) Searches /proc/<pid>/maps for a "suitable" region of available
: VA space.  "Suitable" generally means it has to have a base address
: within a certain limited range (a particular device model might
: have odd limitations, for example), it has to be large enough, and
: alignment has to be large enough (again, various devices may have
: constraints that lead us to do this).
:
: This is of course subject to races with other threads in the process.
:
: Let's say it finds a region starting at va.
:
: b) Next it does:
:     p = mmap(va, ...)
:
: *without* setting MAP_FIXED, of course (so va is just a hint), to
: attempt to safely reserve that region. If p != va, then in most cases,
: this is a failure (almost certainly due to another thread getting a
: mapping from that region before we did), and so this layer now has to
: call munmap(), before returning a "failure: retry" to upper layers.
:
:     IMPROVEMENT: --> if instead, we could call this:
:
:             p = mmap(va, ... MAP_FIXED_NOREPLACE ...)
:
:         , then we could skip the munmap() call upon failure. This
:         is a small thing, but it is useful here. (Thanks to Piotr
:         Jaroszynski and Mark Hairgrove for helping me get that detail
:         exactly right, btw.)
:
: c) After that, CUDA suballocates from p, via:
:
:      q = mmap(sub_region_start, ... MAP_FIXED ...)
:
: Interestingly enough, "freeing" is also done via MAP_FIXED, and
: setting PROT_NONE to the subregion. Anyway, I just included (c) for
: general interest.

Atomic address range probing in the multithreaded programs in general
sounds like an interesting thing to me.

The second patch simply replaces MAP_FIXED use in elf loader by
MAP_FIXED_NOREPLACE.  I believe other places which rely on MAP_FIXED
should follow.  Actually real MAP_FIXED usages should be docummented
properly and they should be more of an exception.

[1] http://lkml.kernel.org/r/20171116101900.13621-1-mhocko@kernel.org
[2] http://lkml.kernel.org/r/20171129144219.22867-1-mhocko@kernel.org
[3] http://lkml.kernel.org/r/20171107162217.382cd754@canb.auug.org.au
[4] http://lkml.kernel.org/r/1510048229.12079.7.camel@abdul.in.ibm.com
[5] http://lkml.kernel.org/r/20171023082608.6167-1-mhocko@kernel.org
[6] http://lkml.kernel.org/r/20171113094203.aofz2e7kueitk55y@dhcp22.suse.cz
[7] http://lkml.kernel.org/r/87efp1w7vy.fsf@concordia.ellerman.id.au

This patch (of 2):

MAP_FIXED is used quite often to enforce mapping at the particular range.
The main problem of this flag is, however, that it is inherently dangerous
because it unmaps existing mappings covered by the requested range.  This
can cause silent memory corruptions.  Some of them even with serious
security implications.  While the current semantic might be really
desiderable in many cases there are others which would want to enforce the
given range but rather see a failure than a silent memory corruption on a
clashing range.  Please note that there is no guarantee that a given range
is obeyed by the mmap even when it is free - e.g.  arch specific code is
allowed to apply an alignment.

Introduce a new MAP_FIXED_NOREPLACE flag for mmap to achieve this
behavior.  It has the same semantic as MAP_FIXED wrt.  the given address
request with a single exception that it fails with EEXIST if the requested
address is already covered by an existing mapping.  We still do rely on
get_unmaped_area to handle all the arch specific MAP_FIXED treatment and
check for a conflicting vma after it returns.

The flag is introduced as a completely new one rather than a MAP_FIXED
extension because of the backward compatibility.  We really want a
never-clobber semantic even on older kernels which do not recognize the
flag.  Unfortunately mmap sucks wrt.  flags evaluation because we do not
EINVAL on unknown flags.  On those kernels we would simply use the
traditional hint based semantic so the caller can still get a different
address (which sucks) but at least not silently corrupt an existing
mapping.  I do not see a good way around that.

[mpe@ellerman.id.au: fix whitespace]
[fail on clashing range with EEXIST as per Florian Weimer]
[set MAP_FIXED before round_hint_to_min as per Khalid Aziz]
Link: http://lkml.kernel.org/r/20171213092550.2774-2-mhocko@kernel.org
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Russell King - ARM Linux <linux@armlinux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Joel Stanley <joel@jms.id.au>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Jason Evans <jasone@google.com>
Cc: David Goldblatt <davidtgoldblatt@gmail.com>
Cc: Edward Tomasz Napierała <trasz@FreeBSD.org>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:38 -07:00
Joe Perches 721d8b41ab MAINTAINERS: update bouncing aacraid@adaptec.com addresses
Adaptec is now part of Microsemi.

Commit 2a81ffdd9d ("MAINTAINERS: Update email address for aacraid")
updated only one of the driver maintainer addresses.

Update the other two sections as the aacraid@adaptec.com address
bounces.

Link: http://lkml.kernel.org/r/1522103936.12357.27.camel@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Dave Carroll <david.carroll@microsemi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:38 -07:00
Nikolay Borisov 32785c0539 fs/dcache.c: add cond_resched() in shrink_dentry_list()
As previously reported (https://patchwork.kernel.org/patch/8642031/)
it's possible to call shrink_dentry_list with a large number of dentries
(> 10000).  This, in turn, could trigger the softlockup detector and
possibly trigger a panic.  In addition to the unmount path being
vulnerable to this scenario, at SuSE we've observed similar situation
happening during process exit on processes that touch a lot of dentries.
Here is an excerpt from a crash dump.  The number after the colon are
the number of dentries on the list passed to shrink_dentry_list:

PID 99760: 10722
PID 107530: 215
PID 108809: 24134
PID 108877: 21331
PID 141708: 16487

So we want to kill between 15k-25k dentries without yielding.

And one possible call stack looks like:

4 [ffff8839ece41db0] _raw_spin_lock at ffffffff8152a5f8
5 [ffff8839ece41db0] evict at ffffffff811c3026
6 [ffff8839ece41dd0] __dentry_kill at ffffffff811bf258
7 [ffff8839ece41df0] shrink_dentry_list at ffffffff811bf593
8 [ffff8839ece41e18] shrink_dcache_parent at ffffffff811bf830
9 [ffff8839ece41e50] proc_flush_task at ffffffff8120dd61
10 [ffff8839ece41ec0] release_task at ffffffff81059ebd
11 [ffff8839ece41f08] do_exit at ffffffff8105b8ce
12 [ffff8839ece41f78] sys_exit at ffffffff8105bd53
13 [ffff8839ece41f80] system_call_fastpath at ffffffff81532909

While some of the callers of shrink_dentry_list do use cond_resched,
this is not sufficient to prevent softlockups.  So just move
cond_resched into shrink_dentry_list from its callers.

David said: I've found hundreds of occurrences of warnings that we emit
when need_resched stays set for a prolonged period of time with the
stack trace that is included in the change log.

Link: http://lkml.kernel.org/r/1521718946-31521-1-git-send-email-nborisov@suse.com
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:38 -07:00
Valentin Vidic de99626c2e include/linux/kfifo.h: fix comment
Clean up unusual formatting in the note about locking.

Link: http://lkml.kernel.org/r/20180324002630.13046-1-Valentin.Vidic@CARNet.hr
Signed-off-by: Valentin Vidic <Valentin.Vidic@CARNet.hr>
Cc: Stefani Seibold <stefani@seibold.net>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Sean Young <sean@mess.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:38 -07:00
Andrew Morton a61fc2cbdf ipc/shm.c: shm_split(): remove unneeded test for NULL shm_file_data.vm_ops
This was added by the recent "ipc/shm.c: add split function to
shm_vm_ops", but it is not necessary.

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:38 -07:00
Waiman Long 24704f3619 kernel/sysctl.c: add kdoc comments to do_proc_do{u}intvec_minmax_conv_param
Kdoc comments are added to the do_proc_dointvec_minmax_conv_param and
do_proc_douintvec_minmax_conv_param structures thare are used internally
for range checking.

The error codes returned by proc_dointvec_minmax() and
proc_douintvec_minmax() are also documented.

Link: http://lkml.kernel.org/r/1519926220-7453-3-git-send-email-longman@redhat.com
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:38 -07:00
Waiman Long 64a11f3dc2 fs/proc/proc_sysctl.c: fix typo in sysctl_check_table_array()
Patch series "ipc: Clamp *mni to the real IPCMNI limit", v3.

The sysctl parameters msgmni, shmmni and semmni have an inherent limit
of IPC_MNI (32k).  However, users may not be aware of that because they
can write a value much higher than that without getting any error or
notification.  Reading the parameters back will show the newly written
values which are not real.

Enforcing the limit by failing sysctl parameter write, however, can
break existing user applications.  To address this delemma, a new flags
field is introduced into the ctl_table.  The value CTL_FLAGS_CLAMP_RANGE
can be added to any ctl_table entries to enable a looser range clamping
without returning any error.  For example,

  .flags = CTL_FLAGS_CLAMP_RANGE,

This flags value are now used for the range checking of shmmni, msgmni
and semmni without breaking existing applications.  If any out of range
value is written to those sysctl parameters, the following warning will
be printed instead.

  Kernel parameter "shmmni" was set out of range [0, 32768], clamped to 32768.

Reading the values back will show 32768 instead of some fake values.

This patch (of 6):

Fix a typo.

Link: http://lkml.kernel.org/r/1519926220-7453-2-git-send-email-longman@redhat.com
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:38 -07:00
Davidlohr Bueso 23c8cec8cf ipc/msg: introduce msgctl(MSG_STAT_ANY)
There is a permission discrepancy when consulting msq ipc object
metadata between /proc/sysvipc/msg (0444) and the MSG_STAT shmctl
command.  The later does permission checks for the object vs S_IRUGO.
As such there can be cases where EACCESS is returned via syscall but the
info is displayed anyways in the procfs files.

While this might have security implications via info leaking (albeit no
writing to the msq metadata), this behavior goes way back and showing
all the objects regardless of the permissions was most likely an
overlook - so we are stuck with it.  Furthermore, modifying either the
syscall or the procfs file can cause userspace programs to break (ie
ipcs).  Some applications require getting the procfs info (without root
privileges) and can be rather slow in comparison with a syscall -- up to
500x in some reported cases for shm.

This patch introduces a new MSG_STAT_ANY command such that the msq ipc
object permissions are ignored, and only audited instead.  In addition,
I've left the lsm security hook checks in place, as if some policy can
block the call, then the user has no other choice than just parsing the
procfs file.

Link: http://lkml.kernel.org/r/20180215162458.10059-4-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reported-by: Robert Kettler <robert.kettler@outlook.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:37 -07:00
Davidlohr Bueso a280d6dc77 ipc/sem: introduce semctl(SEM_STAT_ANY)
There is a permission discrepancy when consulting shm ipc object
metadata between /proc/sysvipc/sem (0444) and the SEM_STAT semctl
command.  The later does permission checks for the object vs S_IRUGO.
As such there can be cases where EACCESS is returned via syscall but the
info is displayed anyways in the procfs files.

While this might have security implications via info leaking (albeit no
writing to the sma metadata), this behavior goes way back and showing
all the objects regardless of the permissions was most likely an
overlook - so we are stuck with it.  Furthermore, modifying either the
syscall or the procfs file can cause userspace programs to break (ie
ipcs).  Some applications require getting the procfs info (without root
privileges) and can be rather slow in comparison with a syscall -- up to
500x in some reported cases for shm.

This patch introduces a new SEM_STAT_ANY command such that the sem ipc
object permissions are ignored, and only audited instead.  In addition,
I've left the lsm security hook checks in place, as if some policy can
block the call, then the user has no other choice than just parsing the
procfs file.

Link: http://lkml.kernel.org/r/20180215162458.10059-3-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reported-by: Robert Kettler <robert.kettler@outlook.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:37 -07:00
Davidlohr Bueso c21a6970ae ipc/shm: introduce shmctl(SHM_STAT_ANY)
Patch series "sysvipc: introduce STAT_ANY commands", v2.

The following patches adds the discussed (see [1]) new command for shm
as well as for sems and msq as they are subject to the same
discrepancies for ipc object permission checks between the syscall and
via procfs.  These new commands are justified in that (1) we are stuck
with this semantics as changing syscall and procfs can break userland;
and (2) some users can benefit from performance (for large amounts of
shm segments, for example) from not having to parse the procfs
interface.

Once merged, I will submit the necesary manpage updates.  But I'm thinking
something like:

: diff --git a/man2/shmctl.2 b/man2/shmctl.2
: index 7bb503999941..bb00bbe21a57 100644
: --- a/man2/shmctl.2
: +++ b/man2/shmctl.2
: @@ -41,6 +41,7 @@
:  .\" 2005-04-25, mtk -- noted aberrant Linux behavior w.r.t. new
:  .\"	attaches to a segment that has already been marked for deletion.
:  .\" 2005-08-02, mtk: Added IPC_INFO, SHM_INFO, SHM_STAT descriptions.
: +.\" 2018-02-13, dbueso: Added SHM_STAT_ANY description.
:  .\"
:  .TH SHMCTL 2 2017-09-15 "Linux" "Linux Programmer's Manual"
:  .SH NAME
: @@ -242,6 +243,18 @@ However, the
:  argument is not a segment identifier, but instead an index into
:  the kernel's internal array that maintains information about
:  all shared memory segments on the system.
: +.TP
: +.BR SHM_STAT_ANY " (Linux-specific)"
: +Return a
: +.I shmid_ds
: +structure as for
: +.BR SHM_STAT .
: +However, the
: +.I shm_perm.mode
: +is not checked for read access for
: +.IR shmid ,
: +resembing the behaviour of
: +/proc/sysvipc/shm.
:  .PP
:  The caller can prevent or allow swapping of a shared
:  memory segment with the following \fIcmd\fP values:
: @@ -287,7 +300,7 @@ operation returns the index of the highest used entry in the
:  kernel's internal array recording information about all
:  shared memory segments.
:  (This information can be used with repeated
: -.B SHM_STAT
: +.B SHM_STAT/SHM_STAT_ANY
:  operations to obtain information about all shared memory segments
:  on the system.)
:  A successful
: @@ -328,7 +341,7 @@ isn't accessible.
:  \fIshmid\fP is not a valid identifier, or \fIcmd\fP
:  is not a valid command.
:  Or: for a
: -.B SHM_STAT
: +.B SHM_STAT/SHM_STAT_ANY
:  operation, the index value specified in
:  .I shmid
:  referred to an array slot that is currently unused.

This patch (of 3):

There is a permission discrepancy when consulting shm ipc object metadata
between /proc/sysvipc/shm (0444) and the SHM_STAT shmctl command.  The
later does permission checks for the object vs S_IRUGO.  As such there can
be cases where EACCESS is returned via syscall but the info is displayed
anyways in the procfs files.

While this might have security implications via info leaking (albeit no
writing to the shm metadata), this behavior goes way back and showing all
the objects regardless of the permissions was most likely an overlook - so
we are stuck with it.  Furthermore, modifying either the syscall or the
procfs file can cause userspace programs to break (ie ipcs).  Some
applications require getting the procfs info (without root privileges) and
can be rather slow in comparison with a syscall -- up to 500x in some
reported cases.

This patch introduces a new SHM_STAT_ANY command such that the shm ipc
object permissions are ignored, and only audited instead.  In addition,
I've left the lsm security hook checks in place, as if some policy can
block the call, then the user has no other choice than just parsing the
procfs file.

[1] https://lkml.org/lkml/2017/12/19/220

Link: http://lkml.kernel.org/r/20180215162458.10059-2-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Robert Kettler <robert.kettler@outlook.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:37 -07:00
Chris Wilson edc41b3c54 kernel/params.c: downgrade warning for unsafe parameters
As using an unsafe module parameter is, by its very definition, an
expected user action, emitting a warning is overkill.  Nothing has yet
gone wrong, and we add a taint flag for any future oops should something
actually go wrong.  So instead of having a user controllable pr_warn,
downgrade it to a pr_notice for "a normal, but significant condition".

We make use of unsafe kernel parameters in igt
(https://cgit.freedesktop.org/drm/igt-gpu-tools/) (we have not yet
succeeded in removing all such debugging options), which generates a
warning and taints the kernel.  The warning is unhelpful as we then need
to filter it out again as we check that every test themselves do not
provoke any kernel warnings.

Link: http://lkml.kernel.org/r/20180226151919.9674-1-chris@chris-wilson.co.uk
Fixes: 91f9d330cc ("module: make it possible to have unsafe, tainting module params")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Petri Latvala <petri.latvala@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:37 -07:00
Randy Dunlap 2d87b309a5 kernel/sysctl.c: fix sizeof argument to match variable name
Fix sizeof argument to be the same as the data variable name.  Probably
a copy/paste error.

Mostly harmless since both variables are unsigned int.

Fixes kernel bugzilla #197371:
  Possible access to unintended variable in "kernel/sysctl.c" line 1339
https://bugzilla.kernel.org/show_bug.cgi?id=197371

Link: http://lkml.kernel.org/r/e0d0531f-361e-ef5f-8499-32743ba907e1@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Petru Mihancea <petrum@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:37 -07:00
Ioan Nicu bbd876adb8 rapidio: use a reference count for struct mport_dma_req
Once the dma request is passed to the DMA engine, the DMA subsystem
would hold a pointer to this structure and could call the completion
callback after do_dma_request() has timed out.

The current code deals with this by putting timed out SYNC requests to a
pending list and freeing them later, when the mport cdev device is
released.  This still does not guarantee that the DMA subsystem is
really done with those transfers, so in theory
dma_xfer_callback/dma_req_free could be called after
mport_cdev_release_dma and could potentially access already freed
memory.

This patch simplifies the current handling by using a kref in the mport
dma request structure, so that it gets freed only when nobody uses it
anymore.

This also simplifies the code a bit, as FAF transfers are now handled in
the same way as SYNC and ASYNC transfers.  There is no need anymore for
the pending list and for the dma workqueue which was used in case of FAF
transfers, so we remove them both.

Link: http://lkml.kernel.org/r/20180405203342.GA16191@nokia.com
Signed-off-by: Ioan Nicu <ioan.nicu.ext@nokia.com>
Acked-by: Alexandre Bounine <alex.bou9@gmail.com>
Cc: Barry Wood <barry.wood@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Frank Kunz <frank.kunz@nokia.com>
Cc: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:37 -07:00
Vasyl Gomonovych b94bb1f610 drivers/rapidio/rio-scan.c: fix typo in comment
Fix typo in the words 'receiver', 'specified', 'during'

Link: http://lkml.kernel.org/r/20180321211035.8904-1-gomonovych@gmail.com
Signed-off-by: Vasyl Gomonovych <gomonovych@gmail.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Alexandre Bounine <alexandre.bounine@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:37 -07:00
Kees Cook c31dbb146d exec: pin stack limit during exec
Since the stack rlimit is used in multiple places during exec and it can
be changed via other threads (via setrlimit()) or processes (via
prlimit()), the assumption that the value doesn't change cannot be made.
This leads to races with mm layout selection and argument size
calculations.  This changes the exec path to use the rlimit stored in
bprm instead of in current.  Before starting the thread, the bprm stack
rlimit is stored back to current.

Link: http://lkml.kernel.org/r/1518638796-20819-4-git-send-email-keescook@chromium.org
Fixes: 64701dee41 ("exec: Use sane stack rlimit under secureexec")
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Reported-by: Andy Lutomirski <luto@kernel.org>
Reported-by: Brad Spengler <spender@grsecurity.net>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Greg KH <greg@kroah.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:37 -07:00
Kees Cook b838383133 exec: introduce finalize_exec() before start_thread()
Provide a final callback into fs/exec.c before start_thread() takes
over, to handle any last-minute changes, like the coming restoration of
the stack limit.

Link: http://lkml.kernel.org/r/1518638796-20819-3-git-send-email-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Greg KH <greg@kroah.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:37 -07:00
Kees Cook 8f2af155b5 exec: pass stack rlimit into mm layout functions
Patch series "exec: Pin stack limit during exec".

Attempts to solve problems with the stack limit changing during exec
continue to be frustrated[1][2].  In addition to the specific issues
around the Stack Clash family of flaws, Andy Lutomirski pointed out[3]
other places during exec where the stack limit is used and is assumed to
be unchanging.  Given the many places it gets used and the fact that it
can be manipulated/raced via setrlimit() and prlimit(), I think the only
way to handle this is to move away from the "current" view of the stack
limit and instead attach it to the bprm, and plumb this down into the
functions that need to know the stack limits.  This series implements
the approach.

[1] 04e35f4495 ("exec: avoid RLIMIT_STACK races with prlimit()")
[2] 779f4e1c6c ("Revert "exec: avoid RLIMIT_STACK races with prlimit()"")
[3] to security@kernel.org, "Subject: existing rlimit races?"

This patch (of 3):

Since it is possible that the stack rlimit can change externally during
exec (either via another thread calling setrlimit() or another process
calling prlimit()), provide a way to pass the rlimit down into the
per-architecture mm layout functions so that the rlimit can stay in the
bprm structure instead of sitting in the signal structure until exec is
finalized.

Link: http://lkml.kernel.org/r/1518638796-20819-2-git-send-email-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Hugh Dickins <hughd@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Greg KH <greg@kroah.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
Cc: Brad Spengler <spender@grsecurity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:37 -07:00
Alexey Dobriyan d64d01a155 seq_file: account everything to kmemcg
All it takes to open a file and read 1 byte from it.

seq_file will be allocated along with any private allocations, and more
importantly seq file buffer which is 1 page by default.

Link: http://lkml.kernel.org/r/20180310085252.GB17121@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Glauber Costa <glommer@gmail.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:36 -07:00
Alexey Dobriyan 0965232035 seq_file: allocate seq_file from kmem_cache
For fine-grained debugging and usercopy protection.

Link: http://lkml.kernel.org/r/20180310085027.GA17121@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Glauber Costa <glommer@gmail.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:36 -07:00
Andrew Morton 9ad553abe6 fs/reiserfs/journal.c: add missing resierfs_warning() arg
One use of the reiserfs_warning() macro in journal_init_dev() is missing
a parameter, causing the following warning:

  REISERFS warning (device loop0): journal_init_dev: Cannot open '%s': %i journal_init_dev:

This also causes a WARN_ONCE() warning in the vsprintf code, and then a
panic if panic_on_warn is set.

  Please remove unsupported %/ in format string
  WARNING: CPU: 1 PID: 4480 at lib/vsprintf.c:2138 format_decode+0x77f/0x830 lib/vsprintf.c:2138
  Kernel panic - not syncing: panic_on_warn set ...

Just add another string argument to the macro invocation.

Addresses https://syzkaller.appspot.com/bug?id=0627d4551fdc39bf1ef5d82cd9eef587047f7718

Link: http://lkml.kernel.org/r/d678ebe1-6f54-8090-df4c-b9affad62293@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: <syzbot+6bd77b88c1977c03f584@syzkaller.appspotmail.com>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Jeff Mahoney <jeffm@suse.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jan Kara <jack@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:36 -07:00
Matthew Wilcox ad12c3a6ef autofs4: use wait_event_killable
This playing with signals to allow only fatal signals appears to predate
the introduction of wait_event_killable(), and I'm fairly sure that
wait_event_killable is what was meant to happen here.

[avagin@openvz.org: use wake_up() instead of wake_up_interruptible]
  Link: http://lkml.kernel.org/r/20180331022839.21277-1-avagin@openvz.org
Link: http://lkml.kernel.org/r/20180319191609.23880-1-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Ian Kent <raven@themaw.net>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:36 -07:00
Aaro Koskinen 1a6a05a4fa init/ramdisk: use pr_cont() at the end of ramdisk loading
Use pr_cont() at the end of ramdisk loading.  This will avoid the
rotator and an extra newline appearing in the dmesg.

Before:
  RAMDISK: Loading 2436KiB [1 disk] into ram disk... |
  done.

After:
  RAMDISK: Loading 2436KiB [1 disk] into ram disk... done.

Link: http://lkml.kernel.org/r/20180302205552.16031-1-aaro.koskinen@iki.fi
Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:36 -07:00
Joe Perches 5d43090261 checkpatch: whinge about bool bitfields
Using bool in a bitfield isn't a good idea as the alignment behavior is
arch implementation defined.

Suggest using unsigned int or u<8|16|32> instead.

Link: http://lkml.kernel.org/r/e22fb871b1b7f2fda4b22f3a24e0d7f092eb612c.camel@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:36 -07:00
Heinrich Schuchardt 38dca988bb checkpatch: allow space between colon and bracket
Allow a space between a colon and subsequent opening bracket.  This
sequence may occur in inline assembler statements like

	asm(
		"ldr %[out], [%[in]]\n\t"
		: [out] "=r" (ret)
		: [in] "r" (addr)
	);

Link: http://lkml.kernel.org/r/20180403191655.23700-1-xypron.glpk@gmx.de
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Acked-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:36 -07:00
Joe Perches 6a487211ec checkpatch: add test for assignment at start of line
Kernel style seems to prefer line wrapping an assignment with the
assignment operator on the previous line like:

	<leading tabs>	identifier =
				expression;
over
	<leading tabs>	identifier
				= expression;

somewhere around a 50:1 ratio

$ git grep -P "[^=]=\s*$" -- "*.[ch]" | wc -l
52008
$ git grep -P "^\s+[\*\/\+\|\%\-]?=[^=>]" | wc -l
1161

So add a --strict test for that condition.

Link: http://lkml.kernel.org/r/1522275726.2210.12.camel@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:36 -07:00
Joe Perches bc22d9a7d3 checkpatch: test SYMBOLIC_PERMS multiple times per line
There are occasions where symbolic perms are used in a ternary like

		return (channel == 0) ? S_IRUGO | S_IWUSR : S_IRUGO;

The current test will find the first use "S_IRUGO | S_IWUSR" but not the
second use "S_IRUGO" on the same line.

Improve the test to look for all instances on a line.

Link: http://lkml.kernel.org/r/1522127944.12357.49.camel@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:36 -07:00
Claudio Fontana 8d2e11b22d checkpatch: two spelling fixes
completly -> completely
wacking -> whacking

Link: http://lkml.kernel.org/r/1520405394-5586-1-git-send-email-claudio.fontana@gliwa.com
Signed-off-by: Claudio Fontana <claudio.fontana@gliwa.com>
Acked-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:36 -07:00