Commit graph

34703 commits

Author SHA1 Message Date
Linus Torvalds
91f28da8c9 random32: make prandom_u32() less predictable
This is the cleanup of the latest series of prandom_u32 experimentations
 consisting in using SipHash instead of Tausworthe to produce the randoms
 used by the network stack. The changes to the files were kept minimal,
 and the controversial commit that used to take noise from the fast_pool
 (f227e3ec3b) was reverted. Instead, a dedicated "net_rand_noise" per_cpu
 variable is fed from various sources of activities (networking, scheduling)
 to perturb the SipHash state using fast, non-trivially predictable data,
 instead of keeping it fully deterministic. The goal is essentially to make
 any occasional memory leakage or brute-force attempt useless.
 
 The resulting code was verified to be very slightly faster on x86_64 than
 what is was with the controversial commit above, though this remains barely
 above measurement noise. It was also tested on i386 and arm, and build-
 tested only on arm64.
 
 The whole discussion around this is archived here:
   https://lore.kernel.org/netdev/20200808152628.GA27941@SDF.ORG/
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJflHDtAAoJEE44bZycYXAvxXUP/A9aKCaSnJUgYi/0Fa507bi9
 yJkNs6cotQfzmG8GznQWdG2mdsviqI/d08UdIHD9JHKoin3wcT4+0So546pCw9BR
 tlrHnFcRnqEqXFLggnqkKnnIgGNNTppMDSi96BplNoXYXAMRjjvKW9Vl94lU2Abk
 a+aOOJRu4Vzj/5tRT3fkg/ldN6YGZ6nbqVkzd0WAwuYYIj1VSYUSkLgKc65n+4Xp
 ISnVWObeHJ+tnBRDVudTaUNYi7T3QvCF9glZZlFUVBDQbTRSqjMBSUmnBnOg1mhO
 Q5nY6xrfTi/0i39+/wOUUOTaxe7YggymibQfN+Y26w1rPO45RkvORiPKvXlfZWaI
 a3wh1TMoDAptrW9VXiO9pJv6a4xC16c7FyyTnGkP4jP+HTFdgmgixoNmPKrBxuEZ
 X8O+HOSf155j057aECMivk8bIj4FfLmYt2ciWqRZTVCwu9uK29AJSMx0SphTmYQ0
 p0HaJ8mHKnFVViX9n+YOPRfZDIRGH1zTOxzPhEkzuX8vx/4uiXsbp/ILxg0uZ913
 DgMk1rzzBlZsL7QqJQnf9JM810pFcU/PI7Y7PKaGKz3ntkJT2WV7gMeg+Wwv9254
 pPccvffYzdbtJAHgj+If8lHwixE33u5RscXqjpxIWLPcKTOLQNIf+6bRQ86sA+Kq
 Vbza8sDu6IWhvApCGLmB
 =KHhI
 -----END PGP SIGNATURE-----

Merge tag '20201024-v4-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/wtarreau/prandom

Pull random32 updates from Willy Tarreau:
 "Make prandom_u32() less predictable.

  This is the cleanup of the latest series of prandom_u32
  experimentations consisting in using SipHash instead of Tausworthe to
  produce the randoms used by the network stack.

  The changes to the files were kept minimal, and the controversial
  commit that used to take noise from the fast_pool (f227e3ec3b) was
  reverted. Instead, a dedicated "net_rand_noise" per_cpu variable is
  fed from various sources of activities (networking, scheduling) to
  perturb the SipHash state using fast, non-trivially predictable data,
  instead of keeping it fully deterministic. The goal is essentially to
  make any occasional memory leakage or brute-force attempt useless.

  The resulting code was verified to be very slightly faster on x86_64
  than what is was with the controversial commit above, though this
  remains barely above measurement noise. It was also tested on i386 and
  arm, and build- tested only on arm64"

Link: https://lore.kernel.org/netdev/20200808152628.GA27941@SDF.ORG/

* tag '20201024-v4-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/wtarreau/prandom:
  random32: add a selftest for the prandom32 code
  random32: add noise from network and scheduling activity
  random32: make prandom_u32() output unpredictable
2020-10-25 10:40:08 -07:00
Linus Torvalds
1b307ac870 dma-mapping fixes for 5.10:
- document the new document dma_{alloc,free}_pages API
  - two fixups for the dma-mapping.h split
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAl+UN1ELHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYPhkg//QLMwY4Lib1IPdeYR4k/UE2E4YtPjQghucNHXDkrd
 jjV5tvGPfNDOEKeCOamivTcOmc3E+jFhQFjpHplqd4OjsfZT3EyXaopdX/qN1Bbo
 JIib0WAAxO1N2MRhEPJzGAsCkMPT32Q52/ka1QUOmr1E8VPrKhU4T9+FnTbj1rgF
 HbMk+PdV4+HP53CvK+aaOfNHqJqQoTBeCx9xebybAjxIBCI+LedRwC7haV4Zz6tg
 xSp9cW0Ztdp9U7u1dOO4gEqnL/fNk3+RWF5iwtyCi96uYmguV+/vAqpWMyej97q5
 2Dx0jTQvj0FhnPug9asydadjtUqkzfRCSDGv4TybeHT/OZJEGAwkdJG7V/5PwGOg
 VCMpqi/WRIDPnUtN3OY4IZFigbyb4wJ6MOO/hvXagC7Lc2+z9ZhuUKUjSsV90LoT
 2a4xwm9M1JAglYbhGvLl5cjzmDSdCFXuGYlJ18lRZx7d4cGi34hAqq3WfqqteHm+
 IRfeAaWN7N+W8PgzGaDqfUVDrGNVZ7eo02kVicaJFCdJE5ecS3rUbyU8uVjhX7Sl
 h8zwBs8/5hFIKLCWUBiT+UBmvWXbG/a0plRh/vIvJ8lk4m4+kwdTRwgngpSkb3G/
 ytAJPZTeI7r75zkwxTHPE01Khf8/qWJ3cdv97PpQH+7mlo4J0XUr5ssmiQ7DAHuu
 jjo=
 =0N7Y
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-5.10-1' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping fixes from Christoph Hellwig:

 - document the new dma_{alloc,free}_pages() API

 - two fixups for the dma-mapping.h split

* tag 'dma-mapping-5.10-1' of git://git.infradead.org/users/hch/dma-mapping:
  dma-mapping: document dma_{alloc,free}_pages
  dma-mapping: move more functions to dma-map-ops.h
  ARM/sa1111: add a missing include of dma-map-ops.h
2020-10-24 12:17:05 -07:00
Willy Tarreau
3744741ada random32: add noise from network and scheduling activity
With the removal of the interrupt perturbations in previous random32
change (random32: make prandom_u32() output unpredictable), the PRNG
has become 100% deterministic again. While SipHash is expected to be
way more robust against brute force than the previous Tausworthe LFSR,
there's still the risk that whoever has even one temporary access to
the PRNG's internal state is able to predict all subsequent draws till
the next reseed (roughly every minute). This may happen through a side
channel attack or any data leak.

This patch restores the spirit of commit f227e3ec3b ("random32: update
the net random state on interrupt and activity") in that it will perturb
the internal PRNG's statee using externally collected noise, except that
it will not pick that noise from the random pool's bits nor upon
interrupt, but will rather combine a few elements along the Tx path
that are collectively hard to predict, such as dev, skb and txq
pointers, packet length and jiffies values. These ones are combined
using a single round of SipHash into a single long variable that is
mixed with the net_rand_state upon each invocation.

The operation was inlined because it produces very small and efficient
code, typically 3 xor, 2 add and 2 rol. The performance was measured
to be the same (even very slightly better) than before the switch to
SipHash; on a 6-core 12-thread Core i7-8700k equipped with a 40G NIC
(i40e), the connection rate dropped from 556k/s to 555k/s while the
SYN cookie rate grew from 5.38 Mpps to 5.45 Mpps.

Link: https://lore.kernel.org/netdev/20200808152628.GA27941@SDF.ORG/
Cc: George Spelvin <lkml@sdf.org>
Cc: Amit Klein <aksecurity@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: tytso@mit.edu
Cc: Florian Westphal <fw@strlen.de>
Cc: Marc Plumb <lkml.mplumb@gmail.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2020-10-24 20:21:57 +02:00
George Spelvin
c51f8f88d7 random32: make prandom_u32() output unpredictable
Non-cryptographic PRNGs may have great statistical properties, but
are usually trivially predictable to someone who knows the algorithm,
given a small sample of their output.  An LFSR like prandom_u32() is
particularly simple, even if the sample is widely scattered bits.

It turns out the network stack uses prandom_u32() for some things like
random port numbers which it would prefer are *not* trivially predictable.
Predictability led to a practical DNS spoofing attack.  Oops.

This patch replaces the LFSR with a homebrew cryptographic PRNG based
on the SipHash round function, which is in turn seeded with 128 bits
of strong random key.  (The authors of SipHash have *not* been consulted
about this abuse of their algorithm.)  Speed is prioritized over security;
attacks are rare, while performance is always wanted.

Replacing all callers of prandom_u32() is the quick fix.
Whether to reinstate a weaker PRNG for uses which can tolerate it
is an open question.

Commit f227e3ec3b ("random32: update the net random state on interrupt
and activity") was an earlier attempt at a solution.  This patch replaces
it.

Reported-by: Amit Klein <aksecurity@gmail.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: tytso@mit.edu
Cc: Florian Westphal <fw@strlen.de>
Cc: Marc Plumb <lkml.mplumb@gmail.com>
Fixes: f227e3ec3b ("random32: update the net random state on interrupt and activity")
Signed-off-by: George Spelvin <lkml@sdf.org>
Link: https://lore.kernel.org/netdev/20200808152628.GA27941@SDF.ORG/
[ willy: partial reversal of f227e3ec3b5c; moved SIPROUND definitions
  to prandom.h for later use; merged George's prandom_seed() proposal;
  inlined siprand_u32(); replaced the net_rand_state[] array with 4
  members to fix a build issue; cosmetic cleanups to make checkpatch
  happy; fixed RANDOM32_SELFTEST build ]
Signed-off-by: Willy Tarreau <w@1wt.eu>
2020-10-24 20:21:57 +02:00
Linus Torvalds
a5e5c274c9 ring-buffer fix
The success return value of ring_buffer_resize() is stated to be zero,
 and checked that way. But it is incorrectly returning the size allocated.
 
 Also, a fix to a comment.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCX5LnlRQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qtPxAP439tmV1wWK4uBF6TLBahaPdj0tGe5b
 NT/ASnYjokZKWgEA//vmUBMMmNBohcd8DkkTu8Pp3tkc2b4RLR5WJIpXGwk=
 =1Og+
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing ring-buffer fix from Steven Rostedt:
 "The success return value of ring_buffer_resize() is stated to be
  zero and checked that way.

  But it was incorrectly returning the size allocated.

  Also, a fix to a comment"

* tag 'trace-v5.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ring-buffer: Update the description for ring_buffer_wait
  ring-buffer: Return 0 on success from ring_buffer_resize()
2020-10-23 17:09:38 -07:00
Linus Torvalds
41f762a15a More power management updates for 5.10-rc1
- Move the AVS drivers to new platform-specific locations and get
    rid of the drivers/power/avs directory (Ulf Hansson).
 
  - Add on/off notifiers and idle state accounting support to the
    generic power domains (genpd) framework (Ulf Hansson, Lina Iyer).
 
  - Ulf will maintain the PM domain part of cpuidle-psci (Ulf Hansson).
 
  - Make intel_idle disregard ACPI _CST if it cannot use the data
    returned by that method (Mel Gorman).
 
  - Modify intel_pstate to avoid leaving useless sysfs directory
    structure behind if it cannot be registered (Chen Yu).
 
  - Fix domain detection in the RAPL power capping driver and prevent
    it from failing to enumerate the Psys RAPL domain (Zhang Rui).
 
  - Allow acpi-cpufreq to use ACPI _PSD information with Family 19 and
    later AMD chips (Wei Huang).
 
  - Update the driver assumptions comment in intel_idle and fix a
    kerneldoc comment in the runtime PM framework (Alexander Monakov,
    Bean Huo).
 
  - Avoid unnecessary resets of the cached frequency in the schedutil
    cpufreq governor to reduce overhead (Wei Wang).
 
  - Clean up the cpufreq core a bit (Viresh Kumar).
 
  - Make assorted minor janitorial changes (Daniel Lezcano, Geert
    Uytterhoeven, Hubert Jasudowicz, Tom Rix).
 
  - Clean up and optimize the cpupower utility somewhat (Colin Ian
    King, Martin Kaistra).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl+TD4gSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRx3AgP/0Fpi50+Kggr7pIXKElwg7ECJA0nOLT6
 gp4Vc/J/3r6zqK0ANDgCRlEMckAT61ukll+eU+BlavBrI4ZYj/Homi0+u53t1GjM
 AOwj1SmQgSBcBavWsBOc8+12X6wYLzyQbyWc53oYH5os537n8s7zkSZuSBcGFUgb
 wWF4xOeuW/ETsxAzEYmY7LvtBeEmo3UjV0fZPPbo/ro5EHDaOpvO/4EUDjCQxR6b
 CvyjgLlxuAOFWG/B5lVTCx7S6MmBjHXUIFUizt+TA6YjyGd0mG0i0f7mgzs6hqUD
 gzERDSlehBC3zPh5O35HNGUG8ulvDi9+ugxuckFHu/j4wEeZswp8AuIpdLI6Mcnc
 LDb+LTeypAB5d1fzHeSziv8AL08cUAS6QT+q96whYibQs6WA1mE9yXECyg6ZGsLt
 1KPAc8KD4ojwjo9vtk9VU0ZaUcVBMnqyK+GK929l0nXohw2Fae6X/NlpQ0D7joZA
 NM+dWMXpHy6tuVOgdUmrmN+P6vWd8ApWBeufkUFsCzrh3zG57yVaLl2SAjEtpKh0
 Emr/kJ8Ox8cf++6mGKseR2ZbkGn0Tz2GD5l3hIAGnIv9Nda3YgCc6RyV7U9se7OW
 2xnQvrgXqQKyjjziptVFqDotcC/KXFACr3YZX6GlW675NOMXSGk1ZYI3FbrsM8yd
 0/zq7PyYmb0D
 =TFKg
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.10-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull more power management updates from Rafael Wysocki:
 "First of all, the adaptive voltage scaling (AVS) drivers go to new
  platform-specific locations as planned (this part was reported to have
  merge conflicts against the new arm-soc updates in linux-next).

  In addition to that, there are some fixes (intel_idle, intel_pstate,
  RAPL, acpi_cpufreq), the addition of on/off notifiers and idle state
  accounting support to the generic power domains (genpd) code and some
  janitorial changes all over.

  Specifics:

   - Move the AVS drivers to new platform-specific locations and get rid
     of the drivers/power/avs directory (Ulf Hansson).

   - Add on/off notifiers and idle state accounting support to the
     generic power domains (genpd) framework (Ulf Hansson, Lina Iyer).

   - Ulf will maintain the PM domain part of cpuidle-psci (Ulf Hansson).

   - Make intel_idle disregard ACPI _CST if it cannot use the data
     returned by that method (Mel Gorman).

   - Modify intel_pstate to avoid leaving useless sysfs directory
     structure behind if it cannot be registered (Chen Yu).

   - Fix domain detection in the RAPL power capping driver and prevent
     it from failing to enumerate the Psys RAPL domain (Zhang Rui).

   - Allow acpi-cpufreq to use ACPI _PSD information with Family 19 and
     later AMD chips (Wei Huang).

   - Update the driver assumptions comment in intel_idle and fix a
     kerneldoc comment in the runtime PM framework (Alexander Monakov,
     Bean Huo).

   - Avoid unnecessary resets of the cached frequency in the schedutil
     cpufreq governor to reduce overhead (Wei Wang).

   - Clean up the cpufreq core a bit (Viresh Kumar).

   - Make assorted minor janitorial changes (Daniel Lezcano, Geert
     Uytterhoeven, Hubert Jasudowicz, Tom Rix).

   - Clean up and optimize the cpupower utility somewhat (Colin Ian
     King, Martin Kaistra)"

* tag 'pm-5.10-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (23 commits)
  PM: sleep: remove unreachable break
  PM: AVS: Drop the avs directory and the corresponding Kconfig
  PM: AVS: qcom-cpr: Move the driver to the qcom specific drivers
  PM: runtime: Fix typo in pm_runtime_set_active() helper comment
  PM: domains: Fix build error for genpd notifiers
  powercap: Fix typo in Kconfig "Plance" -> "Plane"
  cpufreq: schedutil: restore cached freq when next_f is not changed
  acpi-cpufreq: Honor _PSD table setting on new AMD CPUs
  PM: AVS: smartreflex Move driver to soc specific drivers
  PM: AVS: rockchip-io: Move the driver to the rockchip specific drivers
  PM: domains: enable domain idle state accounting
  PM: domains: Add curly braces to delimit comment + statement block
  PM: domains: Add support for PM domain on/off notifiers for genpd
  powercap/intel_rapl: enumerate Psys RAPL domain together with package RAPL domain
  powercap/intel_rapl: Fix domain detection
  intel_idle: Ignore _CST if control cannot be taken from the platform
  cpuidle: Remove pointless stub
  intel_idle: mention assumption that WBINVD is not needed
  MAINTAINERS: Add section for cpuidle-psci PM domain
  cpufreq: intel_pstate: Delete intel_pstate sysfs if failed to register the driver
  ...
2020-10-23 16:27:03 -07:00
Linus Torvalds
3cb12d27ff Fixes for 5.10-rc1 from the networking tree:
Cross-tree/merge window issues:
 
  - rtl8150: don't incorrectly assign random MAC addresses; fix late
    in the 5.9 cycle started depending on a return code from
    a function which changed with the 5.10 PR from the usb subsystem
 
 Current release - regressions:
 
  - Revert "virtio-net: ethtool configurable RXCSUM", it was causing
    crashes at probe when control vq was not negotiated/available
 
 Previous releases - regressions:
 
  - ixgbe: fix probing of multi-port 10 Gigabit Intel NICs with an MDIO
    bus, only first device would be probed correctly
 
  - nexthop: Fix performance regression in nexthop deletion by
    effectively switching from recently added synchronize_rcu()
    to synchronize_rcu_expedited()
 
  - netsec: ignore 'phy-mode' device property on ACPI systems;
    the property is not populated correctly by the firmware,
    but firmware configures the PHY so just keep boot settings
 
 Previous releases - always broken:
 
  - tcp: fix to update snd_wl1 in bulk receiver fast path, addressing
    bulk transfers getting "stuck"
 
  - icmp: randomize the global rate limiter to prevent attackers from
    getting useful signal
 
  - r8169: fix operation under forced interrupt threading, make the
    driver always use hard irqs, even on RT, given the handler is
    light and only wants to schedule napi (and do so through
    a _irqoff() variant, preferably)
 
  - bpf: Enforce pointer id generation for all may-be-null register
    type to avoid pointers erroneously getting marked as null-checked
 
  - tipc: re-configure queue limit for broadcast link
 
  - net/sched: act_tunnel_key: fix OOB write in case of IPv6 ERSPAN
    tunnels
 
  - fix various issues in chelsio inline tls driver
 
 Misc:
 
  - bpf: improve just-added bpf_redirect_neigh() helper api to support
    supplying nexthop by the caller - in case BPF program has already
    done a lookup we can avoid doing another one
 
  - remove unnecessary break statements
 
  - make MCTCP not select IPV6, but rather depend on it
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAl+R+5UACgkQMUZtbf5S
 Irt9KxAAiYme2aSvMOni0NQsOgQ5mVsy7tk0/4dyRqkAx0ggrfGcFuhgZYNm8ZKY
 KoQsQyn30Wb/2wAp1vX2I4Fod67rFyBfQg/8iWiEAu47X7Bj1lpPPJexSPKhF9/X
 e0TuGxZtoaDuV9C3Su/FOjRmnShGSFQu1SCyJThshwaGsFL3YQ0Ut07VRgRF8x05
 A5fy2SVVIw0JOQgV1oH0GP5oEK3c50oGnaXt8emm56PxVIfAYY0oq69hQUzrfMFP
 zV9R0XbnbCIibT8R3lEghjtXavtQTzK5rYDKazTeOyDU87M+yuykNYj7MhgDwl9Q
 UdJkH2OpMlJylEH3asUjz/+ObMhXfOuj/ZS3INtO5omBJx7x76egDZPMQe4wlpcC
 NT5EZMS7kBdQL8xXDob7hXsvFpuEErSUGruYTHp4H52A9ke1dRTH2kQszcKk87V3
 s+aVVPtJ5bHzF3oGEvfwP0DFLTF6WvjD0Ts0LmTY2DhpE//tFWV37j60Ni5XU21X
 fCPooihQbLOsq9D8zc0ydEvCg2LLWMXM5ovCkqfIAJzbGVYhnxJSryZwpOlKDS0y
 LiUmLcTZDoNR/szx0aJhVHdUUVgXDX/GsllHoc1w7ZvDRMJn40K+xnaF3dSMwtIl
 imhfc5pPi6fdBgjB0cFYRPfhwiwlPMQ4YFsOq9JvynJzmt6P5FQ=
 =ceke
 -----END PGP SIGNATURE-----

Merge tag 'net-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Cross-tree/merge window issues:

   - rtl8150: don't incorrectly assign random MAC addresses; fix late in
     the 5.9 cycle started depending on a return code from a function
     which changed with the 5.10 PR from the usb subsystem

  Current release regressions:

   - Revert "virtio-net: ethtool configurable RXCSUM", it was causing
     crashes at probe when control vq was not negotiated/available

  Previous release regressions:

   - ixgbe: fix probing of multi-port 10 Gigabit Intel NICs with an MDIO
     bus, only first device would be probed correctly

   - nexthop: Fix performance regression in nexthop deletion by
     effectively switching from recently added synchronize_rcu() to
     synchronize_rcu_expedited()

   - netsec: ignore 'phy-mode' device property on ACPI systems; the
     property is not populated correctly by the firmware, but firmware
     configures the PHY so just keep boot settings

  Previous releases - always broken:

   - tcp: fix to update snd_wl1 in bulk receiver fast path, addressing
     bulk transfers getting "stuck"

   - icmp: randomize the global rate limiter to prevent attackers from
     getting useful signal

   - r8169: fix operation under forced interrupt threading, make the
     driver always use hard irqs, even on RT, given the handler is light
     and only wants to schedule napi (and do so through a _irqoff()
     variant, preferably)

   - bpf: Enforce pointer id generation for all may-be-null register
     type to avoid pointers erroneously getting marked as null-checked

   - tipc: re-configure queue limit for broadcast link

   - net/sched: act_tunnel_key: fix OOB write in case of IPv6 ERSPAN
     tunnels

   - fix various issues in chelsio inline tls driver

  Misc:

   - bpf: improve just-added bpf_redirect_neigh() helper api to support
     supplying nexthop by the caller - in case BPF program has already
     done a lookup we can avoid doing another one

   - remove unnecessary break statements

   - make MCTCP not select IPV6, but rather depend on it"

* tag 'net-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (62 commits)
  tcp: fix to update snd_wl1 in bulk receiver fast path
  net: Properly typecast int values to set sk_max_pacing_rate
  netfilter: nf_fwd_netdev: clear timestamp in forwarding path
  ibmvnic: save changed mac address to adapter->mac_addr
  selftests: mptcp: depends on built-in IPv6
  Revert "virtio-net: ethtool configurable RXCSUM"
  rtnetlink: fix data overflow in rtnl_calcit()
  net: ethernet: mtk-star-emac: select REGMAP_MMIO
  net: hdlc_raw_eth: Clear the IFF_TX_SKB_SHARING flag after calling ether_setup
  net: hdlc: In hdlc_rcv, check to make sure dev is an HDLC device
  bpf, libbpf: Guard bpf inline asm from bpf_tail_call_static
  bpf, selftests: Extend test_tc_redirect to use modified bpf_redirect_neigh()
  bpf: Fix bpf_redirect_neigh helper api to support supplying nexthop
  mptcp: depends on IPV6 but not as a module
  sfc: move initialisation of efx->filter_sem to efx_init_struct()
  mpls: load mpls_gso after mpls_iptunnel
  net/sched: act_tunnel_key: fix OOB write in case of IPv6 ERSPAN tunnels
  net/sched: act_gate: Unlock ->tcfa_lock in tc_setup_flow_action()
  net: dsa: bcm_sf2: make const array static, makes object smaller
  mptcp: MPTCP_IPV6 should depend on IPV6 instead of selecting it
  ...
2020-10-23 12:05:49 -07:00
Linus Torvalds
4a22709e21 arch-cleanup-2020-10-22
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl+SOXIQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgptrcD/93VUDmRAn73ChKNd0TtXUicJlAlNLVjvfs
 VFTXWBDnlJnGkZT7ElkDD9b8dsz8l4xGf/QZ5dzhC/th2OsfObQkSTfe0lv5cCQO
 mX7CRSrDpjaHtW+WGPDa0oQsGgIfpqUz2IOg9NKbZZ1LJ2uzYfdOcf3oyRgwZJ9B
 I3sh1vP6OzjZVVCMmtMTM+sYZEsDoNwhZwpkpiwMmj8tYtOPgKCYKpqCiXrGU0x2
 ML5FtDIwiwU+O3zYYdCBWqvCb2Db0iA9Aov2whEBz/V2jnmrN5RMA/90UOh1E2zG
 br4wM1Wt3hNrtj5qSxZGlF/HEMYJVB8Z2SgMjYu4vQz09qRVVqpGdT/dNvLAHQWg
 w4xNCj071kVZDQdfwnqeWSKYUau9Xskvi8xhTT+WX8a5CsbVrM9vGslnS5XNeZ6p
 h2D3Q+TAYTvT756icTl0qsYVP7PrPY7DdmQYu0q+Lc3jdGI+jyxO2h9OFBRLZ3p6
 zFX2N8wkvvCCzP2DwVnnhIi/GovpSh7ksHnb039F36Y/IhZPqV1bGqdNQVdanv6I
 8fcIDM6ltRQ7dO2Br5f1tKUZE9Pm6x60b/uRVjhfVh65uTEKyGRhcm5j9ztzvQfI
 cCBg4rbVRNKolxuDEkjsAFXVoiiEEsb7pLf4pMO+Dr62wxFG589tQNySySneUIVZ
 J9ILnGAAeQ==
 =aVWo
 -----END PGP SIGNATURE-----

Merge tag 'arch-cleanup-2020-10-22' of git://git.kernel.dk/linux-block

Pull arch task_work cleanups from Jens Axboe:
 "Two cleanups that don't fit other categories:

   - Finally get the task_work_add() cleanup done properly, so we don't
     have random 0/1/false/true/TWA_SIGNAL confusing use cases. Updates
     all callers, and also fixes up the documentation for
     task_work_add().

   - While working on some TIF related changes for 5.11, this
     TIF_NOTIFY_RESUME cleanup fell out of that. Remove some arch
     duplication for how that is handled"

* tag 'arch-cleanup-2020-10-22' of git://git.kernel.dk/linux-block:
  task_work: cleanup notification modes
  tracehook: clear TIF_NOTIFY_RESUME in tracehook_notify_resume()
2020-10-23 10:06:38 -07:00
Linus Torvalds
746b25b1aa Kbuild updates for v5.10
- Support 'make compile_commands.json' to generate the compilation
    database more easily, avoiding stale entries
 
  - Support 'make clang-analyzer' and 'make clang-tidy' for static checks
    using clang-tidy
 
  - Preprocess scripts/modules.lds.S to allow CONFIG options in the module
    linker script
 
  - Drop cc-option tests from compiler flags supported by our minimal
    GCC/Clang versions
 
  - Use always 12-digits commit hash for CONFIG_LOCALVERSION_AUTO=y
 
  - Use sha1 build id for both BFD linker and LLD
 
  - Improve deb-pkg for reproducible builds and rootless builds
 
  - Remove stale, useless scripts/namespace.pl
 
  - Turn -Wreturn-type warning into error
 
  - Fix build error of deb-pkg when CONFIG_MODULES=n
 
  - Replace 'hostname' command with more portable 'uname -n'
 
  - Various Makefile cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAl+RfS0VHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGG1QP/2hzoMzK1YXErPUhGrhYU1rxz7Nu
 HkLTIkyKF1HPwSJf5XyNW/FTBI4SDlkNoVg/weEDCS1yFxxpvQLIck8ChzA1kIIM
 P+1IfBWOTzqn91XsapU2zwSno3gylphVchVIvYAB3oLUotGeMSluy1cQtBRzyA5D
 rj2Q7H8fzkzk3YoBcBC/BOKDlfo/usqQ1X/gsfRFwN/BJxeZSYoujNBE7KtHaDsd
 8K/ggBIqmST4NBn+M8c11d8CxzvWbtG1gq3EkUL5nG8T13DsGn1EFC0SPt85bkvv
 f9YywfJi37HixhZzK6tXYjN/PWoiEY6z90mhd0NtZghQT7kQMiTQ3sWrM8dX3ssf
 phBzO94uFQDjhyxOaSSsCoI/TIciAPo4+G8PNjcaEtj63IEfhEz/dnlstYwY5Y9P
 Pp3aZtVjSGJwGW2u2EUYj6paFVqjf6DXQjQKPNHnsYCEidIvFTjjguRGvx9gl6mx
 yd8oseOsAtOEf0alRe9MMdvN17O3UrRAxgBdap7fktg02TLVRGxZIbuwKmBf29ho
 ORl9zeFkYBn6XQFyuItJoXy/kYFyHDaBEPYCRQcY4dwqcjZIiAc/FhYbqYthJ59L
 5vLN2etmDIVSuUv1J5nBqHHGCqJChykbqg7riQ651dCNKw4gZB8ctCay2lXhBXMg
 1mqOcoG5WWL7//F+
 =tZRN
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - Support 'make compile_commands.json' to generate the compilation
   database more easily, avoiding stale entries

 - Support 'make clang-analyzer' and 'make clang-tidy' for static checks
   using clang-tidy

 - Preprocess scripts/modules.lds.S to allow CONFIG options in the
   module linker script

 - Drop cc-option tests from compiler flags supported by our minimal
   GCC/Clang versions

 - Use always 12-digits commit hash for CONFIG_LOCALVERSION_AUTO=y

 - Use sha1 build id for both BFD linker and LLD

 - Improve deb-pkg for reproducible builds and rootless builds

 - Remove stale, useless scripts/namespace.pl

 - Turn -Wreturn-type warning into error

 - Fix build error of deb-pkg when CONFIG_MODULES=n

 - Replace 'hostname' command with more portable 'uname -n'

 - Various Makefile cleanups

* tag 'kbuild-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (34 commits)
  kbuild: Use uname for LINUX_COMPILE_HOST detection
  kbuild: Only add -fno-var-tracking-assignments for old GCC versions
  kbuild: remove leftover comment for filechk utility
  treewide: remove DISABLE_LTO
  kbuild: deb-pkg: clean up package name variables
  kbuild: deb-pkg: do not build linux-headers package if CONFIG_MODULES=n
  kbuild: enforce -Werror=return-type
  scripts: remove namespace.pl
  builddeb: Add support for all required debian/rules targets
  builddeb: Enable rootless builds
  builddeb: Pass -n to gzip for reproducible packages
  kbuild: split the build log of kallsyms
  kbuild: explicitly specify the build id style
  scripts/setlocalversion: make git describe output more reliable
  kbuild: remove cc-option test of -Werror=date-time
  kbuild: remove cc-option test of -fno-stack-check
  kbuild: remove cc-option test of -fno-strict-overflow
  kbuild: move CFLAGS_{KASAN,UBSAN,KCSAN} exports to relevant Makefiles
  kbuild: remove redundant CONFIG_KASAN check from scripts/Makefile.kasan
  kbuild: do not create built-in objects for external module builds
  ...
2020-10-22 13:13:57 -07:00
Linus Torvalds
2b71482060 Modules updates for v5.10
Summary of modules changes for the 5.10 merge window:
 
 - Code cleanups. More informative error messages and statically
   initialize init_free_wq to avoid a workqueue warning.
 
 Signed-off-by: Jessica Yu <jeyu@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEVrp26glSWYuDNrCUwEV+OM47wXIFAl+RbXUQHGpleXVAa2Vy
 bmVsLm9yZwAKCRDARX44zjvBcm/QD/42Nm/9UZB1DKL2vh/rWw3UJWN0BU05+1s9
 QvSVQxKEXgEA3tEE1HP7LL5g/6JVFqypghUsX69Uf79P0FlIjlvcCoOQPSaNLXyj
 Md90Jg26AQiI9CrtDCQx2uli8KVUCVAtzuKOqShjUwSiIvI8yBoQpjMMRdnmxNGT
 HTM0yEbdD4Nfa9F/v51J0Tnq3Czj8IlIEjCuHVpyNTbaRZVGfmIJygKDRWQWh4lU
 fnOWLzHeARsvW7T0VvpolenPhoLp+Z+mWKeNg2XJpXpMc9RtZk85DUPn0MDVbL+b
 sZhQXTTsdvF0GVCsdm06wyWLuFc9ti7Oy5Ca+aHdaYsWLoYpoGwW7F7EID4dHu0a
 Sj9cUIWz7ss+TA64Itd5PImaSQasdsU0mGn5PVZNxqnBjlaw6scMpc8AZVXJhqPR
 ihSQgNELTV62AtS4zmpu2K0uVMsjCHmpXMMVkpxZYUfFOoHV4kqFn1iMGPUe9ud3
 dLq3scH6GY0m2nLa0n/Qwk7GLA3X0WO49BDBY7Vwg2XbqY52i5GYiKWJux2wf5hM
 j4+F/iB3IVysocKpP3brGU7hjPKCjRy9KwJ7IHW7w08j/9Q0p/RNGLC+ZyP8op+i
 WgOjXyWiPpLr1XDS8NYVCVAEXiDBa87pKWhdkOahYLvbp24bVWlWRRFk0U8Nl06y
 rx6DGdGrLQ==
 =7nzH
 -----END PGP SIGNATURE-----

Merge tag 'modules-for-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux

Pull modules updates from Jessica Yu:
 "Code cleanups: more informative error messages and statically
  initialize init_free_wq to avoid a workqueue warning"

* tag 'modules-for-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
  module: statically initialize init section freeing data
  module: Add more error message for failed kernel module loading
2020-10-22 13:08:57 -07:00
Linus Torvalds
f56e65dff6 Merge branch 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull initial set_fs() removal from Al Viro:
 "Christoph's set_fs base series + fixups"

* 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs: Allow a NULL pos pointer to __kernel_read
  fs: Allow a NULL pos pointer to __kernel_write
  powerpc: remove address space overrides using set_fs()
  powerpc: use non-set_fs based maccess routines
  x86: remove address space overrides using set_fs()
  x86: make TASK_SIZE_MAX usable from assembly code
  x86: move PAGE_OFFSET, TASK_SIZE & friends to page_{32,64}_types.h
  lkdtm: remove set_fs-based tests
  test_bitmap: remove user bitmap tests
  uaccess: add infrastructure for kernel builds with set_fs()
  fs: don't allow splice read/write without explicit ops
  fs: don't allow kernel reads and writes without iter ops
  sysctl: Convert to iter interfaces
  proc: add a read_iter method to proc proc_ops
  proc: cleanup the compat vs no compat file ops
  proc: remove a level of indentation in proc_get_inode
2020-10-22 09:59:21 -07:00
Qiujun Huang
e1981f75d3 ring-buffer: Update the description for ring_buffer_wait
The function changed at some point, but the description was not
updated.

Link: https://lkml.kernel.org/r/20201017095246.5170-1-hqjagain@gmail.com

Signed-off-by: Qiujun Huang <hqjagain@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-10-22 11:26:26 -04:00
Qiujun Huang
0a1754b2a9 ring-buffer: Return 0 on success from ring_buffer_resize()
We don't need to check the new buffer size, and the return value
had confused resize_buffer_duplicate_size().
...
	ret = ring_buffer_resize(trace_buf->buffer,
		per_cpu_ptr(size_buf->data,cpu_id)->entries, cpu_id);
	if (ret == 0)
		per_cpu_ptr(trace_buf->data, cpu_id)->entries =
			per_cpu_ptr(size_buf->data, cpu_id)->entries;
...

Link: https://lkml.kernel.org/r/20201019142242.11560-1-hqjagain@gmail.com

Cc: stable@vger.kernel.org
Fixes: d60da506cb ("tracing: Add a resize function to make one buffer equivalent to another buffer")
Signed-off-by: Qiujun Huang <hqjagain@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-10-22 11:24:10 -04:00
Peter Zijlstra
f8e48a3dca lockdep: Fix preemption WARN for spurious IRQ-enable
It is valid (albeit uncommon) to call local_irq_enable() without first
having called local_irq_disable(). In this case we enter
lockdep_hardirqs_on*() with IRQs enabled and trip a preemption warning
for using __this_cpu_read().

Use this_cpu_read() instead to avoid the warning.

Fixes: 4d004099a6 ("lockdep: Fix lockdep recursion")
Reported-by: syzbot+53f8ce8bbc07924b6417@syzkaller.appspotmail.com
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2020-10-22 12:37:22 +02:00
Sami Tolvanen
0f6372e522 treewide: remove DISABLE_LTO
This change removes all instances of DISABLE_LTO from
Makefiles, as they are currently unused, and the preferred
method of disabling LTO is to filter out the flags instead.

Note added by Masahiro Yamada:
DISABLE_LTO was added as preparation for GCC LTO, but GCC LTO was
not pulled into the mainline. (https://lkml.org/lkml/2014/4/8/272)

Suggested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-10-21 00:28:53 +09:00
Andrei Vagin
c2f7d08ccc futex: Adjust absolute futex timeouts with per time namespace offset
For all commands except FUTEX_WAIT, the timeout is interpreted as an
absolute value. This absolute value is inside the task's time namespace and
has to be converted to the host's time.

Fixes: 5a590f35ad ("posix-clocks: Wire up clock_gettime() with timens offsets")
Reported-by: Hans van der Laan <j.h.vanderlaan@student.utwente.nl>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20201015160020.293748-1-avagin@gmail.com
2020-10-20 17:02:57 +02:00
Christoph Hellwig
695cebe58d dma-mapping: move more functions to dma-map-ops.h
Due to a mismerge a bunch of prototypes that should have moved to
dma-map-ops.h are still in dma-mapping.h, fix that up.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-20 10:41:07 +02:00
Martin KaFai Lau
93c230e3f5 bpf: Enforce id generation for all may-be-null register type
The commit af7ec13833 ("bpf: Add bpf_skc_to_tcp6_sock() helper")
introduces RET_PTR_TO_BTF_ID_OR_NULL and
the commit eaa6bcb71e ("bpf: Introduce bpf_per_cpu_ptr()")
introduces RET_PTR_TO_MEM_OR_BTF_ID_OR_NULL.
Note that for RET_PTR_TO_MEM_OR_BTF_ID_OR_NULL, the reg0->type
could become PTR_TO_MEM_OR_NULL which is not covered by
BPF_PROBE_MEM.

The BPF_REG_0 will then hold a _OR_NULL pointer type. This _OR_NULL
pointer type requires the bpf program to explicitly do a NULL check first.
After NULL check, the verifier will mark all registers having
the same reg->id as safe to use.  However, the reg->id
is not set for those new _OR_NULL return types.  One of the ways
that may be wrong is, checking NULL for one btf_id typed pointer will
end up validating all other btf_id typed pointers because
all of them have id == 0.  The later tests will exercise
this path.

To fix it and also avoid similar issue in the future, this patch
moves the id generation logic out of each individual RET type
test in check_helper_call().  Instead, it does one
reg_type_may_be_null() test and then do the id generation
if needed.

This patch also adds a WARN_ON_ONCE in mark_ptr_or_null_reg()
to catch future breakage.

The _OR_NULL pointer usage in the bpf_iter_reg.ctx_arg_info is
fine because it just happens that the existing id generation after
check_ctx_access() has covered it.  It is also using the
reg_type_may_be_null() to decide if id generation is needed or not.

Fixes: af7ec13833 ("bpf: Add bpf_skc_to_tcp6_sock() helper")
Fixes: eaa6bcb71e ("bpf: Introduce bpf_per_cpu_ptr()")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201019194212.1050855-1-kafai@fb.com
2020-10-19 15:57:42 -07:00
Tom Rix
76702a2e72 bpf: Remove unneeded break
A break is not needed if it is preceded by a return.

Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201019173846.1021-1-trix@redhat.com
2020-10-19 20:40:21 +02:00
Wei Wang
0070ea2962 cpufreq: schedutil: restore cached freq when next_f is not changed
We have the raw cached freq to reduce the chance in calling cpufreq
driver where it could be costly in some arch/SoC.

Currently, the raw cached freq is reset in sugov_update_single() when
it avoids frequency reduction (which is not desirable sometimes), but
it is better to restore the previous value of it in that case,
because it may not change in the next cycle and it is not necessary
to change the CPU frequency then.

Adapted from https://android-review.googlesource.com/1352810/

Signed-off-by: Wei Wang <wvw@google.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Subject edit and changelog rewrite ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-10-19 17:38:16 +02:00
Linus Torvalds
41eea65e2a Merge tag 'core-rcu-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU changes from Ingo Molnar:

 - Debugging for smp_call_function()

 - RT raw/non-raw lock ordering fixes

 - Strict grace periods for KASAN

 - New smp_call_function() torture test

 - Torture-test updates

 - Documentation updates

 - Miscellaneous fixes

[ This doesn't actually pull the tag - I've dropped the last merge from
  the RCU branch due to questions about the series.   - Linus ]

* tag 'core-rcu-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (77 commits)
  smp: Make symbol 'csd_bug_count' static
  kernel/smp: Provide CSD lock timeout diagnostics
  smp: Add source and destination CPUs to __call_single_data
  rcu: Shrink each possible cpu krcp
  rcu/segcblist: Prevent useless GP start if no CBs to accelerate
  torture: Add gdb support
  rcutorture: Allow pointer leaks to test diagnostic code
  rcutorture: Hoist OOM registry up one level
  refperf: Avoid null pointer dereference when buf fails to allocate
  rcutorture: Properly synchronize with OOM notifier
  rcutorture: Properly set rcu_fwds for OOM handling
  torture: Add kvm.sh --help and update help message
  rcutorture: Add CONFIG_PROVE_RCU_LIST to TREE05
  torture: Update initrd documentation
  rcutorture: Replace HTTP links with HTTPS ones
  locktorture: Make function torture_percpu_rwsem_init() static
  torture: document --allcpus argument added to the kvm.sh script
  rcutorture: Output number of elapsed grace periods
  rcutorture: Remove KCSAN stubs
  rcu: Remove unused "cpu" parameter from rcu_report_qs_rdp()
  ...
2020-10-18 14:34:50 -07:00
Minchan Kim
ecb8ac8b1f mm/madvise: introduce process_madvise() syscall: an external memory hinting API
There is usecase that System Management Software(SMS) want to give a
memory hint like MADV_[COLD|PAGEEOUT] to other processes and in the
case of Android, it is the ActivityManagerService.

The information required to make the reclaim decision is not known to the
app.  Instead, it is known to the centralized userspace
daemon(ActivityManagerService), and that daemon must be able to initiate
reclaim on its own without any app involvement.

To solve the issue, this patch introduces a new syscall
process_madvise(2).  It uses pidfd of an external process to give the
hint.  It also supports vector address range because Android app has
thousands of vmas due to zygote so it's totally waste of CPU and power if
we should call the syscall one by one for each vma.(With testing 2000-vma
syscall vs 1-vector syscall, it showed 15% performance improvement.  I
think it would be bigger in real practice because the testing ran very
cache friendly environment).

Another potential use case for the vector range is to amortize the cost
ofTLB shootdowns for multiple ranges when using MADV_DONTNEED; this could
benefit users like TCP receive zerocopy and malloc implementations.  In
future, we could find more usecases for other advises so let's make it
happens as API since we introduce a new syscall at this moment.  With
that, existing madvise(2) user could replace it with process_madvise(2)
with their own pid if they want to have batch address ranges support
feature.

ince it could affect other process's address range, only privileged
process(PTRACE_MODE_ATTACH_FSCREDS) or something else(e.g., being the same
UID) gives it the right to ptrace the process could use it successfully.
The flag argument is reserved for future use if we need to extend the API.

I think supporting all hints madvise has/will supported/support to
process_madvise is rather risky.  Because we are not sure all hints make
sense from external process and implementation for the hint may rely on
the caller being in the current context so it could be error-prone.  Thus,
I just limited hints as MADV_[COLD|PAGEOUT] in this patch.

If someone want to add other hints, we could hear the usecase and review
it for each hint.  It's safer for maintenance rather than introducing a
buggy syscall but hard to fix it later.

So finally, the API is as follows,

      ssize_t process_madvise(int pidfd, const struct iovec *iovec,
                unsigned long vlen, int advice, unsigned int flags);

    DESCRIPTION
      The process_madvise() system call is used to give advice or directions
      to the kernel about the address ranges from external process as well as
      local process. It provides the advice to address ranges of process
      described by iovec and vlen. The goal of such advice is to improve
      system or application performance.

      The pidfd selects the process referred to by the PID file descriptor
      specified in pidfd. (See pidofd_open(2) for further information)

      The pointer iovec points to an array of iovec structures, defined in
      <sys/uio.h> as:

        struct iovec {
            void *iov_base;         /* starting address */
            size_t iov_len;         /* number of bytes to be advised */
        };

      The iovec describes address ranges beginning at address(iov_base)
      and with size length of bytes(iov_len).

      The vlen represents the number of elements in iovec.

      The advice is indicated in the advice argument, which is one of the
      following at this moment if the target process specified by pidfd is
      external.

        MADV_COLD
        MADV_PAGEOUT

      Permission to provide a hint to external process is governed by a
      ptrace access mode PTRACE_MODE_ATTACH_FSCREDS check; see ptrace(2).

      The process_madvise supports every advice madvise(2) has if target
      process is in same thread group with calling process so user could
      use process_madvise(2) to extend existing madvise(2) to support
      vector address ranges.

    RETURN VALUE
      On success, process_madvise() returns the number of bytes advised.
      This return value may be less than the total number of requested
      bytes, if an error occurred. The caller should check return value
      to determine whether a partial advice occurred.

FAQ:

Q.1 - Why does any external entity have better knowledge?

Quote from Sandeep

"For Android, every application (including the special SystemServer)
are forked from Zygote.  The reason of course is to share as many
libraries and classes between the two as possible to benefit from the
preloading during boot.

After applications start, (almost) all of the APIs end up calling into
this SystemServer process over IPC (binder) and back to the
application.

In a fully running system, the SystemServer monitors every single
process periodically to calculate their PSS / RSS and also decides
which process is "important" to the user for interactivity.

So, because of how these processes start _and_ the fact that the
SystemServer is looping to monitor each process, it does tend to *know*
which address range of the application is not used / useful.

Besides, we can never rely on applications to clean things up
themselves.  We've had the "hey app1, the system is low on memory,
please trim your memory usage down" notifications for a long time[1].
They rely on applications honoring the broadcasts and very few do.

So, if we want to avoid the inevitable killing of the application and
restarting it, some way to be able to tell the OS about unimportant
memory in these applications will be useful.

- ssp

Q.2 - How to guarantee the race(i.e., object validation) between when
giving a hint from an external process and get the hint from the target
process?

process_madvise operates on the target process's address space as it
exists at the instant that process_madvise is called.  If the space
target process can run between the time the process_madvise process
inspects the target process address space and the time that
process_madvise is actually called, process_madvise may operate on
memory regions that the calling process does not expect.  It's the
responsibility of the process calling process_madvise to close this
race condition.  For example, the calling process can suspend the
target process with ptrace, SIGSTOP, or the freezer cgroup so that it
doesn't have an opportunity to change its own address space before
process_madvise is called.  Another option is to operate on memory
regions that the caller knows a priori will be unchanged in the target
process.  Yet another option is to accept the race for certain
process_madvise calls after reasoning that mistargeting will do no
harm.  The suggested API itself does not provide synchronization.  It
also apply other APIs like move_pages, process_vm_write.

The race isn't really a problem though.  Why is it so wrong to require
that callers do their own synchronization in some manner?  Nobody
objects to write(2) merely because it's possible for two processes to
open the same file and clobber each other's writes --- instead, we tell
people to use flock or something.  Think about mmap.  It never
guarantees newly allocated address space is still valid when the user
tries to access it because other threads could unmap the memory right
before.  That's where we need synchronization by using other API or
design from userside.  It shouldn't be part of API itself.  If someone
needs more fine-grained synchronization rather than process level,
there were two ideas suggested - cookie[2] and anon-fd[3].  Both are
applicable via using last reserved argument of the API but I don't
think it's necessary right now since we have already ways to prevent
the race so don't want to add additional complexity with more
fine-grained optimization model.

To make the API extend, it reserved an unsigned long as last argument
so we could support it in future if someone really needs it.

Q.3 - Why doesn't ptrace work?

Injecting an madvise in the target process using ptrace would not work
for us because such injected madvise would have to be executed by the
target process, which means that process would have to be runnable and
that creates the risk of the abovementioned race and hinting a wrong
VMA.  Furthermore, we want to act the hint in caller's context, not the
callee's, because the callee is usually limited in cpuset/cgroups or
even freezed state so they can't act by themselves quick enough, which
causes more thrashing/kill.  It doesn't work if the target process are
ptraced(e.g., strace, debugger, minidump) because a process can have at
most one ptracer.

[1] https://developer.android.com/topic/performance/memory"

[2] process_getinfo for getting the cookie which is updated whenever
    vma of process address layout are changed - Daniel Colascione -
    https://lore.kernel.org/lkml/20190520035254.57579-1-minchan@kernel.org/T/#m7694416fd179b2066a2c62b5b139b14e3894e224

[3] anonymous fd which is used for the object(i.e., address range)
    validation - Michal Hocko -
    https://lore.kernel.org/lkml/20200120112722.GY18451@dhcp22.suse.cz/

[minchan@kernel.org: fix process_madvise build break for arm64]
  Link: http://lkml.kernel.org/r/20200303145756.GA219683@google.com
[minchan@kernel.org: fix build error for mips of process_madvise]
  Link: http://lkml.kernel.org/r/20200508052517.GA197378@google.com
[akpm@linux-foundation.org: fix patch ordering issue]
[akpm@linux-foundation.org: fix arm64 whoops]
[minchan@kernel.org: make process_madvise() vlen arg have type size_t, per Florian]
[akpm@linux-foundation.org: fix i386 build]
[sfr@canb.auug.org.au: fix syscall numbering]
  Link: https://lkml.kernel.org/r/20200905142639.49fc3f1a@canb.auug.org.au
[sfr@canb.auug.org.au: madvise.c needs compat.h]
  Link: https://lkml.kernel.org/r/20200908204547.285646b4@canb.auug.org.au
[minchan@kernel.org: fix mips build]
  Link: https://lkml.kernel.org/r/20200909173655.GC2435453@google.com
[yuehaibing@huawei.com: remove duplicate header which is included twice]
  Link: https://lkml.kernel.org/r/20200915121550.30584-1-yuehaibing@huawei.com
[minchan@kernel.org: do not use helper functions for process_madvise]
  Link: https://lkml.kernel.org/r/20200921175539.GB387368@google.com
[akpm@linux-foundation.org: pidfd_get_pid() gained an argument]
[sfr@canb.auug.org.au: fix up for "iov_iter: transparently handle compat iovecs in import_iovec"]
  Link: https://lkml.kernel.org/r/20200928212542.468e1fef@canb.auug.org.au

Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Christian Brauner <christian@brauner.io>
Cc: Daniel Colascione <dancol@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Dias <joaodias@google.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oleksandr Natalenko <oleksandr@redhat.com>
Cc: Sandeep Patil <sspatil@google.com>
Cc: SeongJae Park <sj38.park@gmail.com>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Sonny Rao <sonnyrao@google.com>
Cc: Tim Murray <timmurray@google.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Florian Weimer <fw@deneb.enyo.de>
Cc: <linux-man@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200302193630.68771-3-minchan@kernel.org
Link: http://lkml.kernel.org/r/20200508183320.GA125527@google.com
Link: http://lkml.kernel.org/r/20200622192900.22757-4-minchan@kernel.org
Link: https://lkml.kernel.org/r/20200901000633.1920247-4-minchan@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-18 09:27:10 -07:00
Minchan Kim
1aa92cd31c pid: move pidfd_get_pid() to pid.c
process_madvise syscall needs pidfd_get_pid function to translate pidfd to
pid so this patch move the function to kernel/pid.c.

Suggested-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jann Horn <jannh@google.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Daniel Colascione <dancol@google.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Dias <joaodias@google.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oleksandr Natalenko <oleksandr@redhat.com>
Cc: Sandeep Patil <sspatil@google.com>
Cc: SeongJae Park <sj38.park@gmail.com>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Sonny Rao <sonnyrao@google.com>
Cc: Tim Murray <timmurray@google.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Florian Weimer <fw@deneb.enyo.de>
Cc: <linux-man@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200302193630.68771-5-minchan@kernel.org
Link: http://lkml.kernel.org/r/20200622192900.22757-3-minchan@kernel.org
Link: https://lkml.kernel.org/r/20200901000633.1920247-3-minchan@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-18 09:27:10 -07:00
Jens Axboe
91989c7078 task_work: cleanup notification modes
A previous commit changed the notification mode from true/false to an
int, allowing notify-no, notify-yes, or signal-notify. This was
backwards compatible in the sense that any existing true/false user
would translate to either 0 (on notification sent) or 1, the latter
which mapped to TWA_RESUME. TWA_SIGNAL was assigned a value of 2.

Clean this up properly, and define a proper enum for the notification
mode. Now we have:

- TWA_NONE. This is 0, same as before the original change, meaning no
  notification requested.
- TWA_RESUME. This is 1, same as before the original change, meaning
  that we use TIF_NOTIFY_RESUME.
- TWA_SIGNAL. This uses TIF_SIGPENDING/JOBCTL_TASK_WORK for the
  notification.

Clean up all the callers, switching their 0/1/false/true to using the
appropriate TWA_* mode for notifications.

Fixes: e91b481623 ("task_work: teach task_work_add() to do signal_wake_up()")
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17 15:05:30 -06:00
Jens Axboe
3c532798ec tracehook: clear TIF_NOTIFY_RESUME in tracehook_notify_resume()
All the callers currently do this, clean it up and move the clearing
into tracehook_notify_resume() instead.

Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17 15:04:36 -06:00
Linus Torvalds
54a4c789ca docs updates for v5.10-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+QmuaPwR3wnBdVwACF8+vY7k4RUFAl+JNGYACgkQCF8+vY7k
 4RV/TA//ZoRoMQE5B6zwO4kOGILMbmW2uepjoEysLgus2ctkTUoRkpNLWS3SozcU
 6c/eW1rC4Fji24te6lwusciZa5zQgbGMjFYk1LhnJ65lJA+kQ+kV1DGz/ZWtklMM
 gLX20+tQADqGl+u2dmFCvmRhPWJ9nzt1C0auN7dGeu+9g97GnhKG6o2Kv/nVCb68
 qMmAs9UrfN24DO5G1ixkdY08nSNJPrpgQnIR2ruUysUII/yTTtcnmHDbH3WWL6+9
 2P87AZ6zsa3FdBhAjmG5YJklQgPkLFWEykHMTqq/Mkcpff/JB/AayrL6XNB2QoZb
 YXLHJp3Na6iBmdmHhecg+VQDgz28UfMk+p+HFoJh8RTtJa9/qJvYdJmIE/mUPrnY
 gL4jNgMVwkptGHXh7IRuSLysT5heJPMQss6TfZ6yYadeOIpx7W8MCAYnGffiElLQ
 hmKdmyCszS3SERJz40EOBdr2NQYcDEUt2NtEhdVfium21A4PFOdJlCejifGhJyzP
 n1QcyMXHnh/d4zecA6fcD0LVyxBgngeKEvdtOLZJ1ubxWwHhgWTN8R4HedoN2Nb9
 cLEUK8Td+9n2RVS8UED4BBI+6vfN3Y6Syjvy8qD3pCs4SBcu3k790mf47t2QhkEq
 +Ho06gdrGJdEcSDO8zVY7qjZX/GX/dbRHCb5CRokL5FmNWhXd/Y=
 =26wi
 -----END PGP SIGNATURE-----

Merge tag 'docs/v5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull documentation updates from Mauro Carvalho Chehab:
 "A series of patches addressing warnings produced by make htmldocs.
  This includes:

   - kernel-doc markup fixes

   - ReST fixes

   - Updates at the build system in order to support newer versions of
     the docs build toolchain (Sphinx)

  After this series, the number of html build warnings should reduce
  significantly, and building with Sphinx 3.1 or later should now be
  supported (although it is still recommended to use Sphinx 2.4.4).

  As agreed with Jon, I should be sending you a late pull request by the
  end of the merge window addressing remaining issues with docs build,
  as there are a number of warning fixes that depends on pull requests
  that should be happening along the merge window.

  The end goal is to have a clean htmldocs build on Kernel 5.10.

  PS. It should be noticed that Sphinx 3.0 is not currently supported,
  as it lacks support for C domain namespaces. Such feature, needed in
  order to document uAPI system calls with Sphinx 3.x, was added only on
  Sphinx 3.1"

* tag 'docs/v5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (75 commits)
  PM / devfreq: remove a duplicated kernel-doc markup
  mm/doc: fix a literal block markup
  workqueue: fix a kernel-doc warning
  docs: virt: user_mode_linux_howto_v2.rst: fix a literal block markup
  Input: sparse-keymap: add a description for @sw
  rcu/tree: docs: document bkvcache new members at struct kfree_rcu_cpu
  nl80211: docs: add a description for s1g_cap parameter
  usb: docs: document altmode register/unregister functions
  kunit: test.h: fix a bad kernel-doc markup
  drivers: core: fix kernel-doc markup for dev_err_probe()
  docs: bio: fix a kerneldoc markup
  kunit: test.h: solve kernel-doc warnings
  block: bio: fix a warning at the kernel-doc markups
  docs: powerpc: syscall64-abi.rst: fix a malformed table
  drivers: net: hamradio: fix document location
  net: appletalk: Kconfig: Fix docs location
  dt-bindings: fix references to files converted to yaml
  memblock: get rid of a :c:type leftover
  math64.h: kernel-docs: Convert some markups into normal comments
  media: uAPI: buffer.rst: remove a left-over documentation
  ...
2020-10-16 15:02:21 -07:00
Linus Torvalds
93f3d8f54a Tracing: Fix mismatch section of adding early trace events
- Fixes the issue of a mismatch section that was missed due to gcc
   inlining the offending function, while clang did not (and reported
   the issue).
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCX4oAlhQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qkyBAQDnmsIHN1BPMdWbSFJeRYobSYFKoSzh
 qwcqy0prvNeFvgD9GeC7wZeJaUWCt4zRbpZhuclIq2BNqoiA/llE67zXQwQ=
 =W1ca
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:
 "Fix mismatch section of adding early trace events.

  Fixes the issue of a mismatch section that was missed due to gcc
  inlining the offending function, while clang did not (and reported the
  issue)"

* tag 'trace-v5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Remove __init from __trace_early_add_new_event()
2020-10-16 14:56:52 -07:00
Linus Torvalds
8119c4332d Urgent printk fix for 5.10
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAl+JtNYACgkQUqAMR0iA
 lPLvaQ//Xo1gAe/ZZznTJ/hMeJQf0COAFeQ/oriqiHThu9rRxv9cfQ434JYIQaz4
 f1cKhnMNrdm6/AZynjKeLwoO+Ui9N3DUJ8Wf3V0r3W1mbWXrxSEhFE9AHc8BNWFZ
 vzlVOrHBfPt9RqFQ12k/IUcAElAiuCmceLJ3LsTLfVF2gc07cBsZdpsZLdPcb0fz
 Oo8Qu9afsVWUu9X/pepgKcNO0XbfDJM0cuFpmCrziRpToLvdaTEBihI7C0ktg9WU
 aNZz+mEuhX9mHGt9PvsHsuDsWj3gJJypkc7ccWRdWEbl0lR1HV40tr2j2WdtYtSb
 GvwD34ApxneoX87mWgaSSHWNfZ5B+SDbK1XQVswRkFmZhEqoAulGYd4Uktt+3Yxy
 4cGC1pzEzU5ekj6XcmUjTHxmTKRNju2qC1XTtSE0F7ozDTlMtPovqSMZAQoNNuQK
 +F1TBy49ikEVKB1V6xoGH/IepStO2OSee5JN6yJ+ZzLKPq9hqqtAC6CjA75NQNR0
 tB9e7EKUjGkwZohnCkPpVCA1BYzqWG8+II0Z5EZilZxJiNnh5CuJ4vr0wkIfnVkI
 BlvkoiPHiCBPhov8UJfnrxBASsSYt/EnzKfHisqQPs3/hN9ZgMbF+wbApwosEwNx
 l1fswzO0mBbb2NNJWydQzG305GjoxnkQjFOewOrCii4Z0ZrfMVo=
 =FqTN
 -----END PGP SIGNATURE-----

Merge tag 'printk-for-5.10-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux

Pull printk fix from Petr Mladek:
 "Prevent overflow in the new lockless ringbuffer"

* tag 'printk-for-5.10-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
  printk: ringbuffer: Wrong data pointer when appending small string
2020-10-16 12:52:37 -07:00
Linus Torvalds
49dc6fbce3 kgdb patches for 5.10-rc1
A fairly modest set of changes for this cycle. Of particular
 note are an earlycon fix from Doug Anderson and my own changes to get
 kgdb/kdb to honour the kprobe blocklist. The later creates a safety
 rail that strongly encourages developers not to place breakpoints in,
 for example, arch specific trap handling code.
 
 Also included are a couple of small fixes and tweaks: an API update,
 eliminate a coverity dead code warning, improved handling of search
 during multi-line printk and a couple of typo corrections.
 
 Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEELzVBU1D3lWq6cKzwfOMlXTn3iKEFAl+IdyAACgkQfOMlXTn3
 iKHeTw//RDAWm0IU00z9+ZlTyksTk0vePuYKwgEm8zp+XYvY0NvpgWyZ5MWd8b3K
 WJmsTfXMHNoPCg3464XCrQDyIhrfhxk0nrdOpgbsQMb7HdjYrnltPdG3l8W9kvVv
 MjMH98QOBaYAY75nd8pGoPVTOmODrhowWo6+y4me2CnJGKOOV/yHmctBhlOJhbeo
 TCUIDP/NmC63N8Oziteym1TZ5dhschBb/85qEb72wXaiGEZTaVC9GEFEgCqfADHX
 51KxbtZoJWirbXu2aYaK5MHEb/0NWPMItiER7y8ZrTiPHMRre4N5DpCMpKpp3/qd
 YRtEnNnT+Ay0ijCt2FjznSsEh2ecLI0qSO4QDQz320QJCj7qgcjJ0++yEayrzz8W
 IxCbwkUP8X5m1srXSxvOTKfuu29wiMCqNkJA0rgjpA2u4Yn5KO0ZRmBoHtW1Sq8E
 MhbRTixU/vFYosjd/mKubj/f4DFrMILo+FJTqdewBUhT/Q6Vr9l660JzvwWnKKJF
 e1EHNYtWo4J+EkL9z++5d9PzDl0d56DcE8rfH53Dkg075Wnma3tdq2Z7WxT3M7EP
 K3U32BI9obu+lPHxl4FtAobCIDjP6NtmmMo3zzzA1fPtXNzAjy7qZ+Ss6POQppkn
 7v+PFYdFJ8VKo3PNxMWnFhgwSDOYYxCPjCxs+bjaMBvHNVgg2Ig=
 =x91W
 -----END PGP SIGNATURE-----

Merge tag 'kgdb-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux

Pull kgdb updates from Daniel Thompson:
 "A fairly modest set of changes for this cycle.

  Of particular note are an earlycon fix from Doug Anderson and my own
  changes to get kgdb/kdb to honour the kprobe blocklist. The later
  creates a safety rail that strongly encourages developers not to place
  breakpoints in, for example, arch specific trap handling code.

  Also included are a couple of small fixes and tweaks: an API update,
  eliminate a coverity dead code warning, improved handling of search
  during multi-line printk and a couple of typo corrections"

* tag 'kgdb-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux:
  kdb: Fix pager search for multi-line strings
  kernel: debug: Centralize dbg_[de]activate_sw_breakpoints
  kgdb: Add NOKPROBE labels on the trap handler functions
  kgdb: Honour the kprobe blocklist when setting breakpoints
  kernel/debug: Fix spelling mistake in debug_core.c
  kdb: Use newer api for tasklist scanning
  kgdb: Make "kgdbcon" work properly with "kgdb_earlycon"
  kdb: remove unnecessary null check of dbg_io_ops
2020-10-16 12:47:18 -07:00
Linus Torvalds
c4cf498dc0 Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
 "155 patches.

  Subsystems affected by this patch series: mm (dax, debug, thp,
  readahead, page-poison, util, memory-hotplug, zram, cleanups), misc,
  core-kernel, get_maintainer, MAINTAINERS, lib, bitops, checkpatch,
  binfmt, ramfs, autofs, nilfs, rapidio, panic, relay, kgdb, ubsan,
  romfs, and fault-injection"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (155 commits)
  lib, uaccess: add failure injection to usercopy functions
  lib, include/linux: add usercopy failure capability
  ROMFS: support inode blocks calculation
  ubsan: introduce CONFIG_UBSAN_LOCAL_BOUNDS for Clang
  sched.h: drop in_ubsan field when UBSAN is in trap mode
  scripts/gdb/tasks: add headers and improve spacing format
  scripts/gdb/proc: add struct mount & struct super_block addr in lx-mounts command
  kernel/relay.c: drop unneeded initialization
  panic: dump registers on panic_on_warn
  rapidio: fix the missed put_device() for rio_mport_add_riodev
  rapidio: fix error handling path
  nilfs2: fix some kernel-doc warnings for nilfs2
  autofs: harden ioctl table
  ramfs: fix nommu mmap with gaps in the page cache
  mm: remove the now-unnecessary mmget_still_valid() hack
  mm/gup: take mmap_lock in get_dump_page()
  binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot
  coredump: rework elf/elf_fdpic vma_dump_size() into common helper
  coredump: refactor page range dumping into common helper
  coredump: let dump_emit() bail out on short writes
  ...
2020-10-16 11:31:55 -07:00
Sudip Mukherjee
ac05b7a1b4 kernel/relay.c: drop unneeded initialization
The variable 'consumed' is initialized with the consumed count but
immediately after that the consumed count is updated and assigned to
'consumed' again thus overwriting the previous value.  So, drop the
unneeded initialization.

Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Link: https://lkml.kernel.org/r/20201005205727.1147-1-sudipm.mukherjee@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16 11:11:22 -07:00
Alexey Kardashevskiy
3f388f2863 panic: dump registers on panic_on_warn
Currently we print stack and registers for ordinary warnings but we do not
for panic_on_warn which looks as oversight - panic() will reboot the
machine but won't print registers.

This moves printing of registers and modules earlier.

This does not move the stack dumping as panic() dumps it.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Douglas Anderson <dianders@chromium.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Link: https://lkml.kernel.org/r/20200804095054.68724-1-aik@ozlabs.ru
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16 11:11:22 -07:00
Randy Dunlap
b7621ebf8a kernel: acct.c: fix some kernel-doc nits
Fix kernel-doc notation to use the documented Returns: syntax and place
the function description for acct_process() on the first line where it
should be.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Link: https://lkml.kernel.org/r/b4c33e5d-98e8-0c47-77b6-ac1859f94d7f@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16 11:11:19 -07:00
Randy Dunlap
7b7b8a2c95 kernel/: fix repeated words in comments
Fix multiple occurrences of duplicated words in kernel/.

Fix one typo/spello on the same line as a duplicate word.  Change one
instance of "the the" to "that the".  Otherwise just drop one of the
repeated words.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: https://lkml.kernel.org/r/98202fa6-8919-ef63-9efe-c0fad5ca7af1@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16 11:11:19 -07:00
Liao Pingfang
15ec0fcff6 kernel/sys.c: replace do_brk with do_brk_flags in comment of prctl_set_mm_map()
Replace do_brk with do_brk_flags in comment of prctl_set_mm_map(), since
do_brk was removed in following commit.

Fixes: bb177a732c ("mm: do not bug_on on incorrect length in __mm_populate()")
Signed-off-by: Liao Pingfang <liao.pingfang@zte.com.cn>
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: https://lkml.kernel.org/r/1600650751-43127-1-git-send-email-wang.yi59@zte.com.cn
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16 11:11:19 -07:00
Andy Shevchenko
b296a6d533 kernel.h: split out min()/max() et al. helpers
kernel.h is being used as a dump for all kinds of stuff for a long time.
Here is the attempt to start cleaning it up by splitting out min()/max()
et al.  helpers.

At the same time convert users in header and lib folder to use new header.
Though for time being include new header back to kernel.h to avoid
twisted indirected includes for other existing users.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Joe Perches <joe@perches.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/20200910164152.GA1891694@smile.fi.intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16 11:11:19 -07:00
Miaohe Lin
73eb7f9a4f mm: use helper function put_write_access()
In commit 1da177e4c3 ("Linux-2.6.12-rc2"), the helper put_write_access()
came with the atomic_dec operation of the i_writecount field.  But it
forgot to use this helper in __vma_link_file() and dup_mmap().

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: https://lkml.kernel.org/r/20200924115235.5111-1-linmiaohe@huawei.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16 11:11:19 -07:00
David Hildenbrand
cb8e3c8b4f kernel/resource: make iomem_resource implicit in release_mem_region_adjustable()
"mem" in the name already indicates the root, similar to
release_mem_region() and devm_request_mem_region().  Make it implicit.
The only single caller always passes iomem_resource, other parents are not
applicable.

Suggested-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Link: https://lkml.kernel.org/r/20200916073041.10355-1-david@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16 11:11:18 -07:00
David Hildenbrand
9ca6551ee2 mm/memory_hotplug: MEMHP_MERGE_RESOURCE to specify merging of System RAM resources
Some add_memory*() users add memory in small, contiguous memory blocks.
Examples include virtio-mem, hyper-v balloon, and the XEN balloon.

This can quickly result in a lot of memory resources, whereby the actual
resource boundaries are not of interest (e.g., it might be relevant for
DIMMs, exposed via /proc/iomem to user space).  We really want to merge
added resources in this scenario where possible.

Let's provide a flag (MEMHP_MERGE_RESOURCE) to specify that a resource
either created within add_memory*() or passed via add_memory_resource()
shall be marked mergeable and merged with applicable siblings.

To implement that, we need a kernel/resource interface to mark selected
System RAM resources mergeable (IORESOURCE_SYSRAM_MERGEABLE) and trigger
merging.

Note: We really want to merge after the whole operation succeeded, not
directly when adding a resource to the resource tree (it would break
add_memory_resource() and require splitting resources again when the
operation failed - e.g., due to -ENOMEM).

Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: Julien Grall <julien@xen.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Leonardo Bras <leobras.c@gmail.com>
Cc: Libor Pechacek <lpechacek@suse.cz>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: "Oliver O'Halloran" <oohall@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Pingfan Liu <kernelfans@gmail.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Link: https://lkml.kernel.org/r/20200911103459.10306-6-david@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16 11:11:18 -07:00
David Hildenbrand
7cf603d17d kernel/resource: move and rename IORESOURCE_MEM_DRIVER_MANAGED
IORESOURCE_MEM_DRIVER_MANAGED currently uses an unused PnP bit, which is
always set to 0 by hardware.  This is far from beautiful (and confusing),
and the bit only applies to SYSRAM.  So let's move it out of the
bus-specific (PnP) defined bits.

We'll add another SYSRAM specific bit soon.  If we ever need more bits for
other purposes, we can steal some from "desc", or reshuffle/regroup what
we have.

Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Julien Grall <julien@xen.org>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Leonardo Bras <leobras.c@gmail.com>
Cc: Libor Pechacek <lpechacek@suse.cz>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: "Oliver O'Halloran" <oohall@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Pingfan Liu <kernelfans@gmail.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Wei Liu <wei.liu@kernel.org>
Link: https://lkml.kernel.org/r/20200911103459.10306-3-david@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16 11:11:18 -07:00
David Hildenbrand
ec62d04e3f kernel/resource: make release_mem_region_adjustable() never fail
Patch series "selective merging of system ram resources", v4.

Some add_memory*() users add memory in small, contiguous memory blocks.
Examples include virtio-mem, hyper-v balloon, and the XEN balloon.

This can quickly result in a lot of memory resources, whereby the actual
resource boundaries are not of interest (e.g., it might be relevant for
DIMMs, exposed via /proc/iomem to user space).  We really want to merge
added resources in this scenario where possible.

Resources are effectively stored in a list-based tree.  Having a lot of
resources not only wastes memory, it also makes traversing that tree more
expensive, and makes /proc/iomem explode in size (e.g., requiring
kexec-tools to manually merge resources when creating a kdump header.  The
current kexec-tools resource count limit does not allow for more than
~100GB of memory with a memory block size of 128MB on x86-64).

Let's allow to selectively merge system ram resources by specifying a new
flag for add_memory*().  Patch #5 contains a /proc/iomem example.  Only
tested with virtio-mem.

This patch (of 8):

Let's make sure splitting a resource on memory hotunplug will never fail.
This will become more relevant once we merge selected System RAM resources
- then, we'll trigger that case more often on memory hotunplug.

In general, this function is already unlikely to fail.  When we remove
memory, we free up quite a lot of metadata (memmap, page tables, memory
block device, etc.).  The only reason it could really fail would be when
injecting allocation errors.

All other error cases inside release_mem_region_adjustable() seem to be
sanity checks if the function would be abused in different context - let's
add WARN_ON_ONCE() in these cases so we can catch them.

[natechancellor@gmail.com: fix use of ternary condition in release_mem_region_adjustable]
  Link: https://lkml.kernel.org/r/20200922060748.2452056-1-natechancellor@gmail.com
  Link: https://github.com/ClangBuiltLinux/linux/issues/1159

Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Julien Grall <julien@xen.org>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Leonardo Bras <leobras.c@gmail.com>
Cc: Libor Pechacek <lpechacek@suse.cz>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: "Oliver O'Halloran" <oohall@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Pingfan Liu <kernelfans@gmail.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Roger Pau Monn <roger.pau@citrix.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Wei Liu <wei.liu@kernel.org>
Link: https://lkml.kernel.org/r/20200911103459.10306-2-david@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16 11:11:17 -07:00
Masami Hiramatsu
ce66f61364 tracing: Remove __init from __trace_early_add_new_event()
The commit 720dee53ad ("tracing/boot: Initialize per-instance event
list in early boot") removes __init from __trace_early_add_events()
but __trace_early_add_new_event() still has __init and will cause a
section mismatch.

Remove __init from __trace_early_add_new_event() as same as
__trace_early_add_events().

Link: https://lore.kernel.org/lkml/CAHk-=wjU86UhovK4XuwvCqTOfc+nvtpAuaN2PJBz15z=w=u0Xg@mail.gmail.com/

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-10-16 09:43:36 -04:00
Mauro Carvalho Chehab
3eb6b31bfb workqueue: fix a kernel-doc warning
As warned by Sphinx:

	./Documentation/core-api/workqueue:400: ./kernel/workqueue.c:1218: WARNING: Unexpected indentation.

the return code table is currently not recognized, as it lacks
markups.

Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-10-16 07:28:20 +02:00
Linus Torvalds
9ff9b0d392 networking changes for the 5.10 merge window
Add redirect_neigh() BPF packet redirect helper, allowing to limit stack
 traversal in common container configs and improving TCP back-pressure.
 Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain.
 
 Expand netlink policy support and improve policy export to user space.
 (Ge)netlink core performs request validation according to declared
 policies. Expand the expressiveness of those policies (min/max length
 and bitmasks). Allow dumping policies for particular commands.
 This is used for feature discovery by user space (instead of kernel
 version parsing or trial and error).
 
 Support IGMPv3/MLDv2 multicast listener discovery protocols in bridge.
 
 Allow more than 255 IPv4 multicast interfaces.
 
 Add support for Type of Service (ToS) reflection in SYN/SYN-ACK
 packets of TCPv6.
 
 In Multi-patch TCP (MPTCP) support concurrent transmission of data
 on multiple subflows in a load balancing scenario. Enhance advertising
 addresses via the RM_ADDR/ADD_ADDR options.
 
 Support SMC-Dv2 version of SMC, which enables multi-subnet deployments.
 
 Allow more calls to same peer in RxRPC.
 
 Support two new Controller Area Network (CAN) protocols -
 CAN-FD and ISO 15765-2:2016.
 
 Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit
 kernel problem.
 
 Add TC actions for implementing MPLS L2 VPNs.
 
 Improve nexthop code - e.g. handle various corner cases when nexthop
 objects are removed from groups better, skip unnecessary notifications
 and make it easier to offload nexthops into HW by converting
 to a blocking notifier.
 
 Support adding and consuming TCP header options by BPF programs,
 opening the doors for easy experimental and deployment-specific
 TCP option use.
 
 Reorganize TCP congestion control (CC) initialization to simplify life
 of TCP CC implemented in BPF.
 
 Add support for shipping BPF programs with the kernel and loading them
 early on boot via the User Mode Driver mechanism, hence reusing all the
 user space infra we have.
 
 Support sleepable BPF programs, initially targeting LSM and tracing.
 
 Add bpf_d_path() helper for returning full path for given 'struct path'.
 
 Make bpf_tail_call compatible with bpf-to-bpf calls.
 
 Allow BPF programs to call map_update_elem on sockmaps.
 
 Add BPF Type Format (BTF) support for type and enum discovery, as
 well as support for using BTF within the kernel itself (current use
 is for pretty printing structures).
 
 Support listing and getting information about bpf_links via the bpf
 syscall.
 
 Enhance kernel interfaces around NIC firmware update. Allow specifying
 overwrite mask to control if settings etc. are reset during update;
 report expected max time operation may take to users; support firmware
 activation without machine reboot incl. limits of how much impact
 reset may have (e.g. dropping link or not).
 
 Extend ethtool configuration interface to report IEEE-standard
 counters, to limit the need for per-vendor logic in user space.
 
 Adopt or extend devlink use for debug, monitoring, fw update
 in many drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw,
 mv88e6xxx, dpaa2-eth).
 
 In mlxsw expose critical and emergency SFP module temperature alarms.
 Refactor port buffer handling to make the defaults more suitable and
 support setting these values explicitly via the DCBNL interface.
 
 Add XDP support for Intel's igb driver.
 
 Support offloading TC flower classification and filtering rules to
 mscc_ocelot switches.
 
 Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as
 fixed interval period pulse generator and one-step timestamping in
 dpaa-eth.
 
 Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3)
 offload.
 
 Add Lynx PHY/PCS MDIO module, and convert various drivers which have
 this HW to use it. Convert mvpp2 to split PCS.
 
 Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as
 7-port Mediatek MT7531 IP.
 
 Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver,
 and wcn3680 support in wcn36xx.
 
 Improve performance for packets which don't require much offloads
 on recent Mellanox NICs by 20% by making multiple packets share
 a descriptor entry.
 
 Move chelsio inline crypto drivers (for TLS and IPsec) from the crypto
 subtree to drivers/net. Move MDIO drivers out of the phy directory.
 
 Clean up a lot of W=1 warnings, reportedly the actively developed
 subsections of networking drivers should now build W=1 warning free.
 
 Make sure drivers don't use in_interrupt() to dynamically adapt their
 code. Convert tasklets to use new tasklet_setup API (sadly this
 conversion is not yet complete).
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAl+ItRwACgkQMUZtbf5S
 IrtTMg//UxpdR/MirT1DatBU0K/UGAZY82hV7F/UC8tPgjfHZeHvWlDFxfi3YP81
 PtPKbhRZ7DhwBXefUp6nY3UdvjftrJK2lJm8prJUPSsZRye8Wlcb7y65q7/P2y2U
 Efucyopg6RUrmrM0DUsIGYGJgylQLHnMYUl/keCsD4t5Bp4ksyi9R2t5eitGoWzh
 r3QGdbSa0AuWx4iu0i+tqp6Tj0ekMBMXLVb35dtU1t0joj2KTNEnSgABN3prOa8E
 iWYf2erOau68Ogp3yU3miCy0ZU4p/7qGHTtzbcp677692P/ekak6+zmfHLT9/Pjy
 2Stq2z6GoKuVxdktr91D9pA3jxG4LxSJmr0TImcGnXbvkMP3Ez3g9RrpV5fn8j6F
 mZCH8TKZAoD5aJrAJAMkhZmLYE1pvDa7KolSk8WogXrbCnTEb5Nv8FHTS1Qnk3yl
 wSKXuvutFVNLMEHCnWQLtODbTST9DI/aOi6EctPpuOA/ZyL1v3pl+gfp37S+LUTe
 owMnT/7TdvKaTD0+gIyU53M6rAWTtr5YyRQorX9awIu/4Ha0F0gYD7BJZQUGtegp
 HzKt59NiSrFdbSH7UdyemdBF4LuCgIhS7rgfeoUXMXmuPHq7eHXyHZt5dzPPa/xP
 81P0MAvdpFVwg8ij2yp2sHS7sISIRKq17fd1tIewUabxQbjXqPc=
 =bc1U
 -----END PGP SIGNATURE-----

Merge tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:

 - Add redirect_neigh() BPF packet redirect helper, allowing to limit
   stack traversal in common container configs and improving TCP
   back-pressure.

   Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain.

 - Expand netlink policy support and improve policy export to user
   space. (Ge)netlink core performs request validation according to
   declared policies. Expand the expressiveness of those policies
   (min/max length and bitmasks). Allow dumping policies for particular
   commands. This is used for feature discovery by user space (instead
   of kernel version parsing or trial and error).

 - Support IGMPv3/MLDv2 multicast listener discovery protocols in
   bridge.

 - Allow more than 255 IPv4 multicast interfaces.

 - Add support for Type of Service (ToS) reflection in SYN/SYN-ACK
   packets of TCPv6.

 - In Multi-patch TCP (MPTCP) support concurrent transmission of data on
   multiple subflows in a load balancing scenario. Enhance advertising
   addresses via the RM_ADDR/ADD_ADDR options.

 - Support SMC-Dv2 version of SMC, which enables multi-subnet
   deployments.

 - Allow more calls to same peer in RxRPC.

 - Support two new Controller Area Network (CAN) protocols - CAN-FD and
   ISO 15765-2:2016.

 - Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit
   kernel problem.

 - Add TC actions for implementing MPLS L2 VPNs.

 - Improve nexthop code - e.g. handle various corner cases when nexthop
   objects are removed from groups better, skip unnecessary
   notifications and make it easier to offload nexthops into HW by
   converting to a blocking notifier.

 - Support adding and consuming TCP header options by BPF programs,
   opening the doors for easy experimental and deployment-specific TCP
   option use.

 - Reorganize TCP congestion control (CC) initialization to simplify
   life of TCP CC implemented in BPF.

 - Add support for shipping BPF programs with the kernel and loading
   them early on boot via the User Mode Driver mechanism, hence reusing
   all the user space infra we have.

 - Support sleepable BPF programs, initially targeting LSM and tracing.

 - Add bpf_d_path() helper for returning full path for given 'struct
   path'.

 - Make bpf_tail_call compatible with bpf-to-bpf calls.

 - Allow BPF programs to call map_update_elem on sockmaps.

 - Add BPF Type Format (BTF) support for type and enum discovery, as
   well as support for using BTF within the kernel itself (current use
   is for pretty printing structures).

 - Support listing and getting information about bpf_links via the bpf
   syscall.

 - Enhance kernel interfaces around NIC firmware update. Allow
   specifying overwrite mask to control if settings etc. are reset
   during update; report expected max time operation may take to users;
   support firmware activation without machine reboot incl. limits of
   how much impact reset may have (e.g. dropping link or not).

 - Extend ethtool configuration interface to report IEEE-standard
   counters, to limit the need for per-vendor logic in user space.

 - Adopt or extend devlink use for debug, monitoring, fw update in many
   drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw, mv88e6xxx,
   dpaa2-eth).

 - In mlxsw expose critical and emergency SFP module temperature alarms.
   Refactor port buffer handling to make the defaults more suitable and
   support setting these values explicitly via the DCBNL interface.

 - Add XDP support for Intel's igb driver.

 - Support offloading TC flower classification and filtering rules to
   mscc_ocelot switches.

 - Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as
   fixed interval period pulse generator and one-step timestamping in
   dpaa-eth.

 - Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3)
   offload.

 - Add Lynx PHY/PCS MDIO module, and convert various drivers which have
   this HW to use it. Convert mvpp2 to split PCS.

 - Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as
   7-port Mediatek MT7531 IP.

 - Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver,
   and wcn3680 support in wcn36xx.

 - Improve performance for packets which don't require much offloads on
   recent Mellanox NICs by 20% by making multiple packets share a
   descriptor entry.

 - Move chelsio inline crypto drivers (for TLS and IPsec) from the
   crypto subtree to drivers/net. Move MDIO drivers out of the phy
   directory.

 - Clean up a lot of W=1 warnings, reportedly the actively developed
   subsections of networking drivers should now build W=1 warning free.

 - Make sure drivers don't use in_interrupt() to dynamically adapt their
   code. Convert tasklets to use new tasklet_setup API (sadly this
   conversion is not yet complete).

* tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2583 commits)
  Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH"
  net, sockmap: Don't call bpf_prog_put() on NULL pointer
  bpf, selftest: Fix flaky tcp_hdr_options test when adding addr to lo
  bpf, sockmap: Add locking annotations to iterator
  netfilter: nftables: allow re-computing sctp CRC-32C in 'payload' statements
  net: fix pos incrementment in ipv6_route_seq_next
  net/smc: fix invalid return code in smcd_new_buf_create()
  net/smc: fix valid DMBE buffer sizes
  net/smc: fix use-after-free of delayed events
  bpfilter: Fix build error with CONFIG_BPFILTER_UMH
  cxgb4/ch_ipsec: Replace the module name to ch_ipsec from chcr
  net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info
  bpf: Fix register equivalence tracking.
  rxrpc: Fix loss of final ack on shutdown
  rxrpc: Fix bundle counting for exclusive connections
  netfilter: restore NF_INET_NUMHOOKS
  ibmveth: Identify ingress large send packets.
  ibmveth: Switch order of ibmveth_helper calls.
  cxgb4: handle 4-tuple PEDIT to NAT mode translation
  selftests: Add VRF route leaking tests
  ...
2020-10-15 18:42:13 -07:00
Linus Torvalds
fefa636d81 Updates for tracing and bootconfig:
- Add support for "bool" type in synthetic events
 
 - Add per instance tracing for bootconfig
 
 - Support perf-style return probe ("SYMBOL%return") in kprobes and uprobes
 
 - Allow for kprobes to be enabled earlier in boot up
 
 - Added tracepoint helper function to allow testing if tracepoints are
   enabled in headers
 
 - Synthetic events can now have dynamic strings (variable length)
 
 - Various fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCX4iMDRQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qrMPAP0UAfOeQcYxBAw9y8oX7oJnBBylLFTR
 CICOVEhBYC/xIQD/edVPEUt77ozM/Bplwv8BiO4QxFjgZFqtpZI8mskIfAo=
 =sbny
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "Updates for tracing and bootconfig:

   - Add support for "bool" type in synthetic events

   - Add per instance tracing for bootconfig

   - Support perf-style return probe ("SYMBOL%return") in kprobes and
     uprobes

   - Allow for kprobes to be enabled earlier in boot up

   - Added tracepoint helper function to allow testing if tracepoints
     are enabled in headers

   - Synthetic events can now have dynamic strings (variable length)

   - Various fixes and cleanups"

* tag 'trace-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (58 commits)
  tracing: support "bool" type in synthetic trace events
  selftests/ftrace: Add test case for synthetic event syntax errors
  tracing: Handle synthetic event array field type checking correctly
  selftests/ftrace: Change synthetic event name for inter-event-combined test
  tracing: Add synthetic event error logging
  tracing: Check that the synthetic event and field names are legal
  tracing: Move is_good_name() from trace_probe.h to trace.h
  tracing: Don't show dynamic string internals in synthetic event description
  tracing: Fix some typos in comments
  tracing/boot: Add ftrace.instance.*.alloc_snapshot option
  tracing: Fix race in trace_open and buffer resize call
  tracing: Check return value of __create_val_fields() before using its result
  tracing: Fix synthetic print fmt check for use of __get_str()
  tracing: Remove a pointless assignment
  ftrace: ftrace_global_list is renamed to ftrace_ops_list
  ftrace: Format variable declarations of ftrace_allocate_records
  ftrace: Simplify the calculation of page number for ftrace_page->records
  ftrace: Simplify the dyn_ftrace->flags macro
  ftrace: Simplify the hash calculation
  ftrace: Use fls() to get the bits for dup_hash()
  ...
2020-10-15 15:51:28 -07:00
Linus Torvalds
bbf6259903 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial updates from Jiri Kosina:
 "The latest advances in computer science from the trivial queue"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
  xtensa: fix Kconfig typo
  spelling.txt: Remove some duplicate entries
  mtd: rawnand: oxnas: cleanup/simplify code
  selftests: vm: add fragment CONFIG_GUP_BENCHMARK
  perf: Fix opt help text for --no-bpf-event
  HID: logitech-dj: Fix spelling in comment
  bootconfig: Fix kernel message mentioning CONFIG_BOOT_CONFIG
  MAINTAINERS: rectify MMP SUPPORT after moving cputype.h
  scif: Fix spelling of EACCES
  printk: fix global comment
  lib/bitmap.c: fix spello
  fs: Fix missing 'bit' in comment
2020-10-15 15:11:56 -07:00
Linus Torvalds
5a32c3413d dma-mapping updates for 5.10
- rework the non-coherent DMA allocator
  - move private definitions out of <linux/dma-mapping.h>
  - lower CMA_ALIGNMENT (Paul Cercueil)
  - remove the omap1 dma address translation in favor of the common
    code
  - make dma-direct aware of multiple dma offset ranges (Jim Quinlan)
  - support per-node DMA CMA areas (Barry Song)
  - increase the default seg boundary limit (Nicolin Chen)
  - misc fixes (Robin Murphy, Thomas Tai, Xu Wang)
  - various cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAl+IiPwLHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYPKEQ//TM8vxjucnRl/pklpMin49dJorwiVvROLhQqLmdxw
 286ZKpVzYYAPc7LnNqwIBugnFZiXuHu8xPKQkIiOa2OtNDTwhKNoBxOAmOJaV6DD
 8JfEtZYeX5mKJ/Nqd2iSkIqOvCwZ9Wzii+aytJ2U88wezQr1fnyF4X49MegETEey
 FHWreSaRWZKa0MMRu9AQ0QxmoNTHAQUNaPc0PeqEtPULybfkGOGw4/ghSB7WcKrA
 gtKTuooNOSpVEHkTas2TMpcBp6lxtOjFqKzVN0ml+/nqq5NeTSDx91VOCX/6Cj76
 mXIg+s7fbACTk/BmkkwAkd0QEw4fo4tyD6Bep/5QNhvEoAriTuSRbhvLdOwFz0EF
 vhkF0Rer6umdhSK7nPd7SBqn8kAnP4vBbdmB68+nc3lmkqysLyE4VkgkdH/IYYQI
 6TJ0oilXWFmU6DT5Rm4FBqCvfcEfU2dUIHJr5wZHqrF2kLzoZ+mpg42fADoG4GuI
 D/oOsz7soeaRe3eYfWybC0omGR6YYPozZJ9lsfftcElmwSsFrmPsbO1DM5IBkj1B
 gItmEbOB9ZK3RhIK55T/3u1UWY3Uc/RVr+kchWvADGrWnRQnW0kxYIqDgiOytLFi
 JZNH8uHpJIwzoJAv6XXSPyEUBwXTG+zK37Ce769HGbUEaUrE71MxBbQAQsK8mDpg
 7fM=
 =Bkf/
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-5.10' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping updates from Christoph Hellwig:

 - rework the non-coherent DMA allocator

 - move private definitions out of <linux/dma-mapping.h>

 - lower CMA_ALIGNMENT (Paul Cercueil)

 - remove the omap1 dma address translation in favor of the common code

 - make dma-direct aware of multiple dma offset ranges (Jim Quinlan)

 - support per-node DMA CMA areas (Barry Song)

 - increase the default seg boundary limit (Nicolin Chen)

 - misc fixes (Robin Murphy, Thomas Tai, Xu Wang)

 - various cleanups

* tag 'dma-mapping-5.10' of git://git.infradead.org/users/hch/dma-mapping: (63 commits)
  ARM/ixp4xx: add a missing include of dma-map-ops.h
  dma-direct: simplify the DMA_ATTR_NO_KERNEL_MAPPING handling
  dma-direct: factor out a dma_direct_alloc_from_pool helper
  dma-direct check for highmem pages in dma_direct_alloc_pages
  dma-mapping: merge <linux/dma-noncoherent.h> into <linux/dma-map-ops.h>
  dma-mapping: move large parts of <linux/dma-direct.h> to kernel/dma
  dma-mapping: move dma-debug.h to kernel/dma/
  dma-mapping: remove <asm/dma-contiguous.h>
  dma-mapping: merge <linux/dma-contiguous.h> into <linux/dma-map-ops.h>
  dma-contiguous: remove dma_contiguous_set_default
  dma-contiguous: remove dev_set_cma_area
  dma-contiguous: remove dma_declare_contiguous
  dma-mapping: split <linux/dma-mapping.h>
  cma: decrease CMA_ALIGNMENT lower limit to 2
  firewire-ohci: use dma_alloc_pages
  dma-iommu: implement ->alloc_noncoherent
  dma-mapping: add new {alloc,free}_noncoherent dma_map_ops methods
  dma-mapping: add a new dma_alloc_pages API
  dma-mapping: remove dma_cache_sync
  53c700: convert to dma_alloc_noncoherent
  ...
2020-10-15 14:43:29 -07:00
Marc Zyngier
151a535171 genirq: Let GENERIC_IRQ_IPI select IRQ_DOMAIN_HIERARCHY
kernel/irq/ipi.c otherwise fails to compile if nothing else
selects it.

Fixes: 379b656446 ("genirq: Add GENERIC_IRQ_IPI Kconfig symbol")
Reported-by: Pavel Machek <pavel@ucw.cz>
Tested-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201015101222.GA32747@amd
2020-10-15 21:41:44 +01:00
Linus Torvalds
93b694d096 drm next for 5.10-rc1
New driver:
 Cadence MHDP8546 DisplayPort bridge driver
 
 core:
 - cross-driver scatterlist cleanups
 - devm_drm conversions
 - remove drm_dev_init
 - devm_drm_dev_alloc conversion
 
 ttm:
 - lots of refactoring and cleanups
 
 bridges:
 - chained bridge support in more drivers
 
 panel:
 - misc new panels
 
 scheduler:
 - cleanup priority levels
 
 displayport:
 - refactor i915 code into helpers for nouveau
 
 i915:
 - split into display and GT trees
 - WW locking refactoring in GEM
 - execbuf2 extension mechanism
 - syncobj timeline support
 - GEN 12 HOBL display powersaving
 - Rocket Lake display additions
 - Disable FBC on Tigerlake
 - Tigerlake Type-C + DP improvements
 - Hotplug interrupt refactoring
 
 amdgpu:
 - Sienna Cichlid updates
 - Navy Flounder updates
 - DCE6 (SI) support for DC
 - Plane rotation enabled
 - TMZ state info ioctl
 - PCIe DPC recovery support
 - DC interrupt handling refactor
 - OLED panel fixes
 
 amdkfd:
 - add SMI events for thermal throttling
 - SMI interface events ioctl update
 - process eviction counters
 
 radeon:
 - move to dma_ for allocations
 - expose sclk via sysfs
 
 msm:
 - DSI support for sm8150/sm8250
 - per-process GPU pagetable support
 - Displayport support
 
 mediatek:
 - move HDMI phy driver to PHY
 - convert mtk-dpi to bridge API
 - disable mt2701 tmds
 
 tegra:
 - bridge support
 
 exynos:
 - misc cleanups
 
 vc4:
 - dual display cleanups
 
 ast:
 - cleanups
 
 gma500:
 - conversion to GPIOd API
 
 hisilicon:
 - misc reworks
 
 ingenic:
 - clock handling and format improvements
 
 mcde:
 - DSI support
 
 mgag200:
 - desktop g200 support
 
 mxsfb:
 - i.MX7 + i.MX8M
 - alpha plane support
 
 panfrost:
 - devfreq support
 - amlogic SoC support
 
 ps8640:
 - EDID from eDP retrieval
 
 tidss:
 - AM65xx YUV workaround
 
 virtio:
 - virtio-gpu exported resources
 
 rcar-du:
 - R8A7742, R8A774E1 and R8A77961 support
 - YUV planar format fixes
 - non-visible plane handling
 - VSP device reference count fix
 - Kconfig fix to avoid displaying disabled options in .config
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJfh579AAoJEAx081l5xIa+GqoP/0amz+ZN7y/L7+f32CRinJ7/
 3e4xjXNDmtWG4Whe/WKjlYmbAcvSdWV/4HYpurW2BFJnOAB/5lIqYcS/PyqErPzA
 w4EpRoJ+ZdFgmlDH0vdsDwPLT/HFmhUN9AopNkoZpbSMxrManSj5QgmePXyiKReP
 Q+ZAK5UW5AdOVY4bgXUSEkVq2eilCLXf+bSBR/LrVQuNgu7GULX8SIy/Y1CuMtv8
 LgzzjLKfIZaIWC+F/RU7BxJ7YnrVq7z7yXnUx8j2416+k/Wwe+BeSUCSZstT7q9G
 UkX8jWfR7ZKqhwP+UQeSwDbHkALz7lv88nyjQdxJZ3SrXRe4hy14YjxnR4maeNAj
 3TAYSdcAMWyRHqeEZIZ7Hj5sQtTq5OZAoIjxzH3vpVdAnnAkcWoF77pqxV8XPqTC
 nw40DihAxQOshGwMkjd5DqkEwnMv43Hs1WTVYu9dPTOfOdqPNt+Vqp7Xl9Z46+kV
 k6PDcx60T9ayDW1QZ6MoIXHta9E7ixzu7gYBL3vP4LuporY0uNG3bzF3CMvof1BK
 sHYcYTdZkqbTD2d6rHV+TbpPQXgTtlej9qVlQM4SeX37Xtc7LxCYpnpUHKz2S/fK
 1vyeGPgdytHblwlxwZOPZ4R2I/HTfnITdr4kMcJHhxAsEewfW1Rd4+stQqVJ2Mph
 Vz+CFP2BngivGFz5vuky
 =4H8J
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2020-10-15' of git://anongit.freedesktop.org/drm/drm

Pull drm updates from Dave Airlie:
 "Not a major amount of change, the i915 trees got split into display
  and gt trees to better facilitate higher level review, and there's a
  major refactoring of i915 GEM locking to use more core kernel concepts
  (like ww-mutexes). msm gets per-process pagetables, older AMD SI cards
  get DC support, nouveau got a bump in displayport support with common
  code extraction from i915.

  Outside of drm this contains a couple of patches for hexint
  moduleparams which you've acked, and a virtio common code tree that
  you should also get via it's regular path.

  New driver:
   - Cadence MHDP8546 DisplayPort bridge driver

  core:
   - cross-driver scatterlist cleanups
   - devm_drm conversions
   - remove drm_dev_init
   - devm_drm_dev_alloc conversion

  ttm:
   - lots of refactoring and cleanups

  bridges:
   - chained bridge support in more drivers

  panel:
   - misc new panels

  scheduler:
   - cleanup priority levels

  displayport:
   - refactor i915 code into helpers for nouveau

  i915:
   - split into display and GT trees
   - WW locking refactoring in GEM
   - execbuf2 extension mechanism
   - syncobj timeline support
   - GEN 12 HOBL display powersaving
   - Rocket Lake display additions
   - Disable FBC on Tigerlake
   - Tigerlake Type-C + DP improvements
   - Hotplug interrupt refactoring

  amdgpu:
   - Sienna Cichlid updates
   - Navy Flounder updates
   - DCE6 (SI) support for DC
   - Plane rotation enabled
   - TMZ state info ioctl
   - PCIe DPC recovery support
   - DC interrupt handling refactor
   - OLED panel fixes

  amdkfd:
   - add SMI events for thermal throttling
   - SMI interface events ioctl update
   - process eviction counters

  radeon:
   - move to dma_ for allocations
   - expose sclk via sysfs

  msm:
   - DSI support for sm8150/sm8250
   - per-process GPU pagetable support
   - Displayport support

  mediatek:
   - move HDMI phy driver to PHY
   - convert mtk-dpi to bridge API
   - disable mt2701 tmds

  tegra:
   - bridge support

  exynos:
   - misc cleanups

  vc4:
   - dual display cleanups

  ast:
   - cleanups

  gma500:
   - conversion to GPIOd API

  hisilicon:
   - misc reworks

  ingenic:
   - clock handling and format improvements

  mcde:
   - DSI support

  mgag200:
   - desktop g200 support

  mxsfb:
   - i.MX7 + i.MX8M
   - alpha plane support

  panfrost:
   - devfreq support
   - amlogic SoC support

  ps8640:
   - EDID from eDP retrieval

  tidss:
   - AM65xx YUV workaround

  virtio:
   - virtio-gpu exported resources

  rcar-du:
   - R8A7742, R8A774E1 and R8A77961 support
   - YUV planar format fixes
   - non-visible plane handling
   - VSP device reference count fix
   - Kconfig fix to avoid displaying disabled options in .config"

* tag 'drm-next-2020-10-15' of git://anongit.freedesktop.org/drm/drm: (1494 commits)
  drm/ingenic: Fix bad revert
  drm/amdgpu: Fix invalid number of character '{' in amdgpu_acpi_init
  drm/amdgpu: Remove warning for virtual_display
  drm/amdgpu: kfd_initialized can be static
  drm/amd/pm: setup APU dpm clock table in SMU HW initialization
  drm/amdgpu: prevent spurious warning
  drm/amdgpu/swsmu: fix ARC build errors
  drm/amd/display: Fix OPTC_DATA_FORMAT programming
  drm/amd/display: Don't allow pstate if no support in blank
  drm/panfrost: increase readl_relaxed_poll_timeout values
  MAINTAINERS: Update entry for st7703 driver after the rename
  Revert "gpu/drm: ingenic: Add option to mmap GEM buffers cached"
  drm/amd/display: HDMI remote sink need mode validation for Linux
  drm/amd/display: Change to correct unit on audio rate
  drm/amd/display: Avoid set zero in the requested clk
  drm/amdgpu: align frag_end to covered address space
  drm/amdgpu: fix NULL pointer dereference for Renoir
  drm/vmwgfx: fix regression in thp code due to ttm init refactor.
  drm/amdgpu/swsmu: add interrupt work handler for smu11 parts
  drm/amdgpu/swsmu: add interrupt work function
  ...
2020-10-15 10:46:16 -07:00
Linus Torvalds
726eb70e0d Char/Misc driver patches for 5.10-rc1
Here is the big set of char, misc, and other assorted driver subsystem
 patches for 5.10-rc1.
 
 There's a lot of different things in here, all over the drivers/
 directory.  Some summaries:
 	- soundwire driver updates
 	- habanalabs driver updates
 	- extcon driver updates
 	- nitro_enclaves new driver
 	- fsl-mc driver and core updates
 	- mhi core and bus updates
 	- nvmem driver updates
 	- eeprom driver updates
 	- binder driver updates and fixes
 	- vbox minor bugfixes
 	- fsi driver updates
 	- w1 driver updates
 	- coresight driver updates
 	- interconnect driver updates
 	- misc driver updates
 	- other minor driver updates
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCX4g8YQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+yngKgCeNpArCP/9vQJRK9upnDm8ZLunSCUAn1wUT/2A
 /bTQ42c/WRQ+LU828GSM
 =6sO2
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver updates from Greg KH:
 "Here is the big set of char, misc, and other assorted driver subsystem
  patches for 5.10-rc1.

  There's a lot of different things in here, all over the drivers/
  directory. Some summaries:

   - soundwire driver updates

   - habanalabs driver updates

   - extcon driver updates

   - nitro_enclaves new driver

   - fsl-mc driver and core updates

   - mhi core and bus updates

   - nvmem driver updates

   - eeprom driver updates

   - binder driver updates and fixes

   - vbox minor bugfixes

   - fsi driver updates

   - w1 driver updates

   - coresight driver updates

   - interconnect driver updates

   - misc driver updates

   - other minor driver updates

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'char-misc-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (396 commits)
  binder: fix UAF when releasing todo list
  docs: w1: w1_therm: Fix broken xref, mistakes, clarify text
  misc: Kconfig: fix a HISI_HIKEY_USB dependency
  LSM: Fix type of id parameter in kernel_post_load_data prototype
  misc: Kconfig: add a new dependency for HISI_HIKEY_USB
  firmware_loader: fix a kernel-doc markup
  w1: w1_therm: make w1_poll_completion static
  binder: simplify the return expression of binder_mmap
  test_firmware: Test partial read support
  firmware: Add request_partial_firmware_into_buf()
  firmware: Store opt_flags in fw_priv
  fs/kernel_file_read: Add "offset" arg for partial reads
  IMA: Add support for file reads without contents
  LSM: Add "contents" flag to kernel_read_file hook
  module: Call security_kernel_post_load_data()
  firmware_loader: Use security_post_load_data()
  LSM: Introduce kernel_post_load_data() hook
  fs/kernel_read_file: Add file_size output argument
  fs/kernel_read_file: Switch buffer size arg to size_t
  fs/kernel_read_file: Remove redundant size argument
  ...
2020-10-15 10:01:51 -07:00
Axel Rasmussen
6107742d15 tracing: support "bool" type in synthetic trace events
It's common [1] to define tracepoint fields as "bool" when they contain
a true / false value. Currently, defining a synthetic event with a
"bool" field yields EINVAL. It's possible to work around this by using
e.g. u8 (assuming sizeof(bool) is 1, and bool is unsigned; if either of
these properties don't match, you get EINVAL [2]).

Supporting "bool" explicitly makes hooking this up easier and more
portable for userspace.

[1]: grep -r "bool" include/trace/events/
[2]: check_synth_field() in kernel/trace/trace_events_hist.c

Link: https://lkml.kernel.org/r/20201009220524.485102-2-axelrasmussen@google.com

Acked-by: Michel Lespinasse <walken@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-10-15 12:01:14 -04:00
Tom Zanussi
10819e2579 tracing: Handle synthetic event array field type checking correctly
Since synthetic event array types are derived from the field name,
there may be a semicolon at the end of the type which should be
stripped off.

If there are more characters following that, normal type string
checking will result in an invalid type.

Without this patch, you can end up with an invalid field type string
that gets displayed in both the synthetic event description and the
event format:

Before:

  # echo 'myevent char str[16]; int v' >> synthetic_events
  # cat synthetic_events
    myevent	char[16]; str; int v

  name: myevent
  ID: 1936
  format:
  	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
  	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
  	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
  	field:int common_pid;	offset:4;	size:4;	signed:1;

  	field:char str[16];;	offset:8;	size:16;	signed:1;
  	field:int v;	offset:40;	size:4;	signed:1;

  print fmt: "str=%s, v=%d", REC->str, REC->v

After:

  # echo 'myevent char str[16]; int v' >> synthetic_events
  # cat synthetic_events
    myevent	char[16] str; int v

  # cat events/synthetic/myevent/format
  name: myevent
  ID: 1936
  format:
	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
	field:int common_pid;	offset:4;	size:4;	signed:1;

	field:char str[16];	offset:8;	size:16;	signed:1;
	field:int v;	offset:40;	size:4;	signed:1;

  print fmt: "str=%s, v=%d", REC->str, REC->v

Link: https://lkml.kernel.org/r/6587663b56c2d45ab9d8c8472a2110713cdec97d.1602598160.git.zanussi@kernel.org

[ <rostedt@goodmis.org>: wrote parse_synth_field() snippet. ]
Fixes: 4b147936fa (tracing: Add support for 'synthetic' events)
Reported-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-10-15 12:01:13 -04:00
Tom Zanussi
d4d704637d tracing: Add synthetic event error logging
Add support for synthetic event error logging, which entails adding a
logging function for it, a way to save the synthetic event command,
and a set of specific synthetic event parse error strings and
handling.

Link: https://lkml.kernel.org/r/ed099c66df13b40cfc633aaeb17f66c37a923066.1602598160.git.zanussi@kernel.org

[ <rostedt@goodmis.org>: wrote save_cmdstr() seq_buf implementation. ]
Tested-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-10-15 12:01:13 -04:00
Tom Zanussi
9bbb33291f tracing: Check that the synthetic event and field names are legal
Call the is_good_name() function used by probe events to make sure
synthetic event and field names don't contain illegal characters and
cause unexpected parsing of synthetic event commands.

Link: https://lkml.kernel.org/r/c4d4bb59d3ac39bcbd70fba0cf837d6b1cedb015.1602598160.git.zanussi@kernel.org

Fixes: 4b147936fa (tracing: Add support for 'synthetic' events)
Reported-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-10-15 12:01:13 -04:00
Tom Zanussi
42d120e2dd tracing: Move is_good_name() from trace_probe.h to trace.h
is_good_name() is useful for other trace infrastructure, such as
synthetic events, so make it available via trace.h.

Link: https://lkml.kernel.org/r/cc6d6a2d7da6957fcbe1e2922e76d18d2bb459b4.1602598160.git.zanussi@kernel.org

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-10-15 12:01:13 -04:00
Tom Zanussi
7d27adf575 tracing: Don't show dynamic string internals in synthetic event description
For synthetic event dynamic fields, the type contains "__data_loc",
which is basically an internal part of the type which is only meant to
be displayed in the format, not in the event description itself, which
is confusing to users since they can't use __data_loc on the
command-line to define an event field, which printing it would lead
them to believe.

So filter it out from the description, while leaving it in the type.

Link: https://lkml.kernel.org/r/b3b7baf7813298a5ede4ff02e2e837b91c05a724.1602598160.git.zanussi@kernel.org

Reported-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-10-15 12:01:13 -04:00
Qiujun Huang
499f7bb085 tracing: Fix some typos in comments
s/wihin/within/
s/retrieven/retrieved/
s/suppport/support/
s/wil/will/
s/accidently/accidentally/
s/if the if the/if the/

Link: https://lkml.kernel.org/r/20201010140924.3809-1-hqjagain@gmail.com

Signed-off-by: Qiujun Huang <hqjagain@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-10-15 12:01:13 -04:00
Masami Hiramatsu
c163409719 tracing/boot: Add ftrace.instance.*.alloc_snapshot option
Add ftrace.instance.*.alloc_snapshot option.

This option has been described in Documentation/trace/boottime-trace.rst
but not implemented yet.

ftrace.[instance.INSTANCE.]alloc_snapshot
   Allocate snapshot buffer.

The difference from kernel.alloc_snapshot is that the kernel.alloc_snapshot
will allocate the buffer only for the main instance, but this can allocate
buffer for any new instances.

Link: https://lkml.kernel.org/r/160234368948.400560.15313384470765915015.stgit@devnote2

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-10-15 12:01:13 -04:00
Gaurav Kohli
bbeb97464e tracing: Fix race in trace_open and buffer resize call
Below race can come, if trace_open and resize of
cpu buffer is running parallely on different cpus
CPUX                                CPUY
				    ring_buffer_resize
				    atomic_read(&buffer->resize_disabled)
tracing_open
tracing_reset_online_cpus
ring_buffer_reset_cpu
rb_reset_cpu
				    rb_update_pages
				    remove/insert pages
resetting pointer

This race can cause data abort or some times infinte loop in
rb_remove_pages and rb_insert_pages while checking pages
for sanity.

Take buffer lock to fix this.

Link: https://lkml.kernel.org/r/1601976833-24377-1-git-send-email-gkohli@codeaurora.org

Cc: stable@vger.kernel.org
Fixes: b23d7a5f4a ("ring-buffer: speed up buffer resets by avoiding synchronize_rcu for each CPU")
Signed-off-by: Gaurav Kohli <gkohli@codeaurora.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-10-15 12:01:13 -04:00
Steven Rostedt (VMware)
6d9bd13945 tracing: Check return value of __create_val_fields() before using its result
After having a typo for writing a histogram trigger.

Wrote:
  echo 'hist:key=pid:ts=common_timestamp.usec' > events/sched/sched_waking/trigger

Instead of:
  echo 'hist:key=pid:ts=common_timestamp.usecs' > events/sched/sched_waking/trigger

and the following crash happened:

 BUG: kernel NULL pointer dereference, address: 0000000000000008
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: 0000 [#1] PREEMPT SMP PTI
 CPU: 4 PID: 1641 Comm: sh Not tainted 5.9.0-rc5-test+ #549
 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
 RIP: 0010:event_hist_trigger_func+0x70b/0x1ee0
 Code: 24 08 89 d5 49 89 cc e9 8c 00 00 00 4c 89 f2 41 b9 00 10 00 00 4c 89 e1 44 89 ee 4c 89 ff e8 dc d3 ff ff 45 89 ea 4b 8b 14 d7 <f6> 42 08 04 74 17 41 8b 8f c0 00 00 00 8d 71 01 41 89 b7 c0 00 00
 RSP: 0018:ffff959213d53db0 EFLAGS: 00010202
 RAX: ffffffffffffffea RBX: 0000000000000000 RCX: 0000000000084c04
 RDX: 0000000000000000 RSI: df7326aefebd174c RDI: 0000000000031080
 RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000001
 R10: 0000000000000001 R11: 0000000000000046 R12: ffff959211dcf690
 R13: 0000000000000001 R14: ffff95925a36e370 R15: ffff959251c89800
 FS:  00007fb9ea934740(0000) GS:ffff95925ab00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000008 CR3: 00000000c976c005 CR4: 00000000001706e0
 Call Trace:
  ? trigger_process_regex+0x78/0x110
  trigger_process_regex+0xc5/0x110
  event_trigger_write+0x71/0xd0
  vfs_write+0xca/0x210
  ksys_write+0x70/0xf0
  do_syscall_64+0x33/0x40
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
 RIP: 0033:0x7fb9eaa29487
 Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24

This was caused by accessing the hlist_data fields after the call to
__create_val_fields() without checking if the creation succeed.

Link: https://lkml.kernel.org/r/20201013154852.3abd8702@gandalf.local.home

Fixes: 63a1e5de30 ("tracing: Save normal string variables")
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-10-15 12:00:59 -04:00
Alexei Starovoitov
e688c3db7c bpf: Fix register equivalence tracking.
The 64-bit JEQ/JNE handling in reg_set_min_max() was clearing reg->id in either
true or false branch. In the case 'if (reg->id)' check was done on the other
branch the counter part register would have reg->id == 0 when called into
find_equal_scalars(). In such case the helper would incorrectly identify other
registers with id == 0 as equivalent and propagate the state incorrectly.
Fix it by preserving ID across reg_set_min_max().

In other words any kind of comparison operator on the scalar register
should preserve its ID to recognize:

r1 = r2
if (r1 == 20) {
  #1 here both r1 and r2 == 20
} else if (r2 < 20) {
  #2 here both r1 and r2 < 20
}

The patch is addressing #1 case. The #2 was working correctly already.

Fixes: 75748837b7 ("bpf: Propagate scalar ranges through register assignments.")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Tested-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20201014175608.1416-1-alexei.starovoitov@gmail.com
2020-10-15 16:05:31 +02:00
Petr Mladek
eac48eb6ce printk: ringbuffer: Wrong data pointer when appending small string
data_realloc() returns wrong data pointer when the block is wrapped and
the size is not increased. It might happen when pr_cont() wants to
add only few characters and there is already a space for them because
of alignment.

It might cause writing outsite the buffer. It has been detected by LTP
tests with KASAN enabled:

[  221.921944] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=c,mems_allowed=0,oom_memcg=/0,task_memcg=in
[  221.922108] ==================================================================
[  221.922111] BUG: KASAN: global-out-of-bounds in vprintk_store+0x362/0x3d0
[  221.922112] Write of size 2 at addr ffffffffba51dbcd by task
memcg_test_1/11282
[  221.922113]
[  221.922114] CPU: 1 PID: 11282 Comm: memcg_test_1 Not tainted
5.9.0-next-20201013 #1
[  221.922116] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
2.0b 07/27/2017
[  221.922116] Call Trace:
[  221.922117]  dump_stack+0xa4/0xd9
[  221.922118]  print_address_description.constprop.0+0x21/0x210
[  221.922119]  ? _raw_write_lock_bh+0xe0/0xe0
[  221.922120]  ? vprintk_store+0x362/0x3d0
[  221.922121]  kasan_report.cold+0x37/0x7c
[  221.922122]  ? vprintk_store+0x362/0x3d0
[  221.922123]  check_memory_region+0x18c/0x1f0
[  221.922124]  memcpy+0x3c/0x60
[  221.922125]  vprintk_store+0x362/0x3d0
[  221.922125]  ? __ia32_sys_syslog+0x50/0x50
[  221.922126]  ? _raw_spin_lock_irqsave+0x9b/0x100
[  221.922127]  ? _raw_spin_lock_irq+0xf0/0xf0
[  221.922128]  ? __kasan_check_write+0x14/0x20
[  221.922129]  vprintk_emit+0x8d/0x1f0
[  221.922130]  vprintk_default+0x1d/0x20
[  221.922131]  vprintk_func+0x5a/0x100
[  221.922132]  printk+0xb2/0xe3
[  221.922133]  ? swsusp_write.cold+0x189/0x189
[  221.922134]  ? kernfs_vfs_xattr_set+0x60/0x60
[  221.922134]  ? _raw_write_lock_bh+0xe0/0xe0
[  221.922135]  ? trace_hardirqs_on+0x38/0x100
[  221.922136]  pr_cont_kernfs_path.cold+0x49/0x4b
[  221.922137]  mem_cgroup_print_oom_context.cold+0x74/0xc3
[  221.922138]  dump_header+0x340/0x3bf
[  221.922139]  oom_kill_process.cold+0xb/0x10
[  221.922140]  out_of_memory+0x1e9/0x860
[  221.922141]  ? oom_killer_disable+0x210/0x210
[  221.922142]  mem_cgroup_out_of_memory+0x198/0x1c0
[  221.922143]  ? mem_cgroup_count_precharge_pte_range+0x250/0x250
[  221.922144]  try_charge+0xa9b/0xc50
[  221.922145]  ? arch_stack_walk+0x9e/0xf0
[  221.922146]  ? memory_high_write+0x230/0x230
[  221.922146]  ? avc_has_extended_perms+0x830/0x830
[  221.922147]  ? stack_trace_save+0x94/0xc0
[  221.922148]  ? stack_trace_consume_entry+0x90/0x90
[  221.922149]  __memcg_kmem_charge+0x73/0x120
[  221.922150]  ? cred_has_capability+0x10f/0x200
[  221.922151]  ? mem_cgroup_can_attach+0x260/0x260
[  221.922152]  ? selinux_sb_eat_lsm_opts+0x2f0/0x2f0
[  221.922153]  ? obj_cgroup_charge+0x16b/0x220
[  221.922154]  ? kmem_cache_alloc+0x78/0x4c0
[  221.922155]  obj_cgroup_charge+0x122/0x220
[  221.922156]  ? vm_area_alloc+0x20/0x90
[  221.922156]  kmem_cache_alloc+0x78/0x4c0
[  221.922157]  vm_area_alloc+0x20/0x90
[  221.922158]  mmap_region+0x3ed/0x9a0
[  221.922159]  ? cap_mmap_addr+0x1d/0x80
[  221.922160]  do_mmap+0x3ee/0x720
[  221.922161]  vm_mmap_pgoff+0x16a/0x1c0
[  221.922162]  ? randomize_stack_top+0x90/0x90
[  221.922163]  ? copy_page_range+0x1980/0x1980
[  221.922163]  ksys_mmap_pgoff+0xab/0x350
[  221.922164]  ? find_mergeable_anon_vma+0x110/0x110
[  221.922165]  ? __audit_syscall_entry+0x1a6/0x1e0
[  221.922166]  __x64_sys_mmap+0x8d/0xb0
[  221.922167]  do_syscall_64+0x38/0x50
[  221.922168]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  221.922169] RIP: 0033:0x7fe8f5e75103
[  221.922172] Code: 54 41 89 d4 55 48 89 fd 53 4c 89 cb 48 85 ff 74
56 49 89 d9 45 89 f8 45 89 f2 44 89 e2 4c 89 ee 48 89 ef b8 09 00 00
00 0f 05 <48> 3d 00 f0 ff ff 77 7d 5b 5d 41 5c 41 5d 41 5e 41 5f c3 66
2e 0f
[  221.922173] RSP: 002b:00007ffd38c90198 EFLAGS: 00000246 ORIG_RAX:
0000000000000009
[  221.922175] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fe8f5e75103
[  221.922176] RDX: 0000000000000003 RSI: 0000000000001000 RDI: 0000000000000000
[  221.922178] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[  221.922179] R10: 0000000000002022 R11: 0000000000000246 R12: 0000000000000003
[  221.922180] R13: 0000000000001000 R14: 0000000000002022 R15: 0000000000000000
[  221.922181]
[  213O[  221.922182] The buggy address belongs to the variable:
[  221.922183]  clear_seq+0x2d/0x40
[  221.922183]
[  221.922184] Memory state around the buggy address:
[  221.922185]  ffffffffba51da80: 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00
[  221.922187]  ffffffffba51db00: 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00
[  221.922188] >ffffffffba51db80: f9 f9 f9 f9 00 f9 f9 f9 f9 f9 f9 f9
00 f9 f9 f9
[  221.922189]                                               ^
[  221.922190]  ffffffffba51dc00: f9 f9 f9 f9 00 f9 f9 f9 f9 f9 f9 f9
00 f9 f9 f9
[  221.922191]  ffffffffba51dc80: f9 f9 f9 f9 01 f9 f9 f9 f9 f9 f9 f9
00 f9 f9 f9
[  221.922193] ==================================================================
[  221.922194] Disabling lock debugging due to kernel taint
[  221.922196] ,task=memcg_test_1,pid=11280,uid=0
[  221.922205] Memory cgroup out of memory: Killed process 11280

Link: https://lore.kernel.org/r/CA+G9fYt46oC7-BKryNDaaXPJ9GztvS2cs_7GjYRjanRi4+ryCQ@mail.gmail.com
Fixes: 4cfc7258f8 ("printk: ringbuffer: add finalization/extension support")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20201014175051.GC13775@alley
2020-10-15 12:21:13 +02:00
Mauro Carvalho Chehab
72a2fbda53 rcu/tree: docs: document bkvcache new members at struct kfree_rcu_cpu
Changeset 53c72b590b ("rcu/tree: cache specified number of objects")
added new members for struct kfree_rcu_cpu, but didn't add the
corresponding at the kernel-doc markup, as repoted when doing
"make htmldocs":
	./kernel/rcu/tree.c:3113: warning: Function parameter or member 'bkvcache' not described in 'kfree_rcu_cpu'
	./kernel/rcu/tree.c:3113: warning: Function parameter or member 'nr_bkv_objs' not described in 'kfree_rcu_cpu'

So, move the description for bkvcache to kernel-doc, and add a
description for nr_bkv_objs.

Fixes: 53c72b590b ("rcu/tree: cache specified number of objects")
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-10-15 07:57:55 +02:00
Linus Torvalds
2f6c6d0891 Merge branch 'for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
 "Two minor changes.

  One makes cgroup interface files ignore zero-sized writes rather than
  triggering -EINVAL on them. The other change is a cleanup which
  doesn't cause any behavior changes"

* 'for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: Zero sized write should be no-op
  cgroup: remove redundant kernfs_activate in cgroup_setup_root()
2020-10-14 14:58:32 -07:00
Linus Torvalds
4da9af0014 threads-v5.10
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCX4a4sAAKCRCRxhvAZXjc
 ohdRAP9fclgrRkTl3o4cgaK0PUMt8BZ5QCg/SPQrVT58AQlfSwEAsNtWAeo6U2z1
 FLGuCoPBEW1Zghkj1lMbIhj5zyVaEQ8=
 =Z7Q0
 -----END PGP SIGNATURE-----

Merge tag 'threads-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull pidfd updates from Christian Brauner:
 "This introduces a new extension to the pidfd_open() syscall. Users can
  now raise the new PIDFD_NONBLOCK flag to support non-blocking pidfd
  file descriptors. This has been requested for uses in async process
  management libraries such as async-pidfd in Rust.

  Ever since the introduction of pidfds and more advanced async io
  various programming languages such as Rust have grown support for
  async event libraries. These libraries are created to help build
  epoll-based event loops around file descriptors. A common pattern is
  to automatically make all file descriptors they manage to O_NONBLOCK.

  For such libraries the EAGAIN error code is treated specially. When a
  function is called that returns EAGAIN the function isn't called again
  until the event loop indicates the the file descriptor is ready.
  Supporting EAGAIN when waiting on pidfds makes such libraries just
  work with little effort.

  This introduces a new flag PIDFD_NONBLOCK that is equivalent to
  O_NONBLOCK. This follows the same patterns we have for other (anon
  inode) file descriptors such as EFD_NONBLOCK, IN_NONBLOCK,
  SFD_NONBLOCK, TFD_NONBLOCK and the same for close-on-exec flags.

  Passing a non-blocking pidfd to waitid() currently has no effect, i.e.
  is not supported. There are users which would like to use waitid() on
  pidfds that are O_NONBLOCK and mix it with pidfds that are blocking
  and both pass them to waitid().

  The expected behavior is to have waitid() return -EAGAIN for
  non-blocking pidfds and to block for blocking pidfds without needing
  to perform any additional checks for flags set on the pidfd before
  passing it to waitid(). Non-blocking pidfds will return EAGAIN from
  waitid() when no child process is ready yet. Returning -EAGAIN for
  non-blocking pidfds makes it easier for event loops that handle EAGAIN
  specially.

  It also makes the API more consistent and uniform. In essence,
  waitid() is treated like a read on a non-blocking pidfd or a recvmsg()
  on a non-blocking socket.

  With the addition of support for non-blocking pidfds we support the
  same functionality that sockets do. For sockets() recvmsg() supports
  MSG_DONTWAIT for pidfds waitid() supports WNOHANG. Both flags are
  per-call options. In contrast non-blocking pidfds and non-blocking
  sockets are a setting on an open file description affecting all
  threads in the calling process as well as other processes that hold
  file descriptors referring to the same open file description. Both
  behaviors, per call and per open file description, have genuine
  use-cases.

  The interaction with the WNOHANG flag is documented as follows:

   - If a non-blocking pidfd is passed and WNOHANG is not raised we
     simply raise the WNOHANG flag internally. When do_wait() returns
     indicating that there are eligible child processes but none have
     exited yet we set EAGAIN. If no child process exists we continue
     returning ECHILD.

   - If a non-blocking pidfd is passed and WNOHANG is raised waitid()
     will continue returning 0, i.e. it will not set EAGAIN. This ensure
     backwards compatibility with applications passing WNOHANG
     explicitly with pidfds"

* tag 'threads-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  tests: remove O_NONBLOCK before waiting for WSTOPPED
  tests: add waitid() tests for non-blocking pidfds
  tests: port pidfd_wait to kselftest harness
  pidfd: support PIDFD_NONBLOCK in pidfd_open()
  exit: support non-blocking pidfds
2020-10-14 14:39:20 -07:00
Linus Torvalds
612e7a4c16 kernel-clone-v5.9
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCXz5bNAAKCRCRxhvAZXjc
 opfjAP9R/J72yxdd2CLGNZ96hyiRX1NgFDOVUhscOvujYJf8ZwD+OoLmKMvAyFW6
 hnMhT1n9Q+aq194hyzChOLQaBTejBQ8=
 =4WCX
 -----END PGP SIGNATURE-----

Merge tag 'kernel-clone-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull kernel_clone() updates from Christian Brauner:
 "During the v5.9 merge window we reworked the process creation
  codepaths across multiple architectures. After this work we were only
  left with the _do_fork() helper based on the struct kernel_clone_args
  calling convention. As was pointed out _do_fork() isn't valid
  kernelese especially for a helper that isn't just static.

  This series removes the _do_fork() helper and introduces the new
  kernel_clone() helper. The process creation cleanup didn't change the
  name to something more reasonable mainly because _do_fork() was used
  in quite a few places. So sending this as a separate series seemed the
  better strategy.

  I originally intended to send this early in the v5.9 development cycle
  after the merge window had closed but given that this was touching
  quite a few places I decided to defer this until the v5.10 merge
  window"

* tag 'kernel-clone-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  sched: remove _do_fork()
  tracing: switch to kernel_clone()
  kgdbts: switch to kernel_clone()
  kprobes: switch to kernel_clone()
  x86: switch to kernel_clone()
  sparc: switch to kernel_clone()
  nios2: switch to kernel_clone()
  m68k: switch to kernel_clone()
  ia64: switch to kernel_clone()
  h8300: switch to kernel_clone()
  fork: introduce kernel_clone()
2020-10-14 14:32:52 -07:00
Linus Torvalds
79db2b74aa Merge branch 'stable/for-linus-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb
Pull swiotlb updates from Konrad Rzeszutek Wilk:
 "Minor enhancement of using %p to print phys_addr_r and also compiler
  warnings"

* 'stable/for-linus-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb:
  swiotlb: Mark max_segment with static keyword
  swiotlb: Declare swiotlb_late_init_with_default_size() in header
  swiotlb: Use %pa to print phys_addr_t variables
2020-10-14 12:00:02 -07:00
Juri Lelli
a73f863af4 sched/features: Fix !CONFIG_JUMP_LABEL case
Commit:

  765cc3a4b2 ("sched/core: Optimize sched_feat() for !CONFIG_SCHED_DEBUG builds")

made sched features static for !CONFIG_SCHED_DEBUG configurations, but
overlooked the CONFIG_SCHED_DEBUG=y and !CONFIG_JUMP_LABEL cases.

For the latter echoing changes to /sys/kernel/debug/sched_features has
the nasty effect of effectively changing what sched_features reports,
but without actually changing the scheduler behaviour (since different
translation units get different sysctl_sched_features).

Fix CONFIG_SCHED_DEBUG=y and !CONFIG_JUMP_LABEL configurations by properly
restructuring ifdefs.

Fixes: 765cc3a4b2 ("sched/core: Optimize sched_feat() for !CONFIG_SCHED_DEBUG builds")
Co-developed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Patrick Bellasi <patrick.bellasi@matbug.net>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lore.kernel.org/r/20201013053114.160628-1-juri.lelli@redhat.com
2020-10-14 19:55:46 +02:00
zhuguangqing
eba9f08293 sched: Replace zero-length array with flexible-array
In the following commit:

  04f5c362ec: ("sched/fair: Replace zero-length array with flexible-array")

a zero-length array cpumask[0] has been replaced with cpumask[].
But there is still a cpumask[0] in 'struct sched_group_capacity'
which was missed.

The point of using [] instead of [0] is that with [] the compiler will
generate a build warning if it isn't the last member of a struct.

[ mingo: Rewrote the changelog. ]

Signed-off-by: zhuguangqing <zhuguangqing@xiaomi.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20201014140220.11384-1-zhuguangqing83@gmail.com
2020-10-14 19:55:19 +02:00
Linus Torvalds
0b8417c141 Power management updates for 5.10-rc1
- Rework cpufreq statistics collection to allow it to take place
    when fast frequency switching is enabled in the governor (Viresh
    Kumar).
 
  - Make the cpufreq core set the frequency scale on behalf of the
    driver and update several cpufreq drivers accordingly (Ionela
    Voinescu, Valentin Schneider).
 
  - Add new hardware support to the STI and qcom cpufreq drivers and
    improve them (Alain Volmat, Manivannan Sadhasivam).
 
  - Fix multiple assorted issues in cpufreq drivers (Jon Hunter,
    Krzysztof Kozlowski, Matthias Kaehlcke, Pali Rohár, Stephan
    Gerhold, Viresh Kumar).
 
  - Fix several assorted issues in the operating performance points
    (OPP) framework (Stephan Gerhold, Viresh Kumar).
 
  - Allow devfreq drivers to fetch devfreq instances by DT enumeration
    instead of using explicit phandles and modify the devfreq core
    code to support driver-specific devfreq DT bindings (Leonard
    Crestez, Chanwoo Choi).
 
  - Improve initial hardware resetting in the tegra30 devfreq driver
    and clean up the tegra cpuidle driver (Dmitry Osipenko).
 
  - Update the cpuidle core to collect state entry rejection
    statistics and expose them via sysfs (Lina Iyer).
 
  - Improve the ACPI _CST code handling diagnostics (Chen Yu).
 
  - Update the PSCI cpuidle driver to allow the PM domain
    initialization to occur in the OSI mode as well as in the PC
    mode (Ulf Hansson).
 
  - Rework the generic power domains (genpd) core code to allow
    domain power off transition to be aborted in the absence of the
    "power off" domain callback (Ulf Hansson).
 
  - Fix two suspend-to-idle issues in the ACPI EC driver (Rafael
    Wysocki).
 
  - Fix the handling of timer_expires in the PM-runtime framework on
    32-bit systems and the handling of device links in it (Grygorii
    Strashko, Xiang Chen).
 
  - Add IO requests batching support to the hibernate image saving and
    reading code and drop a bogus get_gendisk() from there (Xiaoyi
    Chen, Christoph Hellwig).
 
  - Allow PCIe ports to be put into the D3cold power state if they
    are power-manageable via ACPI (Lukas Wunner).
 
  - Add missing header file include to a power capping driver (Pujin
    Shi).
 
  - Clean up the qcom-cpr AVS driver a bit (Liu Shixin).
 
  - Kevin Hilman steps down as designated reviwer of adaptive voltage
    scaling (AVS) driverrs (Kevin Hilman).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl+F4A4SHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxX6QP/iELq9/OsH0aJdDQlY9tnh2Oa13+HB/Y
 w1e6W+ZR/YjPgUpMVARwRLKf/gn7dUEwRDHVpGvDOyun+HACCPHB2hg8iktbxdVl
 NFAVGZCCRezXqz3opL1hl8C3Dh0CqUPUjWXGMr+Lw2TZQKT+hx9K1dm9Epe3ivyT
 RlVH/wifei80cFRcUUj7DI5KLCAyk+uKkZIFnZHAGKK6qOHMqRL5sDZsMUwWpd2i
 AdghABjePbaiLTAoZuUsJINAGY4DnIt6ASRdMJ4iksiD6pFITwFs0HSOPe7hZLlv
 zbwDPI5+TIkrOy9/aWoMaEIH1OQiFN/O++Slvdjn7gMsRgoW4d300ru4Jo1pOHxb
 5twxagCCqlOf4YAaSrMCH4HT+c6fOWoGj2AKzX3DMJyO3/WN+8XNvUxKtC5Px1u+
 pWRASjfQMO2j6nNjTCTwDJdYzggiKa54rYH2k7svX7XnTIAf+2E1gv8b4rMTgQrZ
 0rq9kULYlhgk3EYjd/DndkvxunRlmiqhzrYB4jc9eDSPNzB8FZEbw1ZMRQTFfjK0
 kp0vaEpTJ7JfKSCfluB4UmTuQoGogLl0xbzc+2NNIpwdNmrH2Srvq6wbj35jEDTU
 tqsTsBP+XZFOWyFOw/L2J47LTOp0TJnz8z4aycLfrmdNUVnXJoU1sXgFlDzETMgT
 0E6cTVwLF7Zi
 =rGhy
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "These rework the collection of cpufreq statistics to allow it to take
  place if fast frequency switching is enabled in the governor, rework
  the frequency invariance handling in the cpufreq core and drivers, add
  new hardware support to a couple of cpufreq drivers, fix a number of
  assorted issues and clean up the code all over.

  Specifics:

   - Rework cpufreq statistics collection to allow it to take place when
     fast frequency switching is enabled in the governor (Viresh Kumar).

   - Make the cpufreq core set the frequency scale on behalf of the
     driver and update several cpufreq drivers accordingly (Ionela
     Voinescu, Valentin Schneider).

   - Add new hardware support to the STI and qcom cpufreq drivers and
     improve them (Alain Volmat, Manivannan Sadhasivam).

   - Fix multiple assorted issues in cpufreq drivers (Jon Hunter,
     Krzysztof Kozlowski, Matthias Kaehlcke, Pali Rohár, Stephan
     Gerhold, Viresh Kumar).

   - Fix several assorted issues in the operating performance points
     (OPP) framework (Stephan Gerhold, Viresh Kumar).

   - Allow devfreq drivers to fetch devfreq instances by DT enumeration
     instead of using explicit phandles and modify the devfreq core code
     to support driver-specific devfreq DT bindings (Leonard Crestez,
     Chanwoo Choi).

   - Improve initial hardware resetting in the tegra30 devfreq driver
     and clean up the tegra cpuidle driver (Dmitry Osipenko).

   - Update the cpuidle core to collect state entry rejection statistics
     and expose them via sysfs (Lina Iyer).

   - Improve the ACPI _CST code handling diagnostics (Chen Yu).

   - Update the PSCI cpuidle driver to allow the PM domain
     initialization to occur in the OSI mode as well as in the PC mode
     (Ulf Hansson).

   - Rework the generic power domains (genpd) core code to allow domain
     power off transition to be aborted in the absence of the "power
     off" domain callback (Ulf Hansson).

   - Fix two suspend-to-idle issues in the ACPI EC driver (Rafael
     Wysocki).

   - Fix the handling of timer_expires in the PM-runtime framework on
     32-bit systems and the handling of device links in it (Grygorii
     Strashko, Xiang Chen).

   - Add IO requests batching support to the hibernate image saving and
     reading code and drop a bogus get_gendisk() from there (Xiaoyi
     Chen, Christoph Hellwig).

   - Allow PCIe ports to be put into the D3cold power state if they are
     power-manageable via ACPI (Lukas Wunner).

   - Add missing header file include to a power capping driver (Pujin
     Shi).

   - Clean up the qcom-cpr AVS driver a bit (Liu Shixin).

   - Kevin Hilman steps down as designated reviwer of adaptive voltage
     scaling (AVS) drivers (Kevin Hilman)"

* tag 'pm-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (65 commits)
  cpufreq: stats: Fix string format specifier mismatch
  arm: disable frequency invariance for CONFIG_BL_SWITCHER
  cpufreq,arm,arm64: restructure definitions of arch_set_freq_scale()
  cpufreq: stats: Add memory barrier to store_reset()
  cpufreq: schedutil: Simplify sugov_fast_switch()
  ACPI: EC: PM: Drop ec_no_wakeup check from acpi_ec_dispatch_gpe()
  ACPI: EC: PM: Flush EC work unconditionally after wakeup
  PCI/ACPI: Whitelist hotplug ports for D3 if power managed by ACPI
  PM: hibernate: remove the bogus call to get_gendisk() in software_resume()
  cpufreq: Move traces and update to policy->cur to cpufreq core
  cpufreq: stats: Enable stats for fast-switch as well
  cpufreq: stats: Mark few conditionals with unlikely()
  cpufreq: stats: Remove locking
  cpufreq: stats: Defer stats update to cpufreq_stats_record_transition()
  PM: domains: Allow to abort power off when no ->power_off() callback
  PM: domains: Rename power state enums for genpd
  PM / devfreq: tegra30: Improve initial hardware resetting
  PM / devfreq: event: Change prototype of devfreq_event_get_edev_by_phandle function
  PM / devfreq: Change prototype of devfreq_get_devfreq_by_phandle function
  PM / devfreq: Add devfreq_get_devfreq_by_node function
  ...
2020-10-14 10:45:41 -07:00
Linus Torvalds
6873139ed0 objtool changes for v5.10:
- Most of the changes are cleanups and reorganization to make the objtool code
    more arch-agnostic. This is in preparation for non-x86 support.
 
 Fixes:
 
  - KASAN fixes.
  - Handle unreachable trap after call to noreturn functions better.
  - Ignore unreachable fake jumps.
  - Misc smaller fixes & cleanups.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl+FgwIRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1juGw/6A6goA5/HHapM965yG1eY/rTLp3eIbcma
 1ZbkUsP0YfT6wVUzw/sOeZzKNOwOq1FuMfkjuH2KcnlxlcMekIaKvLk8uauW4igM
 hbFGuuZfZ0An5ka9iQ1W6HGdsuD3vVlN1w/kxdWk0c3lJCVQSTxdCfzF8fuF3gxX
 lF3Bc1D/ZFcHIHT/hu/jeIUCgCYpD3qZDjQJBScSwVthZC+Fw6weLLGp2rKDaCao
 HhSQft6MUfDrUKfH3LBIUNPRPCOrHo5+AX6BXxLXJVxqlwO/YU3e0GMwSLedMtBy
 TASWo7/9GAp+wNNZe8EliyTKrfC3sLxN1QImfjuojxbBVXx/YQ/ToTt9fVGpF4Y+
 XhhRFv9520v1tS2wPHIgQGwbh7EWG6mdrmo10RAs/31ViONPrbEZ4WmcA08b/5FY
 KEkOVb18yfmDVzVZPpSc+HpIFkppEBOf7wPg27Bj3RTZmzIl/y+rKSnxROpsJsWb
 R6iov7SFVET14lHl1G7tPNXfqRaS7HaOQIj3rSUyAP0ZfX+yIupVJp32dc6Ofg8b
 SddUCwdIHoFdUNz4Y9csUCrewtCVJbxhV4MIdv0GpWbrgSw96RFZgetaH+6mGRpj
 0Kh6M1eC3irDbhBuarWUBAr2doPAq4iOUeQU36Q6YSAbCs83Ws2uKOWOHoFBVwCH
 uSKT0wqqG+E=
 =KX5o
 -----END PGP SIGNATURE-----

Merge tag 'objtool-core-2020-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool updates from Ingo Molnar:
 "Most of the changes are cleanups and reorganization to make the
  objtool code more arch-agnostic. This is in preparation for non-x86
  support.

  Other changes:

   - KASAN fixes

   - Handle unreachable trap after call to noreturn functions better

   - Ignore unreachable fake jumps

   - Misc smaller fixes & cleanups"

* tag 'objtool-core-2020-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
  perf build: Allow nested externs to enable BUILD_BUG() usage
  objtool: Allow nested externs to enable BUILD_BUG()
  objtool: Permit __kasan_check_{read,write} under UACCESS
  objtool: Ignore unreachable trap after call to noreturn functions
  objtool: Handle calling non-function symbols in other sections
  objtool: Ignore unreachable fake jumps
  objtool: Remove useless tests before save_reg()
  objtool: Decode unwind hint register depending on architecture
  objtool: Make unwind hint definitions available to other architectures
  objtool: Only include valid definitions depending on source file type
  objtool: Rename frame.h -> objtool.h
  objtool: Refactor jump table code to support other architectures
  objtool: Make relocation in alternative handling arch dependent
  objtool: Abstract alternative special case handling
  objtool: Move macros describing structures to arch-dependent code
  objtool: Make sync-check consider the target architecture
  objtool: Group headers to check in a single list
  objtool: Define 'struct orc_entry' only when needed
  objtool: Skip ORC entry creation for non-text sections
  objtool: Move ORC logic out of check()
  ...
2020-10-14 10:13:37 -07:00
Linus Torvalds
d5660df4a5 Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:
 "181 patches.

  Subsystems affected by this patch series: kbuild, scripts, ntfs,
  ocfs2, vfs, mm (slab, slub, kmemleak, dax, debug, pagecache, fadvise,
  gup, swap, memremap, memcg, selftests, pagemap, mincore, hmm, dma,
  memory-failure, vmallo and migration)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (181 commits)
  mm/migrate: remove obsolete comment about device public
  mm/migrate: remove cpages-- in migrate_vma_finalize()
  mm, oom_adj: don't loop through tasks in __set_oom_adj when not necessary
  memblock: use separate iterators for memory and reserved regions
  memblock: implement for_each_reserved_mem_region() using __next_mem_region()
  memblock: remove unused memblock_mem_size()
  x86/setup: simplify reserve_crashkernel()
  x86/setup: simplify initrd relocation and reservation
  arch, drivers: replace for_each_membock() with for_each_mem_range()
  arch, mm: replace for_each_memblock() with for_each_mem_pfn_range()
  memblock: reduce number of parameters in for_each_mem_range()
  memblock: make memblock_debug and related functionality private
  memblock: make for_each_memblock_type() iterator private
  mircoblaze: drop unneeded NUMA and sparsemem initializations
  riscv: drop unneeded node initialization
  h8300, nds32, openrisc: simplify detection of memory extents
  arm64: numa: simplify dummy_numa_init()
  arm, xtensa: simplify initialization of high memory pages
  dma-contiguous: simplify cma_early_percent_memory()
  KVM: PPC: Book3S HV: simplify kvm_cma_reserve()
  ...
2020-10-14 09:57:24 -07:00
Suren Baghdasaryan
67197a4f28 mm, oom_adj: don't loop through tasks in __set_oom_adj when not necessary
Currently __set_oom_adj loops through all processes in the system to keep
oom_score_adj and oom_score_adj_min in sync between processes sharing
their mm.  This is done for any task with more that one mm_users, which
includes processes with multiple threads (sharing mm and signals).
However for such processes the loop is unnecessary because their signal
structure is shared as well.

Android updates oom_score_adj whenever a tasks changes its role
(background/foreground/...) or binds to/unbinds from a service, making it
more/less important.  Such operation can happen frequently.  We noticed
that updates to oom_score_adj became more expensive and after further
investigation found out that the patch mentioned in "Fixes" introduced a
regression.  Using Pixel 4 with a typical Android workload, write time to
oom_score_adj increased from ~3.57us to ~362us.  Moreover this regression
linearly depends on the number of multi-threaded processes running on the
system.

Mark the mm with a new MMF_MULTIPROCESS flag bit when task is created with
(CLONE_VM && !CLONE_THREAD && !CLONE_VFORK).  Change __set_oom_adj to use
MMF_MULTIPROCESS instead of mm_users to decide whether oom_score_adj
update should be synchronized between multiple processes.  To prevent
races between clone() and __set_oom_adj(), when oom_score_adj of the
process being cloned might be modified from userspace, we use
oom_adj_mutex.  Its scope is changed to global.

The combination of (CLONE_VM && !CLONE_THREAD) is rarely used except for
the case of vfork().  To prevent performance regressions of vfork(), we
skip taking oom_adj_mutex and setting MMF_MULTIPROCESS when CLONE_VFORK is
specified.  Clearing the MMF_MULTIPROCESS flag (when the last process
sharing the mm exits) is left out of this patch to keep it simple and
because it is believed that this threading model is rare.  Should there
ever be a need for optimizing that case as well, it can be done by hooking
into the exit path, likely following the mm_update_next_owner pattern.

With the combination of (CLONE_VM && !CLONE_THREAD && !CLONE_VFORK) being
quite rare, the regression is gone after the change is applied.

[surenb@google.com: v3]
  Link: https://lkml.kernel.org/r/20200902012558.2335613-1-surenb@google.com

Fixes: 44a70adec9 ("mm, oom_adj: make sure processes sharing mm have same view of oom_score_adj")
Reported-by: Tim Murray <timmurray@google.com>
Suggested-by: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Eugene Syromiatnikov <esyr@redhat.com>
Cc: Christian Kellner <christian@kellner.me>
Cc: Adrian Reber <areber@redhat.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Alexey Gladkov <gladkov.alexey@gmail.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Bernd Edlinger <bernd.edlinger@hotmail.de>
Cc: John Johansen <john.johansen@canonical.com>
Cc: Yafang Shao <laoar.shao@gmail.com>
Link: https://lkml.kernel.org/r/20200824153036.3201505-1-surenb@google.com
Debugged-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:35 -07:00
Mike Rapoport
e9aa36ccbb dma-contiguous: simplify cma_early_percent_memory()
The memory size calculation in cma_early_percent_memory() traverses
memblock.memory rather than simply call memblock_phys_mem_size().  The
comment in that function suggests that at some point there should have
been call to memblock_analyze() before memblock_phys_mem_size() could be
used.  As of now, there is no memblock_analyze() at all and
memblock_phys_mem_size() can be used as soon as cold-plug memory is
registered with memblock.

Replace loop over memblock.memory with a call to memblock_phys_mem_size().

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Baoquan He <bhe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Emil Renner Berthing <kernel@esmil.dk>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: https://lkml.kernel.org/r/20200818151634.14343-3-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:35 -07:00
Peter Xu
c78f463649 mm: remove src/dst mm parameter in copy_page_range()
Both of the mm pointers are not needed after commit 7a4830c380
("mm/fork: Pass new vma pointer into copy_page_range()").

Jason Gunthorpe also reported that the ordering of copy_page_range() is
odd.  Since working at it, reorder the parameters to be logical, by (1)
always put the dst_* fields to be before src_* fields, and (2) keep the
same type of parameters together.

[peterx@redhat.com: further reorder some parameters and line format, per Jason]
  Link: https://lkml.kernel.org/r/20201002192647.7161-1-peterx@redhat.com
[peterx@redhat.com: fix warnings]
  Link: https://lkml.kernel.org/r/20201006200138.GA6026@xz-x1

Reported-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Link: https://lkml.kernel.org/r/20200930204950.6668-1-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:32 -07:00
Miaohe Lin
cf508b5845 mm: use helper function mapping_allow_writable()
Commit 4bb5f5d939 ("mm: allow drivers to prevent new writable mappings")
changed i_mmap_writable from unsigned int to atomic_t and add the helper
function mapping_allow_writable() to atomic_inc i_mmap_writable.  But it
forgot to use this helper function in dup_mmap() and __vma_link_file().

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Christian Kellner <christian@kellner.me>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Adrian Reber <areber@redhat.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200917112736.7789-1-linmiaohe@huawei.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:31 -07:00
Dan Williams
73fb952d83 resource: report parent to walk_iomem_res_desc() callback
In support of detecting whether a resource might have been been claimed,
report the parent to the walk_iomem_res_desc() callback.  For example, the
ACPI HMAT parser publishes "hmem" platform devices per target range.
However, if the HMAT is disabled / missing a fallback driver can attach
devices to the raw memory ranges as a fallback if it sees unclaimed /
orphan "Soft Reserved" resources in the resource tree.

Otherwise, find_next_iomem_res() returns a resource with garbage data from
the stack allocation in __walk_iomem_res_desc() for the res->parent field.

There are currently no users that expect ->child and ->sibling to be
valid, and the resource_lock would be needed to traverse them.  Use a
compound literal to implicitly zero initialize the fields that are not
being returned in addition to setting ->parent.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brice Goglin <Brice.Goglin@inria.fr>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: David Airlie <airlied@linux.ie>
Cc: David Hildenbrand <david@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Jia He <justin.he@arm.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Jason Yan <yanaijie@huawei.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Link: https://lkml.kernel.org/r/159643097166.4062302.11875688887228572793.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:27 -07:00
Linus Torvalds
8b05418b25 seccomp updates for v5.10-rc1
- heavily refactor seccomp selftests (and clone3 selftests dependency) to
   fix powerpc (Kees Cook, Thadeu Lima de Souza Cascardo)
 - fix style issue in selftests (Zou Wei)
 - upgrade "unknown action" from KILL_THREAD to KILL_PROCESS (Rich Felker)
 - replace task_pt_regs(current) with current_pt_regs() (Denis Efremov)
 - fix corner-case race in USER_NOTIF (Jann Horn)
 - make CONFIG_SECCOMP no longer per-arch (YiFei Zhu)
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl+E1LAWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJgRfD/0cq7W51+o34719vefC+oZaMjJJ
 Bd5HYshmr6NRpMqn0OhtT9kVi6OeV0sK0VJeNxSISDIaGNJ8xCI9YhnXwzY+7myK
 +IQu3i2Hv7dlWvTaXWFLL+mvfk6WopLntFGGJQ8KPMnP2gcfH2AZmOeAKGFGhBDe
 NwpAUZ9zriXg9JCQp6u0FzPJgk8KfgfHjUY6Hsa095gg0aPSJhc8bWEUNBQwjCe6
 uIcxDP/zK2WWaEhO9BfHt6/VTcXw7QgTLS3yM+pwBCgR1JHs7HMhtgcwPT410qES
 LmYD8OiHmv5AZhDjcCcNipKEv3ZnxkLnpU/6hfaKM4zn/DoaR/zbfjO9U017rcNV
 9gf7k5siAP7DH48IFlqf4Erzd3xyF0OJDnVfC7NiPtggPfO9aWOHJJZCuJRQOdrN
 qPMjkaQzFb02qb501PLEn55F24OLDjz1vFOqpkJm2/XamOBVV4uiRKmfpNEo/MOf
 QkhSvzvwEFErWwzPH95uFyVhs42stwnM3ppnwtya2+U5kxXdNvbAR8N5leH7siaU
 ab+YJIHW59+BxXTlKgXIcqBP/6RqJWJtuT9OqGs0K2A7FhQSexh5MOm+9vvGgIwZ
 Qjyijku8dB3aV94BNGnlJq6BV+4Hc6EGadh7h3b8GiRAUTYo0pk5G/iKL6Ii+R6p
 0msJENqalKFtNCr70w==
 =a4u2
 -----END PGP SIGNATURE-----

Merge tag 'seccomp-v5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull seccomp updates from Kees Cook:
 "The bulk of the changes are with the seccomp selftests to accommodate
  some powerpc-specific behavioral characteristics. Additional cleanups,
  fixes, and improvements are also included:

   - heavily refactor seccomp selftests (and clone3 selftests
     dependency) to fix powerpc (Kees Cook, Thadeu Lima de Souza
     Cascardo)

   - fix style issue in selftests (Zou Wei)

   - upgrade "unknown action" from KILL_THREAD to KILL_PROCESS (Rich
     Felker)

   - replace task_pt_regs(current) with current_pt_regs() (Denis
     Efremov)

   - fix corner-case race in USER_NOTIF (Jann Horn)

   - make CONFIG_SECCOMP no longer per-arch (YiFei Zhu)"

* tag 'seccomp-v5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (23 commits)
  seccomp: Make duplicate listener detection non-racy
  seccomp: Move config option SECCOMP to arch/Kconfig
  selftests/clone3: Avoid OS-defined clone_args
  selftests/seccomp: powerpc: Set syscall return during ptrace syscall exit
  selftests/seccomp: Allow syscall nr and ret value to be set separately
  selftests/seccomp: Record syscall during ptrace entry
  selftests/seccomp: powerpc: Fix seccomp return value testing
  selftests/seccomp: Remove SYSCALL_NUM_RET_SHARE_REG in favor of SYSCALL_RET_SET
  selftests/seccomp: Avoid redundant register flushes
  selftests/seccomp: Convert REGSET calls into ARCH_GETREG/ARCH_SETREG
  selftests/seccomp: Convert HAVE_GETREG into ARCH_GETREG/ARCH_SETREG
  selftests/seccomp: Remove syscall setting #ifdefs
  selftests/seccomp: mips: Remove O32-specific macro
  selftests/seccomp: arm64: Define SYSCALL_NUM_SET macro
  selftests/seccomp: arm: Define SYSCALL_NUM_SET macro
  selftests/seccomp: mips: Define SYSCALL_NUM_SET macro
  selftests/seccomp: Provide generic syscall setting macro
  selftests/seccomp: Refactor arch register macros to avoid xtensa special case
  selftests/seccomp: Use __NR_mknodat instead of __NR_mknod
  selftests/seccomp: Use bitwise instead of arithmetic operator for flags
  ...
2020-10-13 16:33:43 -07:00
Linus Torvalds
01fb1e2f42 audit/stable-5.10 PR 20201012
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAl+E9ScUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXMbmxAAqXBBieR0UY5zHcvBf7YD6pOUolXJ
 +tSAIVhGru+XYuDtHgWjNF0t788WX8rTFAjTn+fUzpiWPPGLUnvVYNtNkKkbzXY4
 zY8J+f+s9YqMVxzs1yPI7cKNBu6P7A8XDRqtfBaZ4Zcnt/H9TsMX+zWMf/+W4+tD
 zWhJLeD1oXaCAiCznx94IfocguiIVowO6242N58hQTGgPjNKu+FJoSC4DywhGWij
 RqrfTXRmecaQs9yPtDmOoyFuuDI0tbbIcUNTID0Gmo/9ivo9qDDGAA+qEkqpc5Ys
 VI6XZ7ZD5KJSf8PFsvs4PDQ6MNrI3KDTDBcTwOyEfvofbKiFCUy/UJTA1fI52EgA
 9euA+44mDqKPuiykWT3S50Jpx7FaRxdGJEvEGY3IwuY3YaVdB1kspeyacmd4HQbL
 j3rX38MlHoexUj3a++K5JJTAt0E26JP5M6QNwJ+T9Idnn2A4jKFyFBLWgkPL7Ai0
 JP7CFU8qO8HoPadIhgUoSFYydCxhFUD2kATvD2Oz+PXxC6ATRmV+j2lwnX+uOFxD
 zS8uybeEfQuwsQ65rIdmKlTObjCtABwTLY3z67aWygQDwbRIlSuVBPpBt5VG0q04
 mXKaawX3NKFP0kp0o0f/Z8/J8voS5mEIQyv7Br/fR1Yq6/BOw+lCOxerMucLtyj2
 ljCy6+WeblHsS2A=
 =n+Nq
 -----END PGP SIGNATURE-----

Merge tag 'audit-pr-20201012' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit

Pull audit updates from Paul Moore:
 "A small set of audit patches for v5.10.

  There are only three patches in total, and all three are trivial fixes
  that don't really warrant any explanations beyond their descriptions.
  As usual, all three patches pass our test suite"

* tag 'audit-pr-20201012' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
  audit: Remove redundant null check
  audit: uninitialize variable audit_sig_sid
  audit: change unnecessary globals into statics
2020-10-13 16:24:42 -07:00
Linus Torvalds
d594d8f411 printk changes for 5.10
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAl+EN+oACgkQUqAMR0iA
 lPK/gA//WXBjC4FSPNr0j7kPFKQhADS3cUcp+GfuI4rYkYcJHV0yJn1kvctg1rUC
 Je+Hc+Hy5Nk93lwejj5BvQoc31zOeoPDyMje5zi5te4H2NQkaoGXHOMvUnaLcNeo
 g+HJvx+NU9MDjuc5amtK8YD69jzErD+eqrHpQOg4UToMXXcBXLafTThIi9vT1fzP
 9uwWBRlpdQyY7tYbbwFiDuu33PyoWlc6Ksp8qKdLBLz2AmGd1Rvaq+ePsq8b9tHJ
 pfv1agW0GTpzoN2pm5gFXOoYniHB/ooB1L0QLq7ylaociEyb8WbTtkn4v++EjxW8
 aGsO1WdO0MQeIWDxXQR5DYD3s+Me2DMhFPDqUc2/s0q2SGWUPFcsmCsvMAOx/clA
 HDfTWkyzB4FarZOTv0gZ7jYNOVukFzUQ1IBTtWpJifC9fT0xrRkKmKE1UgmWv0ei
 Hx5VFQyQGsDh3sUcRLhW91p4sqJCs7l01zw1A/0rb7a+QTHAqZRtbz5hyTjlViiT
 57XiyXynXW8N4Q5U6uAxCbkFFi+nP/XVQ5ggZ/QLn/4hfWWUcu0vt2bOGkRwryAT
 zYmDqViraEVWKIom74UzZ0nrIBtdhvtbFQIYuyiCQKpKMwytWXUQbUASZL2mfBZi
 h5eJx7etV6f5to5mNRsj8bbN5buX9UheEd0QFD9NJdS6aadqTac=
 =9vEl
 -----END PGP SIGNATURE-----

Merge tag 'printk-for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux

Pull printk updates from Petr Mladek:
 "The big new thing is the fully lockless ringbuffer implementation,
  including the support for continuous lines. It will allow to store and
  read messages in any situation wihtout the risk of deadlocks and
  without the need of temporary per-CPU buffers.

  The access is still serialized by logbuf_lock. It synchronizes few
  more operations, for example, temporary buffer for formatting the
  message, syslog and kmsg_dump operations. The lock removal is being
  discussed and should be ready for the next release.

  The continuous lines are handled exactly the same way as before to
  avoid regressions in user space. It means that they are appended to
  the last message when the caller is the same. Only the last message
  can be extended.

  The data ring includes plain text of the messages. Except for an
  integer at the beginning of each message that points back to the
  descriptor ring with other metadata.

  The dictionary has to stay. journalctl uses it to filter the log. It
  allows to show messages related to a given device. The dictionary
  values are stored in the descriptor ring with the other metadata.

  This is the first part of the printk rework as discussed at Plumbers
  2019, see https://lore.kernel.org/r/87k1acz5rx.fsf@linutronix.de. The
  next big step will be handling consoles by kthreads during the normal
  system operation. It will require special handling of situations when
  the kthreads could not get scheduled, for example, early boot,
  suspend, panic.

  Other changes:

   - Add John Ogness as a reviewer for printk subsystem. He is author of
     the rework and is familiar with the code and history.

   - Fix locking in serial8250_do_startup() to prevent lockdep report.

   - Few code cleanups"

* tag 'printk-for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: (27 commits)
  printk: Use fallthrough pseudo-keyword
  printk: reduce setup_text_buf size to LOG_LINE_MAX
  printk: avoid and/or handle record truncation
  printk: remove dict ring
  printk: move dictionary keys to dev_printk_info
  printk: move printk_info into separate array
  printk: reimplement log_cont using record extension
  printk: ringbuffer: add finalization/extension support
  printk: ringbuffer: change representation of states
  printk: ringbuffer: clear initial reserved fields
  printk: ringbuffer: add BLK_DATALESS() macro
  printk: ringbuffer: relocate get_data()
  printk: ringbuffer: avoid memcpy() on state_var
  printk: ringbuffer: fix setting state in desc_read()
  kernel.h: Move oops_in_progress to printk.h
  scripts/gdb: update for lockless printk ringbuffer
  scripts/gdb: add utils.read_ulong()
  docs: vmcoreinfo: add lockless printk ringbuffer vmcoreinfo
  printk: reduce LOG_BUF_SHIFT range for H8300
  printk: ringbuffer: support dataless records
  ...
2020-10-13 15:58:10 -07:00
Linus Torvalds
6ad4bf6ea1 io_uring-5.10-2020-10-12
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl+EXPEQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpiR4EAC3trm1ojXVF7y9/XRhcPpb4Pror+ZA1coO
 gyoy+zUuCEl9WCzzHWqXULMYMP0YzNJnJs0oLQPA1s0sx1H4uDMl/UXg0OXZisYG
 Y59Kca3c1DHFwj9KPQXfGmCEjc/rbDWK5TqRc2iZMz+6E5Mt71UFZHtenwgV1zD8
 hTmZZkzLCu2ePfOvrPONgL5tDqPWGVyn61phoC7cSzMF66juXGYuvQGktzi/m6q+
 jAxUnhKvKTlLB9wsq3s5X/20/QD56Yuba9U+YxeeNDBE8MDWQOsjz0mZCV1fn4p3
 h/6762aRaWaXH7EwMtsHFUWy7arJZg/YoFYNYLv4Ksyy3y4sMABZCy3A+JyzrgQ0
 hMu7vjsP+k22X1WH8nyejBfWNEmxu6dpgckKrgF0dhJcXk/acWA3XaDWZ80UwfQy
 isKRAP1rA0MJKHDMIwCzSQJDPvtUAkPptbNZJcUSU78o+pPoCaQ93V++LbdgGtKn
 iGJJqX05dVbcsDx5X7fluphjkUTC4yFr7ZgLgbOIedXajWRD8iOkO2xxCHk6SKFl
 iv9entvRcX9k3SHK9uffIUkRBUujMU0+HCIQFCO1qGmkCaS5nSrovZl4HoL7L/Dj
 +T8+v7kyJeklLXgJBaE7jk01O4HwZKjwPWMbCjvL9NKk8j7c1soYnRu5uNvi85Mu
 /9wn671s+w==
 =udgj
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.10-2020-10-12' of git://git.kernel.dk/linux-block

Pull io_uring updates from Jens Axboe:

 - Add blkcg accounting for io-wq offload (Dennis)

 - A use-after-free fix for io-wq (Hillf)

 - Cancelation fixes and improvements

 - Use proper files_struct references for offload

 - Cleanup of io_uring_get_socket() since that can now go into our own
   header

 - SQPOLL fixes and cleanups, and support for sharing the thread

 - Improvement to how page accounting is done for registered buffers and
   huge pages, accounting the real pinned state

 - Series cleaning up the xarray code (Willy)

 - Various cleanups, refactoring, and improvements (Pavel)

 - Use raw spinlock for io-wq (Sebastian)

 - Add support for ring restrictions (Stefano)

* tag 'io_uring-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (62 commits)
  io_uring: keep a pointer ref_node in file_data
  io_uring: refactor *files_register()'s error paths
  io_uring: clean file_data access in files_register
  io_uring: don't delay io_init_req() error check
  io_uring: clean leftovers after splitting issue
  io_uring: remove timeout.list after hrtimer cancel
  io_uring: use a separate struct for timeout_remove
  io_uring: improve submit_state.ios_left accounting
  io_uring: simplify io_file_get()
  io_uring: kill extra check in fixed io_file_get()
  io_uring: clean up ->files grabbing
  io_uring: don't io_prep_async_work() linked reqs
  io_uring: Convert advanced XArray uses to the normal API
  io_uring: Fix XArray usage in io_uring_add_task_file
  io_uring: Fix use of XArray in __io_uring_files_cancel
  io_uring: fix break condition for __io_uring_register() waiting
  io_uring: no need to call xa_destroy() on empty xarray
  io_uring: batch account ->req_issue and task struct references
  io_uring: kill callback_head argument for io_req_task_work_add()
  io_uring: move req preps out of io_issue_sqe()
  ...
2020-10-13 12:36:21 -07:00
Linus Torvalds
3ad11d7ac8 block-5.10-2020-10-12
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl+EWUgQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpnoxEADCVSNBRkpV0OVkOEC3wf8EGhXhk01Jnjtl
 u5Mg2V55hcgJ0thQxBV/V28XyqmsEBrmAVi0Yf8Vr9Qbq4Ze08Wae4ChS4rEOyh1
 jTcGYWx5aJB3ChLvV/HI0nWQ3bkj03mMrL3SW8rhhf5DTyKHsVeTenpx42Qu/FKf
 fRzi09FSr3Pjd0B+EX6gunwJnlyXQC5Fa4AA0GhnXJzAznANXxHkkcXu8a6Yw75x
 e28CfhIBliORsK8sRHLoUnPpeTe1vtxCBhBMsE+gJAj9ZUOWMzvNFIPP4FvfawDy
 6cCQo2m1azJ/IdZZCDjFUWyjh+wxdKMp+NNryEcoV+VlqIoc3n98rFwrSL+GIq5Z
 WVwEwq+AcwoMCsD29Lu1ytL2PQ/RVqcJP5UheMrbL4vzefNfJFumQVZLIcX0k943
 8dFL2QHL+H/hM9Dx5y5rjeiWkAlq75v4xPKVjh/DHb4nehddCqn/+DD5HDhNANHf
 c1kmmEuYhvLpIaC4DHjE6DwLh8TPKahJjwsGuBOTr7D93NUQD+OOWsIhX6mNISIl
 FFhP8cd0/ZZVV//9j+q+5B4BaJsT+ZtwmrelKFnPdwPSnh+3iu8zPRRWO+8P8fRC
 YvddxuJAmE6BLmsAYrdz6Xb/wqfyV44cEiyivF0oBQfnhbtnXwDnkDWSfJD1bvCm
 ZwfpDh2+Tg==
 =LzyE
 -----END PGP SIGNATURE-----

Merge tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block

Pull block updates from Jens Axboe:

 - Series of merge handling cleanups (Baolin, Christoph)

 - Series of blk-throttle fixes and cleanups (Baolin)

 - Series cleaning up BDI, seperating the block device from the
   backing_dev_info (Christoph)

 - Removal of bdget() as a generic API (Christoph)

 - Removal of blkdev_get() as a generic API (Christoph)

 - Cleanup of is-partition checks (Christoph)

 - Series reworking disk revalidation (Christoph)

 - Series cleaning up bio flags (Christoph)

 - bio crypt fixes (Eric)

 - IO stats inflight tweak (Gabriel)

 - blk-mq tags fixes (Hannes)

 - Buffer invalidation fixes (Jan)

 - Allow soft limits for zone append (Johannes)

 - Shared tag set improvements (John, Kashyap)

 - Allow IOPRIO_CLASS_RT for CAP_SYS_NICE (Khazhismel)

 - DM no-wait support (Mike, Konstantin)

 - Request allocation improvements (Ming)

 - Allow md/dm/bcache to use IO stat helpers (Song)

 - Series improving blk-iocost (Tejun)

 - Various cleanups (Geert, Damien, Danny, Julia, Tetsuo, Tian, Wang,
   Xianting, Yang, Yufen, yangerkun)

* tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (191 commits)
  block: fix uapi blkzoned.h comments
  blk-mq: move cancel of hctx->run_work to the front of blk_exit_queue
  blk-mq: get rid of the dead flush handle code path
  block: get rid of unnecessary local variable
  block: fix comment and add lockdep assert
  blk-mq: use helper function to test hw stopped
  block: use helper function to test queue register
  block: remove redundant mq check
  block: invoke blk_mq_exit_sched no matter whether have .exit_sched
  percpu_ref: don't refer to ref->data if it isn't allocated
  block: ratelimit handle_bad_sector() message
  blk-throttle: Re-use the throtl_set_slice_end()
  blk-throttle: Open code __throtl_de/enqueue_tg()
  blk-throttle: Move service tree validation out of the throtl_rb_first()
  blk-throttle: Move the list operation after list validation
  blk-throttle: Fix IO hang for a corner case
  blk-throttle: Avoid tracking latency if low limit is invalid
  blk-throttle: Avoid getting the current time if tg->last_finish_time is 0
  blk-throttle: Remove a meaningless parameter for throtl_downgrade_state()
  block: Remove redundant 'return' statement
  ...
2020-10-13 12:12:44 -07:00
Thomas Cedeno
111767c1d8 LSM: Signal to SafeSetID when setting group IDs
For SafeSetID to properly gate set*gid() calls, it needs to know whether
ns_capable() is being called from within a sys_set*gid() function or is
being called from elsewhere in the kernel. This allows SafeSetID to deny
CAP_SETGID to restricted groups when they are attempting to use the
capability for code paths other than updating GIDs (e.g. setting up
userns GID mappings). This is the identical approach to what is
currently done for CAP_SETUID.

NOTE: We also add signaling to SafeSetID from the setgroups() syscall,
as we have future plans to restrict a process' ability to set
supplementary groups in addition to what is added in this series for
restricting setting of the primary group.

Signed-off-by: Thomas Cedeno <thomascedeno@google.com>
Signed-off-by: Micah Morton <mortonm@chromium.org>
2020-10-13 09:17:34 -07:00
Rafael J. Wysocki
2cf9ba2905 Merge branches 'pm-core', 'pm-sleep', 'pm-pci' and 'pm-domains'
* pm-core:
  PM: runtime: Fix timer_expires data type on 32-bit arches
  PM: runtime: Remove link state checks in rpm_get/put_supplier()

* pm-sleep:
  ACPI: EC: PM: Drop ec_no_wakeup check from acpi_ec_dispatch_gpe()
  ACPI: EC: PM: Flush EC work unconditionally after wakeup
  PM: hibernate: remove the bogus call to get_gendisk() in software_resume()
  PM: hibernate: Batch hibernate and resume IO requests

* pm-pci:
  PCI/ACPI: Whitelist hotplug ports for D3 if power managed by ACPI

* pm-domains:
  PM: domains: Allow to abort power off when no ->power_off() callback
  PM: domains: Rename power state enums for genpd
2020-10-13 14:48:20 +02:00
Rafael J. Wysocki
9c2ff6650f Merge branch 'pm-cpufreq'
* pm-cpufreq: (30 commits)
  cpufreq: stats: Fix string format specifier mismatch
  arm: disable frequency invariance for CONFIG_BL_SWITCHER
  cpufreq,arm,arm64: restructure definitions of arch_set_freq_scale()
  cpufreq: stats: Add memory barrier to store_reset()
  cpufreq: schedutil: Simplify sugov_fast_switch()
  cpufreq: Move traces and update to policy->cur to cpufreq core
  cpufreq: stats: Enable stats for fast-switch as well
  cpufreq: stats: Mark few conditionals with unlikely()
  cpufreq: stats: Remove locking
  cpufreq: stats: Defer stats update to cpufreq_stats_record_transition()
  arch_topology, arm, arm64: define arch_scale_freq_invariant()
  arch_topology, cpufreq: constify arch_* cpumasks
  cpufreq: report whether cpufreq supports Frequency Invariance (FI)
  cpufreq: move invariance setter calls in cpufreq core
  arch_topology: validate input frequencies to arch_set_freq_scale()
  cpufreq: qcom: Don't add frequencies without an OPP
  cpufreq: qcom-hw: Add cpufreq support for SM8250 SoC
  cpufreq: qcom-hw: Use of_device_get_match_data for offsets and row size
  cpufreq: qcom-hw: Use devm_platform_ioremap_resource() to simplify code
  dt-bindings: cpufreq: cpufreq-qcom-hw: Document Qcom EPSS compatible
  ...
2020-10-13 14:39:35 +02:00
Linus Torvalds
e18afa5bfa Merge branch 'work.quota-compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull compat quotactl cleanups from Al Viro:
 "More Christoph's compat cleanups: quotactl(2)"

* 'work.quota-compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  quota: simplify the quotactl compat handling
  compat: add a compat_need_64bit_alignment_fixup() helper
  compat: lift compat_s64 and compat_u64 to <asm-generic/compat.h>
2020-10-12 16:37:13 -07:00
Jakub Kicinski
ccdf7fae3a Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2020-10-12

The main changes are:

1) The BPF verifier improvements to track register allocation pattern, from Alexei and Yonghong.

2) libbpf relocation support for different size load/store, from Andrii.

3) bpf_redirect_peer() helper and support for inner map array with different max_entries, from Daniel.

4) BPF support for per-cpu variables, form Hao.

5) sockmap improvements, from John.
====================

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-12 16:16:50 -07:00
Linus Torvalds
1c6890707e This tree prepares to unify the kretprobe trampoline handler and make
kretprobe lockless. (Those patches are still work in progress.)
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl+EgmMRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hKQg//WrYVMc+lLG+QP4IuKfolZVGNeS60crZE
 mTvs4iX8gBrU5omrgatrjUiDhrln6MiTf6H0ec072BAho91lom/AlyDUQbta5sls
 uXKzIjHe9J7ca+myXGDiXkGmWXgcBYHBHifyzf04xhPyFXH869HLxFXCHeV1S3m7
 Tga1Lceths425t8nnYb9yao9k26l22BSklzPqEM/XNNnktrMiaiYlfgUxi1g3hMj
 v9IbZy43qpzljyrnfRk/tRGMnZ/BtZpj7swQEjUVOKgmcymX6bQoxqYvpAH5mYX7
 jqKcTLsw/Jm4YhZdeBpjZc2JNQkNJSLjiXMMtQTmncPKx2shuU1s4KhgRtYEEeyI
 BO37k3RwplED7/yBJtojNt0WWYfd7X2ee8SPuSW/VPL6jSDgJii3Um0AldPZ0J3g
 72OT4rJkyqFER0ZKSf8uIym2Zi7F5IvtzK2xJAzquOQlYdCaKSNrWurckOzWHMm9
 JKqUqq3nV4mFUKEE7Kf0Nu3UgQZNKpxUNepWBoJb3j6baK32Qgb6qpNLLPTTi2qJ
 AwxicRlr7jzdyP2cwvU5z2FuilPypOob8ZnowhhIyV+4xQY9CymJ3uluXattDC74
 ZNgydTyyYCo0PwYZGUDeE8o87apYd3+sEOErLtw4CjaoiadxDaMBmfsHzU7W29Rc
 Fow4+FQCK/Y=
 =2jY/
 -----END PGP SIGNATURE-----

Merge tag 'perf-kprobes-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf/kprobes updates from Ingo Molnar:
 "This prepares to unify the kretprobe trampoline handler and make
  kretprobe lockless (those patches are still work in progress)"

* tag 'perf-kprobes-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  kprobes: Fix to check probe enabled before disarm_kprobe_ftrace()
  kprobes: Make local functions static
  kprobes: Free kretprobe_instance with RCU callback
  kprobes: Remove NMI context check
  sparc: kprobes: Use generic kretprobe trampoline handler
  sh: kprobes: Use generic kretprobe trampoline handler
  s390: kprobes: Use generic kretprobe trampoline handler
  powerpc: kprobes: Use generic kretprobe trampoline handler
  parisc: kprobes: Use generic kretprobe trampoline handler
  mips: kprobes: Use generic kretprobe trampoline handler
  ia64: kprobes: Use generic kretprobe trampoline handler
  csky: kprobes: Use generic kretprobe trampoline handler
  arc: kprobes: Use generic kretprobe trampoline handler
  arm64: kprobes: Use generic kretprobe trampoline handler
  arm: kprobes: Use generic kretprobe trampoline handler
  x86/kprobes: Use generic kretprobe trampoline handler
  kprobes: Add generic kretprobe trampoline handler
2020-10-12 14:21:15 -07:00
Linus Torvalds
3bff6112c8 These are the performance events changes for v5.10:
x86 Intel updates:
 
  - Add Jasper Lake support
 
  - Add support for TopDown metrics on Ice Lake
 
  - Fix Ice Lake & Tiger Lake uncore support, add Snow Ridge support
 
  - Add a PCI sub driver to support uncore PMUs where the PCI resources
    have been claimed already - extending the range of supported systems.
 
 x86 AMD updates:
 
  - Restore 'perf stat -a' behaviour to program the uncore PMU
    to count all CPU threads.
 
  - Fix setting the proper count when sampling Large Increment
    per Cycle events / 'paired' events.
 
  - Fix IBS Fetch sampling on F17h and some other IBS fine tuning,
    greatly reducing the number of interrupts when large sample
    periods are specified.
 
  - Extends Family 17h RAPL support to also work on compatible
    F19h machines.
 
 Core code updates:
 
  - Fix race in perf_mmap_close()
 
  - Add PERF_EV_CAP_SIBLING, to denote that sibling events should be
    closed if the leader is removed.
 
  - Smaller fixes and updates.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl+Ef40RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1h7NQ//ZdQ26Yg79ZaxBX1QSINJ9AgXDi6rXs75
 qU9qNwr/6EF+633RZoPQGAE0Iy5v6h7iLFokcJzM9+kK/rE3ax44tSnPlcMa0+6N
 SHXKCa5iL+hH7o2Spo2MZwCYseH79rloX3TSH7ajnN3X8PvwgWshF0lUE3WEWtCs
 eHSojdCk43IuL9TpusuNOBM2FvgnheFYWiMbFHd0MTBUMxul30sLVCG8IIWCPA+q
 TwG4RJS3X42VbL3SuAGFmOv4OmqNsfkvHvjpDs4NF07tRB9zjXzGrxmGhgSw0NAN
 2KK25qbmrpKATIb4Eqsgk/yikX/SCrDEXrjhg3r8FnyPvRfctq1crZjjf672PI2E
 bDda76dH6Lq9jv5fsyJjas5OsYdMKBCnA+tGQxXPGbmTXeEcYMRbDnwhYnevI/Q/
 8pP+xstF0pmBA3tvpDPrQnYH72Qt7CLJSdcTB15NqZftU2tJxaAyJGx4gJy33jxQ
 wu6BIEGHQ7onQYiIyTwsBHyz6xNsF/CRHwAPcGdYrRRbXB5K5nxHiXNb4awciTMx
 2HF31/S4OqURNpfcpxOQo+1fb/cLqj3loGqE4jCTwkbS3lrHcAcfxyv9QNn77l1f
 hdQ0jworbUNVLUYEUQz1bkZ06GD3LSSas2ZlY1NNdHo62mjyXMQmgirNcZmrFgWl
 tl2gNFAU9x4=
 =2fuY
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull performance events updates from Ingo Molnar:
 "x86 Intel updates:

   - Add Jasper Lake support

   - Add support for TopDown metrics on Ice Lake

   - Fix Ice Lake & Tiger Lake uncore support, add Snow Ridge support

   - Add a PCI sub driver to support uncore PMUs where the PCI resources
     have been claimed already - extending the range of supported
     systems.

  x86 AMD updates:

   - Restore 'perf stat -a' behaviour to program the uncore PMU to count
     all CPU threads.

   - Fix setting the proper count when sampling Large Increment per
     Cycle events / 'paired' events.

   - Fix IBS Fetch sampling on F17h and some other IBS fine tuning,
     greatly reducing the number of interrupts when large sample periods
     are specified.

   - Extends Family 17h RAPL support to also work on compatible F19h
     machines.

  Core code updates:

   - Fix race in perf_mmap_close()

   - Add PERF_EV_CAP_SIBLING, to denote that sibling events should be
     closed if the leader is removed.

   - Smaller fixes and updates"

* tag 'perf-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits)
  perf/core: Fix race in the perf_mmap_close() function
  perf/x86: Fix n_metric for cancelled txn
  perf/x86: Fix n_pair for cancelled txn
  x86/events/amd/iommu: Fix sizeof mismatch
  perf/x86/intel: Check perf metrics feature for each CPU
  perf/x86/intel: Fix Ice Lake event constraint table
  perf/x86/intel/uncore: Fix the scale of the IMC free-running events
  perf/x86/intel/uncore: Fix for iio mapping on Skylake Server
  perf/x86/msr: Add Jasper Lake support
  perf/x86/intel: Add Jasper Lake support
  perf/x86/intel/uncore: Reduce the number of CBOX counters
  perf/x86/intel/uncore: Update Ice Lake uncore units
  perf/x86/intel/uncore: Split the Ice Lake and Tiger Lake MSR uncore support
  perf/x86/intel/uncore: Support PCIe3 unit on Snow Ridge
  perf/x86/intel/uncore: Generic support for the PCI sub driver
  perf/x86/intel/uncore: Factor out uncore_pci_pmu_unregister()
  perf/x86/intel/uncore: Factor out uncore_pci_pmu_register()
  perf/x86/intel/uncore: Factor out uncore_pci_find_dev_pmu()
  perf/x86/intel/uncore: Factor out uncore_pci_get_dev_die_info()
  perf/amd/uncore: Inform the user how many counters each uncore PMU has
  ...
2020-10-12 14:14:35 -07:00
Linus Torvalds
dd502a8107 This tree introduces static_call(), which is the idea of static_branch()
applied to indirect function calls. Remove a data load (indirection) by
 modifying the text.
 
 They give the flexibility of function pointers, but with better
 performance. (This is especially important for cases where
 retpolines would otherwise be used, as retpolines can be pretty
 slow.)
 
 API overview:
 
   DECLARE_STATIC_CALL(name, func);
   DEFINE_STATIC_CALL(name, func);
   DEFINE_STATIC_CALL_NULL(name, typename);
 
   static_call(name)(args...);
   static_call_cond(name)(args...);
   static_call_update(name, func);
 
 x86 is supported via text patching, otherwise basic indirect calls are used,
 with function pointers.
 
 There's a second variant using inline code patching, inspired by jump-labels,
 implemented on x86 as well.
 
 The new APIs are utilized in the x86 perf code, a heavy user of function pointers,
 where static calls speed up the PMU handler by 4.2% (!).
 
 The generic implementation is not really excercised on other architectures,
 outside of the trivial test_static_call_init() self-test.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl+EfAQRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iEAw//divHeVCJnHhV+YBbuI9ROUsERkzu8VhK
 O1DEmW68Fvj7pszT8NZsMjtkt97ZtxDRK7aCJiiup0eItG9qCJ8lpCLb84ZbizHV
 HhCbhBLrpxSvTrWlQnkgP1OkPAbtoryIjVlZzWhjye2MY8UEbVnZWyviBolbAAxH
 Fk1Yi56fIMu19GO+9Ohzy9E2VDnVEH1iMx5YWoLD2H88Qbq/yEMP+U2tIj8hIVKT
 Y/jdogihNXRIau6QB+YPfDPisdty+RHxfU7zct4Rv8cFF5ylglZB5fD34C3sUQF2
 WqsaYz7zjUj9f02F8pw8hIaAT7InzArPhlNVITxf2oMfmdrNqBptnSCddZqCJLvv
 oDGew21k50Zcbqkv9amclpxXH5tTpRvJeqit2pz/85GMeqBRuhzHUAkCpht5YA73
 qJsHWS3z+qIxKi0tDbhDJswuwa51q5sgdUUwo1uCr3wT3DGDlqNhCAZBzX14dcty
 0shDSbv13TCwqAcb7asPzEoPwE15cwa+x+viGEIL901pyZKyQYjs/abDU26It3BW
 roWRkuVJZ9/QMdZJs1v7kaXw1L8YiKIDkBgke+xbfrDwEvvjudQkl2LUL66DB11j
 RJU3GyxKClvdY06SSRh/H13fqZLNKh1JZ0nPEWSTJECDFN9zcDjrDrod/7PFOcpY
 NAlawLoGG+s=
 =JvpF
 -----END PGP SIGNATURE-----

Merge tag 'core-static_call-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull static call support from Ingo Molnar:
 "This introduces static_call(), which is the idea of static_branch()
  applied to indirect function calls. Remove a data load (indirection)
  by modifying the text.

  They give the flexibility of function pointers, but with better
  performance. (This is especially important for cases where retpolines
  would otherwise be used, as retpolines can be pretty slow.)

  API overview:

      DECLARE_STATIC_CALL(name, func);
      DEFINE_STATIC_CALL(name, func);
      DEFINE_STATIC_CALL_NULL(name, typename);

      static_call(name)(args...);
      static_call_cond(name)(args...);
      static_call_update(name, func);

  x86 is supported via text patching, otherwise basic indirect calls are
  used, with function pointers.

  There's a second variant using inline code patching, inspired by
  jump-labels, implemented on x86 as well.

  The new APIs are utilized in the x86 perf code, a heavy user of
  function pointers, where static calls speed up the PMU handler by
  4.2% (!).

  The generic implementation is not really excercised on other
  architectures, outside of the trivial test_static_call_init()
  self-test"

* tag 'core-static_call-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
  static_call: Fix return type of static_call_init
  tracepoint: Fix out of sync data passing by static caller
  tracepoint: Fix overly long tracepoint names
  x86/perf, static_call: Optimize x86_pmu methods
  tracepoint: Optimize using static_call()
  static_call: Allow early init
  static_call: Add some validation
  static_call: Handle tail-calls
  static_call: Add static_call_cond()
  x86/alternatives: Teach text_poke_bp() to emulate RET
  static_call: Add simple self-test for static calls
  x86/static_call: Add inline static call implementation for x86-64
  x86/static_call: Add out-of-line static call implementation
  static_call: Avoid kprobes on inline static_call()s
  static_call: Add inline static call infrastructure
  static_call: Add basic static call infrastructure
  compiler.h: Make __ADDRESSABLE() symbol truly unique
  jump_label,module: Fix module lifetime for __jump_label_mod_text_reserved()
  module: Properly propagate MODULE_STATE_COMING failure
  module: Fix up module_notifier return values
  ...
2020-10-12 13:58:15 -07:00
Linus Torvalds
ed016af52e These are the locking updates for v5.10:
- Add deadlock detection for recursive read-locks. The rationale is outlined
    in:
 
      224ec489d3: ("lockdep/Documention: Recursive read lock detection reasoning")
 
    The main deadlock pattern we want to detect is:
 
            TASK A:                 TASK B:
 
            read_lock(X);
                                    write_lock(X);
            read_lock_2(X);
 
  - Add "latch sequence counters" (seqcount_latch_t):
 
       A sequence counter variant where the counter even/odd value is used to
       switch between two copies of protected data. This allows the read path,
       typically NMIs, to safely interrupt the write side critical section.
 
    We utilize this new variant for sched-clock, and to make x86 TSC handling safer.
 
  - Other seqlock cleanups, fixes and enhancements
 
  - KCSAN updates
 
  - LKMM updates
 
  - Misc updates, cleanups and fixes.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl+EX6QRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1g3gxAAkg+Jy/tcdRxlxlEDOQPFy1mBqvFmulNA
 pGFPkB6dzqmAWF/NfOZSl4g/h/mqGYsq2V+PfK5E8Sq8DQ/yCmnLhjgVOHNUUliv
 x0WWfOysNgJdtdf69NLYJufIQhxhyI0dwFHHoHIsCdGdGqjh2DVevQFPFTBjdpOc
 BUZYo+u3gCaCdB6A2nmlcWYbEw8eVEHgv3qLG6dq46J0KJOV0HfliqJoU3EZqH+s
 977LvEIo+THfuYWMo/Jepwngbi0y36KeeukOAdwm9fK196htBHIUR+YPPrAe+FWD
 z+UXP5IS5XIw9V1sGLmUaC2m+6gpdW19jKBtlzPkxHXmJmsgiZdLLeytEh3WYey7
 nzfH+9Jd4NyyZKucLssYkOjf6P5BxGKCyJ9LXb7vlSthIhiDdFNx47oKtW4hxjOY
 jubsI3BP5c3G1sIBIjTS53XmOhJg+Z52FxTpQ33JswXn1wGidcHZiuNHZuU5q28p
 +tn8rGb2NGJFb4Sw/Vp0yTcqIpEXf+vweiQoaxm6tc9BWzcVzZntGnh0i3gFotx/
 VgKafN4+pgXgo6bwHbN2WBK2FGyvcXFaptfaOMZL48En82hJ1DI6EnBEYN+vuERQ
 JcCXg+iHeeVbxoou7q8NJxITkBmEL5xNBIugXRRqNSP3fXLxKjFuPYqT84/e7yZi
 elGTReYcq6g=
 =Iq51
 -----END PGP SIGNATURE-----

Merge tag 'locking-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Ingo Molnar:
 "These are the locking updates for v5.10:

   - Add deadlock detection for recursive read-locks.

     The rationale is outlined in commit 224ec489d3 ("lockdep/
     Documention: Recursive read lock detection reasoning")

     The main deadlock pattern we want to detect is:

           TASK A:                 TASK B:

           read_lock(X);
                                   write_lock(X);
           read_lock_2(X);

   - Add "latch sequence counters" (seqcount_latch_t):

     A sequence counter variant where the counter even/odd value is used
     to switch between two copies of protected data. This allows the
     read path, typically NMIs, to safely interrupt the write side
     critical section.

     We utilize this new variant for sched-clock, and to make x86 TSC
     handling safer.

   - Other seqlock cleanups, fixes and enhancements

   - KCSAN updates

   - LKMM updates

   - Misc updates, cleanups and fixes"

* tag 'locking-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (67 commits)
  lockdep: Revert "lockdep: Use raw_cpu_*() for per-cpu variables"
  lockdep: Fix lockdep recursion
  lockdep: Fix usage_traceoverflow
  locking/atomics: Check atomic-arch-fallback.h too
  locking/seqlock: Tweak DEFINE_SEQLOCK() kernel doc
  lockdep: Optimize the memory usage of circular queue
  seqlock: Unbreak lockdep
  seqlock: PREEMPT_RT: Do not starve seqlock_t writers
  seqlock: seqcount_LOCKNAME_t: Introduce PREEMPT_RT support
  seqlock: seqcount_t: Implement all read APIs as statement expressions
  seqlock: Use unique prefix for seqcount_t property accessors
  seqlock: seqcount_LOCKNAME_t: Standardize naming convention
  seqlock: seqcount latch APIs: Only allow seqcount_latch_t
  rbtree_latch: Use seqcount_latch_t
  x86/tsc: Use seqcount_latch_t
  timekeeping: Use seqcount_latch_t
  time/sched_clock: Use seqcount_latch_t
  seqlock: Introduce seqcount_latch_t
  mm/swap: Do not abuse the seqcount_t latching API
  time/sched_clock: Use raw_read_seqcount_latch() during suspend
  ...
2020-10-12 13:06:20 -07:00
Linus Torvalds
edaa5ddf38 Scheduler changes for v5.10:
- Reorganize & clean up the SD* flags definitions and add a bunch
    of sanity checks. These new checks caught quite a few bugs or at
    least inconsistencies, resulting in another set of patches.
 
  - Rseq updates, add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ
 
  - Add a new tracepoint to improve CPU capacity tracking
 
  - Improve overloaded SMP system load-balancing behavior
 
  - Tweak SMT balancing
 
  - Energy-aware scheduling updates
 
  - NUMA balancing improvements
 
  - Deadline scheduler fixes and improvements
 
  - CPU isolation fixes
 
  - Misc cleanups, simplifications and smaller optimizations.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl+EWRERHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hV8A/7BB0nt/zYVZ8Z3Di8V0b9hMtr0d1xtRM5
 ZAvg4hcZl/fVgobFndxBw6KdlK8lSce9Mcq+bTTWeD46CS13cK5Vrpiaf7x7Q00P
 m8YHeYEH13ME0pbBrhDoRCR4XzfXukzjkUl7LiyrTekAvRUtFikJ/uKl8MeJtYGZ
 gANEkadqforxUW0v45iUEGepmCWAl8hSlSMb2mDKsVhw4DFMD+px0EBmmA0VDqjE
 e0rkh6dEoUVNqlic2KoaXULld1rLg1xiaOcLUbTAXnucfhmuv5p/H11AC4ABuf+s
 7d0zLrLEfZrcLJkthYxfMHs7DYMtARiQM9Db/a5hAq9Af4Z2bvvVAaHt3gCGvkV1
 llB6BB2yWCki9Qv7oiGOAhANnyJHG/cU4r6WwMuHdlYi4dFT/iN5qkOMUL1IrDgi
 a6ZzvECChXBeisQXHSlMd8Y5O+j0gRvDR7E18z2q0/PlmO8PGJq4w34mEWveWIg3
 LaVF16bmvaARuNFJTQH/zaHhjqVQANSMx5OIv9swp0OkwvQkw21ICYHG0YxfzWCr
 oa/FESEpOL9XdYp8UwMPI0bmVIsEfx79pmDMF3zInYTpJpwMUhV2yjHE8uYVMqEf
 7U8rZv7gdbZ2us38Gjf2l73hY+recp/GrgZKnk0R98OUeMk1l/iVP6dwco6ITUV5
 czGmKlIB1ec=
 =bXy6
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:

 - reorganize & clean up the SD* flags definitions and add a bunch of
   sanity checks. These new checks caught quite a few bugs or at least
   inconsistencies, resulting in another set of patches.

 - rseq updates, add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ

 - add a new tracepoint to improve CPU capacity tracking

 - improve overloaded SMP system load-balancing behavior

 - tweak SMT balancing

 - energy-aware scheduling updates

 - NUMA balancing improvements

 - deadline scheduler fixes and improvements

 - CPU isolation fixes

 - misc cleanups, simplifications and smaller optimizations

* tag 'sched-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (42 commits)
  sched/deadline: Unthrottle PI boosted threads while enqueuing
  sched/debug: Add new tracepoint to track cpu_capacity
  sched/fair: Tweak pick_next_entity()
  rseq/selftests: Test MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ
  rseq/selftests,x86_64: Add rseq_offset_deref_addv()
  rseq/membarrier: Add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ
  sched/fair: Use dst group while checking imbalance for NUMA balancer
  sched/fair: Reduce busy load balance interval
  sched/fair: Minimize concurrent LBs between domain level
  sched/fair: Reduce minimal imbalance threshold
  sched/fair: Relax constraint on task's load during load balance
  sched/fair: Remove the force parameter of update_tg_load_avg()
  sched/fair: Fix wrong cpu selecting from isolated domain
  sched: Remove unused inline function uclamp_bucket_base_value()
  sched/rt: Disable RT_RUNTIME_SHARE by default
  sched/deadline: Fix stale throttling on de-/boosted tasks
  sched/numa: Use runnable_avg to classify node
  sched/topology: Move sd_flag_debug out of #ifdef CONFIG_SYSCTL
  MAINTAINERS: Add myself as SCHED_DEADLINE reviewer
  sched/topology: Move SD_DEGENERATE_GROUPS_MASK out of linux/sched/topology.h
  ...
2020-10-12 12:56:01 -07:00
Linus Torvalds
cc7343724e Surgery of the MSI interrupt handling to prepare the support of upcoming
devices which require non-PCI based MSI handling.
 
   - Cleanup historical leftovers all over the place
 
   - Rework the code to utilize more core functionality
 
   - Wrap XEN PCI/MSI interrupts into an irqdomain to make irqdomain
     assignment to PCI devices possible.
 
   - Assign irqdomains to PCI devices at initialization time which allows
     to utilize the full functionality of hierarchical irqdomains.
 
   - Remove arch_.*_msi_irq() functions from X86 and utilize the irqdomain
     which is assigned to the device for interrupt management.
 
   - Make the arch_.*_msi_irq() support conditional on a config switch and
     let the last few users select it.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl+EUxcTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoagLEACGp5U7a4mk24GsOZJDhrua1PHR/fhb
 enn/5yOPpxDXdYmtFHIjV5qrNjDTV/WqDlI96KOi+oinG1Eoj0O/MA1AcSRhp6nf
 jVdAuK1X0DHDUTEeTAP0JFwqd2j0KlIOphBrIMgeWIf1CRKlYiJaO+ioF9fKgwZ/
 /HigOTSykGYMPggm3JXnWTWtJkKSGFxeADBvVHt5RpVmbWtrI4YoSBxKEMtvjyeM
 5+GsqbCad1CnFYTN74N+QWVGmgGnUWGEzWsPYnJ9hW+yyjad1kWx3n6NcCWhssaC
 E4vAXl6JuCPntL7jBFkbfUkQsgq12ThMZYWpCq8pShJA9O2tDKkxIGasHWrIt4cz
 nYrESiv6hM7edjtOvBc086Gd0A2EyGOM879goHyaNVaTO4rI6jfZG7PlW1HHWibS
 mf/bdTXBtULGNgEt7T8Qnb8sZ+D01WqzLrq/wm645jIrTzvNHUEpOhT1aH/g4TFQ
 cNHD5PcM9OTmiBir9srNd47+1s2mpfwdMYHKBt2QgiXMO8fRgdtr6WLQE4vJjmG8
 sA0yGGsgdTKeg2wW1ERF1pWL0Lt05Iaa42Skm0D3BwcOG2n5ltkBHzVllto9cTUh
 kIldAOgxGE6QeCnnlrnbHz5mvzt/3Ih/PIKqPSUAC94Kx1yvVHRYuOvDExeO8DFB
 P+f0TkrscZObSg==
 =JlqV
 -----END PGP SIGNATURE-----

Merge tag 'x86-irq-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 irq updates from Thomas Gleixner:
 "Surgery of the MSI interrupt handling to prepare the support of
  upcoming devices which require non-PCI based MSI handling:

   - Cleanup historical leftovers all over the place

   - Rework the code to utilize more core functionality

   - Wrap XEN PCI/MSI interrupts into an irqdomain to make irqdomain
     assignment to PCI devices possible.

   - Assign irqdomains to PCI devices at initialization time which
     allows to utilize the full functionality of hierarchical
     irqdomains.

   - Remove arch_.*_msi_irq() functions from X86 and utilize the
     irqdomain which is assigned to the device for interrupt management.

   - Make the arch_.*_msi_irq() support conditional on a config switch
     and let the last few users select it"

* tag 'x86-irq-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
  PCI: MSI: Fix Kconfig dependencies for PCI_MSI_ARCH_FALLBACKS
  x86/apic/msi: Unbreak DMAR and HPET MSI
  iommu/amd: Remove domain search for PCI/MSI
  iommu/vt-d: Remove domain search for PCI/MSI[X]
  x86/irq: Make most MSI ops XEN private
  x86/irq: Cleanup the arch_*_msi_irqs() leftovers
  PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable
  x86/pci: Set default irq domain in pcibios_add_device()
  iommm/amd: Store irq domain in struct device
  iommm/vt-d: Store irq domain in struct device
  x86/xen: Wrap XEN MSI management into irqdomain
  irqdomain/msi: Allow to override msi_domain_alloc/free_irqs()
  x86/xen: Consolidate XEN-MSI init
  x86/xen: Rework MSI teardown
  x86/xen: Make xen_msi_init() static and rename it to xen_hvm_msi_init()
  PCI/MSI: Provide pci_dev_has_special_msi_domain() helper
  PCI_vmd_Mark_VMD_irqdomain_with_DOMAIN_BUS_VMD_MSI
  irqdomain/msi: Provide DOMAIN_BUS_VMD_MSI
  x86/irq: Initialize PCI/MSI domain at PCI init time
  x86/pci: Reducde #ifdeffery in PCI init code
  ...
2020-10-12 11:40:41 -07:00
Linus Torvalds
c457cc800e Updates for the interrupt subsystem:
Core:
     - Allow trimming of interrupt hierarchy to support odd hardware setups
       where only a subset of the interrupts requires the full hierarchy.
 
     - Allow the retrigger mechanism to follow a hierarchy to simplify
       driver code.
 
     - Provide a mechanism to force enable wakeup interrrupts on suspend.
 
     - More infrastructure to handle IPIs in the core code
 
  Architectures:
 
     - Convert ARM/ARM64 IPI handling to utilize the interrupt core code.
 
  Drivers:
 
     - The usual pile of new interrupt chips (MStar, Actions Owl, TI PRUSS,
       Designware ICTL)
 
     - ARM(64) IPI related conversions
 
     - Wakeup support for Qualcom PDC
 
     - Prevent hierarchy corruption in the NVIDIA Tegra driver
 
     - The usual small fixes, improvements and cleanups all over the place.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl+ENDsTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoTyXD/9oGq37/zjpCggtRWdTKGtKvndjodqt
 82zTZ1eSukDSE3UoT7PL8cRQ/4MnRZ7Ke+Iidd2uUbWADfJN28+4d26wN/aYYlX7
 HmI/zowBgK6CJweynHYEF9/C8g2v2SRg5HJCJSOSuVLnTKNLc/aHX5rc/FZXGd6v
 K1BOHJFlzoU1w+OnFfoH4TeJdoKhzXi/T5zJFFtadOVIeCONxTEs4Fxkej2cuBsu
 Nz38WfkPdOnyrVIPhA10KgigczcRkKXU0ot/bNH4s9j2ZIGdgtq3UIbH+itleW2S
 bSWSShnlhSMS918pZNcR49iRyP2CsM+JxcHAmcbA6VPBpKbk2Pb5Zta8g08TZm+X
 XxaDwPFoR4BG00B0L4uygEuHcE89mDy0gCFog0zG7sU+LuY4FYQSSMUqwIC4i/HJ
 DJdWrVqnNHJFCS6wvBl9NO0lyuUrn2be2/IzUtZ3d0xbA0uJXfvI4WgFrbunoPEU
 zgHblQN5nkDLWujjzC10C9vmTi1xxP6FiYcrMScZZ5US0JlHaptkoPOhs82KYQvV
 0DPk06XGWnJMc27+MQYVIMDhQggi3It9pgDRhoyz9Xpgn9fmhhp0goL7KnFk9Hbr
 BKFdW4VBbU0PZacoI6Q186lTQZRptTKfREL+bHvUL2Xyb0RO6nerBPzE5Wxwb2vW
 PmHgFezXDVHbIQ==
 =1ewL
 -----END PGP SIGNATURE-----

Merge tag 'irq-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq updates from Thomas Gleixner:
 "Updates for the interrupt subsystem:

  Core:
   - Allow trimming of interrupt hierarchy to support odd hardware
     setups where only a subset of the interrupts requires the full
     hierarchy.

   - Allow the retrigger mechanism to follow a hierarchy to simplify
     driver code.

   - Provide a mechanism to force enable wakeup interrrupts on suspend.

   - More infrastructure to handle IPIs in the core code

  Architectures:
   - Convert ARM/ARM64 IPI handling to utilize the interrupt core code.

  Drivers:
   - The usual pile of new interrupt chips (MStar, Actions Owl, TI
     PRUSS, Designware ICTL)

   - ARM(64) IPI related conversions

   - Wakeup support for Qualcom PDC

   - Prevent hierarchy corruption in the NVIDIA Tegra driver

   - The usual small fixes, improvements and cleanups all over the
     place"

* tag 'irq-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (59 commits)
  dt-bindings: interrupt-controller: Add MStar interrupt controller
  irqchip/irq-mst: Add MStar interrupt controller support
  soc/tegra: pmc: Don't create fake interrupt hierarchy levels
  soc/tegra: pmc: Allow optional irq parent callbacks
  gpio: tegra186: Allow optional irq parent callbacks
  genirq/irqdomain: Allow partial trimming of irq_data hierarchy
  irqchip/qcom-pdc: Reset PDC interrupts during init
  irqchip/qcom-pdc: Set IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND flag
  pinctrl: qcom: Set IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND flag
  genirq/PM: Introduce IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND flag
  pinctrl: qcom: Use return value from irq_set_wake() call
  pinctrl: qcom: Set IRQCHIP_SET_TYPE_MASKED and IRQCHIP_MASK_ON_SUSPEND flags
  ARM: Handle no IPI being registered in show_ipi_list()
  MAINTAINERS: Add entries for Actions Semi Owl SIRQ controller
  irqchip: Add Actions Semi Owl SIRQ controller
  dt-bindings: interrupt-controller: Add Actions SIRQ controller binding
  dt-bindings: dw-apb-ictl: Update binding to describe use as primary interrupt controller
  irqchip/dw-apb-ictl: Add primary interrupt controller support
  irqchip/dw-apb-ictl: Refactor priot to introducing hierarchical irq domains
  genirq: Add stub for set_handle_irq() when !GENERIC_IRQ_MULTI_HANDLER
  ...
2020-10-12 11:34:32 -07:00
Linus Torvalds
f5f59336a9 Updates for timekeeping, timers and related drivers:
Core:
 
   - Early boot support for the NMI safe timekeeper by utilizing
     local_clock() up to the point where timekeeping is initialized. This
     allows printk() to store multiple timestamps in the ringbuffer which is
     useful for coordinating dmesg information across a fleet of machines.
 
   - Provide a multi-timestamp accessor for printk()
 
   - Make timer init more robust by checking for invalid timer flags.
 
   - Comma vs. semicolon fixes
 
  Drivers:
 
   - Support for new platforms in existing drivers (SP804 and Renesas CMT)
 
   - Comma vs. semicolon fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl+ETs4THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoY/SEACva6YyL5F+GWT3aq1JBkQm55I0BSTS
 KD6XKeT765c88wB+CGzi/huYtSlL9lUonZ+8h2x/Yd9ObYEBqKANWUpzbPFM3aMd
 5UbUHE9rIAbkAm7Ry1/GAQHVLCI/qYXZwaWDi37iHIplXwgY5jSr8AbqHsSBqM92
 e1GMrLo6dxKqVhqPmHYCiZYPNH/15KIgzzrM8Mx7/pxHZaF7rSF/sjFAQObb4UOM
 3ec9dqaKLAmQD04gHG5Y0YDttqHtii1+Gzqi9886Sv9xIvlM020J4elrKQqFnuV3
 GGXRL4Rkhr4rXCJlYYTxE+7kQ7SVQDaztnQEqQCYMi8+DlmsdZsVUU3stsIA8SoF
 T6cC94g0ngoGbtA9Eb+WDT4eIlRPO+Ah/CsMnt78DkgNkI5Vc6U4cVrsWmGUtUDC
 oi/5gJeM8gP/UIzA+N+n3NNpQjC6PaVS0wIQQt/wOpBY6v9GOrcLxwJCpMujW8XG
 th8hXxANimAnyrI4osQhiYrY1zLnmJ7QB1PuuTkb8tyipGg+xkX68qD+oi6tKW+v
 Fo+aMbxv5sadyEA/yqxKLTpnTaVG7bexqrnkFBOxzBS2l3/WLXG4rWN/xYhDWAnm
 4xc5lDOEwSGKk+saU9rs4x1TsLi02Fn++DwuGV0GIqT0qPX+jWsNpVTwE43epaDO
 Cpw7Cx+iGqsfkg==
 =h6YX
 -----END PGP SIGNATURE-----

Merge tag 'timers-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timekeeping updates from Thomas Gleixner:
 "Updates for timekeeping, timers and related drivers:

  Core:

   - Early boot support for the NMI safe timekeeper by utilizing
     local_clock() up to the point where timekeeping is initialized.
     This allows printk() to store multiple timestamps in the ringbuffer
     which is useful for coordinating dmesg information across a fleet
     of machines.

   - Provide a multi-timestamp accessor for printk()

   - Make timer init more robust by checking for invalid timer flags.

   - Comma vs semicolon fixes

  Drivers:

   - Support for new platforms in existing drivers (SP804 and Renesas
     CMT)

   - Comma vs semicolon fixes

* tag 'timers-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clocksource/drivers/armada-370-xp: Use semicolons rather than commas to separate statements
  clocksource/drivers/mps2-timer: Use semicolons rather than commas to separate statements
  timers: Mask invalid flags in do_init_timer()
  clocksource/drivers/sp804: Enable Hisilicon sp804 timer 64bit mode
  clocksource/drivers/sp804: Add support for Hisilicon sp804 timer
  clocksource/drivers/sp804: Support non-standard register offset
  clocksource/drivers/sp804: Prepare for support non-standard register offset
  clocksource/drivers/sp804: Remove a mismatched comment
  clocksource/drivers/sp804: Delete the leading "__" of some functions
  clocksource/drivers/sp804: Remove unused sp804_timer_disable() and timer-sp804.h
  clocksource/drivers/sp804: Cleanup clk_get_sys()
  dt-bindings: timer: renesas,cmt: Document r8a774e1 CMT support
  dt-bindings: timer: renesas,cmt: Document r8a7742 CMT support
  alarmtimer: Convert comma to semicolon
  timekeeping: Provide multi-timestamp accessor to NMI safe timekeeper
  timekeeping: Utilize local_clock() for NMI safe timekeeper during early boot
2020-10-12 11:27:54 -07:00
Linus Torvalds
20d49bfcc3 A small set of updates for debug objects:
- Make all debug object descriptors constant. There is no reason to have
    them writeable.
 
  - Free the per CPU object pool after CPU unplug to avoid memory waste.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl+EMM8THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYobvLD/95P1eqk5Hku2uHeHowj9t4PVqk68Gq
 71rJTj/x32JnGVsNFfwoqm0u8CueAitT1Id5aIjiMAGK2eH83StwdfqFjdHsitwl
 4zH6qywjjgq86jOSXXUQ2TSyYjlL7l2Pr8jwEZkTXUrShZyHAOvD9oW3KwlyV88H
 QXiTGFAm4aAaFctgGlbVafxtp1IJkS+FzFT1/cyPV064d3uGjX6maKwKjssO/8Bv
 WLzPuyOKh7IoEMfyFEXw3Ks3GK3Fo9Rkm4+CNLxjWy7DX7tY2Xvu4kuPqQ0dyn31
 zzxJHOESLOjtw8w0vSEKpuNyIfL/66/MF8p4CXzkUWQTa9h26yos+mDnSzxth++d
 ZERCpx7udTG3BbteJ8ZEf3/QyH6kJw9RccDET180/CvhBTHELEsroy9qOfDMJkgt
 RODtwh+2+O0zqUGjbqEd+PTmkU5p0UwIWAI9t7pic5Dntse7stRwotDtI0FGNnF1
 aj/4ZkTb+lq3EF/x9xh6Hw/SfcVGtmySeMaPlaj9fgq39dtTPIspYeI5GieZlE0s
 1ZfoRSgWA+/K9+4iTsuGvTwXnLhsRYVZl+GIkzeG2FL3euK0t+vaIvIbE9IDHdsS
 mgEFaUXwIosHOfn+8f9KWl6YupBtT/4PyPVu5DseREl4up0W6s26aamM1Ml16A8x
 JUNS/oa+cGpI4A==
 =BP3+
 -----END PGP SIGNATURE-----

Merge tag 'core-debugobjects-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull debugobjects updates from Thomas Gleixner:
 "A small set of updates for debug objects:

   - Make all debug object descriptors constant. There is no reason to
     have them writeable.

   - Free the per CPU object pool after CPU unplug to avoid memory
     waste"

* tag 'core-debugobjects-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  debugobjects: Free per CPU pool after CPU unplug
  treewide: Make all debug_obj_descriptors const
  debugobjects: Allow debug_obj_descr to be const
2020-10-12 11:21:24 -07:00
Linus Torvalds
f94ab23113 * Misc minor cleanups.
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl+ENW0ACgkQEsHwGGHe
 VUrWjw/+O0S/Bf7RQ2OIDnaHGo5u9k+T+FiklYtTO4klYqtfNEt/DFWVOIThVXBQ
 ma4I8Hspj+zUzlq2kqSeqJ2PiikTxRNDqkCUwZhqEFgbXS6/pt8VXXdPniKjeXge
 ZE4lcD1RIyDFxzVlKvVaYt1KryZZVVSRqRIChejLrujN23fI6riWfa0W4Bq54J6m
 fdiujuDJQ9oroak36dF5Ah6g4g8gL8hBLU9Oyzla9V+1O3GSZuDlwTgDsxZZkmC5
 LN4spxwd9tOXOmWhbH7vFfRtQL79KUHkHbUuUvZzZsJ/zs85bxhMa+fUAfjWAEja
 brMpD1GZKOcjUM7xzQ9HngMcKD8lWmlsTBTAO9drD89Z949ntjIA4uCY3d3RTJ1q
 NoYCV8Xw+8Q8e+zjnMW0tph39LCUEeuccT7t09XP5IF5UEXi5T5S14WoCu5Shnt9
 VTQ44NrAxpP7ZNWMpBTaxmr3aXABbdgnvDIxqrohqgQnCnPkWlBJ9FdKj8sQ3y9B
 K010ihIb1pWnmTyKGIC3GOWNjwtCpqz9z3gya76tI7EzAejVS6yUqwMohjaWq6JZ
 Tz/TtTSTUyczKiCCqoOf7P+5LKrhxjWS8IVBeMqMTeN7osCCIT69U+cox1Ih3DST
 pBfy7R3+FXKLHVi/iQv8E+fl3//pTGppKv4MM/wab0E6L+KhqEo=
 =NYxb
 -----END PGP SIGNATURE-----

Merge tag 'x86_cleanups_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cleanups from Borislav Petkov:
 "Misc minor cleanups"

* tag 'x86_cleanups_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/entry: Fix typo in comments for syscall_enter_from_user_mode()
  x86/resctrl: Fix spelling in user-visible warning messages
  x86/entry/64: Do not include inst.h in calling.h
  x86/mpparse: Remove duplicate io_apic.h include
2020-10-12 10:51:02 -07:00
Linus Torvalds
6734e20e39 arm64 updates for 5.10
- Userspace support for the Memory Tagging Extension introduced by Armv8.5.
   Kernel support (via KASAN) is likely to follow in 5.11.
 
 - Selftests for MTE, Pointer Authentication and FPSIMD/SVE context
   switching.
 
 - Fix and subsequent rewrite of our Spectre mitigations, including the
   addition of support for PR_SPEC_DISABLE_NOEXEC.
 
 - Support for the Armv8.3 Pointer Authentication enhancements.
 
 - Support for ASID pinning, which is required when sharing page-tables with
   the SMMU.
 
 - MM updates, including treating flush_tlb_fix_spurious_fault() as a no-op.
 
 - Perf/PMU driver updates, including addition of the ARM CMN PMU driver and
   also support to handle CPU PMU IRQs as NMIs.
 
 - Allow prefetchable PCI BARs to be exposed to userspace using normal
   non-cacheable mappings.
 
 - Implementation of ARCH_STACKWALK for unwinding.
 
 - Improve reporting of unexpected kernel traps due to BPF JIT failure.
 
 - Improve robustness of user-visible HWCAP strings and their corresponding
   numerical constants.
 
 - Removal of TEXT_OFFSET.
 
 - Removal of some unused functions, parameters and prototypes.
 
 - Removal of MPIDR-based topology detection in favour of firmware
   description.
 
 - Cleanups to handling of SVE and FPSIMD register state in preparation
   for potential future optimisation of handling across syscalls.
 
 - Cleanups to the SDEI driver in preparation for support in KVM.
 
 - Miscellaneous cleanups and refactoring work.
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAl+AUXMQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNFc1B/4q2Kabe+pPu7s1f58Q+OTaEfqcr3F1qh27
 F1YpFZUYxg0GPfPsFrnbJpo5WKo7wdR9ceI9yF/GHjs7A/MSoQJis3pG6SlAd9c0
 nMU5tCwhg9wfq6asJtl0/IPWem6cqqhdzC6m808DjeHuyi2CCJTt0vFWH3OeHEhG
 cfmLfaSNXOXa/MjEkT8y1AXJ/8IpIpzkJeCRA1G5s18PXV9Kl5bafIo9iqyfKPLP
 0rJljBmoWbzuCSMc81HmGUQI4+8KRp6HHhyZC/k0WEVgj3LiumT7am02bdjZlTnK
 BeNDKQsv2Jk8pXP2SlrI3hIUTz0bM6I567FzJEokepvTUzZ+CVBi
 =9J8H
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Will Deacon:
 "There's quite a lot of code here, but much of it is due to the
  addition of a new PMU driver as well as some arm64-specific selftests
  which is an area where we've traditionally been lagging a bit.

  In terms of exciting features, this includes support for the Memory
  Tagging Extension which narrowly missed 5.9, hopefully allowing
  userspace to run with use-after-free detection in production on CPUs
  that support it. Work is ongoing to integrate the feature with KASAN
  for 5.11.

  Another change that I'm excited about (assuming they get the hardware
  right) is preparing the ASID allocator for sharing the CPU page-table
  with the SMMU. Those changes will also come in via Joerg with the
  IOMMU pull.

  We do stray outside of our usual directories in a few places, mostly
  due to core changes required by MTE. Although much of this has been
  Acked, there were a couple of places where we unfortunately didn't get
  any review feedback.

  Other than that, we ran into a handful of minor conflicts in -next,
  but nothing that should post any issues.

  Summary:

   - Userspace support for the Memory Tagging Extension introduced by
     Armv8.5. Kernel support (via KASAN) is likely to follow in 5.11.

   - Selftests for MTE, Pointer Authentication and FPSIMD/SVE context
     switching.

   - Fix and subsequent rewrite of our Spectre mitigations, including
     the addition of support for PR_SPEC_DISABLE_NOEXEC.

   - Support for the Armv8.3 Pointer Authentication enhancements.

   - Support for ASID pinning, which is required when sharing
     page-tables with the SMMU.

   - MM updates, including treating flush_tlb_fix_spurious_fault() as a
     no-op.

   - Perf/PMU driver updates, including addition of the ARM CMN PMU
     driver and also support to handle CPU PMU IRQs as NMIs.

   - Allow prefetchable PCI BARs to be exposed to userspace using normal
     non-cacheable mappings.

   - Implementation of ARCH_STACKWALK for unwinding.

   - Improve reporting of unexpected kernel traps due to BPF JIT
     failure.

   - Improve robustness of user-visible HWCAP strings and their
     corresponding numerical constants.

   - Removal of TEXT_OFFSET.

   - Removal of some unused functions, parameters and prototypes.

   - Removal of MPIDR-based topology detection in favour of firmware
     description.

   - Cleanups to handling of SVE and FPSIMD register state in
     preparation for potential future optimisation of handling across
     syscalls.

   - Cleanups to the SDEI driver in preparation for support in KVM.

   - Miscellaneous cleanups and refactoring work"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (148 commits)
  Revert "arm64: initialize per-cpu offsets earlier"
  arm64: random: Remove no longer needed prototypes
  arm64: initialize per-cpu offsets earlier
  kselftest/arm64: Check mte tagged user address in kernel
  kselftest/arm64: Verify KSM page merge for MTE pages
  kselftest/arm64: Verify all different mmap MTE options
  kselftest/arm64: Check forked child mte memory accessibility
  kselftest/arm64: Verify mte tag inclusion via prctl
  kselftest/arm64: Add utilities and a test to validate mte memory
  perf: arm-cmn: Fix conversion specifiers for node type
  perf: arm-cmn: Fix unsigned comparison to less than zero
  arm64: dbm: Invalidate local TLB when setting TCR_EL1.HD
  arm64: mm: Make flush_tlb_fix_spurious_fault() a no-op
  arm64: Add support for PR_SPEC_DISABLE_NOEXEC prctl() option
  arm64: Pull in task_stack_page() to Spectre-v4 mitigation code
  KVM: arm64: Allow patching EL2 vectors even with KASLR is not enabled
  arm64: Get rid of arm64_ssbd_state
  KVM: arm64: Convert ARCH_WORKAROUND_2 to arm64_get_spectre_v4_state()
  KVM: arm64: Get rid of kvm_arm_have_ssbd()
  KVM: arm64: Simplify handling of ARCH_WORKAROUND_2
  ...
2020-10-12 10:00:51 -07:00
Daniel Jordan
fdf09ab887 module: statically initialize init section freeing data
Corentin hit the following workqueue warning when running with
CRYPTO_MANAGER_EXTRA_TESTS:

  WARNING: CPU: 2 PID: 147 at kernel/workqueue.c:1473 __queue_work+0x3b8/0x3d0
  Modules linked in: ghash_generic
  CPU: 2 PID: 147 Comm: modprobe Not tainted
      5.6.0-rc1-next-20200214-00068-g166c9264f0b1-dirty #545
  Hardware name: Pine H64 model A (DT)
  pc : __queue_work+0x3b8/0x3d0
  Call trace:
   __queue_work+0x3b8/0x3d0
   queue_work_on+0x6c/0x90
   do_init_module+0x188/0x1f0
   load_module+0x1d00/0x22b0

I wasn't able to reproduce on x86 or rpi 3b+.

This is

  WARN_ON(!list_empty(&work->entry))

from __queue_work(), and it happens because the init_free_wq work item
isn't initialized in time for a crypto test that requests the gcm
module.  Some crypto tests were recently moved earlier in boot as
explained in commit c4741b2305 ("crypto: run initcalls for generic
implementations earlier"), which went into mainline less than two weeks
before the Fixes commit.

Avoid the warning by statically initializing init_free_wq and the
corresponding llist.

Link: https://lore.kernel.org/lkml/20200217204803.GA13479@Red/
Fixes: 1a7b7d9220 ("modules: Use vmalloc special flag")
Reported-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Tested-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Tested-on: sun50i-h6-pine-h64
Tested-on: imx8mn-ddr4-evk
Tested-on: sun50i-a64-bananapi-m64
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
2020-10-12 18:27:00 +02:00
Jiri Olsa
f91072ed1b perf/core: Fix race in the perf_mmap_close() function
There's a possible race in perf_mmap_close() when checking ring buffer's
mmap_count refcount value. The problem is that the mmap_count check is
not atomic because we call atomic_dec() and atomic_read() separately.

  perf_mmap_close:
  ...
   atomic_dec(&rb->mmap_count);
   ...
   if (atomic_read(&rb->mmap_count))
      goto out_put;

   <ring buffer detach>
   free_uid

out_put:
  ring_buffer_put(rb); /* could be last */

The race can happen when we have two (or more) events sharing same ring
buffer and they go through atomic_dec() and then they both see 0 as refcount
value later in atomic_read(). Then both will go on and execute code which
is meant to be run just once.

The code that detaches ring buffer is probably fine to be executed more
than once, but the problem is in calling free_uid(), which will later on
demonstrate in related crashes and refcount warnings, like:

  refcount_t: addition on 0; use-after-free.
  ...
  RIP: 0010:refcount_warn_saturate+0x6d/0xf
  ...
  Call Trace:
  prepare_creds+0x190/0x1e0
  copy_creds+0x35/0x172
  copy_process+0x471/0x1a80
  _do_fork+0x83/0x3a0
  __do_sys_wait4+0x83/0x90
  __do_sys_clone+0x85/0xa0
  do_syscall_64+0x5b/0x1e0
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Using atomic decrease and check instead of separated calls.

Tested-by: Michael Petlan <mpetlan@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Wade Mealing <wmealing@redhat.com>
Fixes: 9bb5d40cd9 ("perf: Fix mmap() accounting hole");
Link: https://lore.kernel.org/r/20200916115311.GE2301783@krava
2020-10-12 13:24:26 +02:00