Commit Graph

1138 Commits

Author SHA1 Message Date
Nathan Chancellor d5750cd3c5 kbuild: Disable CONFIG_LD_ORPHAN_WARN for ld.lld 10.0.1
ld.lld 10.0.1 spews a bunch of various warnings about .rela sections,
along with a few others. Newer versions of ld.lld do not have these
warnings. As a result, do not add '--orphan-handling=warn' to
LDFLAGS_vmlinux if ld.lld's version is not new enough.

Link: https://github.com/ClangBuiltLinux/linux/issues/1187
Link: https://github.com/ClangBuiltLinux/linux/issues/1193
Reported-by: Arvind Sankar <nivedita@alum.mit.edu>
Reported-by: kernelci.org bot <bot@kernelci.org>
Reported-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-12-01 22:46:06 +09:00
Nathan Chancellor 59612b24f7 kbuild: Hoist '--orphan-handling' into Kconfig
Currently, '--orphan-handling=warn' is spread out across four different
architectures in their respective Makefiles, which makes it a little
unruly to deal with in case it needs to be disabled for a specific
linker version (in this case, ld.lld 10.0.1).

To make it easier to control this, hoist this warning into Kconfig and
the main Makefile so that disabling it is simpler, as the warning will
only be enabled in a couple places (main Makefile and a couple of
compressed boot folders that blow away LDFLAGS_vmlinx) and making it
conditional is easier due to Kconfig syntax. One small additional
benefit of this is saving a call to ld-option on incremental builds
because we will have already evaluated it for CONFIG_LD_ORPHAN_WARN.

To keep the list of supported architectures the same, introduce
CONFIG_ARCH_WANT_LD_ORPHAN_WARN, which an architecture can select to
gain this automatically after all of the sections are specified and size
asserted. A special thanks to Kees Cook for the help text on this
config.

Link: https://github.com/ClangBuiltLinux/linux/issues/1187
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-12-01 22:45:36 +09:00
Heiko Carstens 334ef6ed06 init/Kconfig: make COMPILE_TEST depend on !S390
While allmodconfig and allyesconfig build for s390 there are also
various bots running compile tests with randconfig, where PCI is
disabled. This reveals that a lot of drivers should actually depend on
HAS_IOMEM.
Adding this to each device driver would be a never ending story,
therefore just disable COMPILE_TEST for s390.

The reasoning is more or less the same as described in
commit bc083a64b6 ("init/Kconfig: make COMPILE_TEST depend on !UML").

Reported-by: kernel test robot <lkp@intel.com>
Suggested-by: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-11-20 19:19:12 +01:00
Paul Menzel 0f7636e165 init/Kconfig: Fix CPU number in LOG_CPU_MAX_BUF_SHIFT description
Currently, LOG_BUF_SHIFT defaults to 17, which is 2 ^ 17 bytes = 128 KB,
and LOG_CPU_MAX_BUF_SHIFT defaults to 12, which is 2 ^ 12 bytes = 4 KB.

Half of 128 KB is 64 KB, so more than 16 CPUs are required for the value
to be used, as then the sum of contributions is greater than 64 KB for
the first time. My guess is, that the description was written with the
configuration values used in the SUSE in mind.

Fixes: 23b2899f7f ("printk: allow increasing the ring buffer depending on the number of CPUs")
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20200811092924.6256-1-pmenzel@molgen.mpg.de
2020-11-03 09:34:40 +01:00
Linus Torvalds 9ff9b0d392 networking changes for the 5.10 merge window
Add redirect_neigh() BPF packet redirect helper, allowing to limit stack
 traversal in common container configs and improving TCP back-pressure.
 Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain.
 
 Expand netlink policy support and improve policy export to user space.
 (Ge)netlink core performs request validation according to declared
 policies. Expand the expressiveness of those policies (min/max length
 and bitmasks). Allow dumping policies for particular commands.
 This is used for feature discovery by user space (instead of kernel
 version parsing or trial and error).
 
 Support IGMPv3/MLDv2 multicast listener discovery protocols in bridge.
 
 Allow more than 255 IPv4 multicast interfaces.
 
 Add support for Type of Service (ToS) reflection in SYN/SYN-ACK
 packets of TCPv6.
 
 In Multi-patch TCP (MPTCP) support concurrent transmission of data
 on multiple subflows in a load balancing scenario. Enhance advertising
 addresses via the RM_ADDR/ADD_ADDR options.
 
 Support SMC-Dv2 version of SMC, which enables multi-subnet deployments.
 
 Allow more calls to same peer in RxRPC.
 
 Support two new Controller Area Network (CAN) protocols -
 CAN-FD and ISO 15765-2:2016.
 
 Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit
 kernel problem.
 
 Add TC actions for implementing MPLS L2 VPNs.
 
 Improve nexthop code - e.g. handle various corner cases when nexthop
 objects are removed from groups better, skip unnecessary notifications
 and make it easier to offload nexthops into HW by converting
 to a blocking notifier.
 
 Support adding and consuming TCP header options by BPF programs,
 opening the doors for easy experimental and deployment-specific
 TCP option use.
 
 Reorganize TCP congestion control (CC) initialization to simplify life
 of TCP CC implemented in BPF.
 
 Add support for shipping BPF programs with the kernel and loading them
 early on boot via the User Mode Driver mechanism, hence reusing all the
 user space infra we have.
 
 Support sleepable BPF programs, initially targeting LSM and tracing.
 
 Add bpf_d_path() helper for returning full path for given 'struct path'.
 
 Make bpf_tail_call compatible with bpf-to-bpf calls.
 
 Allow BPF programs to call map_update_elem on sockmaps.
 
 Add BPF Type Format (BTF) support for type and enum discovery, as
 well as support for using BTF within the kernel itself (current use
 is for pretty printing structures).
 
 Support listing and getting information about bpf_links via the bpf
 syscall.
 
 Enhance kernel interfaces around NIC firmware update. Allow specifying
 overwrite mask to control if settings etc. are reset during update;
 report expected max time operation may take to users; support firmware
 activation without machine reboot incl. limits of how much impact
 reset may have (e.g. dropping link or not).
 
 Extend ethtool configuration interface to report IEEE-standard
 counters, to limit the need for per-vendor logic in user space.
 
 Adopt or extend devlink use for debug, monitoring, fw update
 in many drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw,
 mv88e6xxx, dpaa2-eth).
 
 In mlxsw expose critical and emergency SFP module temperature alarms.
 Refactor port buffer handling to make the defaults more suitable and
 support setting these values explicitly via the DCBNL interface.
 
 Add XDP support for Intel's igb driver.
 
 Support offloading TC flower classification and filtering rules to
 mscc_ocelot switches.
 
 Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as
 fixed interval period pulse generator and one-step timestamping in
 dpaa-eth.
 
 Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3)
 offload.
 
 Add Lynx PHY/PCS MDIO module, and convert various drivers which have
 this HW to use it. Convert mvpp2 to split PCS.
 
 Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as
 7-port Mediatek MT7531 IP.
 
 Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver,
 and wcn3680 support in wcn36xx.
 
 Improve performance for packets which don't require much offloads
 on recent Mellanox NICs by 20% by making multiple packets share
 a descriptor entry.
 
 Move chelsio inline crypto drivers (for TLS and IPsec) from the crypto
 subtree to drivers/net. Move MDIO drivers out of the phy directory.
 
 Clean up a lot of W=1 warnings, reportedly the actively developed
 subsections of networking drivers should now build W=1 warning free.
 
 Make sure drivers don't use in_interrupt() to dynamically adapt their
 code. Convert tasklets to use new tasklet_setup API (sadly this
 conversion is not yet complete).
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAl+ItRwACgkQMUZtbf5S
 IrtTMg//UxpdR/MirT1DatBU0K/UGAZY82hV7F/UC8tPgjfHZeHvWlDFxfi3YP81
 PtPKbhRZ7DhwBXefUp6nY3UdvjftrJK2lJm8prJUPSsZRye8Wlcb7y65q7/P2y2U
 Efucyopg6RUrmrM0DUsIGYGJgylQLHnMYUl/keCsD4t5Bp4ksyi9R2t5eitGoWzh
 r3QGdbSa0AuWx4iu0i+tqp6Tj0ekMBMXLVb35dtU1t0joj2KTNEnSgABN3prOa8E
 iWYf2erOau68Ogp3yU3miCy0ZU4p/7qGHTtzbcp677692P/ekak6+zmfHLT9/Pjy
 2Stq2z6GoKuVxdktr91D9pA3jxG4LxSJmr0TImcGnXbvkMP3Ez3g9RrpV5fn8j6F
 mZCH8TKZAoD5aJrAJAMkhZmLYE1pvDa7KolSk8WogXrbCnTEb5Nv8FHTS1Qnk3yl
 wSKXuvutFVNLMEHCnWQLtODbTST9DI/aOi6EctPpuOA/ZyL1v3pl+gfp37S+LUTe
 owMnT/7TdvKaTD0+gIyU53M6rAWTtr5YyRQorX9awIu/4Ha0F0gYD7BJZQUGtegp
 HzKt59NiSrFdbSH7UdyemdBF4LuCgIhS7rgfeoUXMXmuPHq7eHXyHZt5dzPPa/xP
 81P0MAvdpFVwg8ij2yp2sHS7sISIRKq17fd1tIewUabxQbjXqPc=
 =bc1U
 -----END PGP SIGNATURE-----

Merge tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:

 - Add redirect_neigh() BPF packet redirect helper, allowing to limit
   stack traversal in common container configs and improving TCP
   back-pressure.

   Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain.

 - Expand netlink policy support and improve policy export to user
   space. (Ge)netlink core performs request validation according to
   declared policies. Expand the expressiveness of those policies
   (min/max length and bitmasks). Allow dumping policies for particular
   commands. This is used for feature discovery by user space (instead
   of kernel version parsing or trial and error).

 - Support IGMPv3/MLDv2 multicast listener discovery protocols in
   bridge.

 - Allow more than 255 IPv4 multicast interfaces.

 - Add support for Type of Service (ToS) reflection in SYN/SYN-ACK
   packets of TCPv6.

 - In Multi-patch TCP (MPTCP) support concurrent transmission of data on
   multiple subflows in a load balancing scenario. Enhance advertising
   addresses via the RM_ADDR/ADD_ADDR options.

 - Support SMC-Dv2 version of SMC, which enables multi-subnet
   deployments.

 - Allow more calls to same peer in RxRPC.

 - Support two new Controller Area Network (CAN) protocols - CAN-FD and
   ISO 15765-2:2016.

 - Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit
   kernel problem.

 - Add TC actions for implementing MPLS L2 VPNs.

 - Improve nexthop code - e.g. handle various corner cases when nexthop
   objects are removed from groups better, skip unnecessary
   notifications and make it easier to offload nexthops into HW by
   converting to a blocking notifier.

 - Support adding and consuming TCP header options by BPF programs,
   opening the doors for easy experimental and deployment-specific TCP
   option use.

 - Reorganize TCP congestion control (CC) initialization to simplify
   life of TCP CC implemented in BPF.

 - Add support for shipping BPF programs with the kernel and loading
   them early on boot via the User Mode Driver mechanism, hence reusing
   all the user space infra we have.

 - Support sleepable BPF programs, initially targeting LSM and tracing.

 - Add bpf_d_path() helper for returning full path for given 'struct
   path'.

 - Make bpf_tail_call compatible with bpf-to-bpf calls.

 - Allow BPF programs to call map_update_elem on sockmaps.

 - Add BPF Type Format (BTF) support for type and enum discovery, as
   well as support for using BTF within the kernel itself (current use
   is for pretty printing structures).

 - Support listing and getting information about bpf_links via the bpf
   syscall.

 - Enhance kernel interfaces around NIC firmware update. Allow
   specifying overwrite mask to control if settings etc. are reset
   during update; report expected max time operation may take to users;
   support firmware activation without machine reboot incl. limits of
   how much impact reset may have (e.g. dropping link or not).

 - Extend ethtool configuration interface to report IEEE-standard
   counters, to limit the need for per-vendor logic in user space.

 - Adopt or extend devlink use for debug, monitoring, fw update in many
   drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw, mv88e6xxx,
   dpaa2-eth).

 - In mlxsw expose critical and emergency SFP module temperature alarms.
   Refactor port buffer handling to make the defaults more suitable and
   support setting these values explicitly via the DCBNL interface.

 - Add XDP support for Intel's igb driver.

 - Support offloading TC flower classification and filtering rules to
   mscc_ocelot switches.

 - Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as
   fixed interval period pulse generator and one-step timestamping in
   dpaa-eth.

 - Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3)
   offload.

 - Add Lynx PHY/PCS MDIO module, and convert various drivers which have
   this HW to use it. Convert mvpp2 to split PCS.

 - Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as
   7-port Mediatek MT7531 IP.

 - Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver,
   and wcn3680 support in wcn36xx.

 - Improve performance for packets which don't require much offloads on
   recent Mellanox NICs by 20% by making multiple packets share a
   descriptor entry.

 - Move chelsio inline crypto drivers (for TLS and IPsec) from the
   crypto subtree to drivers/net. Move MDIO drivers out of the phy
   directory.

 - Clean up a lot of W=1 warnings, reportedly the actively developed
   subsections of networking drivers should now build W=1 warning free.

 - Make sure drivers don't use in_interrupt() to dynamically adapt their
   code. Convert tasklets to use new tasklet_setup API (sadly this
   conversion is not yet complete).

* tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2583 commits)
  Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH"
  net, sockmap: Don't call bpf_prog_put() on NULL pointer
  bpf, selftest: Fix flaky tcp_hdr_options test when adding addr to lo
  bpf, sockmap: Add locking annotations to iterator
  netfilter: nftables: allow re-computing sctp CRC-32C in 'payload' statements
  net: fix pos incrementment in ipv6_route_seq_next
  net/smc: fix invalid return code in smcd_new_buf_create()
  net/smc: fix valid DMBE buffer sizes
  net/smc: fix use-after-free of delayed events
  bpfilter: Fix build error with CONFIG_BPFILTER_UMH
  cxgb4/ch_ipsec: Replace the module name to ch_ipsec from chcr
  net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info
  bpf: Fix register equivalence tracking.
  rxrpc: Fix loss of final ack on shutdown
  rxrpc: Fix bundle counting for exclusive connections
  netfilter: restore NF_INET_NUMHOOKS
  ibmveth: Identify ingress large send packets.
  ibmveth: Switch order of ibmveth_helper calls.
  cxgb4: handle 4-tuple PEDIT to NAT mode translation
  selftests: Add VRF route leaking tests
  ...
2020-10-15 18:42:13 -07:00
Linus Torvalds d594d8f411 printk changes for 5.10
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAl+EN+oACgkQUqAMR0iA
 lPK/gA//WXBjC4FSPNr0j7kPFKQhADS3cUcp+GfuI4rYkYcJHV0yJn1kvctg1rUC
 Je+Hc+Hy5Nk93lwejj5BvQoc31zOeoPDyMje5zi5te4H2NQkaoGXHOMvUnaLcNeo
 g+HJvx+NU9MDjuc5amtK8YD69jzErD+eqrHpQOg4UToMXXcBXLafTThIi9vT1fzP
 9uwWBRlpdQyY7tYbbwFiDuu33PyoWlc6Ksp8qKdLBLz2AmGd1Rvaq+ePsq8b9tHJ
 pfv1agW0GTpzoN2pm5gFXOoYniHB/ooB1L0QLq7ylaociEyb8WbTtkn4v++EjxW8
 aGsO1WdO0MQeIWDxXQR5DYD3s+Me2DMhFPDqUc2/s0q2SGWUPFcsmCsvMAOx/clA
 HDfTWkyzB4FarZOTv0gZ7jYNOVukFzUQ1IBTtWpJifC9fT0xrRkKmKE1UgmWv0ei
 Hx5VFQyQGsDh3sUcRLhW91p4sqJCs7l01zw1A/0rb7a+QTHAqZRtbz5hyTjlViiT
 57XiyXynXW8N4Q5U6uAxCbkFFi+nP/XVQ5ggZ/QLn/4hfWWUcu0vt2bOGkRwryAT
 zYmDqViraEVWKIom74UzZ0nrIBtdhvtbFQIYuyiCQKpKMwytWXUQbUASZL2mfBZi
 h5eJx7etV6f5to5mNRsj8bbN5buX9UheEd0QFD9NJdS6aadqTac=
 =9vEl
 -----END PGP SIGNATURE-----

Merge tag 'printk-for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux

Pull printk updates from Petr Mladek:
 "The big new thing is the fully lockless ringbuffer implementation,
  including the support for continuous lines. It will allow to store and
  read messages in any situation wihtout the risk of deadlocks and
  without the need of temporary per-CPU buffers.

  The access is still serialized by logbuf_lock. It synchronizes few
  more operations, for example, temporary buffer for formatting the
  message, syslog and kmsg_dump operations. The lock removal is being
  discussed and should be ready for the next release.

  The continuous lines are handled exactly the same way as before to
  avoid regressions in user space. It means that they are appended to
  the last message when the caller is the same. Only the last message
  can be extended.

  The data ring includes plain text of the messages. Except for an
  integer at the beginning of each message that points back to the
  descriptor ring with other metadata.

  The dictionary has to stay. journalctl uses it to filter the log. It
  allows to show messages related to a given device. The dictionary
  values are stored in the descriptor ring with the other metadata.

  This is the first part of the printk rework as discussed at Plumbers
  2019, see https://lore.kernel.org/r/87k1acz5rx.fsf@linutronix.de. The
  next big step will be handling consoles by kthreads during the normal
  system operation. It will require special handling of situations when
  the kthreads could not get scheduled, for example, early boot,
  suspend, panic.

  Other changes:

   - Add John Ogness as a reviewer for printk subsystem. He is author of
     the rework and is familiar with the code and history.

   - Fix locking in serial8250_do_startup() to prevent lockdep report.

   - Few code cleanups"

* tag 'printk-for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: (27 commits)
  printk: Use fallthrough pseudo-keyword
  printk: reduce setup_text_buf size to LOG_LINE_MAX
  printk: avoid and/or handle record truncation
  printk: remove dict ring
  printk: move dictionary keys to dev_printk_info
  printk: move printk_info into separate array
  printk: reimplement log_cont using record extension
  printk: ringbuffer: add finalization/extension support
  printk: ringbuffer: change representation of states
  printk: ringbuffer: clear initial reserved fields
  printk: ringbuffer: add BLK_DATALESS() macro
  printk: ringbuffer: relocate get_data()
  printk: ringbuffer: avoid memcpy() on state_var
  printk: ringbuffer: fix setting state in desc_read()
  kernel.h: Move oops_in_progress to printk.h
  scripts/gdb: update for lockless printk ringbuffer
  scripts/gdb: add utils.read_ulong()
  docs: vmcoreinfo: add lockless printk ringbuffer vmcoreinfo
  printk: reduce LOG_BUF_SHIFT range for H8300
  printk: ringbuffer: support dataless records
  ...
2020-10-13 15:58:10 -07:00
Petr Mladek 70333f4ff9 Merge branch 'printk-rework' into for-linus 2020-10-12 13:01:37 +02:00
Stephen Kitt dd19d2938f Fix references to nommu-mmap.rst
nommu-mmap.rst was moved to Documentation/admin-guide/mm; this patch
updates the remaining stale references to Documentation/mm.

Fixes: 800c02f5d0 ("docs: move nommu-mmap.txt to admin-guide and rename to ReST")
Signed-off-by: Stephen Kitt <steve@sk2.org>
Link: https://lore.kernel.org/r/20200812092230.27541-1-steve@sk2.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-09-24 11:03:40 -06:00
John Ogness 550c10d28d printk: reduce LOG_BUF_SHIFT range for H8300
The .bss section for the h8300 is relatively small. A value of
CONFIG_LOG_BUF_SHIFT that is larger than 19 will create a static
printk ringbuffer that is too large. Limit the range appropriately
for the H8300.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20200812073122.25412-1-john.ogness@linutronix.de
2020-09-08 09:33:15 +02:00
Alexei Starovoitov 1e6c62a882 bpf: Introduce sleepable BPF programs
Introduce sleepable BPF programs that can request such property for themselves
via BPF_F_SLEEPABLE flag at program load time. In such case they will be able
to use helpers like bpf_copy_from_user() that might sleep. At present only
fentry/fexit/fmod_ret and lsm programs can request to be sleepable and only
when they are attached to kernel functions that are known to allow sleeping.

The non-sleepable programs are relying on implicit rcu_read_lock() and
migrate_disable() to protect life time of programs, maps that they use and
per-cpu kernel structures used to pass info between bpf programs and the
kernel. The sleepable programs cannot be enclosed into rcu_read_lock().
migrate_disable() maps to preempt_disable() in non-RT kernels, so the progs
should not be enclosed in migrate_disable() as well. Therefore
rcu_read_lock_trace is used to protect the life time of sleepable progs.

There are many networking and tracing program types. In many cases the
'struct bpf_prog *' pointer itself is rcu protected within some other kernel
data structure and the kernel code is using rcu_dereference() to load that
program pointer and call BPF_PROG_RUN() on it. All these cases are not touched.
Instead sleepable bpf programs are allowed with bpf trampoline only. The
program pointers are hard-coded into generated assembly of bpf trampoline and
synchronize_rcu_tasks_trace() is used to protect the life time of the program.
The same trampoline can hold both sleepable and non-sleepable progs.

When rcu_read_lock_trace is held it means that some sleepable bpf program is
running from bpf trampoline. Those programs can use bpf arrays and preallocated
hash/lru maps. These map types are waiting on programs to complete via
synchronize_rcu_tasks_trace();

Updates to trampoline now has to do synchronize_rcu_tasks_trace() and
synchronize_rcu_tasks() to wait for sleepable progs to finish and for
trampoline assembly to finish.

This is the first step of introducing sleepable progs. Eventually dynamically
allocated hash maps can be allowed and networking program types can become
sleepable too.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: KP Singh <kpsingh@google.com>
Link: https://lore.kernel.org/bpf/20200827220114.69225-3-alexei.starovoitov@gmail.com
2020-08-28 21:20:33 +02:00
Alexei Starovoitov d71fa5c976 bpf: Add kernel module with user mode driver that populates bpffs.
Add kernel module with user mode driver that populates bpffs with
BPF iterators.

$ mount bpffs /my/bpffs/ -t bpf
$ ls -la /my/bpffs/
total 4
drwxrwxrwt  2 root root    0 Jul  2 00:27 .
drwxr-xr-x 19 root root 4096 Jul  2 00:09 ..
-rw-------  1 root root    0 Jul  2 00:27 maps.debug
-rw-------  1 root root    0 Jul  2 00:27 progs.debug

The user mode driver will load BPF Type Formats, create BPF maps, populate BPF
maps, load two BPF programs, attach them to BPF iterators, and finally send two
bpf_link IDs back to the kernel.
The kernel will pin two bpf_links into newly mounted bpffs instance under
names "progs.debug" and "maps.debug". These two files become human readable.

$ cat /my/bpffs/progs.debug
  id name            attached
  11 dump_bpf_map    bpf_iter_bpf_map
  12 dump_bpf_prog   bpf_iter_bpf_prog
  27 test_pkt_access
  32 test_main       test_pkt_access test_pkt_access
  33 test_subprog1   test_pkt_access_subprog1 test_pkt_access
  34 test_subprog2   test_pkt_access_subprog2 test_pkt_access
  35 test_subprog3   test_pkt_access_subprog3 test_pkt_access
  36 new_get_skb_len get_skb_len test_pkt_access
  37 new_get_skb_ifindex get_skb_ifindex test_pkt_access
  38 new_get_constant get_constant test_pkt_access

The BPF program dump_bpf_prog() in iterators.bpf.c is printing this data about
all BPF programs currently loaded in the system. This information is unstable
and will change from kernel to kernel as ".debug" suffix conveys.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200819042759.51280-4-alexei.starovoitov@gmail.com
2020-08-20 16:02:36 +02:00
Kees Cook 3404be67bf mm/slab: expand CONFIG_SLAB_FREELIST_HARDENED to include SLAB
Patch series "mm: Expand CONFIG_SLAB_FREELIST_HARDENED to include SLAB"

In reviewing Vlastimil Babka's latest slub debug series, I realized[1]
that several checks under CONFIG_SLAB_FREELIST_HARDENED weren't being
applied to SLAB.  Fix this by expanding the Kconfig coverage, and adding a
simple double-free test for SLAB.

This patch (of 2):

Include SLAB caches when performing kmem_cache pointer verification.  A
defense against such corruption[1] should be applied to all the
allocators.  With this added, the "SLAB_FREE_CROSS" and "SLAB_FREE_PAGE"
LKDTM tests now pass on SLAB:

  lkdtm: Performing direct entry SLAB_FREE_CROSS
  lkdtm: Attempting cross-cache slab free ...
  ------------[ cut here ]------------
  cache_from_obj: Wrong slab cache. lkdtm-heap-b but object is from lkdtm-heap-a
  WARNING: CPU: 2 PID: 2195 at mm/slab.h:530 kmem_cache_free+0x8d/0x1d0
  ...
  lkdtm: Performing direct entry SLAB_FREE_PAGE
  lkdtm: Attempting non-Slab slab free ...
  ------------[ cut here ]------------
  virt_to_cache: Object is not a Slab page!
  WARNING: CPU: 1 PID: 2202 at mm/slab.h:489 kmem_cache_free+0x196/0x1d0

Additionally clean up neighboring Kconfig entries for clarity,
readability, and redundant option removal.

[1] https://github.com/ThomasKing2014/slides/raw/master/Building%20universal%20Android%20rooting%20with%20a%20type%20confusion%20vulnerability.pdf

Fixes: 598a0717a8 ("mm/slab: validate cache membership under freelist hardening")
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Alexander Popov <alex.popov@linux.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Garrett <mjg59@google.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Vijayanand Jitta <vjitta@codeaurora.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Link: http://lkml.kernel.org/r/20200625215548.389774-1-keescook@chromium.org
Link: http://lkml.kernel.org/r/20200625215548.389774-2-keescook@chromium.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:22 -07:00
Linus Torvalds 2324d50d05 It's been a busy cycle for documentation - hopefully the busiest for a
while to come.  Changes include:
 
  - Some new Chinese translations
 
  - Progress on the battle against double words words and non-HTTPS URLs
 
  - Some block-mq documentation
 
  - More RST conversions from Mauro.  At this point, that task is
    essentially complete, so we shouldn't see this kind of churn again for a
    while.  Unless we decide to switch to asciidoc or something...:)
 
  - Lots of typo fixes, warning fixes, and more.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAl8oVkwPHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5YoW8H/jJ/xnXFn7tkgVPQAlL3k5HCnK7A5nDP9RVR
 cg1pTx1cEFdjzxPlJyExU6/v+AImOvtweHXC+JDK7YcJ6XFUNYXJI3LxL5KwUXbY
 BL/xRFszDSXH2C7SJF5GECcFYp01e/FWSLN3yWAh+g+XwsKiTJ8q9+CoIDkHfPGO
 7oQsHKFu6s36Af0LfSgxk4sVB7EJbo8e4psuPsP5SUrl+oXRO43Put0rXkR4yJoH
 9oOaB51Do5fZp8I4JVAqGXvpXoExyLMO4yw0mASm6YSZ3KyjR8Fae+HD9Cq4ZuwY
 0uzb9K+9NEhqbfwtyBsi99S64/6Zo/MonwKwevZuhtsDTK4l4iU=
 =JQLZ
 -----END PGP SIGNATURE-----

Merge tag 'docs-5.9' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "It's been a busy cycle for documentation - hopefully the busiest for a
  while to come. Changes include:

   - Some new Chinese translations

   - Progress on the battle against double words words and non-HTTPS
     URLs

   - Some block-mq documentation

   - More RST conversions from Mauro. At this point, that task is
     essentially complete, so we shouldn't see this kind of churn again
     for a while. Unless we decide to switch to asciidoc or
     something...:)

   - Lots of typo fixes, warning fixes, and more"

* tag 'docs-5.9' of git://git.lwn.net/linux: (195 commits)
  scripts/kernel-doc: optionally treat warnings as errors
  docs: ia64: correct typo
  mailmap: add entry for <alobakin@marvell.com>
  doc/zh_CN: add cpu-load Chinese version
  Documentation/admin-guide: tainted-kernels: fix spelling mistake
  MAINTAINERS: adjust kprobes.rst entry to new location
  devices.txt: document rfkill allocation
  PCI: correct flag name
  docs: filesystems: vfs: correct flag name
  docs: filesystems: vfs: correct sync_mode flag names
  docs: path-lookup: markup fixes for emphasis
  docs: path-lookup: more markup fixes
  docs: path-lookup: fix HTML entity mojibake
  CREDITS: Replace HTTP links with HTTPS ones
  docs: process: Add an example for creating a fixes tag
  doc/zh_CN: add Chinese translation prefer section
  doc/zh_CN: add clearing-warn-once Chinese version
  doc/zh_CN: add admin-guide index
  doc:it_IT: process: coding-style.rst: Correct __maybe_unused compiler label
  futex: MAINTAINERS: Re-add selftests directory
  ...
2020-08-04 22:47:54 -07:00
Linus Torvalds c0dfadfed8 The main change in this cycle was to add support for ZSTD-compressed
kernel and initrd images.
 
 ZSTD has a very fast decompressor, yet it compresses better than gzip.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl8oNX0RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jdcg/9GaPGjmNgMqi3tbfzU3z11OrbraRBgMj5
 jHIZ89DuzwsqU+jbwGHGiF45ge85iPK6i2ovR3ePzL0LAlLYT3gqzPcl3kkog4E9
 0E0JAddx974uW4toc8cGFEHNf4vXtvvi45FL2yvDoap9xLEcpJsQRdu9upPB4U3s
 +qotO6wJitM74g4l2WdbStzCAcL4ZXFA/ix19nUyLh4QlFBDqUHwufIhW1G0ciL4
 txMXJ23L7e+b6FUvGyK3vFhba1isPdz5xQdQTy2DCK20rQhGu1IBsqzymEibbgIp
 /j4yHfUKSpxdblFcpZfknI1VM1mbt/WN5dKDKm9UnYBhA/R/2PN0klfrAQAT4SOS
 sP3bxXqTRXBjmop0NjOLCdjGCySYnPLFPlB6REIrMcvs6LYUSTqMZEusj7McwD7h
 IqS4zGEMa5A+c6Q4160Qz+zrXIyh/n/bTR/6uOKUktkUQaJ+079P64NK9RtCYZTk
 dkIHJChjmWZGxxXHEbo+4e7bM8gAMHDmX2pdWE5u72oYJRqBv7PVyl+SHBk+onxM
 crtKvqOp8Q8coirlfjx5UynZeZmH1VuIFjpvnwlAtqxZGvuTWZ0ojq3E3Y/XwHQj
 bVejr9AQ1gS9ZBTKwwd5cf7mnOuiXrHrBP3E7buoRw8bWtL+yqHyybqccZnSOUVN
 lGFshs+7J5o=
 =bARW
 -----END PGP SIGNATURE-----

Merge tag 'x86-boot-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 boot updates from Ingo Molnar:
 "The main change in this cycle was to add support for ZSTD-compressed
  kernel and initrd images.

  ZSTD has a very fast decompressor, yet it compresses better than gzip"

* tag 'x86-boot-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Documentation: dontdiff: Add zstd compressed files
  .gitignore: Add ZSTD-compressed files
  x86: Add support for ZSTD compressed kernel
  x86: Bump ZO_z_extra_bytes margin for zstd
  usr: Add support for zstd compressed initramfs
  init: Add support for zstd compressed kernel
  lib: Add zstd support to decompress
  lib: Prepare zstd for preboot environment, improve performance
2020-08-03 16:03:23 -07:00
Nick Terrell 48f7ddf785 init: Add support for zstd compressed kernel
- Add the zstd and zstd22 cmds to scripts/Makefile.lib

- Add the HAVE_KERNEL_ZSTD and KERNEL_ZSTD options

Architecture specific support is still needed for decompression.

Signed-off-by: Nick Terrell <terrelln@fb.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20200730190841.2071656-4-nickrterrell@gmail.com
2020-07-31 11:49:08 +02:00
Valentin Schneider fcd7c9c3c3 arm, arm64: Fix selection of CONFIG_SCHED_THERMAL_PRESSURE
Qian reported that the current setup forgoes the Kconfig dependencies and
results in warnings such as:

  WARNING: unmet direct dependencies detected for SCHED_THERMAL_PRESSURE
    Depends on [n]: SMP [=y] && CPU_FREQ_THERMAL [=n]
    Selected by [y]:
    - ARM64 [=y]

Revert commit

  e17ae7fea8 ("arm, arm64: Select CONFIG_SCHED_THERMAL_PRESSURE")

and re-implement it by making the option default to 'y' for arm64 and arm,
which respects Kconfig dependencies (i.e. will remain 'n' if
CPU_FREQ_THERMAL=n).

Fixes: e17ae7fea8 ("arm, arm64: Select CONFIG_SCHED_THERMAL_PRESSURE")
Reported-by: Qian Cai <cai@lca.pw>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200729135718.1871-1-valentin.schneider@arm.com
2020-07-29 16:14:16 +02:00
Valentin Schneider 98eb401d09 sched: Cleanup SCHED_THERMAL_PRESSURE kconfig entry
As Russell pointed out [1], this option is severely lacking in the
documentation department, and figuring out if one has the required
dependencies to benefit from turning it on is not straightforward.

Make it non user-visible, and add a bit of help to it. While at it, make it
depend on CPU_FREQ_THERMAL.

[1]: https://lkml.kernel.org/r/20200603173150.GB1551@shell.armlinux.org.uk

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200712165917.9168-3-valentin.schneider@arm.com
2020-07-22 10:22:06 +02:00
Masahiro Yamada b816b3db15 kbuild: fix CONFIG_CC_CAN_LINK(_STATIC) for cross-compilation with Clang
scripts/cc-can-link.sh tests if the compiler can link userspace
programs.

When $(CC) is GCC, it is checked against the target architecture
because the toolchain prefix is specified as a part of $(CC).

When $(CC) is Clang, it is checked against the host architecture
because --target option is missing.

Pass $(CLANG_FLAGS) to scripts/cc-can-link.sh to evaluate the link
capability for the target architecture.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
2020-07-02 00:57:45 +09:00
Mauro Carvalho Chehab 800c02f5d0 docs: move nommu-mmap.txt to admin-guide and rename to ReST
The nommu-mmap.txt file provides description of user visible
behaviuour. So, move it to the admin-guide.

As it is already at the ReST, also rename it.

Suggested-by: Mike Rapoport <rppt@linux.ibm.com>
Suggested-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/3a63d1833b513700755c85bf3bda0a6c4ab56986.1592918949.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-06-26 11:33:35 -06:00
Linus Torvalds 6adc19fd13 Kbuild updates for v5.8 (2nd)
- fix build rules in binderfs sample
 
  - fix build errors when Kbuild recurses to the top Makefile
 
  - covert '---help---' in Kconfig to 'help'
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAl7lBuYVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGHvIP/3iErjPshpg/phwH8NTCS4SFkiti
 BZRM+2lupSn7Qs53BTpVzIkXoHBJQZlJxlQ5HY8ScO+fiz28rKZr+b40us+je1Q+
 SkvSPfwZzxjEg7lAZutznG4KgItJLWJKmDyh9T8Y8TAuG4f8WO0hKnXoAp3YorS2
 zppEIxso8O5spZPjp+fF/fPbxPjIsabGK7Jp2LpSVFR5pVDHI/ycTlKQS+MFpMEx
 6JIpdFRw7TkvKew1dr5uAWT5btWHatEqjSR3JeyVHv3EICTGQwHmcHK67cJzGInK
 T51+DT7/CpKtmRgGMiTEu/INfMzzoQAKl6Fcu+vMaShTN97Hk9DpdtQyvA6P/h3L
 8GA4UBct05J7fjjIB7iUD+GYQ0EZbaFujzRXLYk+dQqEJRbhcCwvdzggGp0WvGRs
 1f8/AIpgnQv8JSL/bOMgGMS5uL2dSLsgbzTdr6RzWf1jlYdI1i4u7AZ/nBrwWP+Z
 iOBkKsVceEoJrTbaynl3eoYqFLtWyDau+//oBc2gUvmhn8ioM5dfqBRiJjxJnPG9
 /giRj6xRIqMMEw8Gg8PCG7WebfWxWyaIQwlWBbPok7DwISURK5mvOyakZL+Q25/y
 6MBr2H8NEJsf35q0GTINpfZnot7NX4JXrrndJH8NIRC7HEhwd29S041xlQJdP0rs
 E76xsOr3hrAmBu4P
 =1NIT
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull more Kbuild updates from Masahiro Yamada:

 - fix build rules in binderfs sample

 - fix build errors when Kbuild recurses to the top Makefile

 - covert '---help---' in Kconfig to 'help'

* tag 'kbuild-v5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  treewide: replace '---help---' in Kconfig files with 'help'
  kbuild: fix broken builds because of GZIP,BZIP2,LZOP variables
  samples: binderfs: really compile this sample and fix build issues
2020-06-13 13:29:16 -07:00
Masahiro Yamada a7f7f6248d treewide: replace '---help---' in Kconfig files with 'help'
Since commit 84af7a6194 ("checkpatch: kconfig: prefer 'help' over
'---help---'"), the number of '---help---' has been gradually
decreasing, but there are still more than 2400 instances.

This commit finishes the conversion. While I touched the lines,
I also fixed the indentation.

There are a variety of indentation styles found.

  a) 4 spaces + '---help---'
  b) 7 spaces + '---help---'
  c) 8 spaces + '---help---'
  d) 1 space + 1 tab + '---help---'
  e) 1 tab + '---help---'    (correct indentation)
  f) 1 tab + 1 space + '---help---'
  g) 1 tab + 2 spaces + '---help---'

In order to convert all of them to 1 tab + 'help', I ran the
following commend:

  $ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/'

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-14 01:57:21 +09:00
Linus Torvalds 6c32978414 Notifications over pipes + Keyring notifications
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAl7U/i8ACgkQ+7dXa6fL
 C2u2eg/+Oy6ybq0hPovYVkFI9WIG7ZCz7w9Q6BEnfYMqqn3dnfJxKQ3l4pnQEOWw
 f4QfvpvevsYfMtOJkYcG6s66rQgbFdqc5TEyBBy0QNp3acRolN7IXkcopvv9xOpQ
 JxedpbFG1PTFLWjvBpyjlrUPouwLzq2FXAf1Ox0ZIMw6165mYOMWoli1VL8dh0A0
 Ai7JUB0WrvTNbrwhV413obIzXT/rPCdcrgbQcgrrLPex8lQ47ZAE9bq6k4q5HiwK
 KRzEqkQgnzId6cCNTFBfkTWsx89zZunz7jkfM5yx30MvdAtPSxvvpfIPdZRZkXsP
 E2K9Fk1/6OQZTC0Op3Pi/bt+hVG/mD1p0sQUDgo2MO3qlSS+5mMkR8h3mJEgwK12
 72P4YfOJkuAy2z3v4lL0GYdUDAZY6i6G8TMxERKu/a9O3VjTWICDOyBUS6F8YEAK
 C7HlbZxAEOKTVK0BTDTeEUBwSeDrBbvH6MnRlZCG5g1Fos2aWP0udhjiX8IfZLO7
 GN6nWBvK1fYzfsUczdhgnoCzQs3suoDo04HnsTPGJ8De52T4x2RsjV+gPx0nrNAq
 eWChl1JvMWsY2B3GLnl9XQz4NNN+EreKEkk+PULDGllrArrPsp5Vnhb9FJO1PVCU
 hMDJHohPiXnKbc8f4Bd78OhIvnuoGfJPdM5MtNe2flUKy2a2ops=
 =YTGf
 -----END PGP SIGNATURE-----

Merge tag 'notifications-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull notification queue from David Howells:
 "This adds a general notification queue concept and adds an event
  source for keys/keyrings, such as linking and unlinking keys and
  changing their attributes.

  Thanks to Debarshi Ray, we do have a pull request to use this to fix a
  problem with gnome-online-accounts - as mentioned last time:

     https://gitlab.gnome.org/GNOME/gnome-online-accounts/merge_requests/47

  Without this, g-o-a has to constantly poll a keyring-based kerberos
  cache to find out if kinit has changed anything.

  [ There are other notification pending: mount/sb fsinfo notifications
    for libmount that Karel Zak and Ian Kent have been working on, and
    Christian Brauner would like to use them in lxc, but let's see how
    this one works first ]

  LSM hooks are included:

   - A set of hooks are provided that allow an LSM to rule on whether or
     not a watch may be set. Each of these hooks takes a different
     "watched object" parameter, so they're not really shareable. The
     LSM should use current's credentials. [Wanted by SELinux & Smack]

   - A hook is provided to allow an LSM to rule on whether or not a
     particular message may be posted to a particular queue. This is
     given the credentials from the event generator (which may be the
     system) and the watch setter. [Wanted by Smack]

  I've provided SELinux and Smack with implementations of some of these
  hooks.

  WHY
  ===

  Key/keyring notifications are desirable because if you have your
  kerberos tickets in a file/directory, your Gnome desktop will monitor
  that using something like fanotify and tell you if your credentials
  cache changes.

  However, we also have the ability to cache your kerberos tickets in
  the session, user or persistent keyring so that it isn't left around
  on disk across a reboot or logout. Keyrings, however, cannot currently
  be monitored asynchronously, so the desktop has to poll for it - not
  so good on a laptop. This facility will allow the desktop to avoid the
  need to poll.

  DESIGN DECISIONS
  ================

   - The notification queue is built on top of a standard pipe. Messages
     are effectively spliced in. The pipe is opened with a special flag:

        pipe2(fds, O_NOTIFICATION_PIPE);

     The special flag has the same value as O_EXCL (which doesn't seem
     like it will ever be applicable in this context)[?]. It is given up
     front to make it a lot easier to prohibit splice&co from accessing
     the pipe.

     [?] Should this be done some other way?  I'd rather not use up a new
         O_* flag if I can avoid it - should I add a pipe3() system call
         instead?

     The pipe is then configured::

        ioctl(fds[1], IOC_WATCH_QUEUE_SET_SIZE, queue_depth);
        ioctl(fds[1], IOC_WATCH_QUEUE_SET_FILTER, &filter);

     Messages are then read out of the pipe using read().

   - It should be possible to allow write() to insert data into the
     notification pipes too, but this is currently disabled as the
     kernel has to be able to insert messages into the pipe *without*
     holding pipe->mutex and the code to make this work needs careful
     auditing.

   - sendfile(), splice() and vmsplice() are disabled on notification
     pipes because of the pipe->mutex issue and also because they
     sometimes want to revert what they just did - but one or more
     notification messages might've been interleaved in the ring.

   - The kernel inserts messages with the wait queue spinlock held. This
     means that pipe_read() and pipe_write() have to take the spinlock
     to update the queue pointers.

   - Records in the buffer are binary, typed and have a length so that
     they can be of varying size.

     This allows multiple heterogeneous sources to share a common
     buffer; there are 16 million types available, of which I've used
     just a few, so there is scope for others to be used. Tags may be
     specified when a watchpoint is created to help distinguish the
     sources.

   - Records are filterable as types have up to 256 subtypes that can be
     individually filtered. Other filtration is also available.

   - Notification pipes don't interfere with each other; each may be
     bound to a different set of watches. Any particular notification
     will be copied to all the queues that are currently watching for it
     - and only those that are watching for it.

   - When recording a notification, the kernel will not sleep, but will
     rather mark a queue as having lost a message if there's
     insufficient space. read() will fabricate a loss notification
     message at an appropriate point later.

   - The notification pipe is created and then watchpoints are attached
     to it, using one of:

        keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fds[1], 0x01);
        watch_mount(AT_FDCWD, "/", 0, fd, 0x02);
        watch_sb(AT_FDCWD, "/mnt", 0, fd, 0x03);

     where in both cases, fd indicates the queue and the number after is
     a tag between 0 and 255.

   - Watches are removed if either the notification pipe is destroyed or
     the watched object is destroyed. In the latter case, a message will
     be generated indicating the enforced watch removal.

  Things I want to avoid:

   - Introducing features that make the core VFS dependent on the
     network stack or networking namespaces (ie. usage of netlink).

   - Dumping all this stuff into dmesg and having a daemon that sits
     there parsing the output and distributing it as this then puts the
     responsibility for security into userspace and makes handling
     namespaces tricky. Further, dmesg might not exist or might be
     inaccessible inside a container.

   - Letting users see events they shouldn't be able to see.

  TESTING AND MANPAGES
  ====================

   - The keyutils tree has a pipe-watch branch that has keyctl commands
     for making use of notifications. Proposed manual pages can also be
     found on this branch, though a couple of them really need to go to
     the main manpages repository instead.

     If the kernel supports the watching of keys, then running "make
     test" on that branch will cause the testing infrastructure to spawn
     a monitoring process on the side that monitors a notifications pipe
     for all the key/keyring changes induced by the tests and they'll
     all be checked off to make sure they happened.

        https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/log/?h=pipe-watch

   - A test program is provided (samples/watch_queue/watch_test) that
     can be used to monitor for keyrings, mount and superblock events.
     Information on the notifications is simply logged to stdout"

* tag 'notifications-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  smack: Implement the watch_key and post_notification hooks
  selinux: Implement the watch_key security hook
  keys: Make the KEY_NEED_* perms an enum rather than a mask
  pipe: Add notification lossage handling
  pipe: Allow buffers to be marked read-whole-or-error for notifications
  Add sample notification program
  watch_queue: Add a key/keyring notification facility
  security: Add hooks to rule on setting a watch
  pipe: Add general notification queue support
  pipe: Add O_NOTIFICATION_PIPE
  security: Add a hook for the point of notification insertion
  uapi: General notification queue definitions
2020-06-13 09:56:21 -07:00
Linus Torvalds 4152d146ee Merge branch 'rwonce/rework' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux
Pull READ/WRITE_ONCE rework from Will Deacon:
 "This the READ_ONCE rework I've been working on for a while, which
  bumps the minimum GCC version and improves code-gen on arm64 when
  stack protector is enabled"

[ Side note: I'm _really_ tempted to raise the minimum gcc version to
  4.9, so that we can just say that we require _Generic() support.

  That would allow us to more cleanly handle a lot of the cases where we
  depend on very complex macros with 'sizeof' or __builtin_choose_expr()
  with __builtin_types_compatible_p() etc.

  This branch has a workaround for sparse not handling _Generic(),
  either, but that was already fixed in the sparse development branch,
  so it's really just gcc-4.9 that we'd require.   - Linus ]

* 'rwonce/rework' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux:
  compiler_types.h: Use unoptimized __unqual_scalar_typeof for sparse
  compiler_types.h: Optimize __unqual_scalar_typeof compilation time
  compiler.h: Enforce that READ_ONCE_NOCHECK() access size is sizeof(long)
  compiler-types.h: Include naked type in __pick_integer_type() match
  READ_ONCE: Fix comment describing 2x32-bit atomicity
  gcov: Remove old GCC 3.4 support
  arm64: barrier: Use '__unqual_scalar_typeof' for acquire/release macros
  locking/barriers: Use '__unqual_scalar_typeof' for load-acquire macros
  READ_ONCE: Drop pointer qualifiers when reading from scalar types
  READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses
  READ_ONCE: Simplify implementations of {READ,WRITE}_ONCE()
  arm64: csum: Disable KASAN for do_csum()
  fault_inject: Don't rely on "return value" from WRITE_ONCE()
  net: tls: Avoid assigning 'const' pointer to non-const pointer
  netfilter: Avoid assigning 'const' pointer to non-const pointer
  compiler/gcc: Raise minimum GCC version for kernel builds to 4.8
2020-06-10 14:46:54 -07:00
Linus Torvalds cff11abeca Kbuild updates for v5.8
- fix warnings in 'make clean' for ARCH=um, hexagon, h8300, unicore32
 
  - ensure to rebuild all objects when the compiler is upgraded
 
  - exclude system headers from dependency tracking and fixdep processing
 
  - fix potential bit-size mismatch between the kernel and BPF user-mode
    helper
 
  - add the new syntax 'userprogs' to build user-space programs for the
    target architecture (the same arch as the kernel)
 
  - compile user-space sample code under samples/ for the target arch
    instead of the host arch
 
  - make headers_install fail if a CONFIG option is leaked to user-space
 
  - sanitize the output format of scripts/checkstack.pl
 
  - handle ARM 'push' instruction in scripts/checkstack.pl
 
  - error out before modpost if a module name conflict is found
 
  - error out when multiple directories are passed to M= because this
    feature is broken for a long time
 
  - add CONFIG_DEBUG_INFO_COMPRESSED to support compressed debug info
 
  - a lot of cleanups of modpost
 
  - dump vmlinux symbols out into vmlinux.symvers, and reuse it in the
    second pass of modpost
 
  - do not run the second pass of modpost if nothing in modules is updated
 
  - install modules.builtin(.modinfo) by 'make install' as well as by
    'make modules_install' because it is useful even when CONFIG_MODULES=n
 
  - add new command line variables, GZIP, BZIP2, LZOP, LZMA, LZ4, and XZ
    to allow users to use alternatives such as pigz, pbzip2, etc.
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAl7brm0VHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGjeEP/Rrf8H9cp/Tq+ALQCBycI3W5ZEHg
 n2EqprZkVP2MlOV0d+8b9t4PdZf6E5Wmfv26sMaBAhl6X1KQI/0NgPMnTINvy5jJ
 Q2SMhj9y8Gwr3XKFu9Hd/0U+Sax5rz+LmY84tdF95dXzPIUWjAEVnbmN+ofY6T++
 sNf2YGNFSR6iiqr3uCYA0hHZmpKlfhVgDPAdncWa5aadSsuQb79nZQWefGeVEsuD
 HrISpwnkhBc0qY1xyWry6agE92xWmkNkdjKq6A7peguZL02XySWLRWjyHoiiaPOB
 6U4urKs/NSXqPgxGxwZthhwERHryC3+g4s8wRBDKE6ISRWKBBA2ruHpgdF5h/utu
 re1ZP2qRcAt8NBFynr4MEb2AU0mYkv7iEgfLJ7NUCRlMOtqrn5RFwnS4r8ReyQp5
 1UM11RbPhYgYjM5g9hBHJ7nK944/kfvy1/4jF4I1+M5O7QL6f00pu3r2bBIa/65g
 DWrNOpIliKG27GgnRlxi7HgLfxs9etFcXTpHO0ymgnMmlz+7FQsdceR9qqybGU9o
 yBWw6zculMQjb3E+k0DTnE5kLWsycbua921wxM9ABSxRmJi7WciNF73RdLUIBoAY
 VUbwrP2aIpdL+2uyX6RqdTaWzEBpW8omszr46aQ96pX+RiqMrPvJRLaA/tr3ZH8g
 tdHenJPWdHSaOcO4
 =GKe5
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - fix warnings in 'make clean' for ARCH=um, hexagon, h8300, unicore32

 - ensure to rebuild all objects when the compiler is upgraded

 - exclude system headers from dependency tracking and fixdep processing

 - fix potential bit-size mismatch between the kernel and BPF user-mode
   helper

 - add the new syntax 'userprogs' to build user-space programs for the
   target architecture (the same arch as the kernel)

 - compile user-space sample code under samples/ for the target arch
   instead of the host arch

 - make headers_install fail if a CONFIG option is leaked to user-space

 - sanitize the output format of scripts/checkstack.pl

 - handle ARM 'push' instruction in scripts/checkstack.pl

 - error out before modpost if a module name conflict is found

 - error out when multiple directories are passed to M= because this
   feature is broken for a long time

 - add CONFIG_DEBUG_INFO_COMPRESSED to support compressed debug info

 - a lot of cleanups of modpost

 - dump vmlinux symbols out into vmlinux.symvers, and reuse it in the
   second pass of modpost

 - do not run the second pass of modpost if nothing in modules is
   updated

 - install modules.builtin(.modinfo) by 'make install' as well as by
   'make modules_install' because it is useful even when
   CONFIG_MODULES=n

 - add new command line variables, GZIP, BZIP2, LZOP, LZMA, LZ4, and XZ
   to allow users to use alternatives such as pigz, pbzip2, etc.

* tag 'kbuild-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (96 commits)
  kbuild: add variables for compression tools
  Makefile: install modules.builtin even if CONFIG_MODULES=n
  mksysmap: Fix the mismatch of '.L' symbols in System.map
  kbuild: doc: rename LDFLAGS to KBUILD_LDFLAGS
  modpost: change elf_info->size to size_t
  modpost: remove is_vmlinux() helper
  modpost: strip .o from modname before calling new_module()
  modpost: set have_vmlinux in new_module()
  modpost: remove mod->skip struct member
  modpost: add mod->is_vmlinux struct member
  modpost: remove is_vmlinux() call in check_for_{gpl_usage,unused}()
  modpost: remove mod->is_dot_o struct member
  modpost: move -d option in scripts/Makefile.modpost
  modpost: remove -s option
  modpost: remove get_next_text() and make {grab,release_}file static
  modpost: use read_text_file() and get_line() for reading text files
  modpost: avoid false-positive file open error
  modpost: fix potential mmap'ed file overrun in get_src_version()
  modpost: add read_text_file() and get_line() helpers
  modpost: do not call get_modinfo() for vmlinux(.o)
  ...
2020-06-06 12:00:25 -07:00
Nick Desaulniers 587f17018a Kconfig: add config option for asm goto w/ outputs
This allows C code to make use of compilers with support for output
variables along the fallthrough path via preprocessor define:

  CONFIG_CC_HAS_ASM_GOTO_OUTPUT

[ This is not used anywhere yet, and currently released compilers don't
  support this yet, but it's coming, and I have some local experimental
  patches to take advantage of it when it does   - Linus ]

Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04 19:28:07 -07:00
Chris Down ada4ab7af1 init: allow distribution configuration of default init
Some init systems (eg.  systemd) have init at their own paths, for
example, /usr/lib/systemd/systemd.  A compatibility symlink to one of the
hardcoded init paths is provided by another package, usually named
something like systemd-sysvcompat or similar.

Currently distro maintainers who are hands-off on the bootloader are more
or less required to include those compatibility links as part of their
base distribution, because it's hard to migrate away from them since
there's a risk some users will not get the message to set init= on the
kernel command line appropriately.

Moreover, for distributions where the init system is something the
distribution itself is opinionated about (eg.  Arch, which has systemd in
the required `base` package), we could usually reasonably configure this
ahead of time when building the distribution kernel.  However, we
currently simply don't have any way to configure the kernel to do this.
Here's an example discussion where removing sysvcompat was discussed by
distro maintainers[0].

This patch adds a new Kconfig tunable, CONFIG_DEFAULT_INIT, which if set
is tried before the hardcoded fallback list.  So the order of precedence
is now thus:

1. init= on command line (on failure: panic)
2. CONFIG_DEFAULT_INIT (on failure: try #3)
3. Hardcoded fallback list (on failure: panic)

This new config parameter will allow distribution maintainers to move away
from these compatibility links safely, without having to worry that their
users might not have the right init=.

There are also two other benefits of this over having the distribution
maintain a symlink:

1. One of the value propositions over simply having distributions
   maintain a /sbin/init symlink via a package is that it also frees
   distributions which have a preferred default, but not mandatory, init
   system from having their package manager fight with their users for
   control of /{s,}bin/init.  Instead, the distribution simply makes
   their preference known in CONFIG_DEFAULT_INIT, and if the user
   installs another init system and uninstalls the default one they can
   still make use of /{s,}bin/init and friends for their own uses. This
   makes more cases Just Work(tm) without the user having to perform
   extra configuration via init=.

2. Since before this we don't know which path the distribution actually
   _intends_ to serve init from, we don't pr_err if it is simply
   missing, and usually will just silently put the user in a /bin/sh
   shell. Now that the distribution can make a declaration of intent, we
   can be more vocal when this init system fails to launch for any
   reason, even if it's simply because no file exists at that location,
   speeding up the palaver of init/mount dependency/etc debugging a bit.

[0]: https://lists.archlinux.org/pipermail/arch-dev-public/2019-January/029435.html

Signed-off-by: Chris Down <chris@chrisdown.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Link: http://lkml.kernel.org/r/20200522160234.GA1487022@chrisdown.name
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04 19:06:25 -07:00
Linus Torvalds ee01c4d72a Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
 "More mm/ work, plenty more to come

  Subsystems affected by this patch series: slub, memcg, gup, kasan,
  pagealloc, hugetlb, vmscan, tools, mempolicy, memblock, hugetlbfs,
  thp, mmap, kconfig"

* akpm: (131 commits)
  arm64: mm: use ARCH_HAS_DEBUG_WX instead of arch defined
  x86: mm: use ARCH_HAS_DEBUG_WX instead of arch defined
  riscv: support DEBUG_WX
  mm: add DEBUG_WX support
  drivers/base/memory.c: cache memory blocks in xarray to accelerate lookup
  mm/thp: rename pmd_mknotpresent() as pmd_mkinvalid()
  powerpc/mm: drop platform defined pmd_mknotpresent()
  mm: thp: don't need to drain lru cache when splitting and mlocking THP
  hugetlbfs: get unmapped area below TASK_UNMAPPED_BASE for hugetlbfs
  sparc32: register memory occupied by kernel as memblock.memory
  include/linux/memblock.h: fix minor typo and unclear comment
  mm, mempolicy: fix up gup usage in lookup_node
  tools/vm/page_owner_sort.c: filter out unneeded line
  mm: swap: memcg: fix memcg stats for huge pages
  mm: swap: fix vmstats for huge pages
  mm: vmscan: limit the range of LRU type balancing
  mm: vmscan: reclaim writepage is IO cost
  mm: vmscan: determine anon/file pressure balance at the reclaim root
  mm: balance LRU lists based on relative thrashing
  mm: only count actual rotations as LRU reclaim cost
  ...
2020-06-03 20:24:15 -07:00
Johannes Weiner 2d1c498072 mm: memcontrol: make swap tracking an integral part of memory control
Without swap page tracking, users that are otherwise memory controlled can
easily escape their containment and allocate significant amounts of memory
that they're not being charged for.  That's because swap does readahead,
but without the cgroup records of who owned the page at swapout, readahead
pages don't get charged until somebody actually faults them into their
page table and we can identify an owner task.  This can be maliciously
exploited with MADV_WILLNEED, which triggers arbitrary readahead
allocations without charging the pages.

Make swap swap page tracking an integral part of memcg and remove the
Kconfig options.  In the first place, it was only made configurable to
allow users to save some memory.  But the overhead of tracking cgroup
ownership per swap page is minimal - 2 byte per page, or 512k per 1G of
swap, or 0.04%.  Saving that at the expense of broken containment
semantics is not something we should present as a coequal option.

The swapaccount=0 boot option will continue to exist, and it will
eliminate the page_counter overhead and hide the swap control files, but
it won't disable swap slot ownership tracking.

This patch makes sure we always have the cgroup records at swapin time;
the next patch will fix the actual bug by charging readahead swap pages at
swapin time rather than at fault time.

v2: fix double swap charge bug in cgroup1/cgroup2 code gating

[hannes@cmpxchg.org: fix crash with cgroup_disable=memory]
  Link: http://lkml.kernel.org/r/20200521215855.GB815153@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
Link: http://lkml.kernel.org/r/20200508183105.225460-16-hannes@cmpxchg.org
Debugged-by: Hugh Dickins <hughd@google.com>
Debugged-by: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-03 20:09:48 -07:00
Linus Torvalds 8226f11318 MIPS updates for v5.8:
- added support for MIPSr5 and P5600 cores
 - converted Loongson PCI driver into a PCI host driver using the generic
   PCI framework
 - added emulation of CPUCFG command for Loogonson64 cpus
 - removed of LASAT, PMC MSP71xx and NEC MARKEINS/EMMA
 - ioremap cleanup
 - fix for a race between two threads faulting the same page
 - various cleanups and fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQJOBAABCAA4FiEEbt46xwy6kEcDOXoUeZbBVTGwZHAFAl7WK54aHHRzYm9nZW5k
 QGFscGhhLmZyYW5rZW4uZGUACgkQeZbBVTGwZHAbjA/9EEFeqNg9UNUH6/TS18QV
 qkxKp0+LC4Jk+SduzLyYsYy6l/dSaKYl8m9jyJsWjM6BvBZTcMJJOnzIPRafI0s+
 MK8GCSZunAkm25DsDvfobQUkbQ/UHjY/fuRpNslbDcsYqIKv90hUMd21ccXY6KC5
 RY+aMlpjgksg1X8JJ7k1Rs05sXyUPqpESteyqehF1b/+Iyv7H2L3v5EvQwvPDs6f
 TyVgNJU2B3RCU6/uAcWmHdVLxXd+Y8fM0vC8DCO0pg0rGf4be0FbZztHmeq6r2wy
 g7wsO7acKWGzulFQD5ftVSQ6i8KHIDNDePmDMtU5oFcXkzUDdGvd3j3Gst19/nve
 ZftNmQHOY1JqGUOhdq1fDG/4M3Vc5bvh3W6eMG22TuMLEWsOF8teY8uUa/vxOb+B
 2NsJ9q6ylRS7RDWWOrApJWfFYPvhr5wlLxT+azWNa9y3bjV8vDLjNdU0mRLA1nsu
 yLzYMwIhtWfZhkJZ+xJVSmQ6LjAHDN5TF/LEx/9itLg5t9wrEosFPAtOv8V15hy4
 KBNvvWeoy7RRmBTNuKh7r9Ui4jw7GgxL4D1OwzCsF//GAiGyuuh0zMuUE8EXA6K5
 MpdGt+bSOcLl8ILTtGir8e4MXLawDH8n94f8QWLb9FcOvU4KHUjRKU7EQ6dyD5dk
 a7xskGLXWdVO3IJ/Xvxcaeo=
 =eAtN
 -----END PGP SIGNATURE-----

Merge tag 'mips_5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

Pull MIPS updates from Thomas Bogendoerfer:

 - added support for MIPSr5 and P5600 cores

 - converted Loongson PCI driver into a PCI host driver using the
   generic PCI framework

 - added emulation of CPUCFG command for Loogonson64 cpus

 - removed of LASAT, PMC MSP71xx and NEC MARKEINS/EMMA

 - ioremap cleanup

 - fix for a race between two threads faulting the same page

 - various cleanups and fixes

* tag 'mips_5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (143 commits)
  MIPS: ralink: drop ralink_clk_init for mt7621
  MIPS: ralink: bootrom: mark a function as __init to save some memory
  MIPS: Loongson64: Reorder CPUCFG model match arms
  MIPS: Expose Loongson CPUCFG availability via HWCAP
  MIPS: Loongson64: Guard against future cores without CPUCFG
  MIPS: Fix build warning about "PTR_STR" redefinition
  MIPS: Loongson64: Remove not used pci.c
  MIPS: Loongson64: Define PCI_IOBASE
  MIPS: CPU_LOONGSON2EF need software to maintain cache consistency
  MIPS: DTS: Fix build errors used with various configs
  MIPS: Loongson64: select NO_EXCEPT_FILL
  MIPS: Fix IRQ tracing when call handle_fpe() and handle_msa_fpe()
  MIPS: mm: add page valid judgement in function pte_modify
  mm/memory.c: Add memory read privilege on page fault handling
  mm/memory.c: Update local TLB if PTE entry exists
  MIPS: Do not flush tlb page when updating PTE entry
  MIPS: ingenic: Default to a generic board
  MIPS: ingenic: Add support for GCW Zero prototype
  MIPS: ingenic: DTS: Add memory info of GCW Zero
  MIPS: Loongson64: Switch to generic PCI driver
  ...
2020-06-03 13:32:21 -07:00
David Howells c73be61ced pipe: Add general notification queue support
Make it possible to have a general notification queue built on top of a
standard pipe.  Notifications are 'spliced' into the pipe and then read
out.  splice(), vmsplice() and sendfile() are forbidden on pipes used for
notifications as post_one_notification() cannot take pipe->mutex.  This
means that notifications could be posted in between individual pipe
buffers, making iov_iter_revert() difficult to effect.

The way the notification queue is used is:

 (1) An application opens a pipe with a special flag and indicates the
     number of messages it wishes to be able to queue at once (this can
     only be set once):

	pipe2(fds, O_NOTIFICATION_PIPE);
	ioctl(fds[0], IOC_WATCH_QUEUE_SET_SIZE, queue_depth);

 (2) The application then uses poll() and read() as normal to extract data
     from the pipe.  read() will return multiple notifications if the
     buffer is big enough, but it will not split a notification across
     buffers - rather it will return a short read or EMSGSIZE.

     Notification messages include a length in the header so that the
     caller can split them up.

Each message has a header that describes it:

	struct watch_notification {
		__u32	type:24;
		__u32	subtype:8;
		__u32	info;
	};

The type indicates the source (eg. mount tree changes, superblock events,
keyring changes, block layer events) and the subtype indicates the event
type (eg. mount, unmount; EIO, EDQUOT; link, unlink).  The info field
indicates a number of things, including the entry length, an ID assigned to
a watchpoint contributing to this buffer and type-specific flags.

Supplementary data, such as the key ID that generated an event, can be
attached in additional slots.  The maximum message size is 127 bytes.
Messages may not be padded or aligned, so there is no guarantee, for
example, that the notification type will be on a 4-byte bounary.

Signed-off-by: David Howells <dhowells@redhat.com>
2020-05-19 15:08:24 +01:00
Masahiro Yamada b1183b6dca bpfilter: check if $(CC) can link static libc in Kconfig
On Fedora, linking static glibc requires the glibc-static RPM package,
which is not part of the glibc-devel package.

CONFIG_CC_CAN_LINK does not check the capability of static linking,
so you can enable CONFIG_BPFILTER_UMH, then fail to build:

    HOSTLD  net/bpfilter/bpfilter_umh
  /usr/bin/ld: cannot find -lc
  collect2: error: ld returned 1 exit status

Add CONFIG_CC_CAN_LINK_STATIC, and make CONFIG_BPFILTER_UMH depend
on it.

Reported-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
2020-05-17 18:52:01 +09:00
Masahiro Yamada 9371f86ecb bpfilter: match bit size of bpfilter_umh to that of the kernel
bpfilter_umh is built for the default machine bit of the compiler,
which may not match to the bit size of the kernel.

This happens in the scenario below:

You can use biarch GCC that defaults to 64-bit for building the 32-bit
kernel. In this case, Kbuild passes -m32 to teach the compiler to
produce 32-bit kernel space objects. However, it is missing when
building bpfilter_umh. It is built as a 64-bit ELF, and then embedded
into the 32-bit kernel.

The 32-bit kernel and 64-bit umh is a bad combination.

In theory, we can have 32-bit umh running on 64-bit kernel, but we do
not have a good reason to support such a usecase.

The best is to match the bit size between them.

Pass -m32 or -m64 to the umh build command if it is found in
$(KBUILD_CFLAGS). Evaluate CC_CAN_LINK against the kernel bit-size.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-05-17 18:52:01 +09:00
Linus Torvalds f85c1598dd Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Fix sk_psock reference count leak on receive, from Xiyu Yang.

 2) CONFIG_HNS should be invisible, from Geert Uytterhoeven.

 3) Don't allow locking route MTUs in ipv6, RFCs actually forbid this,
    from Maciej Żenczykowski.

 4) ipv4 route redirect backoff wasn't actually enforced, from Paolo
    Abeni.

 5) Fix netprio cgroup v2 leak, from Zefan Li.

 6) Fix infinite loop on rmmod in conntrack, from Florian Westphal.

 7) Fix tcp SO_RCVLOWAT hangs, from Eric Dumazet.

 8) Various bpf probe handling fixes, from Daniel Borkmann.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (68 commits)
  selftests: mptcp: pm: rm the right tmp file
  dpaa2-eth: properly handle buffer size restrictions
  bpf: Restrict bpf_trace_printk()'s %s usage and add %pks, %pus specifier
  bpf: Add bpf_probe_read_{user, kernel}_str() to do_refine_retval_range
  bpf: Restrict bpf_probe_read{, str}() only to archs where they work
  MAINTAINERS: Mark networking drivers as Maintained.
  ipmr: Add lockdep expression to ipmr_for_each_table macro
  ipmr: Fix RCU list debugging warning
  drivers: net: hamradio: Fix suspicious RCU usage warning in bpqether.c
  net: phy: broadcom: fix BCM54XX_SHD_SCR3_TRDDAPD value for BCM54810
  tcp: fix error recovery in tcp_zerocopy_receive()
  MAINTAINERS: Add Jakub to networking drivers.
  MAINTAINERS: another add of Karsten Graul for S390 networking
  drivers: ipa: fix typos for ipa_smp2p structure doc
  pppoe: only process PADT targeted at local interfaces
  selftests/bpf: Enforce returning 0 for fentry/fexit programs
  bpf: Enforce returning 0 for fentry/fexit progs
  net: stmmac: fix num_por initialization
  security: Fix the default value of secid_to_secctx hook
  libbpf: Fix register naming in PT_REGS s390 macros
  ...
2020-05-15 13:10:06 -07:00
Daniel Borkmann 0ebeea8ca8 bpf: Restrict bpf_probe_read{, str}() only to archs where they work
Given the legacy bpf_probe_read{,str}() BPF helpers are broken on archs
with overlapping address ranges, we should really take the next step to
disable them from BPF use there.

To generally fix the situation, we've recently added new helper variants
bpf_probe_read_{user,kernel}() and bpf_probe_read_{user,kernel}_str().
For details on them, see 6ae08ae3de ("bpf: Add probe_read_{user, kernel}
and probe_read_{user,kernel}_str helpers").

Given bpf_probe_read{,str}() have been around for ~5 years by now, there
are plenty of users at least on x86 still relying on them today, so we
cannot remove them entirely w/o breaking the BPF tracing ecosystem.

However, their use should be restricted to archs with non-overlapping
address ranges where they are working in their current form. Therefore,
move this behind a CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE and
have x86, arm64, arm select it (other archs supporting it can follow-up
on it as well).

For the remaining archs, they can workaround easily by relying on the
feature probe from bpftool which spills out defines that can be used out
of BPF C code to implement the drop-in replacement for old/new kernels
via: bpftool feature probe macro

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/bpf/20200515101118.6508-2-daniel@iogearbox.net
2020-05-15 08:10:36 -07:00
Sami Tolvanen b744b43f79 kbuild: add CONFIG_LD_IS_LLD
Similarly to the CC_IS_CLANG config, add LD_IS_LLD to avoid GNU ld
specific logic such as ld-version or ld-ifversion and gain the
ability to select potential features that depend on the linker at
configuration time such as LTO.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Masahiro Yamada <masahiroy@kernel.org>
[nc: Reword commit message]
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-05-12 10:01:31 +02:00
Masahiro Yamada 8b59cd81dc kbuild: ensure full rebuild when the compiler is updated
Commit 21c54b7747 ("kconfig: show compiler version text in the top
comment") added the environment variable, CC_VERSION_TEXT in the comment
of the top Kconfig file. It can detect the compiler update, and invoke
the syncconfig because all environment variables referenced in Kconfig
files are recorded in include/config/auto.conf.cmd

This commit makes it a CONFIG option in order to ensure the full rebuild
when the compiler is updated.

This works like follows:

include/config/kconfig.h contains "CONFIG_CC_VERSION_TEXT" in the comment
block.

The top Makefile specifies "-include $(srctree)/include/linux/kconfig.h"
to guarantee it is included from all kernel source files.

fixdep parses every source file and all headers included from it,
searching for words prefixed with "CONFIG_". Then, fixdep finds
CONFIG_CC_VERSION_TEXT in include/config/kconfig.h and adds
include/config/cc/version/text.h into every .*.cmd file.

When the compiler is updated, syncconfig is invoked because init/Kconfig
contains the reference to the environment variable CC_VERTION_TEXT.
CONFIG_CC_VERSION_TEXT is updated to the new version string, and
include/config/cc/version/text.h is touched.

In the next rebuild, Make will rebuild every files since the timestamp
of include/config/cc/version/text.h is newer than that of target.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-05-12 13:28:33 +09:00
Masahiro Yamada e33ae3ed33 kbuild: use $(CC_VERSION_TEXT) to evaluate CC_IS_GCC and CC_IS_CLANG
The result of '$(CC) --version | head -n 1' has already been computed
by the top Makefile, and stored in the environment variable,
CC_VERSION_TEXT.

'echo' is cheaper than the two commands $(CC) and 'head' although this
optimization is not noticeable level.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
2020-05-12 13:28:33 +09:00
Linus Torvalds 78a5255ffb Stop the ad-hoc games with -Wno-maybe-initialized
We have some rather random rules about when we accept the
"maybe-initialized" warnings, and when we don't.

For example, we consider it unreliable for gcc versions < 4.9, but also
if -O3 is enabled, or if optimizing for size.  And then various kernel
config options disabled it, because they know that they trigger that
warning by confusing gcc sufficiently (ie PROFILE_ALL_BRANCHES).

And now gcc-10 seems to be introducing a lot of those warnings too, so
it falls under the same heading as 4.9 did.

At the same time, we have a very straightforward way to _enable_ that
warning when wanted: use "W=2" to enable more warnings.

So stop playing these ad-hoc games, and just disable that warning by
default, with the known and straight-forward "if you want to work on the
extra compiler warnings, use W=123".

Would it be great to have code that is always so obvious that it never
confuses the compiler whether a variable is used initialized or not?
Yes, it would.  In a perfect world, the compilers would be smarter, and
our source code would be simpler.

That's currently not the world we live in, though.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-09 13:57:10 -07:00
Will Deacon 5429ef62bc compiler/gcc: Raise minimum GCC version for kernel builds to 4.8
It is very rare to see versions of GCC prior to 4.8 being used to build
the mainline kernel. These old compilers are also know to have codegen
issues which can lead to silent miscompilation:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145

Raise the minimum GCC version for kernel build to 4.8 and remove some
tautological Kconfig dependencies as a consequence.

Cc: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
2020-04-15 21:36:20 +01:00
Linus Torvalds 87ebc45d2d arm64 fixes:
- Ensure that the compiler and linker versions are aligned so that ld
   doesn't complain about not understanding a .note.gnu.property section
   (emitted when pointer authentication is enabled).
 
 - Force -mbranch-protection=none when the feature is not enabled, in
   case a compiler may choose a different default value.
 
 - Remove CONFIG_DEBUG_ALIGN_RODATA. It was never in defconfig and rarely
   enabled.
 
 - Fix checking 16-bit Thumb-2 instructions checking mask in the
   emulation of the SETEND instruction (it could match the bottom half of
   a 32-bit Thumb-2 instruction).
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAl6PUYAACgkQa9axLQDI
 XvH83g/7B5v0RFqjqVW4/cQKoN1rii7qSA8pBfNgGiCMJKtoGvliAlp3xWEtlW0h
 nYJ4gCvey946r5kvZrjdBXC/Ulo2CcGYtX0n8d+8IB6wXAnGcQ0DUBUFZ4+fAU9Z
 F7+R7its24dma9R1wIFHFmQUdlO+EgQTfQFvhQKYMSNVaFQF73Sp/vk3oKhJ2E0x
 QevgDBQSmmcX3DFxhUW7BdcdboBgtTDUGdhcImdorgp7QmI1r40espJKX4VMKvmb
 pfzwg+i7KM6N1RDhRfA2oFMegXwI3rvM3XesqYaua8+xWD5vJuIQfq+ysEq9F9x/
 Hnu+W9nbcN8RKQ9JToiqkE7ifuOBTvaIJaqsgIXYSqtYjatuPAh85MkrorHi9Ji2
 9i7fc0GMTgtgYDo/93++l8SmmRJMX+h+9KtGtxx39+UqGjToJMCnPGjwBSwe4wdK
 lKOAgj488HHsNwTlrRUnq1hXjNjd1w+ON7JM2L3IyRNX/eWN60VxwzwHkZMByCOj
 jlcY4ISWquigW4w9Sp4nxEhLF9dWT1+OrE33Xh3CUxPU94jSEvgcDHcxuGeGOlrA
 QjN1B2APZFox8XbOsLgeG2kKe5C3Fui90SEn0GyA0ncVLsXDI78VnVJR9uz5+6Pd
 ALVQKkJxswhSDPQFlH+7CmQAcr8jWyLEEvyXXaZsoJmewzCpEPM=
 =pHRG
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:

 - Ensure that the compiler and linker versions are aligned so that ld
   doesn't complain about not understanding a .note.gnu.property section
   (emitted when pointer authentication is enabled).

 - Force -mbranch-protection=none when the feature is not enabled, in
   case a compiler may choose a different default value.

 - Remove CONFIG_DEBUG_ALIGN_RODATA. It was never in defconfig and
   rarely enabled.

 - Fix checking 16-bit Thumb-2 instructions checking mask in the
   emulation of the SETEND instruction (it could match the bottom half
   of a 32-bit Thumb-2 instruction).

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: armv8_deprecated: Fix undef_hook mask for thumb setend
  arm64: remove CONFIG_DEBUG_ALIGN_RODATA feature
  arm64: Always force a branch protection mode when the compiler has one
  arm64: Kconfig: ptrauth: Add binutils version check to fix mismatch
  init/kconfig: Add LD_VERSION Kconfig
2020-04-09 11:04:16 -07:00
Krzysztof Kozlowski 7baf219982 init/Kconfig: clean up ANON_INODES and old IO schedulers options
CONFIG_ANON_INODES is gone since commit 5dd50aaeb1 ("Make anon_inodes
unconditional").

CONFIG_CFQ_GROUP_IOSCHED was replaced with CONFIG_BFQ_GROUP_IOSCHED in
commit f382fb0bce ("block: remove legacy IO schedulers").

Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: http://lkml.kernel.org/r/20200130192419.3026-1-krzk@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:44 -07:00
Andrea Arcangeli 5a281062af userfaultfd: wp: add WP pagetable tracking to x86
Accurate userfaultfd WP tracking is possible by tracking exactly which
virtual memory ranges were writeprotected by userland.  We can't relay
only on the RW bit of the mapped pagetable because that information is
destroyed by fork() or KSM or swap.  If we were to relay on that, we'd
need to stay on the safe side and generate false positive wp faults for
every swapped out page.

[peterx@redhat.com: append _PAGE_UFD_WP to _PAGE_CHG_MASK]
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Link: http://lkml.kernel.org/r/20200220163112.11409-4-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:39 -07:00
Linus Torvalds c48b07226b perf updates all over the place:
core:
 
    - Support for cgroup tracking in samples to allow cgroup based
      analysis
 
  tools:
 
    - Support for cgroup analysis
 
    - Commandline option and hotkey for perf top to change the sort order
 
    - A set of fixes all over the place
 
    - Various build system related improvements
 
    - Updates of the X86 pmu event JSON data
 
    - Documentation updates
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6J2iATHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoXMEEACy6WaNabX7EzkLUnX0WCgxnZVSryNR
 4EnxLJSg5lChEe4q2mOS3mRMpzlXHQieWcFxlwda7kjIbFlgQvjJQUiYlAvxRRO/
 giY/GwCtTi/Flcb+7wKxTgMYmtAUOZDWeQbBZUlFLi9vyeCHVkjget9EyVsgbe/W
 EmRsrPuKOVMUTeEwm3zpIE051DObpiWLNge++My70q5W/yNsS94PbNydgKO7osqk
 pX37YVyBFpI2IQxMGzaE3yK7OxXRjYljZaz1tONFDMEYOX9gmxpDsCCflsP1ZOzL
 4/P4faRvAOElwxtYBelKmRl8eboqhRpTEK0Et0TI0LYbUZrE2nisDi0LTKPWQb0k
 Om2Qi6AfZs67PVzx9htlx6rfee72+sUluz5BDKOGH0pNJ8CFy70ns8InLsZqbBZ7
 SgFVNjx6bHxB58VuVE9WEzr/KVs6zI/SuJlH7WG7FLXbm5j0cETfCjg40JlDpSNh
 CZs+Epky1zyytrVJ9Gc7KnRlw8VB2eWEQ2cQ0sqj5w7WxhfrnsCmQf1zk4sofhOF
 iz3Dvb9Llz/pYWGZbEiQAuI+8bo0psJptidzpmpbIXs/woKDhW49w1FxqowJQMWe
 +lGWcauSfo3gjgEnTkOWAx3yiH4i9rvRChX8Jh0Z07d6Kwf19YYfxcuhlkx0Wutj
 eKaaErWDtWAxpA==
 =2UUD
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-2020-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull more perf updates from Thomas Gleixner:
 "Perf updates all over the place:

  core:

   - Support for cgroup tracking in samples to allow cgroup based
     analysis

  tools:

   - Support for cgroup analysis

   - Commandline option and hotkey for perf top to change the sort order

   - A set of fixes all over the place

   - Various build system related improvements

   - Updates of the X86 pmu event JSON data

   - Documentation updates"

* tag 'perf-urgent-2020-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (55 commits)
  perf python: Fix clang detection to strip out options passed in $CC
  perf tools: Support Python 3.8+ in Makefile
  perf script: Fix invalid read of directory entry after closedir()
  perf script report: Fix SEGFAULT when using DWARF mode
  perf script: add -S/--symbols documentation
  perf pmu-events x86: Use CPU_CLK_UNHALTED.THREAD in Kernel_Utilization metric
  perf events parser: Add missing Intel CPU events to parser
  perf script: Allow --symbol to accept hexadecimal addresses
  perf report/top TUI: Fix title line formatting
  perf top: Support hotkey to change sort order
  perf top: Support --group-sort-idx to change the sort order
  perf symbols: Fix arm64 gap between kernel start and module end
  perf build-test: Honour JOBS to override detection of number of cores
  perf script: Add --show-cgroup-events option
  perf top: Add --all-cgroups option
  perf record: Add --all-cgroups option
  perf record: Support synthesizing cgroup events
  perf report: Add 'cgroup' sort key
  perf cgroup: Maintain cgroup hierarchy
  perf tools: Basic support for CGROUP event
  ...
2020-04-05 12:26:24 -07:00
Ingo Molnar 7dc41b9b99 perf/urgent fixes and improvements:
perf python:
 
   Arnaldo Carvalho de Melo:
 
   - Fix clang detection to strip out options passed in $CC.
 
 build:
 
   He Zhe:
 
   - Normalize gcc parameter when generating arch errno table, fixing
     the build by removing options from $(CC).
 
   Sam Lunt:
 
   - Support Python 3.8+ in Makefile.
 
 perf report/top:
 
   Arnaldo Carvalho de Melo:
 
   - Fix title line formatting.
 
 perf script:
 
   Andreas Gerstmayr:
 
   - Fix SEGFAULT when using DWARF mode.
 
   - Fix invalid read of directory entry after closedir(), found with valgrind.
 
   Hagen Paul Pfeifer:
 
   - Introduce --deltatime option.
 
   Stephane Eranian:
 
   - Allow --symbol to accept hexadecimal addresses.
 
   Ian Rogers:
 
   - Add -S/--symbols documentation
 
   Namhyung Kim:
 
   - Add --show-cgroup-events option.
 
 perf python:
 
   Arnaldo Carvalho de Melo:
 
   - Include rwsem.c in the python binding, needed by the cgroups improvements.
 
 build-test:
 
   Arnaldo Carvalho de Melo:
 
   - Honour JOBS to override detection of number of cores
 
 perf top:
 
   Jin Yao:
 
   - Support --group-sort-idx to change the sort order
 
   - perf top: Support hotkey to change sort order
 
 perf pmu-events x86:
 
   Jin Yao:
 
   - Use CPU_CLK_UNHALTED.THREAD in Kernel_Utilization metric
 
 perf symbols arm64:
 
   Kemeng Shi:
 
   - Fix arm64 gap between kernel start and module end
 
 kernel perf subsystem:
 
   Namhyung Kim:
 
   - Add PERF_RECORD_CGROUP event and Add PERF_SAMPLE_CGROUP feature,
     to allow cgroup tracking, saving a link between cgroup path and
     its id number.
 
 perf cgroup:
 
   Namhyung Kim:
 
   - Maintain cgroup hierarchy.
 
 perf report:
 
   Namhyung Kim:
 
   - Add 'cgroup' sort key.
 
 perf record:
 
   Namhyung Kim:
 
   - Support synthesizing cgroup events for pre-existing cgroups.
 
   - Add --all-cgroups option
 
 Documentation:
 
   Tony Jones:
 
   - Update docs regarding kernel/user space unwinding.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCXodMtwAKCRCyPKLppCJ+
 J+leAP0Ws0dGzMSIwcMVc7zvK1IsYOTlZ8lYXJePxD+Po/YPdAEA6Squf4gwZ2wm
 b9R7w50dlCkMJ9LaueCeZZjh/4asFwQ=
 =auOb
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-for-mingo-5.7-20200403' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

Pull perf/urgent fixes and improvements from Arnaldo Carvalho de Melo:

perf python:

  Arnaldo Carvalho de Melo:

  - Fix clang detection to strip out options passed in $CC.

build:

  He Zhe:

  - Normalize gcc parameter when generating arch errno table, fixing
    the build by removing options from $(CC).

  Sam Lunt:

  - Support Python 3.8+ in Makefile.

perf report/top:

  Arnaldo Carvalho de Melo:

  - Fix title line formatting.

perf script:

  Andreas Gerstmayr:

  - Fix SEGFAULT when using DWARF mode.

  - Fix invalid read of directory entry after closedir(), found with valgrind.

  Hagen Paul Pfeifer:

  - Introduce --deltatime option.

  Stephane Eranian:

  - Allow --symbol to accept hexadecimal addresses.

  Ian Rogers:

  - Add -S/--symbols documentation

  Namhyung Kim:

  - Add --show-cgroup-events option.

perf python:

  Arnaldo Carvalho de Melo:

  - Include rwsem.c in the python binding, needed by the cgroups improvements.

build-test:

  Arnaldo Carvalho de Melo:

  - Honour JOBS to override detection of number of cores

perf top:

  Jin Yao:

  - Support --group-sort-idx to change the sort order

  - perf top: Support hotkey to change sort order

perf pmu-events x86:

  Jin Yao:

  - Use CPU_CLK_UNHALTED.THREAD in Kernel_Utilization metric

perf symbols arm64:

  Kemeng Shi:

  - Fix arm64 gap between kernel start and module end

kernel perf subsystem:

  Namhyung Kim:

  - Add PERF_RECORD_CGROUP event and Add PERF_SAMPLE_CGROUP feature,
    to allow cgroup tracking, saving a link between cgroup path and
    its id number.

perf cgroup:

  Namhyung Kim:

  - Maintain cgroup hierarchy.

perf report:

  Namhyung Kim:

  - Add 'cgroup' sort key.

perf record:

  Namhyung Kim:

  - Support synthesizing cgroup events for pre-existing cgroups.

  - Add --all-cgroups option

Documentation:

  Tony Jones:

  - Update docs regarding kernel/user space unwinding.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-04-04 10:35:15 +02:00
Amit Daniel Kachhap 9553d16fa6 init/kconfig: Add LD_VERSION Kconfig
This option can be used in Kconfig files to compare the ld version
and enable/disable incompatible config options if required.

This option is used in the subsequent patch along with GCC_VERSION to
filter out an incompatible feature.

Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-04-01 13:18:38 +01:00
Linus Torvalds 29d9f30d4c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from David Miller:
 "Highlights:

   1) Fix the iwlwifi regression, from Johannes Berg.

   2) Support BSS coloring and 802.11 encapsulation offloading in
      hardware, from John Crispin.

   3) Fix some potential Spectre issues in qtnfmac, from Sergey
      Matyukevich.

   4) Add TTL decrement action to openvswitch, from Matteo Croce.

   5) Allow paralleization through flow_action setup by not taking the
      RTNL mutex, from Vlad Buslov.

   6) A lot of zero-length array to flexible-array conversions, from
      Gustavo A. R. Silva.

   7) Align XDP statistics names across several drivers for consistency,
      from Lorenzo Bianconi.

   8) Add various pieces of infrastructure for offloading conntrack, and
      make use of it in mlx5 driver, from Paul Blakey.

   9) Allow using listening sockets in BPF sockmap, from Jakub Sitnicki.

  10) Lots of parallelization improvements during configuration changes
      in mlxsw driver, from Ido Schimmel.

  11) Add support to devlink for generic packet traps, which report
      packets dropped during ACL processing. And use them in mlxsw
      driver. From Jiri Pirko.

  12) Support bcmgenet on ACPI, from Jeremy Linton.

  13) Make BPF compatible with RT, from Thomas Gleixnet, Alexei
      Starovoitov, and your's truly.

  14) Support XDP meta-data in virtio_net, from Yuya Kusakabe.

  15) Fix sysfs permissions when network devices change namespaces, from
      Christian Brauner.

  16) Add a flags element to ethtool_ops so that drivers can more simply
      indicate which coalescing parameters they actually support, and
      therefore the generic layer can validate the user's ethtool
      request. Use this in all drivers, from Jakub Kicinski.

  17) Offload FIFO qdisc in mlxsw, from Petr Machata.

  18) Support UDP sockets in sockmap, from Lorenz Bauer.

  19) Fix stretch ACK bugs in several TCP congestion control modules,
      from Pengcheng Yang.

  20) Support virtual functiosn in octeontx2 driver, from Tomasz
      Duszynski.

  21) Add region operations for devlink and use it in ice driver to dump
      NVM contents, from Jacob Keller.

  22) Add support for hw offload of MACSEC, from Antoine Tenart.

  23) Add support for BPF programs that can be attached to LSM hooks,
      from KP Singh.

  24) Support for multiple paths, path managers, and counters in MPTCP.
      From Peter Krystad, Paolo Abeni, Florian Westphal, Davide Caratti,
      and others.

  25) More progress on adding the netlink interface to ethtool, from
      Michal Kubecek"

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2121 commits)
  net: ipv6: rpl_iptunnel: Fix potential memory leak in rpl_do_srh_inline
  cxgb4/chcr: nic-tls stats in ethtool
  net: dsa: fix oops while probing Marvell DSA switches
  net/bpfilter: remove superfluous testing message
  net: macb: Fix handling of fixed-link node
  net: dsa: ksz: Select KSZ protocol tag
  netdevsim: dev: Fix memory leak in nsim_dev_take_snapshot_write
  net: stmmac: add EHL 2.5Gbps PCI info and PCI ID
  net: stmmac: add EHL PSE0 & PSE1 1Gbps PCI info and PCI ID
  net: stmmac: create dwmac-intel.c to contain all Intel platform
  net: dsa: bcm_sf2: Support specifying VLAN tag egress rule
  net: dsa: bcm_sf2: Add support for matching VLAN TCI
  net: dsa: bcm_sf2: Move writing of CFP_DATA(5) into slicing functions
  net: dsa: bcm_sf2: Check earlier for FLOW_EXT and FLOW_MAC_EXT
  net: dsa: bcm_sf2: Disable learning for ASP port
  net: dsa: b53: Deny enslaving port 7 for 7278 into a bridge
  net: dsa: b53: Prevent tagged VLAN on port 7 for 7278
  net: dsa: b53: Restore VLAN entries upon (re)configuration
  net: dsa: bcm_sf2: Fix overflow checks
  hv_netvsc: Remove unnecessary round_up for recv_completion_cnt
  ...
2020-03-31 17:29:33 -07:00
Linus Torvalds 5b67fbfc32 Kbuild updates for v5.7
[Build system]
 
  - add CONFIG_UNUSED_KSYMS_WHITELIST, which will be useful to define
    a fixed set of export symbols for Generic Kernel Image (GKI)
 
  - allow to run 'make dt_binding_check' without .config
 
  - use full schema for checking DT examples in *.yaml files
 
  - make modpost fail for missing MODULE_IMPORT_NS(), which makes more
    sense because we know the produced modules are never loadable
 
  - Remove unused 'AS' variable
 
 [Kconfig]
 
  - sanitize DEFCONFIG_LIST, and remove ARCH_DEFCONFIG from Kconfig files
 
  - relax the 'imply' behavior so that symbols implied by y can become m
 
  - make 'imply' obey 'depends on' in order to make 'imply' really weak
 
 [Misc]
 
  - add documentation on building the kernel with Clang/LLVM
 
  - revive __HAVE_ARCH_STRLEN for 32bit sparc to use optimized strlen()
 
  - fix warning from deb-pkg builds when CONFIG_DEBUG_INFO=n
 
  - various script and Makefile cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAl6DbP8VHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGAfkQALZqMCqtX9cAJej04+lnBCzwVPep
 6s8/s6vW6PF92sHv+SJtHvKSnDekcZT2xT8dkPDaVmuOye8xhENs5dFZ4tSKO5D0
 F8YkkM17mu/cylNZ2UCy/8weh6/TjsD7pa+mFqWo/++30JiXm12v3mVFR568KPXI
 kFau/3ALvY1NIr2wUAI2SOd6A4v/Epzpk0ltnFg3f5iWVFKlE03MGueAF+YZzq7v
 UrU73HdUxF/SBW2Jz3UtV9XY8P38uQmmtoDE8SZikG4PjW03q9w6pnhntDBl/H2b
 dZFg40eG7SHXN4L+OOI32ae9jePHvKpsnjeaeNoT/DZpwpuuxXu7C2EmUy+wCAnM
 Rw4+kiAVNppRMRH1GTdp1XjLY6PwPqizzZGmufwX+W3MI8oZdlLSUJLbrO73P/aF
 QR3MgkJkjvgmRVPP9fr8SNcZ39tDGI4KqLdWvjVVSC/s86aDnw/34puEfw0lj4vs
 gCi923iJQ7Y/QWX63TYZhy96pnedlwE2s6aR1InVER3+XMH9K1nW34CDaKQsp1CB
 6zyrd40+K5ETOKo3OAjq4FttlhRkEpX9nIsffCzOz6tybysHTSrCzYhfjpIAzzYj
 Et5HpXbegHShIqN44yqBumt6YkTZac6Aub9FzInW2LPzZgiofDaNesDQmnQmIZOa
 JlUyBrjXRfwkvCH0
 =wT8A
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:
 "Build system:

   - add CONFIG_UNUSED_KSYMS_WHITELIST, which will be useful to define a
     fixed set of export symbols for Generic Kernel Image (GKI)

   - allow to run 'make dt_binding_check' without .config

   - use full schema for checking DT examples in *.yaml files

   - make modpost fail for missing MODULE_IMPORT_NS(), which makes more
     sense because we know the produced modules are never loadable

   - Remove unused 'AS' variable

  Kconfig:

   - sanitize DEFCONFIG_LIST, and remove ARCH_DEFCONFIG from Kconfig
     files

   - relax the 'imply' behavior so that symbols implied by 'y' can
     become 'm'

   - make 'imply' obey 'depends on' in order to make 'imply' really weak

  Misc:

   - add documentation on building the kernel with Clang/LLVM

   - revive __HAVE_ARCH_STRLEN for 32bit sparc to use optimized strlen()

   - fix warning from deb-pkg builds when CONFIG_DEBUG_INFO=n

   - various script and Makefile cleanups"

* tag 'kbuild-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (34 commits)
  Makefile: Update kselftest help information
  kbuild: deb-pkg: fix warning when CONFIG_DEBUG_INFO is unset
  kbuild: add outputmakefile to no-dot-config-targets
  kbuild: remove AS variable
  net: wan: wanxl: refactor the firmware rebuild rule
  net: wan: wanxl: use $(M68KCC) instead of $(M68KAS) for rebuilding firmware
  net: wan: wanxl: use allow to pass CROSS_COMPILE_M68k for rebuilding firmware
  kbuild: add comment about grouped target
  kbuild: add -Wall to KBUILD_HOSTCXXFLAGS
  kconfig: remove unused variable in qconf.cc
  sparc: revive __HAVE_ARCH_STRLEN for 32bit sparc
  kbuild: refactor Makefile.dtbinst more
  kbuild: compute the dtbs_install destination more simply
  Makefile: disallow data races on gcc-10 as well
  kconfig: make 'imply' obey the direct dependency
  kconfig: allow symbols implied by y to become m
  net: drop_monitor: use IS_REACHABLE() to guard net_dm_hw_report()
  modpost: return error if module is missing ns imports and MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS=n
  modpost: rework and consolidate logging interface
  kbuild: allow to run dt_binding_check without kernel configuration
  ...
2020-03-31 16:03:39 -07:00
David S. Miller ed52f2c608 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30 19:52:37 -07:00
Linus Torvalds 642e53ead6 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The main changes in this cycle are:

   - Various NUMA scheduling updates: harmonize the load-balancer and
     NUMA placement logic to not work against each other. The intended
     result is better locality, better utilization and fewer migrations.

   - Introduce Thermal Pressure tracking and optimizations, to improve
     task placement on thermally overloaded systems.

   - Implement frequency invariant scheduler accounting on (some) x86
     CPUs. This is done by observing and sampling the 'recent' CPU
     frequency average at ~tick boundaries. The CPU provides this data
     via the APERF/MPERF MSRs. This hopefully makes our capacity
     estimates more precise and keeps tasks on the same CPU better even
     if it might seem overloaded at a lower momentary frequency. (As
     usual, turbo mode is a complication that we resolve by observing
     the maximum frequency and renormalizing to it.)

   - Add asymmetric CPU capacity wakeup scan to improve capacity
     utilization on asymmetric topologies. (big.LITTLE systems)

   - PSI fixes and optimizations.

   - RT scheduling capacity awareness fixes & improvements.

   - Optimize the CONFIG_RT_GROUP_SCHED constraints code.

   - Misc fixes, cleanups and optimizations - see the changelog for
     details"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (62 commits)
  threads: Update PID limit comment according to futex UAPI change
  sched/fair: Fix condition of avg_load calculation
  sched/rt: cpupri_find: Trigger a full search as fallback
  kthread: Do not preempt current task if it is going to call schedule()
  sched/fair: Improve spreading of utilization
  sched: Avoid scale real weight down to zero
  psi: Move PF_MEMSTALL out of task->flags
  MAINTAINERS: Add maintenance information for psi
  psi: Optimize switching tasks inside shared cgroups
  psi: Fix cpu.pressure for cpu.max and competing cgroups
  sched/core: Distribute tasks within affinity masks
  sched/fair: Fix enqueue_task_fair warning
  thermal/cpu-cooling, sched/core: Move the arch_set_thermal_pressure() API to generic scheduler code
  sched/rt: Remove unnecessary push for unfit tasks
  sched/rt: Allow pulling unfitting task
  sched/rt: Optimize cpupri_find() on non-heterogenous systems
  sched/rt: Re-instate old behavior in select_task_rq_rt()
  sched/rt: cpupri_find: Implement fallback mechanism for !fit case
  sched/fair: Fix reordering of enqueue/dequeue_task_fair()
  sched/fair: Fix runnable_avg for throttled cfs
  ...
2020-03-30 17:01:51 -07:00
KP Singh 4edf16b72c bpf, lsm: Make BPF_LSM depend on BPF_EVENTS
LSM and tracing programs share their helpers with bpf_tracing_func_proto
which is only defined (in bpf_trace.c) when BPF_EVENTS is enabled.

Instead of adding __weak symbol, make BPF_LSM depend on BPF_EVENTS so
that both tracing and LSM programs can actually share helpers.

Fixes: fc611f47f2 ("bpf: Introduce BPF_PROG_TYPE_LSM")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200330204059.13024-1-kpsingh@chromium.org
2020-03-30 22:57:23 +02:00
KP Singh fc611f47f2 bpf: Introduce BPF_PROG_TYPE_LSM
Introduce types and configs for bpf programs that can be attached to
LSM hooks. The programs can be enabled by the config option
CONFIG_BPF_LSM.

Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Brendan Jackman <jackmanb@google.com>
Reviewed-by: Florent Revest <revest@google.com>
Reviewed-by: Thomas Garnier <thgarnie@google.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Link: https://lore.kernel.org/bpf/20200329004356.27286-2-kpsingh@chromium.org
2020-03-30 01:34:00 +02:00
Namhyung Kim 6546b19f95 perf/core: Add PERF_SAMPLE_CGROUP feature
The PERF_SAMPLE_CGROUP bit is to save (perf_event) cgroup information in
the sample.  It will add a 64-bit id to identify current cgroup and it's
the file handle in the cgroup file system.  Userspace should use this
information with PERF_RECORD_CGROUP event to match which cgroup it
belongs.

I put it before PERF_SAMPLE_AUX for simplicity since it just needs a
64-bit word.  But if we want bigger samples, I can work on that
direction too.

Committer testing:

  $ pahole perf_sample_data | grep -w cgroup -B5 -A5
  	/* --- cacheline 4 boundary (256 bytes) was 56 bytes ago --- */
  	struct perf_regs           regs_intr;            /*   312    16 */
  	/* --- cacheline 5 boundary (320 bytes) was 8 bytes ago --- */
  	u64                        stack_user_size;      /*   328     8 */
  	u64                        phys_addr;            /*   336     8 */
  	u64                        cgroup;               /*   344     8 */

  	/* size: 384, cachelines: 6, members: 22 */
  	/* padding: 32 */
  };
  $

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Zefan Li <lizefan@huawei.com>
Link: http://lore.kernel.org/lkml/20200325124536.2800725-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-03-27 10:41:44 -03:00
Masahiro Yamada 3a7c733165 int128: fix __uint128_t compiler test in Kconfig
The support for __uint128_t is dependent on the target bit size.

GCC that defaults to the 32-bit can still build the 64-bit kernel
with -m64 flag passed.

However, $(cc-option,-D__SIZEOF_INT128__=0) is evaluated against the
default machine bit, which may not match to the kernel it is building.

Theoretically, this could be evaluated separately for 64BIT/32BIT.

  config CC_HAS_INT128
          bool
          default !$(cc-option,$(m64-flag) -D__SIZEOF_INT128__=0) if 64BIT
          default !$(cc-option,$(m32-flag) -D__SIZEOF_INT128__=0)

I simplified it more because the 32-bit compiler is unlikely to support
__uint128_t.

Fixes: c12d3362a7 ("int128: move __uint128_t compiler test to Kconfig")
Reported-by: George Spelvin <lkml@sdf.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: George Spelvin <lkml@sdf.org>
2020-03-12 07:43:03 +09:00
Thara Gopinath 765047932f sched/pelt: Add support to track thermal pressure
Extrapolating on the existing framework to track rt/dl utilization using
pelt signals, add a similar mechanism to track thermal pressure. The
difference here from rt/dl utilization tracking is that, instead of
tracking time spent by a CPU running a RT/DL task through util_avg, the
average thermal pressure is tracked through load_avg. This is because
thermal pressure signal is weighted time "delta" capacity unlike util_avg
which is binary. "delta capacity" here means delta between the actual
capacity of a CPU and the decreased capacity a CPU due to a thermal event.

In order to track average thermal pressure, a new sched_avg variable
avg_thermal is introduced. Function update_thermal_load_avg can be called
to do the periodic bookkeeping (accumulate, decay and average) of the
thermal pressure.

Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200222005213.3873-2-thara.gopinath@linaro.org
2020-03-06 12:57:17 +01:00
Quentin Perret 1518c633df kbuild: allow symbol whitelisting with TRIM_UNUSED_KSYMS
CONFIG_TRIM_UNUSED_KSYMS currently removes all unused exported symbols
from ksymtab. This works really well when using in-tree drivers, but
cannot be used in its current form if some of them are out-of-tree.

Indeed, even if the list of symbols required by out-of-tree drivers is
known at compile time, the only solution today to guarantee these don't
get trimmed is to set CONFIG_TRIM_UNUSED_KSYMS=n. This not only wastes
space, but also makes it difficult to control the ABI usable by vendor
modules in distribution kernels such as Android. Being able to control
the kernel ABI surface is particularly useful to ship a unique Generic
Kernel Image (GKI) for all vendors, which is a first step in the
direction of getting all vendors to contribute their code upstream.

As such, attempt to improve the situation by enabling users to specify a
symbol 'whitelist' at compile time. Any symbol specified in this
whitelist will be kept exported when CONFIG_TRIM_UNUSED_KSYMS is set,
even if it has no in-tree user. The whitelist is defined as a simple
text file, listing symbols, one per line.

Acked-by: Jessica Yu <jeyu@kernel.org>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Tested-by: Matthias Maennich <maennich@google.com>
Reviewed-by: Matthias Maennich <maennich@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-03-03 20:49:21 +09:00
Masahiro Yamada 2a86f66121 kbuild: use KBUILD_DEFCONFIG as the fallback for DEFCONFIG_LIST
Most of the Kconfig commands (except defconfig and all*config) read
the .config file as a base set of CONFIG options.

When it does not exist, the files in DEFCONFIG_LIST are searched in
this order and loaded if found.

I do not see much sense in the last two lines in DEFCONFIG_LIST.

[1] ARCH_DEFCONFIG

The entry for DEFCONFIG_LIST is guarded by 'depends on !UML'. So, the
ARCH_DEFCONFIG definition in arch/x86/um/Kconfig is meaningless.

arch/{sh,sparc,x86}/Kconfig define ARCH_DEFCONFIG depending on 32 or
64 bit variant symbols. This is a little bit strange; ARCH_DEFCONFIG
should be a fixed string because the base config file is loaded before
the symbol evaluation stage.

Using KBUILD_DEFCONFIG makes more sense because it is fixed before
Kconfig is invoked. Fortunately, arch/{sh,sparc,x86}/Makefile define it
in the same way, and it works as expected. Hence, replace ARCH_DEFCONFIG
with "arch/$(SRCARCH)/configs/$(KBUILD_DEFCONFIG)".

[2] arch/$(ARCH)/defconfig

This file path is no longer valid. The defconfig files are always located
in the arch configs/ directories.

  $ find arch -name defconfig | sort
  arch/alpha/configs/defconfig
  arch/arm64/configs/defconfig
  arch/csky/configs/defconfig
  arch/nds32/configs/defconfig
  arch/riscv/configs/defconfig
  arch/s390/configs/defconfig
  arch/unicore32/configs/defconfig

The path arch/*/configs/defconfig is already covered by
"arch/$(SRCARCH)/configs/$(KBUILD_DEFCONFIG)". So, this file path is
not necessary.

I moved the default KBUILD_DEFCONFIG to the top Makefile. Otherwise,
the 7 architectures listed above would end up with endless loop of
syncconfig.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-03-03 20:49:21 +09:00
Linus Torvalds 91ad64a84e Tracing updates:
Change in API of bootconfig (before it comes live in a release)
   - Have a magic value "BOOTCONFIG" in initrd to know a bootconfig exists
   - Set CONFIG_BOOT_CONFIG to 'n' by default
   - Show error if "bootconfig" on cmdline but not compiled in
   - Prevent redefining the same value
   - Have a way to append values
   - Added a SELECT BLK_DEV_INITRD to fix a build failure
 
  Synthetic event fixes:
   - Switch to raw_smp_processor_id() for recording CPU value in preempt
     section. (No care for what the value actually is)
   - Fix samples always recording u64 values
   - Fix endianess
   - Check number of values matches number of fields
   - Fix a printing bug
 
  Fix of trace_printk() breaking postponed start up tests
 
  Make a function static that is only used in a single file.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXlW4vxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qtioAP0WLEm3dWO0z3321h/a0DSshC+Bslu3
 HDPTsGVGrXmvggEA/lr1ikRHd8PsO7zW8BfaZMxoXaTqXiuSrzEWxnMlFw0=
 =O8PM
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing and bootconfig updates:
 "Fixes and changes to bootconfig before it goes live in a release.

  Change in API of bootconfig (before it comes live in a release):
  - Have a magic value "BOOTCONFIG" in initrd to know a bootconfig
    exists
  - Set CONFIG_BOOT_CONFIG to 'n' by default
  - Show error if "bootconfig" on cmdline but not compiled in
  - Prevent redefining the same value
  - Have a way to append values
  - Added a SELECT BLK_DEV_INITRD to fix a build failure

  Synthetic event fixes:
  - Switch to raw_smp_processor_id() for recording CPU value in preempt
    section. (No care for what the value actually is)
  - Fix samples always recording u64 values
  - Fix endianess
  - Check number of values matches number of fields
  - Fix a printing bug

  Fix of trace_printk() breaking postponed start up tests

  Make a function static that is only used in a single file"

* tag 'trace-v5.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  bootconfig: Fix CONFIG_BOOTTIME_TRACING dependency issue
  bootconfig: Add append value operator support
  bootconfig: Prohibit re-defining value on same key
  bootconfig: Print array as multiple commands for legacy command line
  bootconfig: Reject subkey and value on same parent key
  tools/bootconfig: Remove unneeded error message silencer
  bootconfig: Add bootconfig magic word for indicating bootconfig explicitly
  bootconfig: Set CONFIG_BOOT_CONFIG=n by default
  tracing: Clear trace_state when starting trace
  bootconfig: Mark boot_config_checksum() static
  tracing: Disable trace_printk() on post poned tests
  tracing: Have synthetic event test use raw_smp_processor_id()
  tracing: Fix number printing bug in print_synth_event()
  tracing: Check that number of vals matches number of synth event fields
  tracing: Make synth_event trace functions endian-correct
  tracing: Make sure synth_event_trace() example always uses u64
2020-02-26 10:34:42 -08:00
Masami Hiramatsu 2910b5aa6f bootconfig: Fix CONFIG_BOOTTIME_TRACING dependency issue
Since commit d8a953ddde ("bootconfig: Set CONFIG_BOOT_CONFIG=n by
default") also changed the CONFIG_BOOTTIME_TRACING to select
CONFIG_BOOT_CONFIG to show the boot-time tracing on the menu,
it introduced wrong dependencies with BLK_DEV_INITRD as below.

WARNING: unmet direct dependencies detected for BOOT_CONFIG
  Depends on [n]: BLK_DEV_INITRD [=n]
  Selected by [y]:
  - BOOTTIME_TRACING [=y] && TRACING_SUPPORT [=y] && FTRACE [=y] && TRACING [=y]

This makes the CONFIG_BOOT_CONFIG selects CONFIG_BLK_DEV_INITRD to
fix this error and make CONFIG_BOOTTIME_TRACING=n by default, so
that both boot-time tracing and boot configuration off but those
appear on the menu list.

Link: http://lkml.kernel.org/r/158264140162.23842.11237423518607465535.stgit@devnote2

Fixes: d8a953ddde ("bootconfig: Set CONFIG_BOOT_CONFIG=n by default")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Compiled-tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-02-25 19:07:58 -05:00
Masami Hiramatsu 85c46b78da bootconfig: Add bootconfig magic word for indicating bootconfig explicitly
Add bootconfig magic word to the end of bootconfig on initrd
image for indicating explicitly the bootconfig is there.
Also tools/bootconfig treats wrong size or wrong checksum or
parse error as an error, because if there is a bootconfig magic
word, there must be a bootconfig.

The bootconfig magic word is "#BOOTCONFIG\n", 12 bytes word.
Thus the block image of the initrd file with bootconfig is
as follows.

[Initrd][bootconfig][size][csum][#BOOTCONFIG\n]

Link: http://lkml.kernel.org/r/158220112263.26565.3944814205960612841.stgit@devnote2

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-02-20 17:52:34 -05:00
Masami Hiramatsu d8a953ddde bootconfig: Set CONFIG_BOOT_CONFIG=n by default
Set CONFIG_BOOT_CONFIG=n by default. This also warns
user if CONFIG_BOOT_CONFIG=n but "bootconfig" is given
in the kernel command line.

Link: http://lkml.kernel.org/r/158220111291.26565.9036889083940367969.stgit@devnote2

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-02-20 17:52:12 -05:00
Linus Torvalds 61a7595403 Various fixes:
- Fix an uninitialized variable
 
  - Fix compile bug to bootconfig userspace tool (in tools directory)
 
  - Suppress some error messages of bootconfig userspace tool
 
  - Remove unneded CONFIG_LIBXBC from bootconfig
 
  - Allocate bootconfig xbc_nodes dynamically.
    To ease complaints about taking up static memory at boot up
 
  - Use of parse_args() to parse bootconfig instead of strstr() usage
    Prevents issues of double quotes containing the interested string
 
  - Fix missing ring_buffer_nest_end() on synthetic event error path
 
  - Return zero not -EINVAL on soft disabled synthetic event
    (soft disabling must be the same as hard disabling, which returns zero)
 
  - Consolidate synthetic event code (remove duplicate code)
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXkMTwRQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qnJQAQD5aKiD6jx+zGtLsFTuZMcEGvhhUuJ6
 oUaSvhKO3UqezwD/V5avKuuC0wWt//gOWDY+0+4QNjmvn1GLnaNYohs6fg0=
 =NLT9
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "Various fixes:

   - Fix an uninitialized variable

   - Fix compile bug to bootconfig userspace tool (in tools directory)

   - Suppress some error messages of bootconfig userspace tool

   - Remove unneded CONFIG_LIBXBC from bootconfig

   - Allocate bootconfig xbc_nodes dynamically. To ease complaints about
     taking up static memory at boot up

   - Use of parse_args() to parse bootconfig instead of strstr() usage
     Prevents issues of double quotes containing the interested string

   - Fix missing ring_buffer_nest_end() on synthetic event error path

   - Return zero not -EINVAL on soft disabled synthetic event (soft
     disabling must be the same as hard disabling, which returns zero)

   - Consolidate synthetic event code (remove duplicate code)"

* tag 'trace-v5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Consolidate trace() functions
  tracing: Don't return -EINVAL when tracing soft disabled synth events
  tracing: Add missing nest end to synth_event_trace_start() error case
  tools/bootconfig: Suppress non-error messages
  bootconfig: Allocate xbc_nodes array dynamically
  bootconfig: Use parse_args() to find bootconfig and '--'
  tracing/kprobe: Fix uninitialized variable bug
  bootconfig: Remove unneeded CONFIG_LIBXBC
  tools/bootconfig: Fix wrong __VA_ARGS__ usage
2020-02-11 16:39:18 -08:00
Masami Hiramatsu 26445f98ea bootconfig: Remove unneeded CONFIG_LIBXBC
Since there is no user except CONFIG_BOOT_CONFIG and no plan
to use it from other functions, CONFIG_LIBXBC can be removed
and we can use CONFIG_BOOT_CONFIG directly.

Link: http://lkml.kernel.org/r/158098769281.939.16293492056419481105.stgit@devnote2

Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-02-10 12:07:42 -05:00
Linus Torvalds e310396bb8 Tracing updates:
- Added new "bootconfig".
    Looks for a file appended to initrd to add boot config options.
    This has been discussed thoroughly at Linux Plumbers.
    Very useful for adding kprobes at bootup.
    Only enabled if "bootconfig" is on the real kernel command line.
 
  - Created dynamic event creation.
    Merges common code between creating synthetic events and
      kprobe events.
 
  - Rename perf "ring_buffer" structure to "perf_buffer"
 
  - Rename ftrace "ring_buffer" structure to "trace_buffer"
    Had to rename existing "trace_buffer" to "array_buffer"
 
  - Allow trace_printk() to work withing (some) tracing code.
 
  - Sort of tracing configs to be a little better organized
 
  - Fixed bug where ftrace_graph hash was not being protected properly
 
  - Various other small fixes and clean ups
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXjtAURQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qshOAQDzopQmvAVrrI6oogghr8JQA30Z2yqT
 i+Ld7vPWL2MV9wEA1S+zLGDSYrj8f/vsCq6BxRYT1ApO+YtmY6LTXiUejwg=
 =WNds
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:

 - Added new "bootconfig".

   This looks for a file appended to initrd to add boot config options,
   and has been discussed thoroughly at Linux Plumbers.

   Very useful for adding kprobes at bootup.

   Only enabled if "bootconfig" is on the real kernel command line.

 - Created dynamic event creation.

   Merges common code between creating synthetic events and kprobe
   events.

 - Rename perf "ring_buffer" structure to "perf_buffer"

 - Rename ftrace "ring_buffer" structure to "trace_buffer"

   Had to rename existing "trace_buffer" to "array_buffer"

 - Allow trace_printk() to work withing (some) tracing code.

 - Sort of tracing configs to be a little better organized

 - Fixed bug where ftrace_graph hash was not being protected properly

 - Various other small fixes and clean ups

* tag 'trace-v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (88 commits)
  bootconfig: Show the number of nodes on boot message
  tools/bootconfig: Show the number of bootconfig nodes
  bootconfig: Add more parse error messages
  bootconfig: Use bootconfig instead of boot config
  ftrace: Protect ftrace_graph_hash with ftrace_sync
  ftrace: Add comment to why rcu_dereference_sched() is open coded
  tracing: Annotate ftrace_graph_notrace_hash pointer with __rcu
  tracing: Annotate ftrace_graph_hash pointer with __rcu
  bootconfig: Only load bootconfig if "bootconfig" is on the kernel cmdline
  tracing: Use seq_buf for building dynevent_cmd string
  tracing: Remove useless code in dynevent_arg_pair_add()
  tracing: Remove check_arg() callbacks from dynevent args
  tracing: Consolidate some synth_event_trace code
  tracing: Fix now invalid var_ref_vals assumption in trace action
  tracing: Change trace_boot to use synth_event interface
  tracing: Move tracing selftests to bottom of menu
  tracing: Move mmio tracer config up with the other tracers
  tracing: Move tracing test module configs together
  tracing: Move all function tracing configs together
  tracing: Documentation for in-kernel synthetic event API
  ...
2020-02-06 07:12:11 +00:00
Linus Torvalds fad7bdc9b0 This pull request contains the following changes for UML:
- Fix for time travel mode
 - Disable CONFIG_CONSTRUCTORS again
 - A new command line option to have an non-raw serial line
 - Preparations to remove obsolete UML network drivers
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAl4k2EYWHHJpY2hhcmRA
 c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wTe2EACDEsoWZvvKnocFH/umFfZdxciU
 Ys5noEPElnILVIwV+Gm9SHq/RQWzG8BqSOirfOn1iGhEqWjDTPzwqPuqFGxKtRVp
 VoaYDA506oDH903i4vj1OuGDHxgModEmR/GFqU9uEtXUws2qbeZQcG0COkquJU8X
 URMz4XB+KLqDI2TvOTnbWevjJnslwLIqRuDdZ2q0d685J1XhRhuq/srgZGMiUpGn
 4H/E4k0UxlC082oh9QWRFYYyc6vhyvlguupphzBgICZQmP4P4ck3pe23OT+vOWBl
 +e2ti9MlB9/Tv3dGhzmq2180U0D74RvtHIi7RjUdaTcEoOkgDwXqKsZ1CY4kCV78
 mxrXHCE6YUMvsQcTBxobXYD/zUXeqXtlSHyGQ4MUATCvI6ag8vWKWjGXV/kDVWdf
 FEeL0O6AHjruTrPxi1aSJ3TFG+JerXCGZpSt2DG67sCcWJ/RqYnrs45DF4U6ywf4
 BQ/nA0bpdZouLrhtCS6yBRvPiA5TVXHmrQMpK/LsOpBD4sKCV+MXghbYoWAwcSoM
 H+RSpf1em3zQrlRcuNPW8XGVkqOmUKn9pFzT9ybWv0h2hVhrDiutjJEPgbpJooIr
 yB0G/MVTtk3Xrok2lq8TT+Hp13TWCTFynsmKYvgv4s37p5jA5fvKL0vhdhIlAxHE
 FCyGsZIkAcMLfjvC3Q==
 =yi/o
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml

Pull UML updates from Anton Ivanov:
 "I am sending this on behalf of Richard who is traveling.

  This contains the following changes for UML:

   - Fix for time travel mode

   - Disable CONFIG_CONSTRUCTORS again

   - A new command line option to have an non-raw serial line

   - Preparations to remove obsolete UML network drivers"

* tag 'for-linus-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
  um: Fix time-travel=inf-cpu with xor/raid6
  Revert "um: Enable CONFIG_CONSTRUCTORS"
  um: Mark non-vector net transports as obsolete
  um: Add an option to make serial driver non-raw
2020-01-28 18:29:25 -08:00
Linus Torvalds bd2463ac7d Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from David Miller:

 1) Add WireGuard

 2) Add HE and TWT support to ath11k driver, from John Crispin.

 3) Add ESP in TCP encapsulation support, from Sabrina Dubroca.

 4) Add variable window congestion control to TIPC, from Jon Maloy.

 5) Add BCM84881 PHY driver, from Russell King.

 6) Start adding netlink support for ethtool operations, from Michal
    Kubecek.

 7) Add XDP drop and TX action support to ena driver, from Sameeh
    Jubran.

 8) Add new ipv4 route notifications so that mlxsw driver does not have
    to handle identical routes itself. From Ido Schimmel.

 9) Add BPF dynamic program extensions, from Alexei Starovoitov.

10) Support RX and TX timestamping in igc, from Vinicius Costa Gomes.

11) Add support for macsec HW offloading, from Antoine Tenart.

12) Add initial support for MPTCP protocol, from Christoph Paasch,
    Matthieu Baerts, Florian Westphal, Peter Krystad, and many others.

13) Add Octeontx2 PF support, from Sunil Goutham, Geetha sowjanya, Linu
    Cherian, and others.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1469 commits)
  net: phy: add default ARCH_BCM_IPROC for MDIO_BCM_IPROC
  udp: segment looped gso packets correctly
  netem: change mailing list
  qed: FW 8.42.2.0 debug features
  qed: rt init valid initialization changed
  qed: Debug feature: ilt and mdump
  qed: FW 8.42.2.0 Add fw overlay feature
  qed: FW 8.42.2.0 HSI changes
  qed: FW 8.42.2.0 iscsi/fcoe changes
  qed: Add abstraction for different hsi values per chip
  qed: FW 8.42.2.0 Additional ll2 type
  qed: Use dmae to write to widebus registers in fw_funcs
  qed: FW 8.42.2.0 Parser offsets modified
  qed: FW 8.42.2.0 Queue Manager changes
  qed: FW 8.42.2.0 Expose new registers and change windows
  qed: FW 8.42.2.0 Internal ram offsets modifications
  MAINTAINERS: Add entry for Marvell OcteonTX2 Physical Function driver
  Documentation: net: octeontx2: Add RVU HW and drivers overview
  octeontx2-pf: ethtool RSS config support
  octeontx2-pf: Add basic ethtool support
  ...
2020-01-28 16:02:33 -08:00
Linus Torvalds 8b561778f2 Merge branch 'core-objtool-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool updates from Ingo Molnar:
 "The main changes are to move the ORC unwind table sorting from early
  init to build-time - this speeds up booting.

  No change in functionality intended"

* 'core-objtool-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/unwind/orc: Fix !CONFIG_MODULES build warning
  x86/unwind/orc: Remove boot-time ORC unwind tables sorting
  scripts/sorttable: Implement build-time ORC unwind table sorting
  scripts/sorttable: Rename 'sortextable' to 'sorttable'
  scripts/sortextable: Refactor the do_func() function
  scripts/sortextable: Remove dead code
  scripts/sortextable: Clean up the code to meet the kernel coding style better
  scripts/sortextable: Rewrite error/success handling
2020-01-28 08:38:25 -08:00
Masami Hiramatsu 0947db01d9 bootconfig: Fix Kconfig help message for BOOT_CONFIG
Fix Kconfig help message since the bootconfig file is
only available to be appended to initramfs. And also
add a reference to the documentation.

Link: http://lkml.kernel.org/r/157949058031.25888.18399447161895787505.stgit@devnote2

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-21 18:24:08 -05:00
Johannes Berg 87c9366e17 Revert "um: Enable CONFIG_CONSTRUCTORS"
This reverts commit 786b2384bf ("um: Enable CONFIG_CONSTRUCTORS").

There are two issues with this commit, uncovered by Anton in tests
on some (Debian) systems:

1) I completely forgot to call any constructors if CONFIG_CONSTRUCTORS
   isn't set. Don't recall now if it just wasn't needed on my system, or
   if I never tested this case.

2) With that fixed, it works - with CONFIG_CONSTRUCTORS *unset*. If I
   set CONFIG_CONSTRUCTORS, it fails again, which isn't totally
   unexpected since whatever wanted to run is likely to have to run
   before the kernel init etc. that calls the constructors in this case.

Basically, some constructors that gcc emits (libc has?) need to run
very early during init; the failure mode otherwise was that the ptrace
fork test already failed:

----------------------
$ ./linux mem=512M
Core dump limits :
	soft - 0
	hard - NONE
Checking that ptrace can change system call numbers...check_ptrace : child exited with exitcode 6, while expecting 0; status 0x67f
Aborted
----------------------

Thinking more about this, it's clear that we simply cannot support
CONFIG_CONSTRUCTORS in UML. All the cases we need now (gcov, kasan)
involve not use of the __attribute__((constructor)), but instead
some constructor code/entry generated by gcc. Therefore, we cannot
distinguish between kernel constructors and system constructors.

Thus, revert this commit.

Cc: stable@vger.kernel.org [5.4+]
Fixes: 786b2384bf ("um: Enable CONFIG_CONSTRUCTORS")
Reported-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Anton Ivanov <anton.ivanov@cambridgegreys.co.uk>

Signed-off-by: Richard Weinberger <richard@nod.at>
2020-01-19 22:42:06 +01:00
Thomas Gleixner 660fd04f93 lib/vdso: Prepare for time namespace support
To support time namespaces in the vdso with a minimal impact on regular non
time namespace affected tasks, the namespace handling needs to be hidden in
a slow path.

The most obvious place is vdso_seq_begin(). If a task belongs to a time
namespace then the VVAR page which contains the system wide vdso data is
replaced with a namespace specific page which has the same layout as the
VVAR page. That page has vdso_data->seq set to 1 to enforce the slow path
and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce the time
namespace handling path.

The extra check in the case that vdso_data->seq is odd, e.g. a concurrent
update of the vdso data is in progress, is not really affecting regular
tasks which are not part of a time namespace as the task is spin waiting
for the update to finish and vdso_data->seq to become even again.

If a time namespace task hits that code path, it invokes the corresponding
time getter function which retrieves the real VVAR page, reads host time
and then adds the offset for the requested clock which is stored in the
special VVAR page.

If VDSO time namespace support is disabled the whole magic is compiled out.

Initial testing shows that the disabled case is almost identical to the
host case which does not take the slow timens path. With the special timens
page installed the performance hit is constant time and in the range of
5-7%.

For the vdso functions which are not using the sequence count an
unconditional check for vdso_data->clock_mode is added which switches to
the real vdso when the clock_mode is VCLOCK_TIMENS.

[avagin: Make do_hres_timens() work with raw clocks too: choose vdso_data
 pointer by CS_RAW offset.]

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191112012724.250792-21-dima@arista.com
2020-01-14 12:20:57 +01:00
Andrei Vagin 769071ac9f ns: Introduce Time Namespace
Time Namespace isolates clock values.

The kernel provides access to several clocks CLOCK_REALTIME,
CLOCK_MONOTONIC, CLOCK_BOOTTIME, etc.

CLOCK_REALTIME
      System-wide clock that measures real (i.e., wall-clock) time.

CLOCK_MONOTONIC
      Clock that cannot be set and represents monotonic time since
      some unspecified starting point.

CLOCK_BOOTTIME
      Identical to CLOCK_MONOTONIC, except it also includes any time
      that the system is suspended.

For many users, the time namespace means the ability to changes date and
time in a container (CLOCK_REALTIME). Providing per namespace notions of
CLOCK_REALTIME would be complex with a massive overhead, but has a dubious
value.

But in the context of checkpoint/restore functionality, monotonic and
boottime clocks become interesting. Both clocks are monotonic with
unspecified starting points. These clocks are widely used to measure time
slices and set timers. After restoring or migrating processes, it has to be
guaranteed that they never go backward. In an ideal case, the behavior of
these clocks should be the same as for a case when a whole system is
suspended. All this means that it is required to set CLOCK_MONOTONIC and
CLOCK_BOOTTIME clocks, which can be achieved by adding per-namespace
offsets for clocks.

A time namespace is similar to a pid namespace in the way how it is
created: unshare(CLONE_NEWTIME) system call creates a new time namespace,
but doesn't set it to the current process. Then all children of the process
will be born in the new time namespace, or a process can use the setns()
system call to join a namespace.

This scheme allows setting clock offsets for a namespace, before any
processes appear in it.

All available clone flags have been used, so CLONE_NEWTIME uses the highest
bit of CSIGNAL. It means that it can be used only with the unshare() and
the clone3() system calls.

[ tglx: Adjusted paragraph about clone3() to reality and massaged the
  	changelog a bit. ]

Co-developed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://criu.org/Time_namespace
Link: https://lists.openvz.org/pipermail/criu/2018-June/041504.html
Link: https://lore.kernel.org/r/20191112012724.250792-4-dima@arista.com
2020-01-14 12:20:48 +01:00
Masami Hiramatsu 7684b8582c bootconfig: Load boot config from the tail of initrd
Load the extended boot config data from the tail of initrd
image. If there is an SKC data there, it has
[(u32)size][(u32)checksum] header (in really, this is a
footer) at the end of initrd. If the checksum (simple sum
of bytes) is match, this starts parsing it from there.

Link: http://lkml.kernel.org/r/157867222435.17873.9936667353335606867.stgit@devnote2

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-13 13:19:39 -05:00
Masami Hiramatsu 76db5a27a8 bootconfig: Add Extra Boot Config support
Extra Boot Config (XBC) allows admin to pass a tree-structured
boot configuration file when boot up the kernel. This extends
the kernel command line in an efficient way.

Boot config will contain some key-value commands, e.g.

key.word = value1
another.key.word = value2

It can fold same keys with braces, also you can write array
data. For example,

key {
   word1 {
      setting1 = data
      setting2
   }
   word2.array = "val1", "val2"
}

User can access these key-value pair and tree structure via
SKC APIs.

Link: http://lkml.kernel.org/r/157867221257.17873.1775090991929862549.stgit@devnote2

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-13 13:19:38 -05:00
Shile Zhang 1091670637 scripts/sorttable: Rename 'sortextable' to 'sorttable'
Use a more generic name for additional table sorting usecases,
such as the upcoming ORC table sorting feature. This tool is
not tied to exception table sorting anymore.

No functional changes intended.

[ mingo: Rewrote the changelog. ]

Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Michal Marek <michal.lkml@markovi.net>
Cc: linux-kbuild@vger.kernel.org
Link: https://lkml.kernel.org/r/20191204004633.88660-6-shile.zhang@linux.alibaba.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-13 10:47:58 +01:00
Daniel Borkmann 81c22041d9 bpf, x86, arm64: Enable jit by default when not built as always-on
After Spectre 2 fix via 290af86629 ("bpf: introduce BPF_JIT_ALWAYS_ON
config") most major distros use BPF_JIT_ALWAYS_ON configuration these days
which compiles out the BPF interpreter entirely and always enables the
JIT. Also given recent fix in e1608f3fa8 ("bpf: Avoid setting bpf insns
pages read-only when prog is jited"), we additionally avoid fragmenting
the direct map for the BPF insns pages sitting in the general data heap
since they are not used during execution. Latter is only needed when run
through the interpreter.

Since both x86 and arm64 JITs have seen a lot of exposure over the years,
are generally most up to date and maintained, there is more downside in
!BPF_JIT_ALWAYS_ON configurations to have the interpreter enabled by default
rather than the JIT. Add a ARCH_WANT_DEFAULT_BPF_JIT config which archs can
use to set the bpf_jit_{enable,kallsyms} to 1. Back in the days the
bpf_jit_kallsyms knob was set to 0 by default since major distros still
had /proc/kallsyms addresses exposed to unprivileged user space which is
not the case anymore. Hence both knobs are set via BPF_JIT_DEFAULT_ON which
is set to 'y' in case of BPF_JIT_ALWAYS_ON or ARCH_WANT_DEFAULT_BPF_JIT.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/f78ad24795c2966efcc2ee19025fa3459f622185.1575903816.git.daniel@iogearbox.net
2019-12-11 16:16:01 -08:00
Krzysztof Kozlowski e8cf4e9ca0 init/Kconfig: fix indentation
Adjust indentation from spaces to tab (+optional two spaces) as in
coding style with command like:
	$ sed -e 's/^        /	/' -i */Kconfig

Link: http://lkml.kernel.org/r/1574306670-30234-1-git-send-email-krzk@kernel.org
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Jiri Kosina <trivial@kernel.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-04 19:44:13 -08:00
Linus Torvalds 76bb8b0596 Kbuild updates for v5.5
- remove unneeded asm headers from hexagon, ia64
 
  - add 'dir-pkg' target, which works like 'tar-pkg' but skips archiving
 
  - add 'helpnewconfig' target, which shows help for new CONFIG options
 
  - support 'make nsdeps' for external modules
 
  - make rebuilds faster by deleting $(wildcard $^) checks
 
  - remove compile tests for kernel-space headers
 
  - refactor modpost to simplify modversion handling
 
  - make single target builds faster
 
  - optimize and clean up scripts/kallsyms.c
 
  - refactor various Makefiles and scripts
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAl3lKCUVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGu9sP/iTW/RjDxbAsu0aP8jFqzLK/xKB/
 NQn/+dD76TjEmjgew9AXszf2rJL+ixKVymGM08FV59Bbguvi8XmAB/QXK21Sjb5j
 rVl3N97TWNkvXM+QJyly23G2UtbubRSPo3g+e70BZrw3lcmrsK+sAmTOL5KtIrNX
 9BHM803JwqsMJyvBwTBBw3UFeeBqb38Qx6gmigfSihuDf6pvjoVDKskpsDno3wX7
 rdiXYxAsKQLQ/P2ym/bV/Oqe90RqRtV/2/WCpLshlwHkiM9huflv6GjgCkkbAx5H
 N3TSptlS7l/2B/XKHgA5ALjHjUlxTGBzLLoevarCd8loKcQXFlgx+vd3nM/WJlHJ
 x9UpTklDwGP9eUBsa9W980tEyUVsFGMAC8EcTdW6NN2IRtuCOSA5N2FYYt8/SDd0
 2b3PhElTJIp4pTWSYN6JZxB1R8n/YBgxLqOJ6N2U6B9CdKFUCHlwGH23QfN89km/
 WEMP85bsaab/dnyxbwelkoYYYyPgUHsC13AbpkHdrDxMbAGO+G1PwpHxC6ErF2en
 wRGrcUxWTfHRykO5aJIQtCB9b1fv73134mTzB5fTYd6GtjepGBSBCO9xb2Iy4sc9
 Y+nHVVDUrihvSOpJgqh677PcLDutOZR8fFCoc1ZMDAbBsDvrb0Qsee6oEidj98xc
 5kXp9YZh/tdh/tdo
 =zUaB
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - remove unneeded asm headers from hexagon, ia64

 - add 'dir-pkg' target, which works like 'tar-pkg' but skips archiving

 - add 'helpnewconfig' target, which shows help for new CONFIG options

 - support 'make nsdeps' for external modules

 - make rebuilds faster by deleting $(wildcard $^) checks

 - remove compile tests for kernel-space headers

 - refactor modpost to simplify modversion handling

 - make single target builds faster

 - optimize and clean up scripts/kallsyms.c

 - refactor various Makefiles and scripts

* tag 'kbuild-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (59 commits)
  MAINTAINERS: update Kbuild/Kconfig maintainer's email address
  scripts/kallsyms: remove redundant initializers
  scripts/kallsyms: put check_symbol_range() calls close together
  scripts/kallsyms: make check_symbol_range() void function
  scripts/kallsyms: move ignored symbol types to is_ignored_symbol()
  scripts/kallsyms: move more patterns to the ignored_prefixes array
  scripts/kallsyms: skip ignored symbols very early
  scripts/kallsyms: add const qualifiers where possible
  scripts/kallsyms: make find_token() return (unsigned char *)
  scripts/kallsyms: replace prefix_underscores_count() with strspn()
  scripts/kallsyms: add sym_name() to mitigate cast ugliness
  scripts/kallsyms: remove unneeded length check for prefix matching
  scripts/kallsyms: remove redundant is_arm_mapping_symbol()
  scripts/kallsyms: set relative_base more effectively
  scripts/kallsyms: shrink table before sorting it
  scripts/kallsyms: fix definitely-lost memory leak
  scripts/kallsyms: remove unneeded #ifndef ARRAY_SIZE
  kbuild: make single target builds even faster
  modpost: respect the previous export when 'exported twice' is warned
  modpost: do not set ->preloaded for symbols from Module.symvers
  ...
2019-12-02 17:35:04 -08:00
Linus Torvalds ad0b314e00 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull sysctl system call removal from Eric Biederman:
 "As far as I can tell we have reached the point where no one enables
  the sysctl system call anymore. It still is enabled in a few
  defconfigs but they are mostly the rarely used one and in asking
  people about that it was more cut & paste enabled than anything else.

  This is single commit that just deletes code. Leaving just enough code
  so that the deprecated sysctl warning continues to be printed. If my
  analysis turns out to be wrong and someone actually cares it will be
  easy to revert this commit and have the system call again.

  There was one new xtensa defconfig in linux-next that enabled the
  system call this cycle and when asked about it the maintainer of the
  code replied that it was not enabled on purpose. As of today's
  linux-next tree that defconfig no longer enables the system call.

  What we saw in the review discussion was that if we go a step farther
  than my patch and mess with uapi headers there are pieces of code that
  won't compile, but nothing minds the system call actually disappearing
  from the kernel"

Link: https://lore.kernel.org/lkml/201910011140.EA0181F13@keescook/

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  sysctl: Remove the sysctl system call
2019-12-01 13:26:18 -08:00
Eric W. Biederman 61a47c1ad3 sysctl: Remove the sysctl system call
This system call has been deprecated almost since it was introduced, and
in a survey of the linux distributions I can no longer find any of them
that enable CONFIG_SYSCTL_SYSCALL.  The only indication that I can find
that anyone might care is that a few of the defconfigs in the kernel
enable CONFIG_SYSCTL_SYSCALL.  However this appears in only 31 of 414
defconfigs in the kernel, so I suspect this symbols presence is simply
because it is harmless to include rather than because it is necessary.

As there appear to be no users of the sysctl system call, remove the
code.  As this removes one of the few uses of the internal kernel mount
of proc I hope this allows for even more simplifications of the proc
filesystem.

Cc: Alex Smith <alex.smith@imgtec.com>
Cc: Anders Berg <anders.berg@lsi.com>
Cc: Apelete Seketeli <apelete@seketeli.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Chee Nouk Phoon <cnphoon@altera.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Christian Ruppert <christian.ruppert@abilis.com>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: Harvey Hunt <harvey.hunt@imgtec.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hongliang Tao <taohl@lemote.com>
Cc: Hua Yan <yanh@lemote.com>
Cc: Huacai Chen <chenhc@lemote.com>
Cc: John Crispin <blogic@openwrt.org>
Cc: Jonas Jensen <jonas.jensen@gmail.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Cc: Jun Nie <jun.nie@linaro.org>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Kevin Wells <kevin.wells@nxp.com>
Cc: Kumar Gala <galak@codeaurora.org>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Markos Chandras <markos.chandras@imgtec.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Noam Camus <noamc@ezchip.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Phil Edworthy <phil.edworthy@renesas.com>
Cc: Pierrick Hascoet <pierrick.hascoet@abilis.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Roland Stigge <stigge@antcom.de>
Cc: Santosh Shilimkar <santosh.shilimkar@ti.com>
Cc: Scott Telford <stelford@cadence.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Steven J. Hill <Steven.Hill@imgtec.com>
Cc: Tanmay Inamdar <tinamdar@apm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Wolfram Sang <w.sang@pengutronix.de>
Acked-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2019-11-26 13:03:56 -06:00
Linus Torvalds 642356cb5f Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
 "API:
   - Add library interfaces of certain crypto algorithms for WireGuard
   - Remove the obsolete ablkcipher and blkcipher interfaces
   - Move add_early_randomness() out of rng_mutex

  Algorithms:
   - Add blake2b shash algorithm
   - Add blake2s shash algorithm
   - Add curve25519 kpp algorithm
   - Implement 4 way interleave in arm64/gcm-ce
   - Implement ciphertext stealing in powerpc/spe-xts
   - Add Eric Biggers's scalar accelerated ChaCha code for ARM
   - Add accelerated 32r2 code from Zinc for MIPS
   - Add OpenSSL/CRYPTOGRAMS poly1305 implementation for ARM and MIPS

  Drivers:
   - Fix entropy reading failures in ks-sa
   - Add support for sam9x60 in atmel
   - Add crypto accelerator for amlogic GXL
   - Add sun8i-ce Crypto Engine
   - Add sun8i-ss cryptographic offloader
   - Add a host of algorithms to inside-secure
   - Add NPCM RNG driver
   - add HiSilicon HPRE accelerator
   - Add HiSilicon TRNG driver"

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (285 commits)
  crypto: vmx - Avoid weird build failures
  crypto: lib/chacha20poly1305 - use chacha20_crypt()
  crypto: x86/chacha - only unregister algorithms if registered
  crypto: chacha_generic - remove unnecessary setkey() functions
  crypto: amlogic - enable working on big endian kernel
  crypto: sun8i-ce - enable working on big endian
  crypto: mips/chacha - select CRYPTO_SKCIPHER, not CRYPTO_BLKCIPHER
  hwrng: ks-sa - Enable COMPILE_TEST
  crypto: essiv - remove redundant null pointer check before kfree
  crypto: atmel-aes - Change data type for "lastc" buffer
  crypto: atmel-tdes - Set the IV after {en,de}crypt
  crypto: sun4i-ss - fix big endian issues
  crypto: sun4i-ss - hide the Invalid keylen message
  crypto: sun4i-ss - use crypto_ahash_digestsize
  crypto: sun4i-ss - remove dependency on not 64BIT
  crypto: sun4i-ss - Fix 64-bit size_t warnings on sun4i-ss-hash.c
  MAINTAINERS: Add maintainer for HiSilicon SEC V2 driver
  crypto: hisilicon - add DebugFS for HiSilicon SEC
  Documentation: add DebugFS doc for HiSilicon SEC
  crypto: hisilicon - add SRIOV for HiSilicon SEC
  ...
2019-11-25 19:49:58 -08:00
Ard Biesheuvel c12d3362a7 int128: move __uint128_t compiler test to Kconfig
In order to use 128-bit integer arithmetic in C code, the architecture
needs to have declared support for it by setting ARCH_SUPPORTS_INT128,
and it requires a version of the toolchain that supports this at build
time. This is why all existing tests for ARCH_SUPPORTS_INT128 also test
whether __SIZEOF_INT128__ is defined, since this is only the case for
compilers that can support 128-bit integers.

Let's fold this additional test into the Kconfig declaration of
ARCH_SUPPORTS_INT128 so that we can also use the symbol in Makefiles,
e.g., to decide whether a certain object needs to be included in the
first place.

Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-11-17 09:02:42 +08:00
Masahiro Yamada fcbb8461fd kbuild: remove header compile test
There are both positive and negative options about this feature.
At first, I thought it was a good idea, but actually Linus stated a
negative opinion (https://lkml.org/lkml/2019/9/29/227). I admit it
is ugly and annoying.

The baseline I'd like to keep is the compile-test of uapi headers.
(Otherwise, kernel developers have no way to ensure the correctness
of the exported headers.)

I will maintain a small build rule in usr/include/Makefile.
Remove the other header test functionality.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2019-11-15 00:22:35 +09:00
Jens Axboe 561fb04a6a io_uring: replace workqueue usage with io-wq
Drop various work-arounds we have for workqueues:

- We no longer need the async_list for tracking sequential IO.

- We don't have to maintain our own mm tracking/setting.

- We don't need a separate workqueue for buffered writes. This didn't
  even work that well to begin with, as it was suboptimal for multiple
  buffered writers on multiple files.

- We can properly cancel pending interruptible work. This fixes
  deadlocks with particularly socket IO, where we cannot cancel them
  when the io_uring is closed. Hence the ring will wait forever for
  these requests to complete, which may never happen. This is different
  from disk IO where we know requests will complete in a finite amount
  of time.

- Due to being able to cancel work interruptible work that is already
  running, we can implement file table support for work. We need that
  for supporting system calls that add to a process file table.

- It gets us one step closer to adding async support for any system
  call.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-29 12:43:06 -06:00
Linus Torvalds aefcf2f4b5 Merge branch 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull kernel lockdown mode from James Morris:
 "This is the latest iteration of the kernel lockdown patchset, from
  Matthew Garrett, David Howells and others.

  From the original description:

    This patchset introduces an optional kernel lockdown feature,
    intended to strengthen the boundary between UID 0 and the kernel.
    When enabled, various pieces of kernel functionality are restricted.
    Applications that rely on low-level access to either hardware or the
    kernel may cease working as a result - therefore this should not be
    enabled without appropriate evaluation beforehand.

    The majority of mainstream distributions have been carrying variants
    of this patchset for many years now, so there's value in providing a
    doesn't meet every distribution requirement, but gets us much closer
    to not requiring external patches.

  There are two major changes since this was last proposed for mainline:

   - Separating lockdown from EFI secure boot. Background discussion is
     covered here: https://lwn.net/Articles/751061/

   -  Implementation as an LSM, with a default stackable lockdown LSM
      module. This allows the lockdown feature to be policy-driven,
      rather than encoding an implicit policy within the mechanism.

  The new locked_down LSM hook is provided to allow LSMs to make a
  policy decision around whether kernel functionality that would allow
  tampering with or examining the runtime state of the kernel should be
  permitted.

  The included lockdown LSM provides an implementation with a simple
  policy intended for general purpose use. This policy provides a coarse
  level of granularity, controllable via the kernel command line:

    lockdown={integrity|confidentiality}

  Enable the kernel lockdown feature. If set to integrity, kernel features
  that allow userland to modify the running kernel are disabled. If set to
  confidentiality, kernel features that allow userland to extract
  confidential information from the kernel are also disabled.

  This may also be controlled via /sys/kernel/security/lockdown and
  overriden by kernel configuration.

  New or existing LSMs may implement finer-grained controls of the
  lockdown features. Refer to the lockdown_reason documentation in
  include/linux/security.h for details.

  The lockdown feature has had signficant design feedback and review
  across many subsystems. This code has been in linux-next for some
  weeks, with a few fixes applied along the way.

  Stephen Rothwell noted that commit 9d1f8be5cf ("bpf: Restrict bpf
  when kernel lockdown is in confidentiality mode") is missing a
  Signed-off-by from its author. Matthew responded that he is providing
  this under category (c) of the DCO"

* 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (31 commits)
  kexec: Fix file verification on S390
  security: constify some arrays in lockdown LSM
  lockdown: Print current->comm in restriction messages
  efi: Restrict efivar_ssdt_load when the kernel is locked down
  tracefs: Restrict tracefs when the kernel is locked down
  debugfs: Restrict debugfs when the kernel is locked down
  kexec: Allow kexec_file() with appropriate IMA policy when locked down
  lockdown: Lock down perf when in confidentiality mode
  bpf: Restrict bpf when kernel lockdown is in confidentiality mode
  lockdown: Lock down tracing and perf kprobes when in confidentiality mode
  lockdown: Lock down /proc/kcore
  x86/mmiotrace: Lock down the testmmiotrace module
  lockdown: Lock down module params that specify hardware parameters (eg. ioport)
  lockdown: Lock down TIOCSSERIAL
  lockdown: Prohibit PCMCIA CIS storage when the kernel is locked down
  acpi: Disable ACPI table override if the kernel is locked down
  acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down
  ACPI: Limit access to custom_method when the kernel is locked down
  x86/msr: Restrict MSR access when the kernel is locked down
  x86: Lock down IO port access when the kernel is locked down
  ...
2019-09-28 08:14:15 -07:00
Linus Torvalds f1f2f614d5 Merge branch 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull integrity updates from Mimi Zohar:
 "The major feature in this time is IMA support for measuring and
  appraising appended file signatures. In addition are a couple of bug
  fixes and code cleanup to use struct_size().

  In addition to the PE/COFF and IMA xattr signatures, the kexec kernel
  image may be signed with an appended signature, using the same
  scripts/sign-file tool that is used to sign kernel modules.

  Similarly, the initramfs may contain an appended signature.

  This contained a lot of refactoring of the existing appended signature
  verification code, so that IMA could retain the existing framework of
  calculating the file hash once, storing it in the IMA measurement list
  and extending the TPM, verifying the file's integrity based on a file
  hash or signature (eg. xattrs), and adding an audit record containing
  the file hash, all based on policy. (The IMA support for appended
  signatures patch set was posted and reviewed 11 times.)

  The support for appended signature paves the way for adding other
  signature verification methods, such as fs-verity, based on a single
  system-wide policy. The file hash used for verifying the signature and
  the signature, itself, can be included in the IMA measurement list"

* 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
  ima: ima_api: Use struct_size() in kzalloc()
  ima: use struct_size() in kzalloc()
  sefltest/ima: support appended signatures (modsig)
  ima: Fix use after free in ima_read_modsig()
  MODSIGN: make new include file self contained
  ima: fix freeing ongoing ahash_request
  ima: always return negative code for error
  ima: Store the measurement again when appraising a modsig
  ima: Define ima-modsig template
  ima: Collect modsig
  ima: Implement support for module-style appended signatures
  ima: Factor xattr_verify() out of ima_appraise_measurement()
  ima: Add modsig appraise_type option for module-style appended signatures
  integrity: Select CONFIG_KEYS instead of depending on it
  PKCS#7: Introduce pkcs7_get_digest()
  PKCS#7: Refactor verify_pkcs7_signature()
  MODSIGN: Export module signature definitions
  ima: initialize the "template" field with the default template
2019-09-27 19:37:27 -07:00
Linus Torvalds e070355664 Modules updates for v5.4
Summary of modules changes for the 5.4 merge window:
 
 - Introduce exported symbol namespaces.
 
   This new feature allows subsystem maintainers to partition and
   categorize their exported symbols into explicit namespaces. Module
   authors are now required to import the namespaces they need.
 
   Some of the main motivations of this feature include: allowing kernel
   developers to better manage the export surface, allow subsystem
   maintainers to explicitly state that usage of some exported symbols
   should only be limited to certain users (think: inter-module or
   inter-driver symbols, debugging symbols, etc), as well as more easily
   limiting the availability of namespaced symbols to other parts of the
   kernel. With the module import requirement, it is also easier to spot
   the misuse of exported symbols during patch review. Two new macros are
   introduced: EXPORT_SYMBOL_NS() and EXPORT_SYMBOL_NS_GPL(). The API is
   thoroughly documented in Documentation/kbuild/namespaces.rst.
 
 - Some small code and kbuild cleanups here and there.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCgAGBQJdh3n8AAoJEMBFfjjOO8Fy94kP+QHZF39QDvLbxAzEYAETAS+o
 CFu6wix/DrAwFkTU/kX1eAsAwDBEz0xkMciR4BsLX3sIafUVERxtDXVAui/dA1+6
 zfw2c3ObyVwPEk6aUPFprgkj+08gxujsJFlYTsQQUhtRbmxg6R7hD6t6ANxiHaY2
 AQe5TzOWXoIa2hHO+7rPMqf8l6qiFCaL0s3v5jrmBXa5mHmc4PVy95h1J6xQVw2u
 b+SlvKeylHv+OtCtvthkAJS3hfS35J/1TNb/RNYIvh60IfEguEuFsGuQ9JiSSAZS
 pv1cJ+I5d4v8Y/md1rZpdjTJL9gCrq/UUC67+UkejCOn0C+7XM2eR4Bu/jWvdMSn
 ZQDHcPhFSIfmP7FaKomPogaBbw1sI1FvM5930pPJzHnyO9+cefBXe7rWaaB+y0At
 GAxOtmk1dKh01BT7YO/C0oVuX87csWd74NHypVsbs0TgQo5jBFdZRheyDrq5YB+s
 tVK+5H0nqQrCcfo/TvhcsZlgITTGtgTPenaW99/i7qNa9mRUtxC/VkE+aob6HNRF
 1iBxxopOTxGN8akyKOVumtkuTQH3EJfouZee//pWbXLzyDmScg/k67vuao8kxbyq
 NA1piFAGJAHFsHATxrbvNOq6jZ5bfUT8pwSTs83JppuR++8Hxk7zaShS3/LvsvHt
 6ist/epOwTZ7oiNQ04nj
 =72Uy
 -----END PGP SIGNATURE-----

Merge tag 'modules-for-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux

Pull modules updates from Jessica Yu:
 "The main bulk of this pull request introduces a new exported symbol
  namespaces feature. The number of exported symbols is increasingly
  growing with each release (we're at about 31k exports as of 5.3-rc7)
  and we currently have no way of visualizing how these symbols are
  "clustered" or making sense of this huge export surface.

  Namespacing exported symbols allows kernel developers to more
  explicitly partition and categorize exported symbols, as well as more
  easily limiting the availability of namespaced symbols to other parts
  of the kernel. For starters, we have introduced the USB_STORAGE
  namespace to demonstrate the API's usage. I have briefly summarized
  the feature and its main motivations in the tag below.

  Summary:

   - Introduce exported symbol namespaces.

     This new feature allows subsystem maintainers to partition and
     categorize their exported symbols into explicit namespaces. Module
     authors are now required to import the namespaces they need.

     Some of the main motivations of this feature include: allowing
     kernel developers to better manage the export surface, allow
     subsystem maintainers to explicitly state that usage of some
     exported symbols should only be limited to certain users (think:
     inter-module or inter-driver symbols, debugging symbols, etc), as
     well as more easily limiting the availability of namespaced symbols
     to other parts of the kernel.

     With the module import requirement, it is also easier to spot the
     misuse of exported symbols during patch review.

     Two new macros are introduced: EXPORT_SYMBOL_NS() and
     EXPORT_SYMBOL_NS_GPL(). The API is thoroughly documented in
     Documentation/kbuild/namespaces.rst.

   - Some small code and kbuild cleanups here and there"

* tag 'modules-for-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
  module: Remove leftover '#undef' from export header
  module: remove unneeded casts in cmp_name()
  module: move CONFIG_UNUSED_SYMBOLS to the sub-menu of MODULES
  module: remove redundant 'depends on MODULES'
  module: Fix link failure due to invalid relocation on namespace offset
  usb-storage: export symbols in USB_STORAGE namespace
  usb-storage: remove single-use define for debugging
  docs: Add documentation for Symbol Namespaces
  scripts: Coccinelle script for namespace dependencies.
  modpost: add support for generating namespace dependencies
  export: allow definition default namespaces in Makefiles or sources
  module: add config option MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS
  modpost: add support for symbol namespaces
  module: add support for symbol namespaces.
  export: explicitly align struct kernel_symbol
  module: support reading multiple values per modinfo tag
2019-09-22 10:34:46 -07:00
Linus Torvalds 9dca3432ee This pull request contains the following changes for UML:
- virtio support
 - Fixes for our new time travel mode
 - Various improvements to make lockdep and kasan work better
 - SPDX header updates
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAl2Fx9kWHHJpY2hhcmRA
 c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wa2kD/9UJ5JOe6yBeMfPO5Vv8vpJRc10
 0gS8qDbzfutrWddq1wUvEaNCIQY4NOf4tsqjauYHpTUA/0AWwruz++iyI9u3XWEQ
 0b+ZMhKXkws3UgPwWIxrgLr0106wz6Xuz6d36nqpAc6F4MJhC3LqUCC9yEp3hxMd
 pSF65ueQXp7NKfOAqqKU1m3FnfmyBTpsL5PpA6OEZn//kt/Qz5PhIjHpC3JwIBQb
 z0OUhE/6mmWb66wtqHIx4Zd2ybLLnsfby24q+1e8J2B+gcORxhubvgCIGY+PU98o
 EW3N4aMevUdgG9MJbnlZUgWeZ1bsByail2z8aFElRKefT2xkEnjxfQZgKahI6LnO
 jzLm9pk3RjTiZxvYkEbgRAjBkZD514M6FvOlyrHtLxMDfWE6/z71VKDqFjEyeIHQ
 QpDjwEjdJTxVHr4Ol+VnZe1lE5zXLNuCFT5qdPQBqyr8g151T7jwYXnGK2SqGo2D
 UQ6/KnaN+pgM7BaqcNtwciKk3Xjng0BDLfdZs7z8F3bzv53rg2mpQt5iPm+nWFPa
 aNt4B3FKXv3+YnjuSbi5NlvKKK9alRcvZTOk8jFjwOVmFJXlvMCzegZnuTxtqU+j
 XpwmUlsT6aMV7vPZN2ta7y1bjOijzZIjL0O7rP4Obxwfp3dTGGYX/T6vW8F2o9V6
 evyx/KSD6nqlY1bvwQ==
 =oxpp
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml

Pull UML updates from Richard Weinberger:

 - virtio support

 - fixes for our new time travel mode

 - various improvements to make lockdep and kasan work better

 - SPDX header updates

* tag 'for-linus-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: (25 commits)
  um: irq: Fix LAST_IRQ usage in init_IRQ()
  um: Add SPDX headers for files in arch/um/include
  um: Add SPDX headers for files in arch/um/os-Linux
  um: Add SPDX headers to files in arch/um/kernel/
  um: Add SPDX headers for files in arch/um/drivers
  um: virtio: Implement VHOST_USER_PROTOCOL_F_REPLY_ACK
  um: virtio: Implement VHOST_USER_PROTOCOL_F_SLAVE_REQ
  um: drivers: Add virtio vhost-user driver
  um: Use real DMA barriers
  um: Don't use generic barrier.h
  um: time-travel: Restrict time update in IRQ handler
  um: time-travel: Fix periodic timers
  um: Enable CONFIG_CONSTRUCTORS
  um: Place (soft)irq text with macros
  um: Fix VDSO compiler warning
  um: Implement TRACE_IRQFLAGS_SUPPORT
  um: Remove misleading #define ARCh_IRQ_ENABLED
  um: Avoid using uninitialized regs
  um: Remove sig_info[SIGALRM]
  um: Error handling fixes in vector drivers
  ...
2019-09-21 11:07:02 -07:00
Linus Torvalds 227c3e9eb5 Make use of gcc 9's "asm inline()" (Rasmus Villemoes):
gcc 9+ (and gcc 8.3, 7.5) provides a way to override the otherwise
     crude heuristic that gcc uses to estimate the size of the code
     represented by an asm() statement. From the gcc docs
 
       If you use 'asm inline' instead of just 'asm', then for inlining
       purposes the size of the asm is taken as the minimum size, ignoring
       how many instructions GCC thinks it is.
 
     For compatibility with older compilers, we obviously want a
 
       #if [understands asm inline]
       #define asm_inline asm inline
       #else
       #define asm_inline asm
       #endif
 
     But since we #define the identifier inline to attach some attributes,
     we have to use an alternate spelling of that keyword. gcc provides
     both __inline__ and __inline, and we currently #define both to inline,
     so they all have the same semantics. We have to free up one of
     __inline__ and __inline, and the latter is by far the easiest.
 
     The two x86 changes cause smaller code gen differences than I'd
     expect, but I think we do want the asm_inline thing available sooner
     or later, so this is just to get the ball rolling.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPjU5OPd5QIZ9jqqOGXyLc2htIW0FAl2D9iAACgkQGXyLc2ht
 IW31DxAA1ajlhPor7NccuhJrz3hXMkPfEBEaTEYp98a5XYGARqrRwMEDzB7j9O/t
 6BZ/ZD8/IehuoNjNP3E5IOnDfvP+a94WL25/hjpTN4aBuZKWJz0X7As8TJJ+bwQc
 v2Hyo+yqzcSEhCI7Mc34uo+TuPaFYEoKHvg+hhSXi4h7c5eqtGKCaB2286iEkk/6
 bAo7n6ogYN64wXjbVXePmpQWgVJG2tsz/blG0hHMISW5UTzWkK/hYZkSf6jdFGSN
 aft1l9EMGx5skAQwFfnDOgf805/TE4jliD2nvaZzT0f/UtYkGjx7C77dcFlPY9Mf
 9R1M01rCQ0KfxR5BqHPN/DbTrd+GzBvKswjrTIB+TwopEfy8yVY2uRIq1lP+aobD
 V3HOtdicgw3wB/fF40pjqoCp3ByawKLlzRpQuqdSmqHs0kS6ForbfLClg8riru6a
 VnJqOXkLlZXU/VysdxuCPbAqN/Rvf9YV3H25x8BOJJWsXa0RdotSbyPIpyPcUo5K
 NycxR/vrLJ8hQqEdOY9iKJP+IMGmAfOJD8fppG/Xsnav8c+NtHPAQlW1c8ee/x2g
 Dbm2AWPCZnrSrFiQ1mDUdc921d/sLTUC/nOHzR8FiZEJGmYVmYSSw3PekFjQW1aV
 TIqvghk15XLB7Ye8JE/eKmoNIBtOsftXpYBzI3716tX46bnPu+4=
 =+BVw
 -----END PGP SIGNATURE-----

Merge tag 'compiler-attributes-for-linus-v5.4' of git://github.com/ojeda/linux

Pull asm inline support from Miguel Ojeda:
 "Make use of gcc 9's "asm inline()" (Rasmus Villemoes):

  gcc 9+ (and gcc 8.3, 7.5) provides a way to override the otherwise
  crude heuristic that gcc uses to estimate the size of the code
  represented by an asm() statement. From the gcc docs

      If you use 'asm inline' instead of just 'asm', then for inlining
      purposes the size of the asm is taken as the minimum size, ignoring
      how many instructions GCC thinks it is.

  For compatibility with older compilers, we obviously want a

      #if [understands asm inline]
      #define asm_inline asm inline
      #else
      #define asm_inline asm
      #endif

  But since we #define the identifier inline to attach some attributes,
  we have to use an alternate spelling of that keyword. gcc provides
  both __inline__ and __inline, and we currently #define both to inline,
  so they all have the same semantics.

  We have to free up one of __inline__ and __inline, and the latter is
  by far the easiest.

  The two x86 changes cause smaller code gen differences than I'd
  expect, but I think we do want the asm_inline thing available sooner
  or later, so this is just to get the ball rolling"

* tag 'compiler-attributes-for-linus-v5.4' of git://github.com/ojeda/linux:
  x86: bug.h: use asm_inline in _BUG_FLAGS definitions
  x86: alternative.h: use asm_inline for all alternative variants
  compiler-types.h: add asm_inline definition
  compiler_types.h: don't #define __inline
  lib/zstd/mem.h: replace __inline by inline
  staging: rtl8723bs: replace __inline by inline
2019-09-21 09:47:19 -07:00
Linus Torvalds d7b0827f28 Kbuild updates for v5.4
- add modpost warn exported symbols marked as 'static' because 'static'
    and EXPORT_SYMBOL is an odd combination
 
  - break the build early if gold linker is used
 
  - optimize the Bison rule to produce .c and .h files by a single
    pattern rule
 
  - handle PREEMPT_RT in the module vermagic and UTS_VERSION
 
  - warn CONFIG options leaked to the user-space except existing ones
 
  - make single targets work properly
 
  - rebuild modules when module linker scripts are updated
 
  - split the module final link stage into scripts/Makefile.modfinal
 
  - fix the missed error code in merge_config.sh
 
  - improve the error message displayed on the attempt of the O= build
    in unclean source tree
 
  - remove 'clean-dirs' syntax
 
  - disable -Wimplicit-fallthrough warning for Clang
 
  - add CONFIG_CC_OPTIMIZE_FOR_SIZE_O3 for ARC
 
  - remove ARCH_{CPP,A,C}FLAGS variables
 
  - add $(BASH) to run bash scripts
 
  - change *CFLAGS_<basetarget>.o to take the relative path to $(obj)
    instead of the basename
 
  - stop suppressing Clang's -Wunused-function warnings when W=1
 
  - fix linux/export.h to avoid genksyms calculating CRC of trimmed
    exported symbols
 
  - misc cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQJSBAABCgA8FiEEbmPs18K1szRHjPqEPYsBB53g2wYFAl1+OnoeHHlhbWFkYS5t
 YXNhaGlyb0Bzb2Npb25leHQuY29tAAoJED2LAQed4NsGoKEQAKcid9lDacMe5KWT
 4Ic93hANMFKZ9Qy8WoxivnOr1a93NcloZ0Bhka96QUt7hYUkLmDCs99eMbxKuMfP
 m/ViHepojOBPzq+VtAGWOiIyPMCA7XDrTPph4wcPDKeOURTreK1PZ20fxDoAR4to
 +qaqKZJGdRcNf2DpJN1yIosz8Wj0Sa2LQrRi9jgUHi3bzgvLfL7P9WM2xyZMggAc
 GaSktCEFL0UzMFlMpYyDrKh2EV6ryOnN8+bVAKbmWP89tuU3njutycKdWOoL+bsj
 tH2kjFThxQyIcZGNHS1VzNunYAFE2q5nj2q47O1EDN6sjTYUoRn5cHwPam6x3Kly
 NH88xDEtJ7sUUc9GZEIXADWWD0f08QIhAH5x+jxFg3529lNgyrNHRSQ2XceYNAnG
 i/GnMJ0EhODOFKusXw7sNlWFKtukep+8/pwnvfTXWQu6plEm5EQ3a3RL5SESubVo
 mHzXsQDFCE0x/UrsJxEAww+3YO3pQEelfVi74W9z0cckpbRF8FuUq/69ltOT15l4
 X+gCz80lXMWBKw/kNoR4GQoAJo3KboMEociawwoj72HXEHTPLJnCdUOsAf3n+opj
 xuz/UPZ4WYSgKdnbmmDbJ+1POA1NqtARZZXpMVyKVVCOiLafbJkLQYwLKEpE2mOO
 TP9igzP1i3/jPWec8cJ6Fa8UwuGh
 =VGqV
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - add modpost warn exported symbols marked as 'static' because 'static'
   and EXPORT_SYMBOL is an odd combination

 - break the build early if gold linker is used

 - optimize the Bison rule to produce .c and .h files by a single
   pattern rule

 - handle PREEMPT_RT in the module vermagic and UTS_VERSION

 - warn CONFIG options leaked to the user-space except existing ones

 - make single targets work properly

 - rebuild modules when module linker scripts are updated

 - split the module final link stage into scripts/Makefile.modfinal

 - fix the missed error code in merge_config.sh

 - improve the error message displayed on the attempt of the O= build in
   unclean source tree

 - remove 'clean-dirs' syntax

 - disable -Wimplicit-fallthrough warning for Clang

 - add CONFIG_CC_OPTIMIZE_FOR_SIZE_O3 for ARC

 - remove ARCH_{CPP,A,C}FLAGS variables

 - add $(BASH) to run bash scripts

 - change *CFLAGS_<basetarget>.o to take the relative path to $(obj)
   instead of the basename

 - stop suppressing Clang's -Wunused-function warnings when W=1

 - fix linux/export.h to avoid genksyms calculating CRC of trimmed
   exported symbols

 - misc cleanups

* tag 'kbuild-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (63 commits)
  genksyms: convert to SPDX License Identifier for lex.l and parse.y
  modpost: use __section in the output to *.mod.c
  modpost: use MODULE_INFO() for __module_depends
  export.h, genksyms: do not make genksyms calculate CRC of trimmed symbols
  export.h: remove defined(__KERNEL__), which is no longer needed
  kbuild: allow Clang to find unused static inline functions for W=1 build
  kbuild: rename KBUILD_ENABLE_EXTRA_GCC_CHECKS to KBUILD_EXTRA_WARN
  kbuild: refactor scripts/Makefile.extrawarn
  merge_config.sh: ignore unwanted grep errors
  kbuild: change *FLAGS_<basetarget>.o to take the path relative to $(obj)
  modpost: add NOFAIL to strndup
  modpost: add guid_t type definition
  kbuild: add $(BASH) to run scripts with bash-extension
  kbuild: remove ARCH_{CPP,A,C}FLAGS
  kbuild,arc: add CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE_O3 for ARC
  kbuild: Do not enable -Wimplicit-fallthrough for clang for now
  kbuild: clean up subdir-ymn calculation in Makefile.clean
  kbuild: remove unneeded '+' marker from cmd_clean
  kbuild: remove clean-dirs syntax
  kbuild: check clean srctree even earlier
  ...
2019-09-20 08:36:47 -07:00
Linus Torvalds 7e67a85999 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:

 - MAINTAINERS: Add Mark Rutland as perf submaintainer, Juri Lelli and
   Vincent Guittot as scheduler submaintainers. Add Dietmar Eggemann,
   Steven Rostedt, Ben Segall and Mel Gorman as scheduler reviewers.

   As perf and the scheduler is getting bigger and more complex,
   document the status quo of current responsibilities and interests,
   and spread the review pain^H^H^H^H fun via an increase in the Cc:
   linecount generated by scripts/get_maintainer.pl. :-)

 - Add another series of patches that brings the -rt (PREEMPT_RT) tree
   closer to mainline: split the monolithic CONFIG_PREEMPT dependencies
   into a new CONFIG_PREEMPTION category that will allow the eventual
   introduction of CONFIG_PREEMPT_RT. Still a few more hundred patches
   to go though.

 - Extend the CPU cgroup controller with uclamp.min and uclamp.max to
   allow the finer shaping of CPU bandwidth usage.

 - Micro-optimize energy-aware wake-ups from O(CPUS^2) to O(CPUS).

 - Improve the behavior of high CPU count, high thread count
   applications running under cpu.cfs_quota_us constraints.

 - Improve balancing with SCHED_IDLE (SCHED_BATCH) tasks present.

 - Improve CPU isolation housekeeping CPU allocation NUMA locality.

 - Fix deadline scheduler bandwidth calculations and logic when cpusets
   rebuilds the topology, or when it gets deadline-throttled while it's
   being offlined.

 - Convert the cpuset_mutex to percpu_rwsem, to allow it to be used from
   setscheduler() system calls without creating global serialization.
   Add new synchronization between cpuset topology-changing events and
   the deadline acceptance tests in setscheduler(), which were broken
   before.

 - Rework the active_mm state machine to be less confusing and more
   optimal.

 - Rework (simplify) the pick_next_task() slowpath.

 - Improve load-balancing on AMD EPYC systems.

 - ... and misc cleanups, smaller fixes and improvements - please see
   the Git log for more details.

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits)
  sched/psi: Correct overly pessimistic size calculation
  sched/fair: Speed-up energy-aware wake-ups
  sched/uclamp: Always use 'enum uclamp_id' for clamp_id values
  sched/uclamp: Update CPU's refcount on TG's clamp changes
  sched/uclamp: Use TG's clamps to restrict TASK's clamps
  sched/uclamp: Propagate system defaults to the root group
  sched/uclamp: Propagate parent clamps
  sched/uclamp: Extend CPU's cgroup controller
  sched/topology: Improve load balancing on AMD EPYC systems
  arch, ia64: Make NUMA select SMP
  sched, perf: MAINTAINERS update, add submaintainers and reviewers
  sched/fair: Use rq_lock/unlock in online_fair_sched_group
  cpufreq: schedutil: fix equation in comment
  sched: Rework pick_next_task() slow-path
  sched: Allow put_prev_task() to drop rq->lock
  sched/fair: Expose newidle_balance()
  sched: Add task_struct pointer to sched_class::set_curr_task
  sched: Rework CPU hotplug task selection
  sched/{rt,deadline}: Fix set_next_task vs pick_next_task
  sched: Fix kerneldoc comment for ia64_set_curr_task
  ...
2019-09-16 17:25:49 -07:00
Johannes Berg 786b2384bf um: Enable CONFIG_CONSTRUCTORS
We do need to call the constructors for *modules*, and
at least for KASAN in the future, we must call even the
kernel constructors only later when the kernel has been
initialized.

Instead of relying on libc to call them, emit an empty
section for libc and let the kernel's CONSTRUCTORS code
do the rest of the job.

Tested that it indeed doesn't work in modules, and does
work after the fixes in both, with a few functions with
__attribute__((constructor)) in both dynamic and static
builds.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2019-09-15 21:37:13 +02:00
Rasmus Villemoes eb11186930 compiler-types.h: add asm_inline definition
This adds an asm_inline macro which expands to "asm inline" [1] when
the compiler supports it. This is currently gcc 9.1+, gcc 8.3
and (once released) gcc 7.5 [2]. It expands to just "asm" for other
compilers.

Using asm inline("foo") instead of asm("foo") overrules gcc's
heuristic estimate of the size of the code represented by the asm()
statement, and makes gcc use the minimum possible size instead. That
can in turn affect gcc's inlining decisions.

I wasn't sure whether to make this a function-like macro or not - this
way, it can be combined with volatile as

  asm_inline volatile()

but perhaps we'd prefer to spell that

  asm_inline_volatile()

anyway.

The Kconfig logic is taken from an RFC patch by Masahiro Yamada [3].

[1] Technically, asm __inline, since both inline and __inline__
are macros that attach various attributes, making gcc barf if one
literally does "asm inline()". However, the third spelling __inline is
available for referring to the bare keyword.

[2] https://lore.kernel.org/lkml/20190907001411.GG9749@gate.crashing.org/

[3] https://lore.kernel.org/lkml/1544695154-15250-1-git-send-email-yamada.masahiro@socionext.com/

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
2019-09-15 20:14:15 +02:00
Masahiro Yamada efd9763d88 module: move CONFIG_UNUSED_SYMBOLS to the sub-menu of MODULES
When CONFIG_MODULES is disabled, CONFIG_UNUSED_SYMBOLS is pointless,
thus it should be invisible.

Instead of adding "depends on MODULES", I moved it to the sub-menu
"Enable loadable module support", which is a better fit. I put it
close to TRIM_UNUSED_KSYMS because it depends on !UNUSED_SYMBOLS.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
2019-09-11 21:40:27 +02:00
Masahiro Yamada d189c2a4b6 module: remove redundant 'depends on MODULES'
These are located in the 'if MODULES' ... 'endif' block.

Remove the redundant dependencies.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
2019-09-11 21:40:23 +02:00
Matthias Maennich 3d52ec5e5d module: add config option MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS
If MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS is enabled (default=n), the
requirement for modules to import all namespaces that are used by
the module is relaxed.

Enabling this option effectively allows (invalid) modules to be loaded
while only a warning is emitted.

Disabling this option keeps the enforcement at module loading time and
loading is denied if the module's imports are not satisfactory.

Reviewed-by: Martijn Coenen <maco@android.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Matthias Maennich <maennich@google.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
2019-09-10 10:30:27 +02:00
Masahiro Yamada 15f5db60a1 kbuild,arc: add CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE_O3 for ARC
arch/arc/Makefile overrides -O2 with -O3. This is the only user of
ARCH_CFLAGS. There is no user of ARCH_CPPFLAGS or ARCH_AFLAGS.
My plan is to remove ARCH_{CPP,A,C}FLAGS after refactoring the ARC
Makefile.

Currently, ARC has no way to enable -Wmaybe-uninitialized because both
-O3 and -Os disable it. Enabling it will be useful for compile-testing.
This commit allows allmodconfig (, which defaults to -O2) to enable it.

Add CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE_O3=y to all the defconfig files
in arch/arc/configs/ in order to keep the current config settings.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
2019-09-04 10:01:17 +09:00
Patrick Bellasi 2480c09313 sched/uclamp: Extend CPU's cgroup controller
The cgroup CPU bandwidth controller allows to assign a specified
(maximum) bandwidth to the tasks of a group. However this bandwidth is
defined and enforced only on a temporal base, without considering the
actual frequency a CPU is running on. Thus, the amount of computation
completed by a task within an allocated bandwidth can be very different
depending on the actual frequency the CPU is running that task.
The amount of computation can be affected also by the specific CPU a
task is running on, especially when running on asymmetric capacity
systems like Arm's big.LITTLE.

With the availability of schedutil, the scheduler is now able
to drive frequency selections based on actual task utilization.
Moreover, the utilization clamping support provides a mechanism to
bias the frequency selection operated by schedutil depending on
constraints assigned to the tasks currently RUNNABLE on a CPU.

Giving the mechanisms described above, it is now possible to extend the
cpu controller to specify the minimum (or maximum) utilization which
should be considered for tasks RUNNABLE on a cpu.
This makes it possible to better defined the actual computational
power assigned to task groups, thus improving the cgroup CPU bandwidth
controller which is currently based just on time constraints.

Extend the CPU controller with a couple of new attributes uclamp.{min,max}
which allow to enforce utilization boosting and capping for all the
tasks in a group.

Specifically:

- uclamp.min: defines the minimum utilization which should be considered
	      i.e. the RUNNABLE tasks of this group will run at least at a
	      minimum frequency which corresponds to the uclamp.min
	      utilization

- uclamp.max: defines the maximum utilization which should be considered
	      i.e. the RUNNABLE tasks of this group will run up to a
	      maximum frequency which corresponds to the uclamp.max
	      utilization

These attributes:

a) are available only for non-root nodes, both on default and legacy
   hierarchies, while system wide clamps are defined by a generic
   interface which does not depends on cgroups. This system wide
   interface enforces constraints on tasks in the root node.

b) enforce effective constraints at each level of the hierarchy which
   are a restriction of the group requests considering its parent's
   effective constraints. Root group effective constraints are defined
   by the system wide interface.
   This mechanism allows each (non-root) level of the hierarchy to:
   - request whatever clamp values it would like to get
   - effectively get only up to the maximum amount allowed by its parent

c) have higher priority than task-specific clamps, defined via
   sched_setattr(), thus allowing to control and restrict task requests.

Add two new attributes to the cpu controller to collect "requested"
clamp values. Allow that at each non-root level of the hierarchy.
Keep it simple by not caring now about "effective" values computation
and propagation along the hierarchy.

Update sysctl_sched_uclamp_handler() to use the newly introduced
uclamp_mutex so that we serialize system default updates with cgroup
relate updates.

Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Michal Koutny <mkoutny@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Alessio Balsini <balsini@android.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Todd Kjos <tkjos@google.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lkml.kernel.org/r/20190822132811.31294-2-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-09-03 09:17:37 +02:00
Masahiro Yamada ce3b487f60 init/Kconfig: rework help of CONFIG_CC_OPTIMIZE_FOR_SIZE
CONFIG_CC_OPTIMIZE_FOR_SIZE was originally an independent boolean
option, but commit 877417e6ff ("Kbuild: change CC_OPTIMIZE_FOR_SIZE
definition") turned it into a choice between _PERFORMANCE and _SIZE.

The phrase "If unsure, say N." sounds like an independent option.
Reword the help text to make it appropriate for the choice menu.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2019-08-29 01:39:35 +09:00
Masahiro Yamada 2ff2b7ec65 kbuild: add CONFIG_ASM_MODVERSIONS
Add CONFIG_ASM_MODVERSIONS. This allows to remove one if-conditional
nesting in scripts/Makefile.build.

scripts/Makefile.build is run every time Kbuild descends into a
sub-directory. So, I want to avoid $(wildcard ...) evaluation
where possible although computing $(wildcard ...) is so cheap that
it may not make measurable performance difference.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
2019-08-22 01:14:11 +09:00
Will Deacon 2d12294248 Revert "init/Kconfig: Fix infinite Kconfig recursion on PPC"
This reverts commit 71c67a31f0.

Commit 117acf5c29 ("powerpc/Makefile: Always pass --synthetic to nm if
supported") removed the only conditional definition of $(NM), so we can
revert our temporary bodge to avoid Kconfig recursion and go back to
passing $(NM) through to the 'tools-support-relr.sh' when detecting
support for RELR relocations.

Signed-off-by: Will Deacon <will@kernel.org>
2019-08-20 10:11:54 +01:00
David Howells 49fcf732bd lockdown: Enforce module signatures if the kernel is locked down
If the kernel is locked down, require that all modules have valid
signatures that we can verify.

I have adjusted the errors generated:

 (1) If there's no signature (ENODATA) or we can't check it (ENOPKG,
     ENOKEY), then:

     (a) If signatures are enforced then EKEYREJECTED is returned.

     (b) If there's no signature or we can't check it, but the kernel is
	 locked down then EPERM is returned (this is then consistent with
	 other lockdown cases).

 (2) If the signature is unparseable (EBADMSG, EINVAL), the signature fails
     the check (EKEYREJECTED) or a system error occurs (eg. ENOMEM), we
     return the error we got.

Note that the X.509 code doesn't check for key expiry as the RTC might not
be valid or might not have been transferred to the kernel's clock yet.

 [Modified by Matthew Garrett to remove the IMA integration. This will
  be replaced with integration with the IMA architecture policy
  patchset.]

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Matthew Garrett <matthewgarrett@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Jessica Yu <jeyu@kernel.org>
Signed-off-by: James Morris <jmorris@namei.org>
2019-08-19 21:54:15 -07:00
Will Deacon 71c67a31f0 init/Kconfig: Fix infinite Kconfig recursion on PPC
Commit 5cf896fb6b ("arm64: Add support for relocating the kernel with
RELR relocations") introduced CONFIG_TOOLS_SUPPORT_RELR, which checks
for RELR support in the toolchain as part of the kernel configuration.
During this procedure, "$(NM)" is invoked to see if it supports the new
relocation format, however PowerPC conditionally overrides this variable
in the architecture Makefile in order to pass '--synthetic' when
targetting PPC64.

This conditional override causes Kconfig to recurse forever, since
CONFIG_TOOLS_SUPPORT_RELR cannot be determined without $(NM) being
defined, but that in turn depends on CONFIG_PPC64:

  $ make ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu-
  scripts/kconfig/conf  --syncconfig Kconfig
  scripts/kconfig/conf  --syncconfig Kconfig
  scripts/kconfig/conf  --syncconfig Kconfig
  [...]

In this particular case, it looks like PowerPC may be able to pass
'--synthetic' unconditionally to nm or even drop it altogether. While
that is being resolved, let's just bodge the RELR check by picking up
$(NM) directly from the environment in whatever state it happens to be
in.

Cc: Peter Collingbourne <pcc@google.com>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Suggested-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Will Deacon <will@kernel.org>
2019-08-07 16:20:57 +01:00
Thiago Jung Bauermann c8424e776b MODSIGN: Export module signature definitions
IMA will use the module_signature format for append signatures, so export
the relevant definitions and factor out the code which verifies that the
appended signature trailer is valid.

Also, create a CONFIG_MODULE_SIG_FORMAT option so that IMA can select it
and be able to use mod_check_sig() without having to depend on either
CONFIG_MODULE_SIG or CONFIG_MODULES.

s390 duplicated the definition of struct module_signature so now they can
use the new <linux/module_signature.h> header instead.

Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Acked-by: Jessica Yu <jeyu@kernel.org>
Reviewed-by: Philipp Rudo <prudo@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2019-08-05 18:39:56 -04:00
Peter Collingbourne 5cf896fb6b arm64: Add support for relocating the kernel with RELR relocations
RELR is a relocation packing format for relative relocations.
The format is described in a generic-abi proposal:
https://groups.google.com/d/topic/generic-abi/bX460iggiKg/discussion

The LLD linker can be instructed to pack relocations in the RELR
format by passing the flag --pack-dyn-relocs=relr.

This patch adds a new config option, CONFIG_RELR. Enabling this option
instructs the linker to pack vmlinux's relative relocations in the RELR
format, and causes the kernel to apply the relocations at startup along
with the RELA relocations. RELA relocations still need to be applied
because the linker will emit RELA relative relocations if they are
unrepresentable in the RELR format (i.e. address not a multiple of 2).

Enabling CONFIG_RELR reduces the size of a defconfig kernel image
with CONFIG_RANDOMIZE_BASE by 3.5MB/16% uncompressed, or 550KB/5%
compressed (lz4).

Signed-off-by: Peter Collingbourne <pcc@google.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
2019-08-05 12:35:35 +01:00
Linus Torvalds 57a8ec387e Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
 "VM:
   - z3fold fixes and enhancements by Henry Burns and Vitaly Wool

   - more accurate reclaimed slab caches calculations by Yafang Shao

   - fix MAP_UNINITIALIZED UAPI symbol to not depend on config, by
     Christoph Hellwig

   - !CONFIG_MMU fixes by Christoph Hellwig

   - new novmcoredd parameter to omit device dumps from vmcore, by
     Kairui Song

   - new test_meminit module for testing heap and pagealloc
     initialization, by Alexander Potapenko

   - ioremap improvements for huge mappings, by Anshuman Khandual

   - generalize kprobe page fault handling, by Anshuman Khandual

   - device-dax hotplug fixes and improvements, by Pavel Tatashin

   - enable synchronous DAX fault on powerpc, by Aneesh Kumar K.V

   - add pte_devmap() support for arm64, by Robin Murphy

   - unify locked_vm accounting with a helper, by Daniel Jordan

   - several misc fixes

  core/lib:
   - new typeof_member() macro including some users, by Alexey Dobriyan

   - make BIT() and GENMASK() available in asm, by Masahiro Yamada

   - changed LIST_POISON2 on x86_64 to 0xdead000000000122 for better
     code generation, by Alexey Dobriyan

   - rbtree code size optimizations, by Michel Lespinasse

   - convert struct pid count to refcount_t, by Joel Fernandes

  get_maintainer.pl:
   - add --no-moderated switch to skip moderated ML's, by Joe Perches

  misc:
   - ptrace PTRACE_GET_SYSCALL_INFO interface

   - coda updates

   - gdb scripts, various"

[ Using merge message suggestion from Vlastimil Babka, with some editing - Linus ]

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (100 commits)
  fs/select.c: use struct_size() in kmalloc()
  mm: add account_locked_vm utility function
  arm64: mm: implement pte_devmap support
  mm: introduce ARCH_HAS_PTE_DEVMAP
  mm: clean up is_device_*_page() definitions
  mm/mmap: move common defines to mman-common.h
  mm: move MAP_SYNC to asm-generic/mman-common.h
  device-dax: "Hotremove" persistent memory that is used like normal RAM
  mm/hotplug: make remove_memory() interface usable
  device-dax: fix memory and resource leak if hotplug fails
  include/linux/lz4.h: fix spelling and copy-paste errors in documentation
  ipc/mqueue.c: only perform resource calculation if user valid
  include/asm-generic/bug.h: fix "cut here" for WARN_ON for __WARN_TAINT architectures
  scripts/gdb: add helpers to find and list devices
  scripts/gdb: add lx-genpd-summary command
  drivers/pps/pps.c: clear offset flags in PPS_SETPARAMS ioctl
  kernel/pid.c: convert struct pid count to refcount_t
  drivers/rapidio/devices/rio_mport_cdev.c: NUL terminate some strings
  select: shift restore_saved_sigmask_unless() into poll_select_copy_remaining()
  select: change do_poll() to return -ERESTARTNOHAND rather than -EINTR
  ...
2019-07-17 08:58:04 -07:00
Kees Cook 92bae787c4 init/Kconfig: fix neighboring typos
This fixes a couple typos I noticed in the slab Kconfig:

	sacrifies -> sacrifices
	accellerate -> accelerate

Seeing as no other instances of these typos are found elsewhere in the
kernel and that I originally added one of the two, I can only assume
working on slab must have caused damage to the spelling centers of my
brain.

Link: http://lkml.kernel.org/r/201905292203.CD000546EB@keescook
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16 19:23:22 -07:00
Mauro Carvalho Chehab da82c92f11 docs: cgroup-v1: add it to the admin-guide book
Those files belong to the admin guide, so add them.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2019-07-15 11:03:02 -03:00
Mauro Carvalho Chehab c3123552aa docs: accounting: convert to ReST
Rename the accounting documentation files to ReST, add an
index for them and adjust in order to produce a nice html
output via the Sphinx build system.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2019-07-15 09:20:25 -03:00
Linus Torvalds 39ceda5ce1 Kbuild updates for v5.3
- remove headers_{install,check}_all targets
 
 - remove unreasonable 'depends on !UML' from CONFIG_SAMPLES
 
 - re-implement 'make headers_install' more cleanly
 
 - add new header-test-y syntax to compile-test headers
 
 - compile-test exported headers to ensure they are compilable in
   user-space
 
 - compile-test headers under include/ to ensure they are self-contained
 
 - remove -Waggregate-return, -Wno-uninitialized, -Wno-unused-value flags
 
 - add -Werror=unknown-warning-option for Clang
 
 - add 128-bit built-in types support to genksyms
 
 - fix missed rebuild of modules.builtin
 
 - propagate 'No space left on device' error in fixdep to Make
 
 - allow Clang to use its integrated assembler
 
 - improve some coccinelle scripts
 
 - add a new flag KBUILD_ABS_SRCTREE to request Kbuild to use absolute
   path for $(srctree).
 
 - do not ignore errors when compression utility is missing
 
 - misc cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQJSBAABCgA8FiEEbmPs18K1szRHjPqEPYsBB53g2wYFAl0oxNkeHHlhbWFkYS5t
 YXNhaGlyb0Bzb2Npb25leHQuY29tAAoJED2LAQed4NsGnhcP/AuM8s+3SYFiLitJ
 ISbznLFP2Xatq0SPXp5+moez/AMTK6Mm1biPcdo20d+TjVEh4+9F2nq12Ii9U8/D
 tds9A6G8+Bb28r9GMIVQPdFohijW6ijtDziS31iQnIWyPsP/yx6PKfLAD9F4ca1x
 7/4btmu+BOMjtN0NrMWSNz5MM47xUzoWIALL40SV4PzGVXLCQZ2PBNPeSRIk22Jt
 ynDNPuNsmDWcFfwAE+sLSDrhCHZlwM8rg8rf6jmYdc4LcN4cj0oho5+K1TRyC9mn
 fO3PT25juFejthxQulxEfyGggnyLM6BNTgPDGcCHSP4nD7mlXA9GcpZICtJOgGGu
 SlDadMZ0GRMK5zcZ0MF0GQboeyViwsbXgrRcYuXt6cUFWX4P/1SeAQ5Mf4u1EKqf
 hEbwFXV/g81ht0lFS8gyWkvdpoNPtxGHNPusLjp65C4rc0/48/s+7EE/u8JTPl1g
 dQTeIOds6XUOkJgqhEfuq+8gfngbjKc9bYhs+ACbkCzBltQdnb6m5aLgk0ODxe8I
 WbGn0+cQcS9VVwre7E5DnFSVWVOHAG5taiUwj0KDcHB0Jxw9Gvorq9WU1ppHHYH2
 XQIFBx7XHdn28d+plS8R23vAPgDgrGdvE5RYK5tNQLhTJ6BbjlZ1n/Tmxzu62scK
 deG3aCOB13Om7OTzTUh9+C3TC9ZQ
 =E2Rz
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - remove headers_{install,check}_all targets

 - remove unreasonable 'depends on !UML' from CONFIG_SAMPLES

 - re-implement 'make headers_install' more cleanly

 - add new header-test-y syntax to compile-test headers

 - compile-test exported headers to ensure they are compilable in
   user-space

 - compile-test headers under include/ to ensure they are self-contained

 - remove -Waggregate-return, -Wno-uninitialized, -Wno-unused-value
   flags

 - add -Werror=unknown-warning-option for Clang

 - add 128-bit built-in types support to genksyms

 - fix missed rebuild of modules.builtin

 - propagate 'No space left on device' error in fixdep to Make

 - allow Clang to use its integrated assembler

 - improve some coccinelle scripts

 - add a new flag KBUILD_ABS_SRCTREE to request Kbuild to use absolute
   path for $(srctree).

 - do not ignore errors when compression utility is missing

 - misc cleanups

* tag 'kbuild-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (49 commits)
  kbuild: use -- separater intead of $(filter-out ...) for cc-cross-prefix
  kbuild: Inform user to pass ARCH= for make mrproper
  kbuild: fix compression errors getting ignored
  kbuild: add a flag to force absolute path for srctree
  kbuild: replace KBUILD_SRCTREE with boolean building_out_of_srctree
  kbuild: remove src and obj from the top Makefile
  scripts/tags.sh: remove unused environment variables from comments
  scripts/tags.sh: drop SUBARCH support for ARM
  kbuild: compile-test kernel headers to ensure they are self-contained
  kheaders: include only headers into kheaders_data.tar.xz
  kheaders: remove meaningless -R option of 'ls'
  kbuild: support header-test-pattern-y
  kbuild: do not create wrappers for header-test-y
  kbuild: compile-test exported headers to ensure they are self-contained
  init/Kconfig: add CONFIG_CC_CAN_LINK
  kallsyms: exclude kasan local symbols on s390
  kbuild: add more hints about SUBDIRS replacement
  coccinelle: api/stream_open: treat all wait_.*() calls as blocking
  coccinelle: put_device: Add a cast to an expression for an assignment
  coccinelle: put_device: Adjust a message construction
  ...
2019-07-12 16:03:16 -07:00
Linus Torvalds e9a83bd232 It's been a relatively busy cycle for docs:
- A fair pile of RST conversions, many from Mauro.  These create more
    than the usual number of simple but annoying merge conflicts with other
    trees, unfortunately.  He has a lot more of these waiting on the wings
    that, I think, will go to you directly later on.
 
  - A new document on how to use merges and rebases in kernel repos, and one
    on Spectre vulnerabilities.
 
  - Various improvements to the build system, including automatic markup of
    function() references because some people, for reasons I will never
    understand, were of the opinion that :c:func:``function()`` is
    unattractive and not fun to type.
 
  - We now recommend using sphinx 1.7, but still support back to 1.4.
 
  - Lots of smaller improvements, warning fixes, typo fixes, etc.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAl0krAEPHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5Yg98H/AuLqO9LpOgUjF4LhyjxGPdzJkY9RExSJ7km
 gznyreLCZgFaJR+AY6YDsd4Jw6OJlPbu1YM/Qo3C3WrZVFVhgL/s2ebvBgCo50A8
 raAFd8jTf4/mGCHnAqRotAPQ3mETJUk315B66lBJ6Oc+YdpRhwXWq8ZW2bJxInFF
 3HDvoFgMf0KhLuMHUkkL0u3fxH1iA+KvDu8diPbJYFjOdOWENz/CV8wqdVkXRSEW
 DJxIq89h/7d+hIG3d1I7Nw+gibGsAdjSjKv4eRKauZs4Aoxd1Gpl62z0JNk6aT3m
 dtq4joLdwScydonXROD/Twn2jsu4xYTrPwVzChomElMowW/ZBBY=
 =D0eO
 -----END PGP SIGNATURE-----

Merge tag 'docs-5.3' of git://git.lwn.net/linux

Pull Documentation updates from Jonathan Corbet:
 "It's been a relatively busy cycle for docs:

   - A fair pile of RST conversions, many from Mauro. These create more
     than the usual number of simple but annoying merge conflicts with
     other trees, unfortunately. He has a lot more of these waiting on
     the wings that, I think, will go to you directly later on.

   - A new document on how to use merges and rebases in kernel repos,
     and one on Spectre vulnerabilities.

   - Various improvements to the build system, including automatic
     markup of function() references because some people, for reasons I
     will never understand, were of the opinion that
     :c:func:``function()`` is unattractive and not fun to type.

   - We now recommend using sphinx 1.7, but still support back to 1.4.

   - Lots of smaller improvements, warning fixes, typo fixes, etc"

* tag 'docs-5.3' of git://git.lwn.net/linux: (129 commits)
  docs: automarkup.py: ignore exceptions when seeking for xrefs
  docs: Move binderfs to admin-guide
  Disable Sphinx SmartyPants in HTML output
  doc: RCU callback locks need only _bh, not necessarily _irq
  docs: format kernel-parameters -- as code
  Doc : doc-guide : Fix a typo
  platform: x86: get rid of a non-existent document
  Add the RCU docs to the core-api manual
  Documentation: RCU: Add TOC tree hooks
  Documentation: RCU: Rename txt files to rst
  Documentation: RCU: Convert RCU UP systems to reST
  Documentation: RCU: Convert RCU linked list to reST
  Documentation: RCU: Convert RCU basic concepts to reST
  docs: filesystems: Remove uneeded .rst extension on toctables
  scripts/sphinx-pre-install: fix out-of-tree build
  docs: zh_CN: submitting-drivers.rst: Remove a duplicated Documentation/
  Documentation: PGP: update for newer HW devices
  Documentation: Add section about CPU vulnerabilities for Spectre
  Documentation: platform: Delete x86-laptop-drivers.txt
  docs: Note that :c:func: should no longer be used
  ...
2019-07-09 12:34:26 -07:00
Linus Torvalds 3b99107f0e for-5.3/block-20190708
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl0jrIMQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgptlFD/9CNsBX+Aap2lO6wKNr6QISwNAK76GMzEay
 s4LSY2kGkXvzv8i89mCuY+8UVNI8WH2/22WnU+8CBAJOjWyFQMsIwH/mrq0oZWRD
 J6STJE8rTr6Fc2MvJUWryp/xdBh3+eDIsAdIZVHVAkIzqYPBnpIAwEIeIw8t0xsm
 v9ngpQ3WD6ep8tOj9pnG1DGKFg1CmukZCC/Y4CQV1vZtmm2I935zUwNV/TB+Egfx
 G8JSC0cSV02LMK88HCnA6MnC/XSUC0qgfXbnmP+TpKlgjVX+P/fuB3oIYcZEu2Rk
 3YBpIkhsQytKYbF42KRLsmBH72u6oB9G+tNZTgB1STUDrZqdtD9xwX1rjDlY0ZzP
 EUDnk48jl/cxbs+VZrHoE2TcNonLiymV7Kb92juHXdIYmKFQStprGcQUbMaTkMfB
 6BYrYLifWx0leu1JJ1i7qhNmug94BYCSCxcRmH0p6kPazPcY9LXNmDWMfMuBPZT7
 z79VLZnHF2wNXJyT1cBluwRYYJRT4osWZ3XUaBWFKDgf1qyvXJfrN/4zmgkEIyW7
 ivXC+KLlGkhntDlWo2pLKbbyOIKY1HmU6aROaI11k5Zyh0ixKB7tHKavK39l+NOo
 YB41+4l6VEpQEyxyRk8tO0sbHpKaKB+evVIK3tTwbY+Q0qTExErxjfWUtOgRWhjx
 iXJssPRo4w==
 =VSYT
 -----END PGP SIGNATURE-----

Merge tag 'for-5.3/block-20190708' of git://git.kernel.dk/linux-block

Pull block updates from Jens Axboe:
 "This is the main block updates for 5.3. Nothing earth shattering or
  major in here, just fixes, additions, and improvements all over the
  map. This contains:

   - Series of documentation fixes (Bart)

   - Optimization of the blk-mq ctx get/put (Bart)

   - null_blk removal race condition fix (Bob)

   - req/bio_op() cleanups (Chaitanya)

   - Series cleaning up the segment accounting, and request/bio mapping
     (Christoph)

   - Series cleaning up the page getting/putting for bios (Christoph)

   - block cgroup cleanups and moving it to where it is used (Christoph)

   - block cgroup fixes (Tejun)

   - Series of fixes and improvements to bcache, most notably a write
     deadlock fix (Coly)

   - blk-iolatency STS_AGAIN and accounting fixes (Dennis)

   - Series of improvements and fixes to BFQ (Douglas, Paolo)

   - debugfs_create() return value check removal for drbd (Greg)

   - Use struct_size(), where appropriate (Gustavo)

   - Two lighnvm fixes (Heiner, Geert)

   - MD fixes, including a read balance and corruption fix (Guoqing,
     Marcos, Xiao, Yufen)

   - block opal shadow mbr additions (Jonas, Revanth)

   - sbitmap compare-and-exhange improvemnts (Pavel)

   - Fix for potential bio->bi_size overflow (Ming)

   - NVMe pull requests:
       - improved PCIe suspent support (Keith Busch)
       - error injection support for the admin queue (Akinobu Mita)
       - Fibre Channel discovery improvements (James Smart)
       - tracing improvements including nvmetc tracing support (Minwoo Im)
       - misc fixes and cleanups (Anton Eidelman, Minwoo Im, Chaitanya
         Kulkarni)"

   - Various little fixes and improvements to drivers and core"

* tag 'for-5.3/block-20190708' of git://git.kernel.dk/linux-block: (153 commits)
  blk-iolatency: fix STS_AGAIN handling
  block: nr_phys_segments needs to be zero for REQ_OP_WRITE_ZEROES
  blk-mq: simplify blk_mq_make_request()
  blk-mq: remove blk_mq_put_ctx()
  sbitmap: Replace cmpxchg with xchg
  block: fix .bi_size overflow
  block: sed-opal: check size of shadow mbr
  block: sed-opal: ioctl for writing to shadow mbr
  block: sed-opal: add ioctl for done-mark of shadow mbr
  block: never take page references for ITER_BVEC
  direct-io: use bio_release_pages in dio_bio_complete
  block_dev: use bio_release_pages in bio_unmap_user
  block_dev: use bio_release_pages in blkdev_bio_end_io
  iomap: use bio_release_pages in iomap_dio_bio_end_io
  block: use bio_release_pages in bio_map_user_iov
  block: use bio_release_pages in bio_unmap_user
  block: optionally mark pages dirty in bio_release_pages
  block: move the BIO_NO_PAGE_REF check into bio_release_pages
  block: skd_main.c: Remove call to memset after dma_alloc_coherent
  block: mtip32xx: Remove call to memset after dma_alloc_coherent
  ...
2019-07-09 10:45:06 -07:00
Masahiro Yamada 43c78d8803 kbuild: compile-test kernel headers to ensure they are self-contained
The headers in include/ are globally used in the kernel source tree
to provide common APIs. They are included from external modules, too.

It will be useful to make as many headers self-contained as possible
so that we do not have to rely on a specific include order.

There are more than 4000 headers in include/. In my rough analysis,
70% of them are already self-contained. With efforts, most of them
can be self-contained.

For now, we must exclude more than 1000 headers just because they
cannot be compiled as standalone units. I added them to header-test-.
The blacklist was mostly generated by a script, so the reason of the
breakage should be checked later.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Tested-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
2019-07-09 21:44:37 +09:00
Linus Torvalds 92c1d65221 Merge branch 'for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
 "Documentation updates and the addition of cgroup_parse_float() which
  will be used by new controllers including blk-iocost"

* 'for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  docs: cgroup-v1: convert docs to ReST and rename to *.rst
  cgroup: Move cgroup_parse_float() implementation out of CONFIG_SYSFS
  cgroup: add cgroup_parse_float()
2019-07-08 21:35:12 -07:00
Masahiro Yamada d6fc9fcbaa kbuild: compile-test exported headers to ensure they are self-contained
Multiple people have suggested compile-testing UAPI headers to ensure
they can be really included from user-space. "make headers_check" is
obviously not enough to catch bugs, and we often leak unresolved
references to user-space.

Use the new header-test-y syntax to implement it. Please note exported
headers are compile-tested with a completely different set of compiler
flags. The header search path is set to $(objtree)/usr/include since
exported headers should not include unexported ones.

We use -std=gnu89 for the kernel space since the kernel code highly
depends on GNU extensions. On the other hand, UAPI headers should be
written in more standardized C, so they are compiled with -std=c90.
This will emit errors if C++ style comments, the keyword 'inline', etc.
are used. Please use C style comments (/* ... */), '__inline__', etc.
in UAPI headers.

There is additional compiler requirement to enable this test because
many of UAPI headers include <stdlib.h>, <sys/ioctl.h>, <sys/time.h>,
etc. directly or indirectly. You cannot use kernel.org pre-built
toolchains [1] since they lack <stdlib.h>.

I reused CONFIG_CC_CAN_LINK to check the system header availability.
The intention is slightly different, but a compiler that can link
userspace programs provide system headers.

For now, a lot of headers need to be excluded because they cannot
be compiled standalone, but this is a good start point.

[1] https://mirrors.edge.kernel.org/pub/tools/crosstool/index.html

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
2019-07-08 23:13:57 +09:00
Masahiro Yamada 1a927fd347 init/Kconfig: add CONFIG_CC_CAN_LINK
Currently, scripts/cc-can-link.sh is run just for BPFILTER_UMH, but
defining CC_CAN_LINK will be useful in other places.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2019-07-08 02:25:59 +09:00
Patrick Bellasi 69842cba9a sched/uclamp: Add CPU's clamp buckets refcounting
Utilization clamping allows to clamp the CPU's utilization within a
[util_min, util_max] range, depending on the set of RUNNABLE tasks on
that CPU. Each task references two "clamp buckets" defining its minimum
and maximum (util_{min,max}) utilization "clamp values". A CPU's clamp
bucket is active if there is at least one RUNNABLE tasks enqueued on
that CPU and refcounting that bucket.

When a task is {en,de}queued {on,from} a rq, the set of active clamp
buckets on that CPU can change. If the set of active clamp buckets
changes for a CPU a new "aggregated" clamp value is computed for that
CPU. This is because each clamp bucket enforces a different utilization
clamp value.

Clamp values are always MAX aggregated for both util_min and util_max.
This ensures that no task can affect the performance of other
co-scheduled tasks which are more boosted (i.e. with higher util_min
clamp) or less capped (i.e. with higher util_max clamp).

A task has:
   task_struct::uclamp[clamp_id]::bucket_id
to track the "bucket index" of the CPU's clamp bucket it refcounts while
enqueued, for each clamp index (clamp_id).

A runqueue has:
   rq::uclamp[clamp_id]::bucket[bucket_id].tasks
to track how many RUNNABLE tasks on that CPU refcount each
clamp bucket (bucket_id) of a clamp index (clamp_id).
It also has a:
   rq::uclamp[clamp_id]::bucket[bucket_id].value
to track the clamp value of each clamp bucket (bucket_id) of a clamp
index (clamp_id).

The rq::uclamp::bucket[clamp_id][] array is scanned every time it's
needed to find a new MAX aggregated clamp value for a clamp_id. This
operation is required only when it's dequeued the last task of a clamp
bucket tracking the current MAX aggregated clamp value. In this case,
the CPU is either entering IDLE or going to schedule a less boosted or
more clamped task.
The expected number of different clamp values configured at build time
is small enough to fit the full unordered array into a single cache
line, for configurations of up to 7 buckets.

Add to struct rq the basic data structures required to refcount the
number of RUNNABLE tasks for each clamp bucket. Add also the max
aggregation required to update the rq's clamp value at each
enqueue/dequeue event.

Use a simple linear mapping of clamp values into clamp buckets.
Pre-compute and cache bucket_id to avoid integer divisions at
enqueue/dequeue time.

Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alessio Balsini <balsini@android.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Todd Kjos <tkjos@google.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lkml.kernel.org/r/20190621084217.8167-2-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-24 19:23:44 +02:00
Christoph Hellwig 8060c47ba8 block: rename CONFIG_DEBUG_BLK_CGROUP to CONFIG_BFQ_CGROUP_DEBUG
This option is entirely bfq specific, give it an appropinquate name.

Also make it depend on CONFIG_BFQ_GROUP_IOSCHED in Kconfig, as all
the functionality already does so anyway.

Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-20 10:32:35 -06:00
Jani Nikula e846f0dc57 kbuild: add support for ensuring headers are self-contained
Sometimes it's useful to be able to explicitly ensure certain headers
remain self-contained, i.e. that they are compilable as standalone
units, by including and/or forward declaring everything they depend on.

Add special target header-test-y where individual Makefiles can add
headers to be tested if CONFIG_HEADER_TEST is enabled. This will
generate a dummy C file per header that gets built as part of extra-y.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2019-06-15 19:57:02 +09:00
Mauro Carvalho Chehab d6a3b24762 docs: scheduler: convert docs to ReST and rename to *.rst
In order to prepare to add them to the Kernel API book,
convert the files to ReST format.

The conversion is actually:
  - add blank lines and identation in order to identify paragraphs;
  - fix tables markups;
  - add some lists markups;
  - mark literal blocks;
  - adjust title markups.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2019-06-14 14:32:18 -06:00
Mauro Carvalho Chehab 99c8b231ae docs: cgroup-v1: convert docs to ReST and rename to *.rst
Convert the cgroup-v1 files to ReST format, in order to
allow a later addition to the admin-guide.

The conversion is actually:
  - add blank lines and identation in order to identify paragraphs;
  - fix tables markups;
  - add some lists markups;
  - mark literal blocks;
  - adjust title markups.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2019-06-14 13:29:54 -07:00
Linus Torvalds 1ce2c85137 Char/Misc driver fixes for 5.2-rc4
Here are some small char and misc driver fixes for 5.2-rc4 to resolve a
 number of reported issues.
 
 The most "notable" one here is the kernel headers in proc^Wsysfs fixes.
 Those changes move the header file info into sysfs and fixes the build
 issues that you reported.
 
 Other than that, a bunch of small habanalabs driver fixes, some fpga
 driver fixes, and a few other tiny driver fixes.
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXPuEVg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+yma0QCfVPa7r1rHqljz1UgvjKJTzVg8g9wAn1W1mddx
 MIlG+0+ZnBdaPzyzoY1O
 =0LJD
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
 "Here are some small char and misc driver fixes for 5.2-rc4 to resolve
  a number of reported issues.

  The most "notable" one here is the kernel headers in proc^Wsysfs
  fixes. Those changes move the header file info into sysfs and fixes
  the build issues that you reported.

  Other than that, a bunch of small habanalabs driver fixes, some fpga
  driver fixes, and a few other tiny driver fixes.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'char-misc-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  habanalabs: Read upper bits of trace buffer from RWPHI
  habanalabs: Fix virtual address access via debugfs for 2MB pages
  fpga: zynqmp-fpga: Correctly handle error pointer
  habanalabs: fix bug in checking huge page optimization
  habanalabs: Avoid using a non-initialized MMU cache mutex
  habanalabs: fix debugfs code
  uapi/habanalabs: add opcode for enable/disable device debug mode
  habanalabs: halt debug engines on user process close
  test_firmware: Use correct snprintf() limit
  genwqe: Prevent an integer overflow in the ioctl
  parport: Fix mem leak in parport_register_dev_model
  fpga: dfl: expand minor range when registering chrdev region
  fpga: dfl: Add lockdep classes for pdata->lock
  fpga: dfl: afu: Pass the correct device to dma_mapping_error()
  fpga: stratix10-soc: fix use-after-free on s10_init()
  w1: ds2408: Fix typo after 49695ac468 (reset on output_write retry with readback)
  kheaders: Do not regenerate archive if config is not changed
  kheaders: Move from proc to sysfs
  lkdtm/bugs: Adjust recursion test to avoid elision
  lkdtm/usercopy: Moves the KERNEL_DS test to non-canonical
2019-06-08 12:50:36 -07:00
Joel Fernandes (Google) f7b101d330 kheaders: Move from proc to sysfs
The kheaders archive consisting of the kernel headers used for compiling
bpf programs is in /proc. However there is concern that moving it here
will make it permanent. Let us move it to /sys/kernel as discussed [1].

[1] https://lore.kernel.org/patchwork/patch/1067310/#1265969

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-24 20:16:01 +02:00
Thomas Gleixner ec8f24b7fa treewide: Add SPDX license identifier - Makefile/Kconfig
Add SPDX license identifiers to all Make/Kconfig files which:

 - Have no license information of any form

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

  GPL-2.0-only

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21 10:50:46 +02:00
Dan Williams e900a918b0 mm: shuffle initial free memory to improve memory-side-cache utilization
Patch series "mm: Randomize free memory", v10.

This patch (of 3):

Randomization of the page allocator improves the average utilization of
a direct-mapped memory-side-cache.  Memory side caching is a platform
capability that Linux has been previously exposed to in HPC
(high-performance computing) environments on specialty platforms.  In
that instance it was a smaller pool of high-bandwidth-memory relative to
higher-capacity / lower-bandwidth DRAM.  Now, this capability is going
to be found on general purpose server platforms where DRAM is a cache in
front of higher latency persistent memory [1].

Robert offered an explanation of the state of the art of Linux
interactions with memory-side-caches [2], and I copy it here:

    It's been a problem in the HPC space:
    http://www.nersc.gov/research-and-development/knl-cache-mode-performance-coe/

    A kernel module called zonesort is available to try to help:
    https://software.intel.com/en-us/articles/xeon-phi-software

    and this abandoned patch series proposed that for the kernel:
    https://lkml.kernel.org/r/20170823100205.17311-1-lukasz.daniluk@intel.com

    Dan's patch series doesn't attempt to ensure buffers won't conflict, but
    also reduces the chance that the buffers will. This will make performance
    more consistent, albeit slower than "optimal" (which is near impossible
    to attain in a general-purpose kernel).  That's better than forcing
    users to deploy remedies like:
        "To eliminate this gradual degradation, we have added a Stream
         measurement to the Node Health Check that follows each job;
         nodes are rebooted whenever their measured memory bandwidth
         falls below 300 GB/s."

A replacement for zonesort was merged upstream in commit cc9aec03e5
("x86/numa_emulation: Introduce uniform split capability").  With this
numa_emulation capability, memory can be split into cache sized
("near-memory" sized) numa nodes.  A bind operation to such a node, and
disabling workloads on other nodes, enables full cache performance.
However, once the workload exceeds the cache size then cache conflicts
are unavoidable.  While HPC environments might be able to tolerate
time-scheduling of cache sized workloads, for general purpose server
platforms, the oversubscribed cache case will be the common case.

The worst case scenario is that a server system owner benchmarks a
workload at boot with an un-contended cache only to see that performance
degrade over time, even below the average cache performance due to
excessive conflicts.  Randomization clips the peaks and fills in the
valleys of cache utilization to yield steady average performance.

Here are some performance impact details of the patches:

1/ An Intel internal synthetic memory bandwidth measurement tool, saw a
   3X speedup in a contrived case that tries to force cache conflicts.
   The contrived cased used the numa_emulation capability to force an
   instance of the benchmark to be run in two of the near-memory sized
   numa nodes.  If both instances were placed on the same emulated they
   would fit and cause zero conflicts.  While on separate emulated nodes
   without randomization they underutilized the cache and conflicted
   unnecessarily due to the in-order allocation per node.

2/ A well known Java server application benchmark was run with a heap
   size that exceeded cache size by 3X.  The cache conflict rate was 8%
   for the first run and degraded to 21% after page allocator aging.  With
   randomization enabled the rate levelled out at 11%.

3/ A MongoDB workload did not observe measurable difference in
   cache-conflict rates, but the overall throughput dropped by 7% with
   randomization in one case.

4/ Mel Gorman ran his suite of performance workloads with randomization
   enabled on platforms without a memory-side-cache and saw a mix of some
   improvements and some losses [3].

While there is potentially significant improvement for applications that
depend on low latency access across a wide working-set, the performance
may be negligible to negative for other workloads.  For this reason the
shuffle capability defaults to off unless a direct-mapped
memory-side-cache is detected.  Even then, the page_alloc.shuffle=0
parameter can be specified to disable the randomization on those systems.

Outside of memory-side-cache utilization concerns there is potentially
security benefit from randomization.  Some data exfiltration and
return-oriented-programming attacks rely on the ability to infer the
location of sensitive data objects.  The kernel page allocator, especially
early in system boot, has predictable first-in-first out behavior for
physical pages.  Pages are freed in physical address order when first
onlined.

Quoting Kees:
    "While we already have a base-address randomization
     (CONFIG_RANDOMIZE_MEMORY), attacks against the same hardware and
     memory layouts would certainly be using the predictability of
     allocation ordering (i.e. for attacks where the base address isn't
     important: only the relative positions between allocated memory).
     This is common in lots of heap-style attacks. They try to gain
     control over ordering by spraying allocations, etc.

     I'd really like to see this because it gives us something similar
     to CONFIG_SLAB_FREELIST_RANDOM but for the page allocator."

While SLAB_FREELIST_RANDOM reduces the predictability of some local slab
caches it leaves vast bulk of memory to be predictably in order allocated.
However, it should be noted, the concrete security benefits are hard to
quantify, and no known CVE is mitigated by this randomization.

Introduce shuffle_free_memory(), and its helper shuffle_zone(), to perform
a Fisher-Yates shuffle of the page allocator 'free_area' lists when they
are initially populated with free memory at boot and at hotplug time.  Do
this based on either the presence of a page_alloc.shuffle=Y command line
parameter, or autodetection of a memory-side-cache (to be added in a
follow-on patch).

The shuffling is done in terms of CONFIG_SHUFFLE_PAGE_ORDER sized free
pages where the default CONFIG_SHUFFLE_PAGE_ORDER is MAX_ORDER-1 i.e.  10,
4MB this trades off randomization granularity for time spent shuffling.
MAX_ORDER-1 was chosen to be minimally invasive to the page allocator
while still showing memory-side cache behavior improvements, and the
expectation that the security implications of finer granularity
randomization is mitigated by CONFIG_SLAB_FREELIST_RANDOM.  The
performance impact of the shuffling appears to be in the noise compared to
other memory initialization work.

This initial randomization can be undone over time so a follow-on patch is
introduced to inject entropy on page free decisions.  It is reasonable to
ask if the page free entropy is sufficient, but it is not enough due to
the in-order initial freeing of pages.  At the start of that process
putting page1 in front or behind page0 still keeps them close together,
page2 is still near page1 and has a high chance of being adjacent.  As
more pages are added ordering diversity improves, but there is still high
page locality for the low address pages and this leads to no significant
impact to the cache conflict rate.

[1]: https://itpeernetwork.intel.com/intel-optane-dc-persistent-memory-operating-modes/
[2]: https://lkml.kernel.org/r/AT5PR8401MB1169D656C8B5E121752FC0F8AB120@AT5PR8401MB1169.NAMPRD84.PROD.OUTLOOK.COM
[3]: https://lkml.org/lkml/2018/10/12/309

[dan.j.williams@intel.com: fix shuffle enable]
  Link: http://lkml.kernel.org/r/154943713038.3858443.4125180191382062871.stgit@dwillia2-desk3.amr.corp.intel.com
[cai@lca.pw: fix SHUFFLE_PAGE_ALLOCATOR help texts]
  Link: http://lkml.kernel.org/r/20190425201300.75650-1-cai@lca.pw
Link: http://lkml.kernel.org/r/154899811738.3165233.12325692939590944259.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Robert Elliott <elliott@hpe.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 19:52:48 -07:00
Linus Torvalds cf482a49af Driver core/kobject patches for 5.2-rc1
Here is the "big" set of driver core patches for 5.2-rc1
 
 There are a number of ACPI patches in here as well, as Rafael said they
 should go through this tree due to the driver core changes they
 required.  They have all been acked by the ACPI developers.
 
 There are also a number of small subsystem-specific changes in here, due
 to some changes to the kobject core code.  Those too have all been acked
 by the various subsystem maintainers.
 
 As for content, it's pretty boring outside of the ACPI changes:
   - spdx cleanups
   - kobject documentation updates
   - default attribute groups for kobjects
   - other minor kobject/driver core fixes
 
 All have been in linux-next for a while with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXNHDbw8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ynDAgCfbb4LBR6I50wFXb8JM/R6cAS7qrsAn1unshKV
 8XCYcif2RxjtdJWXbjdm
 =/rLh
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core/kobject updates from Greg KH:
 "Here is the "big" set of driver core patches for 5.2-rc1

  There are a number of ACPI patches in here as well, as Rafael said
  they should go through this tree due to the driver core changes they
  required. They have all been acked by the ACPI developers.

  There are also a number of small subsystem-specific changes in here,
  due to some changes to the kobject core code. Those too have all been
  acked by the various subsystem maintainers.

  As for content, it's pretty boring outside of the ACPI changes:
   - spdx cleanups
   - kobject documentation updates
   - default attribute groups for kobjects
   - other minor kobject/driver core fixes

  All have been in linux-next for a while with no reported issues"

* tag 'driver-core-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (47 commits)
  kobject: clean up the kobject add documentation a bit more
  kobject: Fix kernel-doc comment first line
  kobject: Remove docstring reference to kset
  firmware_loader: Fix a typo ("syfs" -> "sysfs")
  kobject: fix dereference before null check on kobj
  Revert "driver core: platform: Fix the usage of platform device name(pdev->name)"
  init/config: Do not select BUILD_BIN2C for IKCONFIG
  Provide in-kernel headers to make extending kernel easier
  kobject: Improve doc clarity kobject_init_and_add()
  kobject: Improve docs for kobject_add/del
  driver core: platform: Fix the usage of platform device name(pdev->name)
  livepatch: Replace klp_ktype_patch's default_attrs with groups
  cpufreq: schedutil: Replace default_attrs field with groups
  padata: Replace padata_attr_type default_attrs field with groups
  irqdesc: Replace irq_kobj_type's default_attrs field with groups
  net-sysfs: Replace ktype default_attrs field with groups
  block: Replace all ktype default_attrs with groups
  samples/kobject: Replace foo_ktype's default_attrs field with groups
  kobject: Add support for default attribute groups to kobj_type
  driver core: Postpone DMA tear-down until after devres release for probe failure
  ...
2019-05-07 13:01:40 -07:00
Joel Fernandes (Google) bc0c60457c init/config: Do not select BUILD_BIN2C for IKCONFIG
Since commit 13610aa908 ("kernel/configs: use .incbin directive to
embed config_data.gz"), IKCONFIG no longer uses BUILD_BIN2C so prevent
it from being selected in Kconfig.

Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-29 16:48:04 +02:00
Joel Fernandes (Google) 43d8ce9d65 Provide in-kernel headers to make extending kernel easier
Introduce in-kernel headers which are made available as an archive
through proc (/proc/kheaders.tar.xz file). This archive makes it
possible to run eBPF and other tracing programs that need to extend the
kernel for tracing purposes without any dependency on the file system
having headers.

A github PR is sent for the corresponding BCC patch at:
https://github.com/iovisor/bcc/pull/2312

On Android and embedded systems, it is common to switch kernels but not
have kernel headers available on the file system. Further once a
different kernel is booted, any headers stored on the file system will
no longer be useful. This is an issue even well known to distros.
By storing the headers as a compressed archive within the kernel, we can
avoid these issues that have been a hindrance for a long time.

The best way to use this feature is by building it in. Several users
have a need for this, when they switch debug kernels, they do not want to
update the filesystem or worry about it where to store the headers on
it. However, the feature is also buildable as a module in case the user
desires it not being part of the kernel image. This makes it possible to
load and unload the headers from memory on demand. A tracing program can
load the module, do its operations, and then unload the module to save
kernel memory. The total memory needed is 3.3MB.

By having the archive available at a fixed location independent of
filesystem dependencies and conventions, all debugging tools can
directly refer to the fixed location for the archive, without concerning
with where the headers on a typical filesystem which significantly
simplifies tooling that needs kernel headers.

The code to read the headers is based on /proc/config.gz code and uses
the same technique to embed the headers.

Other approaches were discussed such as having an in-memory mountable
filesystem, but that has drawbacks such as requiring an in-kernel xz
decompressor which we don't have today, and requiring usage of 42 MB of
kernel memory to host the decompressed headers at anytime. Also this
approach is simpler than such approaches.

Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-29 16:48:03 +02:00
David Howells 5dd50aaeb1
Make anon_inodes unconditional
Make the anon_inodes facility unconditional so that it can be used by core
VFS code and pidfd code.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[christian@brauner.io: adapt commit message to mention pidfds]
Signed-off-by: Christian Brauner <christian@brauner.io>
2019-04-19 14:03:11 +02:00
Linus Torvalds ffd602eb46 Kbuild updates for v5.1
- do not generate unneeded top-level built-in.a
 
  - let git ignore O= directory entirely
 
  - optimize scripts/kallsyms slightly
 
  - exclude DWARF info from *.s regardless of config options
 
  - fix GCC toolchain search path for Clang to prepare ld.lld support
 
  - do not generate modules.order when CONFIG_MODULES is disabled
 
  - simplify single target rules and remove VPATH for external module build
 
  - allow to add optional flags to dpkg-buildpackage when building deb-pkg
 
  - move some compiler option tests from Makefile to Kconfig
 
  - various Makefile cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJcgxYUAAoJED2LAQed4NsGr7YQAJq4LmN/aZDI9Mt0YAQjEyyA
 PCpm8J2HI9HO1sMoY7J/ksWmV0BU25G+uspKD7dXAQo3l9fmahQM5e4dsyZ4Xqs8
 DyyYSGtJJnMJaWmupIZNA4UKDCVtwPoVW8YeuK9rwADVokCux9avogof9O1OoA/E
 Pylo+I4UCM82kbpZSd+UxnCx6B0v8XGtW+d31Q4yZXCkw5nw14chrlaprcqB3UgB
 +7C3xOnDWCi7gyxaTqmD7dLay2DM8KCDlznEvBL733Y/cK3to1fywzEPzp0JQCLX
 BLgmmpW13NF++q5BCoTW6sFjZAhBVbiYZwesMrCi75Y32T8zt4G5l4pkvGkSuGF/
 UQh5aoCxaMIp70VPj/loZ0lh78nwVGTok9zRb0rfztM0X4DbmiPi5MNiHRzRpIeE
 1jjEa/GK1t0TDnXc/MuDFK8cWwdhttIqUL5yWfAxjXbtP27eLtsopQUdW7EPHs7d
 sMnfuSUuhOC28yByVxIkBcwawLyYrcWRphJ3ixCO70CoJWt2DT6aOKxcFJefoJix
 Pto6Oo3oQ4iypMM5M9/0Uo+AK2TKRejWIqtZdbo+ir70tNxVH3WDZq++fG0drXOB
 r2I/GY6nRjuzLOe2jzEqywFTFd2xpk4Qo84LGb1R3U6aU5qS2gA0W/q00JS5c2qU
 R8uReJ7bvmLmrVNZ/NI4
 =y9YG
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - do not generate unneeded top-level built-in.a

 - let git ignore O= directory entirely

 - optimize scripts/kallsyms slightly

 - exclude DWARF info from *.s regardless of config options

 - fix GCC toolchain search path for Clang to prepare ld.lld support

 - do not generate modules.order when CONFIG_MODULES is disabled

 - simplify single target rules and remove VPATH for external module
   build

 - allow to add optional flags to dpkg-buildpackage when building
   deb-pkg

 - move some compiler option tests from Makefile to Kconfig

 - various Makefile cleanups

* tag 'kbuild-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (40 commits)
  kbuild: remove scripts/basic/% build target
  kbuild: use -Werror=implicit-... instead of -Werror-implicit-...
  kbuild: clean up scripts/gcc-version.sh
  kbuild: remove cc-version macro
  kbuild: update comment block of scripts/clang-version.sh
  kbuild: remove commented-out INITRD_COMPRESS
  kbuild: move -gsplit-dwarf, -gdwarf-4 option tests to Kconfig
  kbuild: [bin]deb-pkg: add DPKG_FLAGS variable
  kbuild: move ".config not found!" message from Kconfig to Makefile
  kbuild: invoke syncconfig if include/config/auto.conf.cmd is missing
  kbuild: simplify single target rules
  kbuild: remove empty rules for makefiles
  kbuild: make -r/-R effective in top Makefile for old Make versions
  kbuild: move tools_silent to a more relevant place
  kbuild: compute false-positive -Wmaybe-uninitialized cases in Kconfig
  kbuild: refactor cc-cross-prefix implementation
  kbuild: hardcode genksyms path and remove GENKSYMS variable
  scripts/gdb: refactor rules for symlink creation
  kbuild: create symlink to vmlinux-gdb.py in scripts_gdb target
  scripts/gdb: do not descend into scripts/gdb from scripts
  ...
2019-03-10 17:48:21 -07:00
Linus Torvalds a15f6b923e Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
 "A single fix to prevent a unmet dependencies warning in Kconfig"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  time: Make VIRT_CPU_ACCOUNTING_GEN depend on GENERIC_CLOCKEVENTS
2019-03-10 13:58:33 -07:00
Arnd Bergmann 041a15744a time: Make VIRT_CPU_ACCOUNTING_GEN depend on GENERIC_CLOCKEVENTS
Moving the CONTEXT_TRACKING Kconfig option into kernel/time/Kconfig added
an implicit dependency on the surrounding GENERIC_CLOCKEVENTS option, but
this is not always enabled when it is possible to select
VIRT_CPU_ACCOUNTING_GEN:

WARNING: unmet direct dependencies detected for CONTEXT_TRACKING
  Depends on [n]: GENERIC_CLOCKEVENTS [=n]
  Selected by [y]:
  - VIRT_CPU_ACCOUNTING_GEN [=y] && <choice> && HAVE_CONTEXT_TRACKING [=y] && HAVE_VIRT_CPU_ACCOUNTING_GEN [=y]

Platforms without GENERIC_CLOCKEVENTS are rare enough so that corner case
can be just ignored. Make it a dependency for VIRT_CPU_ACCOUNTING_GEN to
simplify the configuration.

Fixes: a4cffdad73 ("time: Move CONTEXT_TRACKING to kernel/time/Kconfig")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Paul E . McKenney" <paulmck@linux.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: https://lkml.kernel.org/r/20190304200202.1163250-1-arnd@arndb.de
2019-03-06 20:43:08 +01:00
Masahiro Yamada fa7295ab69 kbuild: clean up scripts/gcc-version.sh
Now that the Kconfig is the only user of this script, we can drop
unneeded code.

Remove the -p option, and stop prepending the output with zero,
so that Kconfig can directly use the output from this script.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2019-03-04 22:35:04 +09:00
Jens Axboe 2b188cc1bb Add io_uring IO interface
The submission queue (SQ) and completion queue (CQ) rings are shared
between the application and the kernel. This eliminates the need to
copy data back and forth to submit and complete IO.

IO submissions use the io_uring_sqe data structure, and completions
are generated in the form of io_uring_cqe data structures. The SQ
ring is an index into the io_uring_sqe array, which makes it possible
to submit a batch of IOs without them being contiguous in the ring.
The CQ ring is always contiguous, as completion events are inherently
unordered, and hence any io_uring_cqe entry can point back to an
arbitrary submission.

Two new system calls are added for this:

io_uring_setup(entries, params)
	Sets up an io_uring instance for doing async IO. On success,
	returns a file descriptor that the application can mmap to
	gain access to the SQ ring, CQ ring, and io_uring_sqes.

io_uring_enter(fd, to_submit, min_complete, flags, sigset, sigsetsize)
	Initiates IO against the rings mapped to this fd, or waits for
	them to complete, or both. The behavior is controlled by the
	parameters passed in. If 'to_submit' is non-zero, then we'll
	try and submit new IO. If IORING_ENTER_GETEVENTS is set, the
	kernel will wait for 'min_complete' events, if they aren't
	already available. It's valid to set IORING_ENTER_GETEVENTS
	and 'min_complete' == 0 at the same time, this allows the
	kernel to return already completed events without waiting
	for them. This is useful only for polling, as for IRQ
	driven IO, the application can just check the CQ ring
	without entering the kernel.

With this setup, it's possible to do async IO with a single system
call. Future developments will enable polled IO with this interface,
and polled submission as well. The latter will enable an application
to do IO without doing ANY system calls at all.

For IRQ driven IO, an application only needs to enter the kernel for
completions if it wants to wait for them to occur.

Each io_uring is backed by a workqueue, to support buffered async IO
as well. We will only punt to an async context if the command would
need to wait for IO on the device side. Any data that can be accessed
directly in the page cache is done inline. This avoids the slowness
issue of usual threadpools, since cached data is accessed as quickly
as a sync interface.

Sample application: http://git.kernel.dk/cgit/fio/plain/t/io_uring.c

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-28 08:24:23 -07:00
Masahiro Yamada b303c6df80 kbuild: compute false-positive -Wmaybe-uninitialized cases in Kconfig
Since -Wmaybe-uninitialized was introduced by GCC 4.7, we have patched
various false positives:

 - commit e74fc973b6 ("Turn off -Wmaybe-uninitialized when building
   with -Os") turned off this option for -Os.

 - commit 815eb71e71 ("Kbuild: disable 'maybe-uninitialized' warning
   for CONFIG_PROFILE_ALL_BRANCHES") turned off this option for
   CONFIG_PROFILE_ALL_BRANCHES

 - commit a76bcf557e ("Kbuild: enable -Wmaybe-uninitialized warning
   for "make W=1"") turned off this option for GCC < 4.9
   Arnd provided more explanation in https://lkml.org/lkml/2017/3/14/903

I think this looks better by shifting the logic from Makefile to Kconfig.

Link: https://github.com/ClangBuiltLinux/linux/issues/350
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
2019-02-27 21:43:20 +09:00
Johannes Weiner 7b2489d37e psi: clarify the Kconfig text for the default-disable option
The current help text caused some confusion in online forums about
whether or not to default-enable or default-disable psi in vendor
kernels.  This is because it doesn't communicate the reason for why we
made this setting configurable in the first place: that the overhead is
non-zero in an artificial scheduler stress test.

Since this isn't representative of real workloads, and the effect was
not measurable in scheduler-heavy real world applications such as the
webservers and memcache installations at Facebook, it's fair to point
out that this is a pretty cautious option to select.

Link: http://lkml.kernel.org/r/20190129233617.16767-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01 15:46:24 -08:00
Jonathan Neuschäfer 9807683384 init/Kconfig: fix grammar by moving a closing parenthesis
Link: http://lkml.kernel.org/r/20190129150813.15785-1-j.neuschaefer@gmx.net
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01 15:46:23 -08:00
Paul Burton 16fd20aa98 kbuild: Disable LD_DEAD_CODE_DATA_ELIMINATION with ftrace & GCC <= 4.7
When building using GCC 4.7 or older, -ffunction-sections & the -pg flag
used by ftrace are incompatible. This causes warnings or build failures
(where -Werror applies) such as the following:

  arch/mips/generic/init.c:
    error: -ffunction-sections disabled; it makes profiling impossible

This used to be taken into account by the ordering of calls to cc-option
from within the top-level Makefile, which was introduced by commit
90ad4052e8 ("kbuild: avoid conflict between -ffunction-sections and
-pg on gcc-4.7"). Unfortunately this was broken when the
CONFIG_LD_DEAD_CODE_DATA_ELIMINATION cc-option check was moved to
Kconfig in commit e85d1d65cd ("kbuild: test dead code/data elimination
support in Kconfig"), because the flags used by this check no longer
include -pg.

Fix this by not allowing CONFIG_LD_DEAD_CODE_DATA_ELIMINATION to be
enabled at the same time as ftrace/CONFIG_FUNCTION_TRACER when building
using GCC 4.7 or older.

Signed-off-by: Paul Burton <paul.burton@mips.com>
Fixes: e85d1d65cd ("kbuild: test dead code/data elimination support in Kconfig")
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2019-01-14 10:37:09 +09:00
Masahiro Yamada e9666d10a5 jump_label: move 'asm goto' support test to Kconfig
Currently, CONFIG_JUMP_LABEL just means "I _want_ to use jump label".

The jump label is controlled by HAVE_JUMP_LABEL, which is defined
like this:

  #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
  # define HAVE_JUMP_LABEL
  #endif

We can improve this by testing 'asm goto' support in Kconfig, then
make JUMP_LABEL depend on CC_HAS_ASM_GOTO.

Ugly #ifdef HAVE_JUMP_LABEL will go away, and CONFIG_JUMP_LABEL will
match to the real kernel capability.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
2019-01-06 09:46:51 +09:00
Linus Torvalds 047ce6d380 audit/stable-4.21 PR 20181224
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAlwhAwIUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNl1w/+PKsewN5VkmmfibIxZ+iZwe1KGB+L
 iOwkdHDkG1Bae5A7TBdbKMbHq0FdhaiDXAIFrfunBG/tbgBF9O0056edekR4rRLp
 ReGQVNpGMggiATyVKrc3vi+4+UYQqtS6N7Y8q+mMMX/hVeeESXrTAZdgxSWwsZAX
 LbYwXXYUyupLvelpkpakE6VPZEcatcYWrVK/vFKLkTt2jLLlLPtanbMf0B71TULi
 5EZSVBYWS71a6yvrrYcVDDZjgot31nVQfX4EIqE6CVcXLuL9vqbZBGKZh+iAGbjs
 UdKgaQMZ/eJ4CRYDJca0Bnba3n1AKO4uNssY0nrMW4s/inDPrJnMZ0kgGWfayE3d
 QR96aHEP5W3SZoiJCUlYm8a4JFfndYKn4YBvqjvLgIkbd784/rvI+sNGM9BF1DNP
 f05frIJVHLNO3sECKWMmQyMGWGglj7bLsjtKrai5UQReyFLpM/q/Lh3J1IHZ9KZq
 YWFTA4G0rg7x2bdEB4Qh/SaLOOHW7uyQ7IJCYfzSKsZCIO++RqCQoArxiKRE6++C
 hv0UG6NGb6Z6a+k1JSzlxCXPmcui0zow7aqEpZSl/9kiYzkLpBITha/ERP7at5M2
 W3JVNfQNn6kPtZFgmNuP7rNE9Yn6jnbIdks0nsi/J/4KUr/p2Mfc5LamyTj1unk6
 xf7S+xmOFKHAc2s=
 =PCHx
 -----END PGP SIGNATURE-----

Merge tag 'audit-pr-20181224' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit

Pull audit updates from Paul Moore:
 "In the finest of holiday of traditions, I have a number of gifts to
  share today. While most of them are re-gifts from others, unlike the
  typical re-gift, these are things you will want in and around your
  tree; I promise.

  This pull request is perhaps a bit larger than our typical PR, but
  most of it comes from Jan's rework of audit's fanotify code; a very
  welcome improvement. We ran this through our normal regression tests,
  as well as some newly created stress tests and everything looks good.

  Richard added a few patches, mostly cleaning up a few things and and
  shortening some of the audit records that we send to userspace; a
  change the userspace folks are quite happy about.

  Finally YueHaibing and I kick in a few patches to simplify things a
  bit and make the code less prone to errors.

  Lastly, I want to say thanks one more time to everyone who has
  contributed patches, testing, and code reviews for the audit subsystem
  over the past year. The project is what it is due to your help and
  contributions - thank you"

* tag 'audit-pr-20181224' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: (22 commits)
  audit: remove duplicated include from audit.c
  audit: shorten PATH cap values when zero
  audit: use current whenever possible
  audit: minimize our use of audit_log_format()
  audit: remove WATCH and TREE config options
  audit: use session_info helper
  audit: localize audit_log_session_info prototype
  audit: Use 'mark' name for fsnotify_mark variables
  audit: Replace chunk attached to mark instead of replacing mark
  audit: Simplify locking around untag_chunk()
  audit: Drop all unused chunk nodes during deletion
  audit: Guarantee forward progress of chunk untagging
  audit: Allocate fsnotify mark independently of chunk
  audit: Provide helper for dropping mark's chunk reference
  audit: Remove pointless check in insert_hash()
  audit: Factor out chunk replacement code
  audit: Make hash table insertion safe against concurrent lookups
  audit: Embed key into chunk
  audit: Fix possible tagging failures
  audit: Fix possible spurious -ENOSPC error
  ...
2018-12-27 11:58:50 -08:00
Baruch Siach 428a1cb4ba psi: fix reference to kernel commandline enable
The kernel commandline parameter named in CONFIG_PSI_DEFAULT_DISABLED
help text contradicts the documentation in kernel-parameters.txt, and
the code.  Fix that.

Link: http://lkml.kernel.org/r/20181203213416.GA12627@cmpxchg.org
Fixes: e0c274472d ("psi: make disabling/enabling easier for vendor kernels")
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-14 15:05:45 -08:00
Johannes Weiner e0c274472d psi: make disabling/enabling easier for vendor kernels
Mel Gorman reports a hackbench regression with psi that would prohibit
shipping the suse kernel with it default-enabled, but he'd still like
users to be able to opt in at little to no cost to others.

With the current combination of CONFIG_PSI and the psi_disabled bool set
from the commandline, this is a challenge.  Do the following things to
make it easier:

1. Add a config option CONFIG_PSI_DEFAULT_DISABLED that allows distros
   to enable CONFIG_PSI in their kernel but leave the feature disabled
   unless a user requests it at boot-time.

   To avoid double negatives, rename psi_disabled= to psi=.

2. Make psi_disabled a static branch to eliminate any branch costs
   when the feature is disabled.

In terms of numbers before and after this patch, Mel says:

: The following is a comparision using CONFIG_PSI=n as a baseline against
: your patch and a vanilla kernel
:
:                          4.20.0-rc4             4.20.0-rc4             4.20.0-rc4
:                 kconfigdisable-v1r1                vanilla        psidisable-v1r1
: Amean     1       1.3100 (   0.00%)      1.3923 (  -6.28%)      1.3427 (  -2.49%)
: Amean     3       3.8860 (   0.00%)      4.1230 *  -6.10%*      3.8860 (  -0.00%)
: Amean     5       6.8847 (   0.00%)      8.0390 * -16.77%*      6.7727 (   1.63%)
: Amean     7       9.9310 (   0.00%)     10.8367 *  -9.12%*      9.9910 (  -0.60%)
: Amean     12     16.6577 (   0.00%)     18.2363 *  -9.48%*     17.1083 (  -2.71%)
: Amean     18     26.5133 (   0.00%)     27.8833 *  -5.17%*     25.7663 (   2.82%)
: Amean     24     34.3003 (   0.00%)     34.6830 (  -1.12%)     32.0450 (   6.58%)
: Amean     30     40.0063 (   0.00%)     40.5800 (  -1.43%)     41.5087 (  -3.76%)
: Amean     32     40.1407 (   0.00%)     41.2273 (  -2.71%)     39.9417 (   0.50%)
:
: It's showing that the vanilla kernel takes a hit (as the bisection
: indicated it would) and that disabling PSI by default is reasonably
: close in terms of performance for this particular workload on this
: particular machine so;

Link: http://lkml.kernel.org/r/20181127165329.GA29728@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Tested-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-30 14:56:14 -08:00
Richard Guy Briggs c8fc5d49c3 audit: remove WATCH and TREE config options
Remove the CONFIG_AUDIT_WATCH and CONFIG_AUDIT_TREE config options since
they are both dependent on CONFIG_AUDITSYSCALL and force
CONFIG_FSNOTIFY.

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2018-11-19 16:29:50 -05:00
Johannes Weiner 2ce7135adc psi: cgroup support
On a system that executes multiple cgrouped jobs and independent
workloads, we don't just care about the health of the overall system, but
also that of individual jobs, so that we can ensure individual job health,
fairness between jobs, or prioritize some jobs over others.

This patch implements pressure stall tracking for cgroups.  In kernels
with CONFIG_PSI=y, cgroup2 groups will have cpu.pressure, memory.pressure,
and io.pressure files that track aggregate pressure stall times for only
the tasks inside the cgroup.

Link: http://lkml.kernel.org/r/20180828172258.3185-10-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Daniel Drake <drake@endlessm.com>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <jweiner@fb.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Enderborg <peter.enderborg@sony.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-26 16:26:32 -07:00
Johannes Weiner eb414681d5 psi: pressure stall information for CPU, memory, and IO
When systems are overcommitted and resources become contended, it's hard
to tell exactly the impact this has on workload productivity, or how close
the system is to lockups and OOM kills.  In particular, when machines work
multiple jobs concurrently, the impact of overcommit in terms of latency
and throughput on the individual job can be enormous.

In order to maximize hardware utilization without sacrificing individual
job health or risk complete machine lockups, this patch implements a way
to quantify resource pressure in the system.

A kernel built with CONFIG_PSI=y creates files in /proc/pressure/ that
expose the percentage of time the system is stalled on CPU, memory, or IO,
respectively.  Stall states are aggregate versions of the per-task delay
accounting delays:

       cpu: some tasks are runnable but not executing on a CPU
       memory: tasks are reclaiming, or waiting for swapin or thrashing cache
       io: tasks are waiting for io completions

These percentages of walltime can be thought of as pressure percentages,
and they give a general sense of system health and productivity loss
incurred by resource overcommit.  They can also indicate when the system
is approaching lockup scenarios and OOMs.

To do this, psi keeps track of the task states associated with each CPU
and samples the time they spend in stall states.  Every 2 seconds, the
samples are averaged across CPUs - weighted by the CPUs' non-idle time to
eliminate artifacts from unused CPUs - and translated into percentages of
walltime.  A running average of those percentages is maintained over 10s,
1m, and 5m periods (similar to the loadaverage).

[hannes@cmpxchg.org: doc fixlet, per Randy]
  Link: http://lkml.kernel.org/r/20180828205625.GA14030@cmpxchg.org
[hannes@cmpxchg.org: code optimization]
  Link: http://lkml.kernel.org/r/20180907175015.GA8479@cmpxchg.org
[hannes@cmpxchg.org: rename psi_clock() to psi_update_work(), per Peter]
  Link: http://lkml.kernel.org/r/20180907145404.GB11088@cmpxchg.org
[hannes@cmpxchg.org: fix build]
  Link: http://lkml.kernel.org/r/20180913014222.GA2370@cmpxchg.org
Link: http://lkml.kernel.org/r/20180828172258.3185-9-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Daniel Drake <drake@endlessm.com>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <jweiner@fb.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Enderborg <peter.enderborg@sony.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-26 16:26:32 -07:00
Vincent Guittot 11d4afd4ff sched/pelt: Fix warning and clean up IRQ PELT config
Create a config for enabling irq load tracking in the scheduler.
irq load tracking is useful only when irq or paravirtual time is
accounted but it's only possible with SMP for now.

Also use __maybe_unused to remove the compilation warning in
update_rq_clock_task() that has been introduced by:

  2e62c4743a ("sched/fair: Remove #ifdefs from scale_rt_capacity()")

Suggested-by: Ingo Molnar <mingo@redhat.com>
Reported-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Reported-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@alien8.de
Cc: dou_liyang@163.com
Fixes: 2e62c4743a ("sched/fair: Remove #ifdefs from scale_rt_capacity()")
Link: http://lkml.kernel.org/r/1537867062-27285-1-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-02 09:45:00 +02:00
Linus Torvalds 1bc276775d Kbuild updates for v4.19 (2nd)
- add build_{menu,n,g,x}config targets for compile-testing Kconfig
 
  - fix and improve recursive dependency detection in Kconfig
 
  - fix parallel building of menuconfig/nconfig
 
  - fix syntax error in clang-version.sh
 
  - suppress distracting log from syncconfig
 
  - remove obsolete "rpm" target
 
  - remove VMLINUX_SYMBOL(_STR) macro entirely
 
  - fix microblaze build with CONFIG_DYNAMIC_FTRACE
 
  - move compiler test for dead code/data elimination to Kconfig
 
  - rename well-known LDFLAGS variable to KBUILD_LDFLAGS
 
  - misc fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJbgYhCAAoJED2LAQed4NsGErAP/jt7gt76+N0PZmADBZqyVR/H
 4k286g3OiT7DIcdvwqE5BRvu+zNOamDujnnXw63/jwu2RjrkLX/JnhzTbC0IZleZ
 KeO4bU4ZH0WFa0Ny9pp0LAnzbXGMnQjDXygcUd5BFoEd5JSLKW2PISEEjRh6b5B7
 swJRdgySFaMrUBRNf13FwH5EvX/D0xZQe/wFhFCOv6L4gJZFMmpGUIepgTjTUmxZ
 wcNN6xxXg+ulLHVcPdPQ9EYssNHN5xNys02+IdIrhhXuNHji/TFm4dGYuU+dDGeE
 Eu4O6Qs7pg0PFGrZ5gLxXDJEp75W+uaTNOqV+jcjq8MRxJuWxyy2biUeelKRT/KH
 0iv4ZQJVOMOhl8fZgLtQaXHyQ++5uwd6kvPPf+XFdkogGAIXK0wKWLoALFEOXwb6
 z1BBnFx09LrKPGt0ZlKX624OEczedv/UAFiSh3Ic2S3PFEpq4oHrEGhTnyKRobPv
 OEcF3RqKjmAdK7PLy4kVpTLhkutkWWhw6Giy9qXUkXYJWonJR7NTQ1mIan2LoGZC
 sGi+qKae/8xgO2Nerx59tZpkiHYTMfYeAo8frzWurOxm3YzEfaxNNGPl+IMW7VKz
 cNPzQZ5tMUy4i4PAhk/gIWibnUTPfjDbWsZSMtIbO0GFcao56EvllwD8/awuy7lO
 QkaAeZHFcF+qgU3muaYK
 =Vsb2
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull more Kbuild updates from Masahiro Yamada:

 - add build_{menu,n,g,x}config targets for compile-testing Kconfig

 - fix and improve recursive dependency detection in Kconfig

 - fix parallel building of menuconfig/nconfig

 - fix syntax error in clang-version.sh

 - suppress distracting log from syncconfig

 - remove obsolete "rpm" target

 - remove VMLINUX_SYMBOL(_STR) macro entirely

 - fix microblaze build with CONFIG_DYNAMIC_FTRACE

 - move compiler test for dead code/data elimination to Kconfig

 - rename well-known LDFLAGS variable to KBUILD_LDFLAGS

 - misc fixes and cleanups

* tag 'kbuild-v4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kbuild: rename LDFLAGS to KBUILD_LDFLAGS
  kbuild: pass LDFLAGS to recordmcount.pl
  kbuild: test dead code/data elimination support in Kconfig
  initramfs: move gen_initramfs_list.sh from scripts/ to usr/
  vmlinux.lds.h: remove stale <linux/export.h> include
  export.h: remove VMLINUX_SYMBOL() and VMLINUX_SYMBOL_STR()
  Coccinelle: remove pci_alloc_consistent semantic to detect in zalloc-simple.cocci
  kbuild: make sorting initramfs contents independent of locale
  kbuild: remove "rpm" target, which is alias of "rpm-pkg"
  kbuild: Fix LOADLIBES rename in Documentation/kbuild/makefiles.txt
  kconfig: suppress "configuration written to .config" for syncconfig
  kconfig: fix "Can't open ..." in parallel build
  kbuild: Add a space after `!` to prevent parsing as file pattern
  scripts: modpost: check memory allocation results
  kconfig: improve the recursive dependency report
  kconfig: report recursive dependency involving 'imply'
  kconfig: error out when seeing recursive dependency
  kconfig: add build-only configurator targets
  scripts/dtc: consolidate include path options in Makefile
2018-08-25 13:40:38 -07:00
Masahiro Yamada e85d1d65cd kbuild: test dead code/data elimination support in Kconfig
This config option should be enabled only when both the compiler and
the linker support necessary flags.  Add proper dependencies to Kconfig.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2018-08-24 08:10:00 +09:00
Adrian Reber 5cb366bb3a init/Kconfig: remove EXPERT from CHECKPOINT_RESTORE
The CHECKPOINT_RESTORE configuration option was introduced in 2012 and
combined with EXPERT.  CHECKPOINT_RESTORE is already enabled in many
distribution kernels and also part of the defconfigs of various
architectures.

To make it easier for distributions to enable CHECKPOINT_RESTORE this
removes EXPERT and moves the configuration option out of the EXPERT block.

Link: http://lkml.kernel.org/r/20180712130733.11510-1-adrian@lisas.de
Signed-off-by: Adrian Reber <adrian@lisas.de>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Acked-by: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22 10:52:51 -07:00
Randy Dunlap 3903bf940b init/Kconfig: fix its typos
Correct typos of "it's" to "its.

Link: http://lkml.kernel.org/r/0ac627b6-5527-55f4-0489-1631aa34fc11@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22 10:52:49 -07:00
Kirill Tkhai 84c07d11aa mm: introduce CONFIG_MEMCG_KMEM as combination of CONFIG_MEMCG && !CONFIG_SLOB
Introduce new config option, which is used to replace repeating
CONFIG_MEMCG && !CONFIG_SLOB pattern.  Next patches add a little more
memcg+kmem related code, so let's keep the defines more clearly.

Link: http://lkml.kernel.org/r/153063053670.1818.15013136946600481138.stgit@localhost.localdomain
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Tested-by: Shakeel Butt <shakeelb@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Li RongQing <lirongqing@baidu.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matthias Kaehlcke <mka@chromium.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Sahitya Tummala <stummala@codeaurora.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-17 16:20:30 -07:00
Linus Torvalds fa1b5d09d0 Consolidation of Kconfig files by Christoph Hellwig.
Move the source statements of arch-independent Kconfig files instead of
 duplicating the includes in every arch/$(SRCARCH)/Kconfig.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJbdFsfAAoJED2LAQed4NsGxHsP/1tmA57OOOj8oGxO2OXhXVbr
 Q0MZqCoV4bqMvK/hgCQdl9f+tp0m+j12x4xDLdVf4OqnTXMbqvPDu3uQVKvaj/k1
 gHhsFA1tFgSbuJ8InltUsrPEQqbceeJsj50xHVAKijqI6LYeRPPSU7aE9obn+OzH
 n2nd5sLKvMI/dqdJvW6i5KPydqTH3r3iA7D+ne/XQj0s0EMXvXUPmDT1+ijTnM4a
 yfm6W5p7L/c3Ugf1Pz5PfnPl4BxBwZMfW5ie/UO8j5C6Rl0iPaOGuuHurocaaJb3
 MefR/7NEAR3G8MhJyL2+70jbbwhjpqR2b5ooz1vpuulPHxjeU45BY60XIBWq1afR
 ewsc12MMCYB695ieYWoHdaWgxD/jhffyRuajfpkXKIZEMgDxS03sMhdULXENVMx1
 M0ZQ01g/NLWt9ti9DY3eTKB3ymOhnBa1sa77nGGUHkITq4DQKwPX1J9FP/HT6RNt
 uOvzeH5kGzc7tqOlZAO0kHbwhQG1uqGcd78IYd4lgf/XfkSgDERTWjnJmnQbwr9m
 3PFuST2u8eyO+8Lh1MK76TXOEkXsHMdFugPmb6SlgtMEPKGVLDPlsj52o/LFtgzl
 eygfMiBFr2+ttkZ6IpNcpmQ4IztmDpz6XoMk3PqDAfUTUSYpCnq1gAEuff/eisCM
 Odva1ZZaeQ7WpxhsP8rr
 =gsQJ
 -----END PGP SIGNATURE-----

Merge tag 'kconfig-v4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kconfig consolidation from Masahiro Yamada:
 "Consolidation of Kconfig files by Christoph Hellwig.

  Move the source statements of arch-independent Kconfig files instead
  of duplicating the includes in every arch/$(SRCARCH)/Kconfig"

* tag 'kconfig-v4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kconfig: add a Memory Management options" menu
  kconfig: move the "Executable file formats" menu to fs/Kconfig.binfmt
  kconfig: use a menu in arch/Kconfig to reduce clutter
  kconfig: include kernel/Kconfig.preempt from init/Kconfig
  Kconfig: consolidate the "Kernel hacking" menu
  kconfig: include common Kconfig files from top-level Kconfig
  kconfig: remove duplicate SWAP symbol defintions
  um: create a proper drivers Kconfig
  um: cleanup Kconfig files
  um: stop abusing KBUILD_KCONFIG
2018-08-15 13:05:12 -07:00