Commit graph

38809 commits

Author SHA1 Message Date
Jakub Kicinski
0e55546b18 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
include/linux/netdevice.h
net/core/dev.c
  6510ea973d ("net: Use this_cpu_inc() to increment net->core_stats")
  794c24e992 ("net-core: rx_otherhost_dropped to core_stats")
https://lore.kernel.org/all/20220428111903.5f4304e0@canb.auug.org.au/

drivers/net/wan/cosa.c
  d48fea8401 ("net: cosa: fix error check return value of register_chrdev()")
  89fbca3307 ("net: wan: remove support for COSA and SRP synchronous serial boards")
https://lore.kernel.org/all/20220428112130.1f689e5e@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-28 13:02:01 -07:00
Linus Torvalds
249aca0d3d Networking fixes for 5.18-rc5, including fixes from bluetooth, bpf
and netfilter.
 
 Current release - new code bugs:
 
  - bridge: switchdev: check br_vlan_group() return value
 
  - use this_cpu_inc() to increment net->core_stats, fix preempt-rt
 
 Previous releases - regressions:
 
  - eth: stmmac: fix write to sgmii_adapter_base
 
 Previous releases - always broken:
 
  - netfilter: nf_conntrack_tcp: re-init for syn packets only,
    resolving issues with TCP fastopen
 
  - tcp: md5: fix incorrect tcp_header_len for incoming connections
 
  - tcp: fix F-RTO may not work correctly when receiving DSACK
 
  - tcp: ensure use of most recently sent skb when filling rate samples
 
  - tcp: fix potential xmit stalls caused by TCP_NOTSENT_LOWAT
 
  - virtio_net: fix wrong buf address calculation when using xdp
 
  - xsk: fix forwarding when combining copy mode with busy poll
 
  - xsk: fix possible crash when multiple sockets are created
 
  - bpf: lwt: fix crash when using bpf_skb_set_tunnel_key() from
    bpf_xmit lwt hook
 
  - sctp: null-check asoc strreset_chunk in sctp_generate_reconf_event
 
  - wireguard: device: check for metadata_dst with skb_valid_dst()
 
  - netfilter: update ip6_route_me_harder to consider L3 domain
 
  - gre: make o_seqno start from 0 in native mode
 
  - gre: switch o_seqno to atomic to prevent races in collect_md mode
 
 Misc:
 
  - add Eric Dumazet to networking maintainers
 
  - dt: dsa: realtek: remove realtek,rtl8367s string
 
  - netfilter: flowtable: Remove the empty file
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmJq2r4ACgkQMUZtbf5S
 IrthIxAAjGEcLr25lkB0IWcjOD5wqOuhaRKeSWXnbm5bPkWIaxVMsssBAR8DS78S
 bsaJ0yTKQqv4vLtlMjtQpVC/azr0NTmsb0y5+6C5d4IObBf2Mv1LPpkiqs0d+phQ
 khPsCh0QGtSJT9VbaMu5+JW+c6Jo0kVmnmOgmhZMo1QKFw/bocdJrxQoZjcve9/X
 /uTDFEn8dPWbQKOm1TmSvQhuEPh1V6ZAf8/cBikN2Yul1R0EYbZO6RKSfrgqaa7T
 aRMMTEwRoTEuRdjF97F7ZgGXhoRxP9rW+bdzoQ0ewRu+KgKKPjCL2eVgeNfTNptj
 FjaUpNrImkYUxQ8+x7sQVPdwFaScVVtjHFfqFl0CsvhddT6nQw2trElccfy3/16K
 0GEKBXKCB3B9h02fhillPFveZzDChy/5NTezARqMYP0eG5SBUHLCCymZnqnoNkwV
 hdcmciZTnJzIxPJcmlp8F7D5etueDOwh03nMcFxRf3eW0IcC+Hl6qtbOFUzr6KhB
 V/TLh7N+Smy3JtsavU9aj4iSQGR+kChCt5zhH9idkuEsUgdYo3apB4q3k3ShzWqM
 SJx4gxp5HxwobLx+uW/HJMJTmwA8fApMbIl0OOPm+qiHfULufCZb6BS3AZGb7jdX
 wrcEPeZxBTCZqHeQc0kuo9/CtTTvbawJYtP5LRiZ5bWfoKn5F9o=
 =XOSt
 -----END PGP SIGNATURE-----

Merge tag 'net-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bluetooth, bpf and netfilter.

  Current release - new code bugs:

   - bridge: switchdev: check br_vlan_group() return value

   - use this_cpu_inc() to increment net->core_stats, fix preempt-rt

  Previous releases - regressions:

   - eth: stmmac: fix write to sgmii_adapter_base

  Previous releases - always broken:

   - netfilter: nf_conntrack_tcp: re-init for syn packets only,
     resolving issues with TCP fastopen

   - tcp: md5: fix incorrect tcp_header_len for incoming connections

   - tcp: fix F-RTO may not work correctly when receiving DSACK

   - tcp: ensure use of most recently sent skb when filling rate samples

   - tcp: fix potential xmit stalls caused by TCP_NOTSENT_LOWAT

   - virtio_net: fix wrong buf address calculation when using xdp

   - xsk: fix forwarding when combining copy mode with busy poll

   - xsk: fix possible crash when multiple sockets are created

   - bpf: lwt: fix crash when using bpf_skb_set_tunnel_key() from
     bpf_xmit lwt hook

   - sctp: null-check asoc strreset_chunk in sctp_generate_reconf_event

   - wireguard: device: check for metadata_dst with skb_valid_dst()

   - netfilter: update ip6_route_me_harder to consider L3 domain

   - gre: make o_seqno start from 0 in native mode

   - gre: switch o_seqno to atomic to prevent races in collect_md mode

  Misc:

   - add Eric Dumazet to networking maintainers

   - dt: dsa: realtek: remove realtek,rtl8367s string

   - netfilter: flowtable: Remove the empty file"

* tag 'net-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (65 commits)
  tcp: fix F-RTO may not work correctly when receiving DSACK
  Revert "ibmvnic: Add ethtool private flag for driver-defined queue limits"
  net: enetc: allow tc-etf offload even with NETIF_F_CSUM_MASK
  ixgbe: ensure IPsec VF<->PF compatibility
  MAINTAINERS: Update BNXT entry with firmware files
  netfilter: nft_socket: only do sk lookups when indev is available
  net: fec: add missing of_node_put() in fec_enet_init_stop_mode()
  bnx2x: fix napi API usage sequence
  tls: Skip tls_append_frag on zero copy size
  Add Eric Dumazet to networking maintainers
  netfilter: conntrack: fix udp offload timeout sysctl
  netfilter: nf_conntrack_tcp: re-init for syn packets only
  net: dsa: lantiq_gswip: Don't set GSWIP_MII_CFG_RMII_CLK
  net: Use this_cpu_inc() to increment net->core_stats
  Bluetooth: hci_sync: Cleanup hci_conn if it cannot be aborted
  Bluetooth: hci_event: Fix creating hci_conn object on error status
  Bluetooth: hci_event: Fix checking for invalid handle on error status
  ice: fix use-after-free when deinitializing mailbox snapshot
  ice: wait 5 s for EMP reset after firmware flash
  ice: Protect vf_state check by cfg_lock in ice_vc_process_vf_msg()
  ...
2022-04-28 12:34:50 -07:00
Jakub Kicinski
50c6afabfd Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2022-04-27

We've added 85 non-merge commits during the last 18 day(s) which contain
a total of 163 files changed, 4499 insertions(+), 1521 deletions(-).

The main changes are:

1) Teach libbpf to enhance BPF verifier log with human-readable and relevant
   information about failed CO-RE relocations, from Andrii Nakryiko.

2) Add typed pointer support in BPF maps and enable it for unreferenced pointers
   (via probe read) and referenced ones that can be passed to in-kernel helpers,
   from Kumar Kartikeya Dwivedi.

3) Improve xsk to break NAPI loop when rx queue gets full to allow for forward
   progress to consume descriptors, from Maciej Fijalkowski & Björn Töpel.

4) Fix a small RCU read-side race in BPF_PROG_RUN routines which dereferenced
   the effective prog array before the rcu_read_lock, from Stanislav Fomichev.

5) Implement BPF atomic operations for RV64 JIT, and add libbpf parsing logic
   for USDT arguments under riscv{32,64}, from Pu Lehui.

6) Implement libbpf parsing of USDT arguments under aarch64, from Alan Maguire.

7) Enable bpftool build for musl and remove nftw with FTW_ACTIONRETVAL usage
   so it can be shipped under Alpine which is musl-based, from Dominique Martinet.

8) Clean up {sk,task,inode} local storage trace RCU handling as they do not
   need to use call_rcu_tasks_trace() barrier, from KP Singh.

9) Improve libbpf API documentation and fix error return handling of various
   API functions, from Grant Seltzer.

10) Enlarge offset check for bpf_skb_{load,store}_bytes() helpers given data
    length of frags + frag_list may surpass old offset limit, from Liu Jian.

11) Various improvements to prog_tests in area of logging, test execution
    and by-name subtest selection, from Mykola Lysenko.

12) Simplify map_btf_id generation for all map types by moving this process
    to build time with help of resolve_btfids infra, from Menglong Dong.

13) Fix a libbpf bug in probing when falling back to legacy bpf_probe_read*()
    helpers; the probing caused always to use old helpers, from Runqing Yang.

14) Add support for ARCompact and ARCv2 platforms for libbpf's PT_REGS
    tracing macros, from Vladimir Isaev.

15) Cleanup BPF selftests to remove old & unneeded rlimit code given kernel
    switched to memcg-based memory accouting a while ago, from Yafang Shao.

16) Refactor of BPF sysctl handlers to move them to BPF core, from Yan Zhu.

17) Fix BPF selftests in two occasions to work around regressions caused by latest
    LLVM to unblock CI until their fixes are worked out, from Yonghong Song.

18) Misc cleanups all over the place, from various others.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (85 commits)
  selftests/bpf: Add libbpf's log fixup logic selftests
  libbpf: Fix up verifier log for unguarded failed CO-RE relos
  libbpf: Simplify bpf_core_parse_spec() signature
  libbpf: Refactor CO-RE relo human description formatting routine
  libbpf: Record subprog-resolved CO-RE relocations unconditionally
  selftests/bpf: Add CO-RE relos and SEC("?...") to linked_funcs selftests
  libbpf: Avoid joining .BTF.ext data with BPF programs by section name
  libbpf: Fix logic for finding matching program for CO-RE relocation
  libbpf: Drop unhelpful "program too large" guess
  libbpf: Fix anonymous type check in CO-RE logic
  bpf: Compute map_btf_id during build time
  selftests/bpf: Add test for strict BTF type check
  selftests/bpf: Add verifier tests for kptr
  selftests/bpf: Add C tests for kptr
  libbpf: Add kptr type tag macros to bpf_helpers.h
  bpf: Make BTF type match stricter for release arguments
  bpf: Teach verifier about kptr_get kfunc helpers
  bpf: Wire up freeing of referenced kptr
  bpf: Populate pairs of btf_id and destructor kfunc in btf
  bpf: Adapt copy_map_value for multiple offset case
  ...
====================

Link: https://lore.kernel.org/r/20220427224758.20976-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-27 17:09:32 -07:00
Jakub Kicinski
347cb5deae Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2022-04-27

We've added 5 non-merge commits during the last 20 day(s) which contain
a total of 6 files changed, 34 insertions(+), 12 deletions(-).

The main changes are:

1) Fix xsk sockets when rx and tx are separately bound to the same umem, also
   fix xsk copy mode combined with busy poll, from Maciej Fijalkowski.

2) Fix BPF tunnel/collect_md helpers with bpf_xmit lwt hook usage which triggered
   a crash due to invalid metadata_dst access, from Eyal Birger.

3) Fix release of page pool in XDP live packet mode, from Toke Høiland-Jørgensen.

4) Fix potential NULL pointer dereference in kretprobes, from Adam Zabrocki.

   (Masami & Steven preferred this small fix to be routed via bpf tree given it's
    follow-up fix to Masami's rethook work that went via bpf earlier, too.)

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  xsk: Fix possible crash when multiple sockets are created
  kprobes: Fix KRETPROBES when CONFIG_KRETPROBE_ON_RETHOOK is set
  bpf, lwt: Fix crash when using bpf_skb_set_tunnel_key() from bpf_xmit lwt hook
  bpf: Fix release of page_pool in BPF_PROG_RUN in test runner
  xsk: Fix l2fwd for copy mode + busy poll combo
====================

Link: https://lore.kernel.org/r/20220427212748.9576-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-27 15:18:40 -07:00
Menglong Dong
c317ab71fa bpf: Compute map_btf_id during build time
For now, the field 'map_btf_id' in 'struct bpf_map_ops' for all map
types are computed during vmlinux-btf init:

  btf_parse_vmlinux() -> btf_vmlinux_map_ids_init()

It will lookup the btf_type according to the 'map_btf_name' field in
'struct bpf_map_ops'. This process can be done during build time,
thanks to Jiri's resolve_btfids.

selftest of map_ptr has passed:

  $96 map_ptr:OK
  Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-04-26 11:35:21 -07:00
Adam Zabrocki
1d661ed54d kprobes: Fix KRETPROBES when CONFIG_KRETPROBE_ON_RETHOOK is set
The recent kernel change in 73f9b911fa ("kprobes: Use rethook for kretprobe
if possible"), introduced a potential NULL pointer dereference bug in the
KRETPROBE mechanism. The official Kprobes documentation defines that "Any or
all handlers can be NULL". Unfortunately, there is a missing return handler
verification to fulfill these requirements and can result in a NULL pointer
dereference bug.

This patch adds such verification in kretprobe_rethook_handler() function.

Fixes: 73f9b911fa ("kprobes: Use rethook for kretprobe if possible")
Signed-off-by: Adam Zabrocki <pi3@pi3.com.pl>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.ibm.com>
Cc: Anil S. Keshavamurthy <anil.s.keshavamurthy@intel.com>
Link: https://lore.kernel.org/bpf/20220422164027.GA7862@pi3.com.pl
2022-04-26 16:09:36 +02:00
Kumar Kartikeya Dwivedi
2ab3b3808e bpf: Make BTF type match stricter for release arguments
The current of behavior of btf_struct_ids_match for release arguments is
that when type match fails, it retries with first member type again
(recursively). Since the offset is already 0, this is akin to just
casting the pointer in normal C, since if type matches it was just
embedded inside parent sturct as an object. However, we want to reject
cases for release function type matching, be it kfunc or BPF helpers.

An example is the following:

struct foo {
	struct bar b;
};

struct foo *v = acq_foo();
rel_bar(&v->b); // btf_struct_ids_match fails btf_types_are_same, then
		// retries with first member type and succeeds, while
		// it should fail.

Hence, don't walk the struct and only rely on btf_types_are_same for
strict mode. All users of strict mode must be dealing with zero offset
anyway, since otherwise they would want the struct to be walked.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220424214901.2743946-10-memxor@gmail.com
2022-04-25 20:26:44 -07:00
Kumar Kartikeya Dwivedi
a1ef195996 bpf: Teach verifier about kptr_get kfunc helpers
We introduce a new style of kfunc helpers, namely *_kptr_get, where they
take pointer to the map value which points to a referenced kernel
pointer contained in the map. Since this is referenced, only
bpf_kptr_xchg from BPF side and xchg from kernel side is allowed to
change the current value, and each pointer that resides in that location
would be referenced, and RCU protected (this must be kept in mind while
adding kernel types embeddable as reference kptr in BPF maps).

This means that if do the load of the pointer value in an RCU read
section, and find a live pointer, then as long as we hold RCU read lock,
it won't be freed by a parallel xchg + release operation. This allows us
to implement a safe refcount increment scheme. Hence, enforce that first
argument of all such kfunc is a proper PTR_TO_MAP_VALUE pointing at the
right offset to referenced pointer.

For the rest of the arguments, they are subjected to typical kfunc
argument checks, hence allowing some flexibility in passing more intent
into how the reference should be taken.

For instance, in case of struct nf_conn, it is not freed until RCU grace
period ends, but can still be reused for another tuple once refcount has
dropped to zero. Hence, a bpf_ct_kptr_get helper not only needs to call
refcount_inc_not_zero, but also do a tuple match after incrementing the
reference, and when it fails to match it, put the reference again and
return NULL.

This can be implemented easily if we allow passing additional parameters
to the bpf_ct_kptr_get kfunc, like a struct bpf_sock_tuple * and a
tuple__sz pair.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220424214901.2743946-9-memxor@gmail.com
2022-04-25 20:26:44 -07:00
Kumar Kartikeya Dwivedi
14a324f6a6 bpf: Wire up freeing of referenced kptr
A destructor kfunc can be defined as void func(type *), where type may
be void or any other pointer type as per convenience.

In this patch, we ensure that the type is sane and capture the function
pointer into off_desc of ptr_off_tab for the specific pointer offset,
with the invariant that the dtor pointer is always set when 'kptr_ref'
tag is applied to the pointer's pointee type, which is indicated by the
flag BPF_MAP_VALUE_OFF_F_REF.

Note that only BTF IDs whose destructor kfunc is registered, thus become
the allowed BTF IDs for embedding as referenced kptr. Hence it serves
the purpose of finding dtor kfunc BTF ID, as well acting as a check
against the whitelist of allowed BTF IDs for this purpose.

Finally, wire up the actual freeing of the referenced pointer if any at
all available offsets, so that no references are leaked after the BPF
map goes away and the BPF program previously moved the ownership a
referenced pointer into it.

The behavior is similar to BPF timers, where bpf_map_{update,delete}_elem
will free any existing referenced kptr. The same case is with LRU map's
bpf_lru_push_free/htab_lru_push_free functions, which are extended to
reset unreferenced and free referenced kptr.

Note that unlike BPF timers, kptr is not reset or freed when map uref
drops to zero.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220424214901.2743946-8-memxor@gmail.com
2022-04-25 20:26:44 -07:00
Kumar Kartikeya Dwivedi
5ce937d613 bpf: Populate pairs of btf_id and destructor kfunc in btf
To support storing referenced PTR_TO_BTF_ID in maps, we require
associating a specific BTF ID with a 'destructor' kfunc. This is because
we need to release a live referenced pointer at a certain offset in map
value from the map destruction path, otherwise we end up leaking
resources.

Hence, introduce support for passing an array of btf_id, kfunc_btf_id
pairs that denote a BTF ID and its associated release function. Then,
add an accessor 'btf_find_dtor_kfunc' which can be used to look up the
destructor kfunc of a certain BTF ID. If found, we can use it to free
the object from the map free path.

The registration of these pairs also serve as a whitelist of structures
which are allowed as referenced PTR_TO_BTF_ID in a BPF map, because
without finding the destructor kfunc, we will bail and return an error.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220424214901.2743946-7-memxor@gmail.com
2022-04-25 20:26:44 -07:00
Kumar Kartikeya Dwivedi
4d7d7f69f4 bpf: Adapt copy_map_value for multiple offset case
Since now there might be at most 10 offsets that need handling in
copy_map_value, the manual shuffling and special case is no longer going
to work. Hence, let's generalise the copy_map_value function by using
a sorted array of offsets to skip regions that must be avoided while
copying into and out of a map value.

When the map is created, we populate the offset array in struct map,
Then, copy_map_value uses this sorted offset array is used to memcpy
while skipping timer, spin lock, and kptr. The array is allocated as
in most cases none of these special fields would be present in map
value, hence we can save on space for the common case by not embedding
the entire object inside bpf_map struct.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220424214901.2743946-6-memxor@gmail.com
2022-04-25 20:26:44 -07:00
Kumar Kartikeya Dwivedi
6efe152d40 bpf: Prevent escaping of kptr loaded from maps
While we can guarantee that even for unreferenced kptr, the object
pointer points to being freed etc. can be handled by the verifier's
exception handling (normal load patching to PROBE_MEM loads), we still
cannot allow the user to pass these pointers to BPF helpers and kfunc,
because the same exception handling won't be done for accesses inside
the kernel. The same is true if a referenced pointer is loaded using
normal load instruction. Since the reference is not guaranteed to be
held while the pointer is used, it must be marked as untrusted.

Hence introduce a new type flag, PTR_UNTRUSTED, which is used to mark
all registers loading unreferenced and referenced kptr from BPF maps,
and ensure they can never escape the BPF program and into the kernel by
way of calling stable/unstable helpers.

In check_ptr_to_btf_access, the !type_may_be_null check to reject type
flags is still correct, as apart from PTR_MAYBE_NULL, only MEM_USER,
MEM_PERCPU, and PTR_UNTRUSTED may be set for PTR_TO_BTF_ID. The first
two are checked inside the function and rejected using a proper error
message, but we still want to allow dereference of untrusted case.

Also, we make sure to inherit PTR_UNTRUSTED when chain of pointers are
walked, so that this flag is never dropped once it has been set on a
PTR_TO_BTF_ID (i.e. trusted to untrusted transition can only be in one
direction).

In convert_ctx_accesses, extend the switch case to consider untrusted
PTR_TO_BTF_ID in addition to normal PTR_TO_BTF_ID for PROBE_MEM
conversion for BPF_LDX.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220424214901.2743946-5-memxor@gmail.com
2022-04-25 20:26:44 -07:00
Kumar Kartikeya Dwivedi
c0a5a21c25 bpf: Allow storing referenced kptr in map
Extending the code in previous commits, introduce referenced kptr
support, which needs to be tagged using 'kptr_ref' tag instead. Unlike
unreferenced kptr, referenced kptr have a lot more restrictions. In
addition to the type matching, only a newly introduced bpf_kptr_xchg
helper is allowed to modify the map value at that offset. This transfers
the referenced pointer being stored into the map, releasing the
references state for the program, and returning the old value and
creating new reference state for the returned pointer.

Similar to unreferenced pointer case, return value for this case will
also be PTR_TO_BTF_ID_OR_NULL. The reference for the returned pointer
must either be eventually released by calling the corresponding release
function, otherwise it must be transferred into another map.

It is also allowed to call bpf_kptr_xchg with a NULL pointer, to clear
the value, and obtain the old value if any.

BPF_LDX, BPF_STX, and BPF_ST cannot access referenced kptr. A future
commit will permit using BPF_LDX for such pointers, but attempt at
making it safe, since the lifetime of object won't be guaranteed.

There are valid reasons to enforce the restriction of permitting only
bpf_kptr_xchg to operate on referenced kptr. The pointer value must be
consistent in face of concurrent modification, and any prior values
contained in the map must also be released before a new one is moved
into the map. To ensure proper transfer of this ownership, bpf_kptr_xchg
returns the old value, which the verifier would require the user to
either free or move into another map, and releases the reference held
for the pointer being moved in.

In the future, direct BPF_XCHG instruction may also be permitted to work
like bpf_kptr_xchg helper.

Note that process_kptr_func doesn't have to call
check_helper_mem_access, since we already disallow rdonly/wronly flags
for map, which is what check_map_access_type checks, and we already
ensure the PTR_TO_MAP_VALUE refers to kptr by obtaining its off_desc,
so check_map_access is also not required.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220424214901.2743946-4-memxor@gmail.com
2022-04-25 20:26:05 -07:00
Kumar Kartikeya Dwivedi
8f14852e89 bpf: Tag argument to be released in bpf_func_proto
Add a new type flag for bpf_arg_type that when set tells verifier that
for a release function, that argument's register will be the one for
which meta.ref_obj_id will be set, and which will then be released
using release_reference. To capture the regno, introduce a new field
release_regno in bpf_call_arg_meta.

This would be required in the next patch, where we may either pass NULL
or a refcounted pointer as an argument to the release function
bpf_kptr_xchg. Just releasing only when meta.ref_obj_id is set is not
enough, as there is a case where the type of argument needed matches,
but the ref_obj_id is set to 0. Hence, we must enforce that whenever
meta.ref_obj_id is zero, the register that is to be released can only
be NULL for a release function.

Since we now indicate whether an argument is to be released in
bpf_func_proto itself, is_release_function helper has lost its utitlity,
hence refactor code to work without it, and just rely on
meta.release_regno to know when to release state for a ref_obj_id.
Still, the restriction of one release argument and only one ref_obj_id
passed to BPF helper or kfunc remains. This may be lifted in the future.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220424214901.2743946-3-memxor@gmail.com
2022-04-25 17:31:35 -07:00
Kumar Kartikeya Dwivedi
61df10c779 bpf: Allow storing unreferenced kptr in map
This commit introduces a new pointer type 'kptr' which can be embedded
in a map value to hold a PTR_TO_BTF_ID stored by a BPF program during
its invocation. When storing such a kptr, BPF program's PTR_TO_BTF_ID
register must have the same type as in the map value's BTF, and loading
a kptr marks the destination register as PTR_TO_BTF_ID with the correct
kernel BTF and BTF ID.

Such kptr are unreferenced, i.e. by the time another invocation of the
BPF program loads this pointer, the object which the pointer points to
may not longer exist. Since PTR_TO_BTF_ID loads (using BPF_LDX) are
patched to PROBE_MEM loads by the verifier, it would safe to allow user
to still access such invalid pointer, but passing such pointers into
BPF helpers and kfuncs should not be permitted. A future patch in this
series will close this gap.

The flexibility offered by allowing programs to dereference such invalid
pointers while being safe at runtime frees the verifier from doing
complex lifetime tracking. As long as the user may ensure that the
object remains valid, it can ensure data read by it from the kernel
object is valid.

The user indicates that a certain pointer must be treated as kptr
capable of accepting stores of PTR_TO_BTF_ID of a certain type, by using
a BTF type tag 'kptr' on the pointed to type of the pointer. Then, this
information is recorded in the object BTF which will be passed into the
kernel by way of map's BTF information. The name and kind from the map
value BTF is used to look up the in-kernel type, and the actual BTF and
BTF ID is recorded in the map struct in a new kptr_off_tab member. For
now, only storing pointers to structs is permitted.

An example of this specification is shown below:

	#define __kptr __attribute__((btf_type_tag("kptr")))

	struct map_value {
		...
		struct task_struct __kptr *task;
		...
	};

Then, in a BPF program, user may store PTR_TO_BTF_ID with the type
task_struct into the map, and then load it later.

Note that the destination register is marked PTR_TO_BTF_ID_OR_NULL, as
the verifier cannot know whether the value is NULL or not statically, it
must treat all potential loads at that map value offset as loading a
possibly NULL pointer.

Only BPF_LDX, BPF_STX, and BPF_ST (with insn->imm = 0 to denote NULL)
are allowed instructions that can access such a pointer. On BPF_LDX, the
destination register is updated to be a PTR_TO_BTF_ID, and on BPF_STX,
it is checked whether the source register type is a PTR_TO_BTF_ID with
same BTF type as specified in the map BTF. The access size must always
be BPF_DW.

For the map in map support, the kptr_off_tab for outer map is copied
from the inner map's kptr_off_tab. It was chosen to do a deep copy
instead of introducing a refcount to kptr_off_tab, because the copy only
needs to be done when paramterizing using inner_map_fd in the map in map
case, hence would be unnecessary for all other users.

It is not permitted to use MAP_FREEZE command and mmap for BPF map
having kptrs, similar to the bpf_timer case. A kptr also requires that
BPF program has both read and write access to the map (hence both
BPF_F_RDONLY_PROG and BPF_F_WRONLY_PROG are disallowed).

Note that check_map_access must be called from both
check_helper_mem_access and for the BPF instructions, hence the kptr
check must distinguish between ACCESS_DIRECT and ACCESS_HELPER, and
reject ACCESS_HELPER cases. We rename stack_access_src to bpf_access_src
and reuse it for this purpose.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220424214901.2743946-2-memxor@gmail.com
2022-04-25 17:31:35 -07:00
Stanislav Fomichev
d9d31cf887 bpf: Use bpf_prog_run_array_cg_flags everywhere
Rename bpf_prog_run_array_cg_flags to bpf_prog_run_array_cg and
use it everywhere. check_return_code already enforces sane
return ranges for all cgroup types. (only egress and bind hooks have
uncanonical return ranges, the rest is using [0, 1])

No functional changes.

v2:
- 'func_ret & 1' under explicit test (Andrii & Martin)

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220425220448.3669032-1-sdf@google.com
2022-04-25 17:03:57 -07:00
Linus Torvalds
42740a2ff5 - Fix a corner case when calculating sched runqueue variables
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmJlHhcACgkQEsHwGGHe
 VUrPyQ/9FE7zLj8euC4HJ4HPJwf7vkwaeRHz1T2gS+izChBI+QSo5Ipe5zFOKz55
 vYSfaYF0MIvVJtKSHMbnQf6f/2i+y5j0ozMjKEkHRZdYP26okPoj+M2effgbceiJ
 pOIZUsdr8SdBQv313icuUfsXIGfMv/xIw20OtIhVpOFQPB4ZbLASn6AhusZL7U6Z
 0BIcfGmmOwV6p4petOJVUXRcwkgfT812UOBLV71DEz9jzP8dXYGVvPV8ZnSYoVQW
 tm6rcmnpzsOqb3xnp7hqFHevyoIzBT31KVo4xnB80CtCoWB/tbEIPIbNjUPaREp0
 ezE8yXv6euob92+Uh5DH/+8oWuzlctKv1Pc98rFnrGGfW4ocDsr5ibsi9472Mkec
 s+waTwemZMGN3bQHH5QvjWxPGdGuPsqrNvgHbZRFGYGJcMoC+2F9p+vKOXK00fMF
 9ivhhuFqH8OVAFu3WUvvD8zO18tfnST7fQflQJNxZ/TqPumNc0+zLrpKDp+7ZE+r
 qgdvxvXO3ZRnPttiEP1/J+uKxQGNMuDEU8NcfdA7nOzEv9yPyKLLcwo2qu3IYgP0
 XuM3Gqt5/Cf38b+1ddR1LWai3KjxVTn7HV4G9YPdvYP296YcZlFjGOtzfNOfw905
 djGEFTFyGQuS7BEHKhD3OoDbegT4FvB+69k2ddy4Dut99WkDk84=
 =S7Ux
 -----END PGP SIGNATURE-----

Merge tag 'sched_urgent_for_v5.18_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fix from Borislav Petkov:

 - Fix a corner case when calculating sched runqueue variables

That fix also removes a check for a zero divisor in the code, without
mentioning it.  Vincent clarified that it's ok after I whined about it:

  https://lore.kernel.org/all/CAKfTPtD2QEyZ6ADd5WrwETMOX0XOwJGnVddt7VHgfURdqgOS-Q@mail.gmail.com/

* tag 'sched_urgent_for_v5.18_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/pelt: Fix attach_entity_load_avg() corner case
2022-04-24 13:28:06 -07:00
Linus Torvalds
f48ffef19d - Add Sapphire Rapids CPU support
- Fix a perf vmalloc-ed buffer mapping error (PERF_USE_VMALLOC in use)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmJlIAIACgkQEsHwGGHe
 VUrrqRAAjXX/9J6eQNAHNLHwNhAJYptDq/O2s9rkzOfittWzflfEKy/QeQqWT0Gp
 fIqo0tO+QAcj9h6PFYHL0tqAPSkTzCBAoOEbjatkBKYj1BIXETQbfkVvG/lfP5Hz
 sbtDJE2lk2mG8FHnTbQR4FJZHJR1fWsdxdDyJoFxQ/Ww8E3gB9BW8qgJAJZkqttv
 L3D8bFYd9LnTjrs5+lq7ZxyqIMNF14kfd07uxjpJW13TpuXIIQ+enz0bTGllJv/y
 uHIupgd2RHgAe3HFAkU9fWBs8qNJpqWIOZKiNNETIxOxcS9zUt+Th5hJeMonCLg7
 WAoS/Y1I5mu0qf2mxm6gGPbUJGHc/fsWO0+StA54a/MnG/MwGafYtiZv/7qYlYCE
 ia0UEuKZjCqYbNOBgrDnP8iWHFIaAtjAi4zBTRzTaIIv1+JthKTkRJ60NNArmQ+f
 vZyv0YvLLujJLBNShfSIWy77/6qap8I+nvvvbfUy2ylhm7eu3AgaTkq3kJS+1pnc
 NEOPhG1qVYITpu1vSkC8V5mpumgcAomnLxgZf8O/bfe+AQlBUyNDMgBZVI9sesyv
 5Wuh5O8obHKnvr6FJ67bUv9fOg62Qs2ywcBtQdmd+l/DmhyqGqPVfBKaeSLLoXcU
 lqP9Bp6iLq+WgSCqSUq8CPjoRYKduzzes6AZVdcSNLMFZ+P5+x8=
 =uiO0
 -----END PGP SIGNATURE-----

Merge tag 'perf_urgent_for_v5.18_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Borislav Petkov:

 - Add Sapphire Rapids CPU support

 - Fix a perf vmalloc-ed buffer mapping error (PERF_USE_VMALLOC in use)

* tag 'perf_urgent_for_v5.18_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/cstate: Add SAPPHIRERAPIDS_X CPU support
  perf/core: Fix perf_mmap fail when CONFIG_PERF_USE_VMALLOC enabled
2022-04-24 12:01:16 -07:00
Andrii Nakryiko
df86ca0d2f bpf: Allow attach TRACING programs through LINK_CREATE command
Allow attaching BTF-aware TRACING programs, previously attachable only
through BPF_RAW_TRACEPOINT_OPEN command, through LINK_CREATE command:

  - BTF-aware raw tracepoints (tp_btf in libbpf lingo);
  - fentry/fexit/fmod_ret programs;
  - BPF LSM programs.

This change converges all bpf_link-based attachments under LINK_CREATE
command allowing to further extend the API with features like BPF cookie
under "multiplexed" link_create section of bpf_attr.

Non-BTF-aware raw tracepoints are left under BPF_RAW_TRACEPOINT_OPEN,
but there is nothing preventing opening them up to LINK_CREATE as well.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Kuifeng Lee <kuifeng@fb.com>
Link: https://lore.kernel.org/bpf/20220421033945.3602803-2-andrii@kernel.org
2022-04-23 00:36:55 +02:00
Paolo Abeni
f70925bf99 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/ethernet/microchip/lan966x/lan966x_main.c
  d08ed85256 ("net: lan966x: Make sure to release ptp interrupt")
  c834963932 ("net: lan966x: Add FDMA functionality")

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-22 09:56:00 +02:00
Aleksandr Nogikh
ecc04463d1 kcov: don't generate a warning on vm_insert_page()'s failure
vm_insert_page()'s failure is not an unexpected condition, so don't do
WARN_ONCE() in such a case.

Instead, print a kernel message and just return an error code.

This flaw has been reported under an OOM condition by sysbot [1].

The message is mainly for the benefit of the test log, in this case the
fuzzer's log so that humans inspecting the log can figure out what was
going on.  KCOV is a testing tool, so I think being a little more chatty
when KCOV unexpectedly is about to fail will save someone debugging
time.

We don't want the WARN, because it's not a kernel bug that syzbot should
report, and failure can happen if the fuzzer tries hard enough (as
above).

Link: https://lkml.kernel.org/r/Ylkr2xrVbhQYwNLf@elver.google.com [1]
Link: https://lkml.kernel.org/r/20220401182512.249282-1-nogikh@google.com
Fixes: b3d7fe86fb ("kcov: properly handle subsequent mmap calls"),
Signed-off-by: Aleksandr Nogikh <nogikh@google.com>
Acked-by: Marco Elver <elver@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Taras Madan <tarasmadan@google.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-21 20:01:10 -07:00
Kumar Kartikeya Dwivedi
e9147b4422 bpf: Move check_ptr_off_reg before check_map_access
Some functions in next patch want to use this function, and those
functions will be called by check_map_access, hence move it before
check_map_access.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://lore.kernel.org/bpf/20220415160354.1050687-3-memxor@gmail.com
2022-04-21 16:31:10 +02:00
Kumar Kartikeya Dwivedi
42ba130807 bpf: Make btf_find_field more generic
Next commit introduces field type 'kptr' whose kind will not be struct,
but pointer, and it will not be limited to one offset, but multiple
ones. Make existing btf_find_struct_field and btf_find_datasec_var
functions amenable to use for finding kptrs in map value, by moving
spin_lock and timer specific checks into their own function.

The alignment, and name are checked before the function is called, so it
is the last point where we can skip field or return an error before the
next loop iteration happens. Size of the field and type is meant to be
checked inside the function.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220415160354.1050687-2-memxor@gmail.com
2022-04-21 16:31:10 +02:00
KP Singh
dcf456c9a0 bpf: Fix usage of trace RCU in local storage.
bpf_{sk,task,inode}_storage_free() do not need to use
call_rcu_tasks_trace as no BPF program should be accessing the owner
as it's being destroyed. The only other reader at this point is
bpf_local_storage_map_free() which uses normal RCU.

The only path that needs trace RCU are:

* bpf_local_storage_{delete,update} helpers
* map_{delete,update}_elem() syscalls

Fixes: 0fe4b381a5 ("bpf: Allow bpf_local_storage to be used by sleepable programs")
Signed-off-by: KP Singh <kpsingh@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220418155158.2865678-1-kpsingh@kernel.org
2022-04-19 17:55:45 -07:00
Kumar Kartikeya Dwivedi
eb596b0905 bpf: Ensure type tags precede modifiers in BTF
It is guaranteed that for modifiers, clang always places type tags
before other modifiers, and then the base type. We would like to rely on
this guarantee inside the kernel to make it simple to parse type tags
from BTF.

However, a user would be allowed to construct a BTF without such
guarantees. Hence, add a pass to check that in modifier chains, type
tags only occur at the head of the chain, and then don't occur later in
the chain.

If we see a type tag, we can have one or more type tags preceding other
modifiers that then never have another type tag. If we see other
modifiers, all modifiers following them should never be a type tag.

Instead of having to walk chains we verified previously, we can remember
the last good modifier type ID which headed a good chain. At that point,
we must have verified all other chains headed by type IDs less than it.
This makes the verification process less costly, and it becomes a simple
O(n) pass.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220419164608.1990559-2-memxor@gmail.com
2022-04-19 14:02:49 -07:00
Zhipeng Xie
60490e7966 perf/core: Fix perf_mmap fail when CONFIG_PERF_USE_VMALLOC enabled
This problem can be reproduced with CONFIG_PERF_USE_VMALLOC enabled on
both x86_64 and aarch64 arch when using sysdig -B(using ebpf)[1].
sysdig -B works fine after rebuilding the kernel with
CONFIG_PERF_USE_VMALLOC disabled.

I tracked it down to the if condition event->rb->nr_pages != nr_pages
in perf_mmap is true when CONFIG_PERF_USE_VMALLOC is enabled where
event->rb->nr_pages = 1 and nr_pages = 2048 resulting perf_mmap to
return -EINVAL. This is because when CONFIG_PERF_USE_VMALLOC is
enabled, rb->nr_pages is always equal to 1.

Arch with CONFIG_PERF_USE_VMALLOC enabled by default:
	arc/arm/csky/mips/sh/sparc/xtensa

Arch with CONFIG_PERF_USE_VMALLOC disabled by default:
	x86_64/aarch64/...

Fix this problem by using data_page_nr()

[1] https://github.com/draios/sysdig

Fixes: 906010b213 ("perf_event: Provide vmalloc() based mmap() backing")
Signed-off-by: Zhipeng Xie <xiezhipeng1@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220209145417.6495-1-xiezhipeng1@huawei.com
2022-04-19 21:15:42 +02:00
kuyo chang
40f5aa4c5e sched/pelt: Fix attach_entity_load_avg() corner case
The warning in cfs_rq_is_decayed() triggered:

    SCHED_WARN_ON(cfs_rq->avg.load_avg ||
		  cfs_rq->avg.util_avg ||
		  cfs_rq->avg.runnable_avg)

There exists a corner case in attach_entity_load_avg() which will
cause load_sum to be zero while load_avg will not be.

Consider se_weight is 88761 as per the sched_prio_to_weight[] table.
Further assume the get_pelt_divider() is 47742, this gives:
se->avg.load_avg is 1.

However, calculating load_sum:

  se->avg.load_sum = div_u64(se->avg.load_avg * se->avg.load_sum, se_weight(se));
  se->avg.load_sum = 1*47742/88761 = 0.

Then enqueue_load_avg() adds this to the cfs_rq totals:

  cfs_rq->avg.load_avg += se->avg.load_avg;
  cfs_rq->avg.load_sum += se_weight(se) * se->avg.load_sum;

Resulting in load_avg being 1 with load_sum is 0, which will trigger
the WARN.

Fixes: f207934fb7 ("sched/fair: Align PELT windows between cfs_rq and its se")
Signed-off-by: kuyo chang <kuyo.chang@mediatek.com>
[peterz: massage changelog]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Link: https://lkml.kernel.org/r/20220414090229.342-1-kuyo.chang@mediatek.com
2022-04-19 21:15:41 +02:00
Stanislav Fomichev
055eb95533 bpf: Move rcu lock management out of BPF_PROG_RUN routines
Commit 7d08c2c911 ("bpf: Refactor BPF_PROG_RUN_ARRAY family of macros
into functions") switched a bunch of BPF_PROG_RUN macros to inline
routines. This changed the semantic a bit. Due to arguments expansion
of macros, it used to be:

	rcu_read_lock();
	array = rcu_dereference(cgrp->bpf.effective[atype]);
	...

Now, with with inline routines, we have:
	array_rcu = rcu_dereference(cgrp->bpf.effective[atype]);
	/* array_rcu can be kfree'd here */
	rcu_read_lock();
	array = rcu_dereference(array_rcu);

I'm assuming in practice rcu subsystem isn't fast enough to trigger
this but let's use rcu API properly.

Also, rename to lower caps to not confuse with macros. Additionally,
drop and expand BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY.

See [1] for more context.

  [1] https://lore.kernel.org/bpf/CAKH8qBs60fOinFdxiiQikK_q0EcVxGvNTQoWvHLEUGbgcj1UYg@mail.gmail.com/T/#u

v2
- keep rcu locks inside by passing cgroup_bpf

Fixes: 7d08c2c911 ("bpf: Refactor BPF_PROG_RUN_ARRAY family of macros into functions")
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220414161233.170780-1-sdf@google.com
2022-04-19 09:45:47 -07:00
Linus Torvalds
fbb9c58e56 A small set of fixes for the timers core:
- Fix the warning condition in __run_timers() which does not take into
     account, that a CPU base (especially the deferrable base) has never a
     timer armed on it and therefore the next_expiry value can become stale.
 
   - Replace a WARN_ON() in the NOHZ code with a WARN_ON_ONCE() to prevent
     endless spam in dmesg.
 
   - Remove the double star from a comment which is not meant to be in
     kernel-doc format.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmJb44gTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoXptD/4tKI9W4q0CAZnXw/G9cJdUK2u2VQGb
 QfB0DMABuDLMlHUSO92T49IcfjQUR7hz36kHXY0OxwALaEB8ftap6yZLfI1864mZ
 RFatWcTl71ZYY0SSio0rMnbDawTeN9N69WLPR+A6fDH0ahBqTwpeJoa1zj5w99LF
 kFoEtI5vEBtf+m94PljBTclxOKTXmRRCzEZRpJ6L4Ze9SHR7aMKtBgp84M7a3IAw
 IzXq0Kw0MRIm4P26uOpsVzi2KOq6+F1O1zy4L8cHVeU+Y6HC1qWLPXsyhipb0liE
 6STFs1L7ANX+dpMW8kmorXc3RcN4xCuCY4Sb17W6Hqiq3syQ+Ho2G0CTheL0yOpM
 GVqf7SVes3EGAJwWyKaFmhpgWGKcS93p18XGUrf5V+rVIU+ggQcc2vM9E8FiM03d
 TA6T9+1u0TvtPlmd/KQ3OPLMEtu6/3R0fOLEXnv+Cb/PYIYMvPpb9cu6LjJkNqGv
 6R+3dWORI7mYLZD+vS3Y5mrmB5OECGXTPNJ+aBr46PfdvSVLxvmOhFlNR7eQKHZ8
 wFwOC6n4wkD5Y1VOVGEn0GVGWVkZXdlQzZOQGqBhJPjsfPfOaGacyLqCcsZH4V6w
 kitKI8z7A7evLozZMQqKZskwLG1BTQ13YeElb5jJMpG3aTG7lSmPk+iGgLFMDWkc
 Z2k4YD0Uw+92Tw==
 =SffK
 -----END PGP SIGNATURE-----

Merge tag 'timers-urgent-2022-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fixes from Thomas Gleixner:
 "A small set of fixes for the timers core:

   - Fix the warning condition in __run_timers() which does not take
     into account that a CPU base (especially the deferrable base) never
     has a timer armed on it and therefore the next_expiry value can
     become stale.

   - Replace a WARN_ON() in the NOHZ code with a WARN_ON_ONCE() to
     prevent endless spam in dmesg.

   - Remove the double star from a comment which is not meant to be in
     kernel-doc format"

* tag 'timers-urgent-2022-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tick/sched: Fix non-kernel-doc comment
  tick/nohz: Use WARN_ON_ONCE() to prevent console saturation
  timers: Fix warning condition in __run_timers()
2022-04-17 09:53:01 -07:00
Linus Torvalds
0e59732ed6 Two fixes for the SMP core:
- Make the warning condition in flush_smp_call_function_queue() correct,
    which checks a just emptied list head for being empty instead of
    validating that there was no pending entry on the offlined CPU at all.
 
  - The @cpu member of struct cpuhp_cpu_state is initialized when the CPU
    hotplug thread for the upcoming CPU is created. That's too late because
    the creation of the thread can fail and then the following rollback
    operates on CPU0. Get rid of the CPU member and hand the CPU number to
    the involved functions directly.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmJb4skTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoUYaD/9SWzwEqwKICAhWWSMtRg6cvtVTAKGL
 xWXNb5ZxLA8MdG9CYu7WaUF7mU3E8o7kF/dItvkdj0/9mg/IqcVnP1CEOfp0ioHc
 1cbAZShORfFlOowO1WR9Xe3MSZB2mXPHsjh3FWPeGTDxnSByMDXROMFyotxji+7q
 g1ZsyVbq3+hSVTOhnhpxrh8MS3qcCAsgYtHKRnPjOE/tqbNXSmbhmeuN7QBKLVp6
 AD6DaWj8jGtZ8yzbpm0Ve3ZsjH+1SdZAzm4yhh3FHMKsSOwB8yuNwq1oNbbEjxWu
 mTg+LCBBMGYybSTa9sUpDG5Xfc3/dyWnzkGnmp7M8bfElrI3x4d+sMdVhfFRjEXB
 xlXmNqRoExgZwifyoEJbmSbG2SgLiWqyLqgpUvLS/yHud6IurFbu36TrByYRMMsG
 CyaiLMdWIlBibo+xGkVgnLyc2io98KeoLJoc35HM7VYyn9oNI4hUU9OJDIkG5D2i
 lyS+qMTFInFSOm7L5u/SUTYe4I0sXYJ6uEXxhR/Gi3ITXb7aLOJyxhNtq//u1dmg
 6vZHgs8HZWylG3LxVn1QERWotVlPuNPRilsUGCVMsCHjOFnoPy+eM3ANtR2x/uaE
 XpQ58jnhkhPXZfmnsjDEILkWB67J0/9FujbcPvixH8SRvUgBB7xRRbuo9zXPchhw
 VDKy1BCIcZ93gA==
 =XNLP
 -----END PGP SIGNATURE-----

Merge tag 'smp-urgent-2022-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull SMP fixes from Thomas Gleixner:
 "Two fixes for the SMP core:

   - Make the warning condition in flush_smp_call_function_queue()
     correct, which checked a just emptied list head for being empty
     instead of validating that there was no pending entry on the
     offlined CPU at all.

   - The @cpu member of struct cpuhp_cpu_state is initialized when the
     CPU hotplug thread for the upcoming CPU is created. That's too late
     because the creation of the thread can fail and then the following
     rollback operates on CPU0. Get rid of the CPU member and hand the
     CPU number to the involved functions directly"

* tag 'smp-urgent-2022-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  cpu/hotplug: Remove the 'cpu' member of cpuhp_cpu_state
  smp: Fix offline cpu check in flush_smp_call_function_queue()
2022-04-17 09:46:15 -07:00
Linus Torvalds
7e1777f5ec A single fix for the interrupt affinity spreading logic to take into
account that there can be an imbalance between present and possible CPUs,
 which causes already assigned bits to be overwritten.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmJb4cETHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoWrDD/wOd9X4zQz6cZci2Xl091IhMlS2/sBm
 afY98mTK56rX/BEgybMhzXZtBOSdDL2UDnBYznuYiPWof09+mqJGp4cLJAtVvK7E
 cDoLdpqmFuQzO8Ka1lUgdgwR+d32pFmaBASbl/0COdl8qgNskQp+jD6BaztNxYbZ
 HjkN2chMXqRlCOkKCthNajPEB3thmW2H0gDiZT/VEGYkQfp+HdoH7QhpIXt8HkWd
 lb1g1bpDBUwsGrVJ2PlI+XvwmYdW/WOyebIq2D1XBhSS1RXVTt5xfXh/ZfhULzgn
 2bnhvXU7XwDZRJNte1r8EFHSufXZUwhZIa9DRKHoy1fxNaxufRy0VJLEqcRHbcBO
 xGIvXFfjmn+vmO6hwC1/DeP7Dz6WiTYoxTwJVpvgURqcEdr1HvnSy1CdMEOsAMUa
 ZQDJzudV47a2qj80MRO1TSovW2gtDPdSWEUp5oOx2nYUZbqME146ubglZEcXLzq7
 Y7BeMaO1VzXkfyk6ImtrBYY+qrXaT1ZQvCOGPrOltnlU0cBLkQMSUpjTstIcaOAp
 iIvPLjVR1fDdgEBRck+ZAsM9snefz160wEbx23PzGuh2BenLpQfKZHIYhDO8wo2l
 4uUmnp7VY0FS5I38f+rWNvqhFfJV1oda0dQfprP+s3Usz5oSegl764as5hjGcu+y
 0Z/mCvaPODGctg==
 =wSdp
 -----END PGP SIGNATURE-----

Merge tag 'irq-urgent-2022-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fix from Thomas Gleixner:
 "A single fix for the interrupt affinity spreading logic to take into
  account that there can be an imbalance between present and possible
  CPUs, which causes already assigned bits to be overwritten"

* tag 'irq-urgent-2022-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq/affinity: Consider that CPUs on nodes can be unbalanced
2022-04-17 09:42:03 -07:00
Linus Torvalds
b00868396d dma-mapping fixes for Linux 5.18
- avoid a double memory copy for swiotlb (Chao Gao)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmJaWd0LHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYM8MhAAsFPr3ebQD5iEGDuDTY0MPRFUgQzIWXymEgsy5c5t
 maJkJMA19ouOnQiLxNu8T750fo5ywuFHHDp9WJHjplA69KXJIWczdbBNVkHFzfkc
 oG2hzNsouVdLXz0speaFpvcuRFscWH8HQ0AqUO4/PrQmJudOEF8ekEu/8IloYULs
 np1IKrIlIpGk5JEzgIHGcIqxmeclYtETQ1XkNyDOOUZAlsEcQY5pTNgujq++a1uL
 /zku2SOg2EPdeEXFHlH3N7AwsDq/s7So4qLrMUoeXzC8qO9n60kuYKhnMd0nz0H0
 49a4YFqUrBVI8yW4EW6MBMLkZnK4f0ATjdn7SYToh1gVGrCOypUWPjjBtnOStqC8
 DqjByjQra0xf2re4ED/oPtRtQLKSC5seht4HDThOIG62A8uuKySk2NUFQ2u3d8mp
 KOz/v/xg/3BBjjxidhiQtuYv2WxaI5ilfNl1bpThm23Dw4Mqi00f77swUWxJlESj
 qsjKgdE3We9GlkihyPf4/M5PtPphrWOHoUMBH8IAe10o4DuI29aJnXrerCEaKLGr
 RIA2ALiNeVnoUKhSLPDB86pcTZufoEiVtAGRDEADjgU2ic+Ytw312Aswh1PFznzH
 r/6X4tcKoWzjmzpBkCrxm+RYfqipoNb+sbgiaI7O6z/AiBQUR0itc72Tug1lR0xq
 2KI=
 =D8o8
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-5.18-2' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping fix from Christoph Hellwig:

 - avoid a double memory copy for swiotlb (Chao Gao)

* tag 'dma-mapping-5.18-2' of git://git.infradead.org/users/hch/dma-mapping:
  dma-direct: avoid redundant memory sync for swiotlb
2022-04-16 11:20:21 -07:00
Zqiang
25934fcfb9 irq_work: use kasan_record_aux_stack_noalloc() record callstack
On PREEMPT_RT kernel and KASAN is enabled.  the kasan_record_aux_stack()
may call alloc_pages(), and the rt-spinlock will be acquired, if currently
in atomic context, will trigger warning:

  BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
  in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 239, name: bootlogd
  Preemption disabled at:
  [<ffffffffbab1a531>] rt_mutex_slowunlock+0xa1/0x4e0
  CPU: 3 PID: 239 Comm: bootlogd Tainted: G        W 5.17.1-rt17-yocto-preempt-rt+ #105
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
  Call Trace:
     __might_resched.cold+0x13b/0x173
     rt_spin_lock+0x5b/0xf0
     get_page_from_freelist+0x20c/0x1610
     __alloc_pages+0x25e/0x5e0
     __stack_depot_save+0x3c0/0x4a0
     kasan_save_stack+0x3a/0x50
     __kasan_record_aux_stack+0xb6/0xc0
     kasan_record_aux_stack+0xe/0x10
     irq_work_queue_on+0x6a/0x1c0
     pull_rt_task+0x631/0x6b0
     do_balance_callbacks+0x56/0x80
     __balance_callbacks+0x63/0x90
     rt_mutex_setprio+0x349/0x880
     rt_mutex_slowunlock+0x22a/0x4e0
     rt_spin_unlock+0x49/0x80
     uart_write+0x186/0x2b0
     do_output_char+0x2e9/0x3a0
     n_tty_write+0x306/0x800
     file_tty_write.isra.0+0x2af/0x450
     tty_write+0x22/0x30
     new_sync_write+0x27c/0x3a0
     vfs_write+0x3f7/0x5d0
     ksys_write+0xd9/0x180
     __x64_sys_write+0x43/0x50
     do_syscall_64+0x44/0x90
     entry_SYSCALL_64_after_hwframe+0x44/0xae

Fix it by using kasan_record_aux_stack_noalloc() to avoid the call to
alloc_pages().

Link: https://lkml.kernel.org/r/20220402142555.2699582-1-qiang1.zhang@intel.com
Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-15 14:49:55 -07:00
Paolo Abeni
edf45f007a Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-04-15 09:26:00 +02:00
Chao Gao
9e02977bfa dma-direct: avoid redundant memory sync for swiotlb
When we looked into FIO performance with swiotlb enabled in VM, we found
swiotlb_bounce() is always called one more time than expected for each DMA
read request.

It turns out that the bounce buffer is copied to original DMA buffer twice
after the completion of a DMA request (one is done by in
dma_direct_sync_single_for_cpu(), the other by swiotlb_tbl_unmap_single()).
But the content in bounce buffer actually doesn't change between the two
rounds of copy. So, one round of copy is redundant.

Pass DMA_ATTR_SKIP_CPU_SYNC flag to swiotlb_tbl_unmap_single() to
skip the memory copy in it.

This fix increases FIO 64KB sequential read throughput in a guest with
swiotlb=force by 5.6%.

Fixes: 55897af630 ("dma-direct: merge swiotlb_dma_ops into the dma_direct code")
Reported-by: Wang Zhaoyang1 <zhaoyang1.wang@intel.com>
Reported-by: Gao Liang <liang.gao@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-04-14 06:30:39 +02:00
Yu Zhe
241d50ec5d bpf: Remove unnecessary type castings
Remove/clean up unnecessary void * type castings.

Signed-off-by: Yu Zhe <yuzhe@nfschina.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220413015048.12319-1-yuzhe@nfschina.com
2022-04-14 01:02:24 +02:00
Daniel Borkmann
68477ede43 Merge branch 'pr/bpf-sysctl' into bpf-next
Pull the migration of the BPF sysctl handling into BPF core. This work is
needed in both sysctl-next and bpf-next tree.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/bpf/b615bd44-6bd1-a958-7e3f-dd2ff58931a1@iogearbox.net
2022-04-13 21:55:11 +02:00
Yan Zhu
2900005ea2 bpf: Move BPF sysctls from kernel/sysctl.c to BPF core
We're moving sysctls out of kernel/sysctl.c as it is a mess. We
already moved all filesystem sysctls out. And with time the goal
is to move all sysctls out to their own subsystem/actual user.

kernel/sysctl.c has grown to an insane mess and its easy to run
into conflicts with it. The effort to move them out into various
subsystems is part of this.

Signed-off-by: Yan Zhu <zhuyan34@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/bpf/20220407070759.29506-1-zhuyan34@huawei.com
2022-04-13 21:36:56 +02:00
Steven Price
b7ba6d8dc3 cpu/hotplug: Remove the 'cpu' member of cpuhp_cpu_state
Currently the setting of the 'cpu' member of struct cpuhp_cpu_state in
cpuhp_create() is too late as it is used earlier in _cpu_up().

If kzalloc_node() in __smpboot_create_thread() fails then the rollback will
be done with st->cpu==0 causing CPU0 to be erroneously set to be dying,
causing the scheduler to get mightily confused and throw its toys out of
the pram.

However the cpu number is actually available directly, so simply remove
the 'cpu' member and avoid the problem in the first place.

Fixes: 2ea46c6fc9 ("cpumask/hotplug: Fix cpu_dying() state tracking")
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220411152233.474129-2-steven.price@arm.com
2022-04-13 21:25:40 +02:00
Nadav Amit
9e949a3886 smp: Fix offline cpu check in flush_smp_call_function_queue()
The check in flush_smp_call_function_queue() for callbacks that are sent
to offline CPUs currently checks whether the queue is empty.

However, flush_smp_call_function_queue() has just deleted all the
callbacks from the queue and moved all the entries into a local list.
This checks would only be positive if some callbacks were added in the
short time after llist_del_all() was called. This does not seem to be
the intention of this check.

Change the check to look at the local list to which the entries were
moved instead of the queue from which all the callbacks were just
removed.

Fixes: 8d056c48e4 ("CPU hotplug, smp: flush any pending IPI callbacks before CPU offline")
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220319072015.1495036-1-namit@vmware.com
2022-04-13 18:44:35 +02:00
Yuntao Wang
aa1b02e674 bpf: Remove redundant assignment to meta.seq in __task_seq_show()
The seq argument is assigned to meta.seq twice, the second one is
redundant, remove it.

This patch also removes a redundant space in bpf_iter_link_attach().

Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20220410060020.307283-1-ytcoode@gmail.com
2022-04-11 21:14:34 +02:00
Rei Yamamoto
08d835dff9 genirq/affinity: Consider that CPUs on nodes can be unbalanced
If CPUs on a node are offline at boot time, the number of nodes is
different when building affinity masks for present cpus and when building
affinity masks for possible cpus. This causes the following problem:

In the case that the number of vectors is less than the number of nodes
there are cases where bits of masks for present cpus are overwritten when
building masks for possible cpus.

Fix this by excluding CPUs, which are not part of the current build mask
(present/possible).

[ tglx: Massaged changelog and added comment ]

Fixes: b825921990 ("genirq/affinity: Spread IRQs to all available NUMA nodes")
Signed-off-by: Rei Yamamoto <yamamoto.rei@jp.fujitsu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220331003309.10891-1-yamamoto.rei@jp.fujitsu.com
2022-04-11 09:58:03 +02:00
Linus Torvalds
b51f86e990 - A couple of fixes to cgroup-related handling of perf events
- A couple of fixes to event encoding on Sapphire Rapids
 
 - Pass event caps of inherited events so that perf doesn't fail wrongly at fork()
 
 - Add support for a new Raptor Lake CPU
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmJSvt0ACgkQEsHwGGHe
 VUpKBRAAtegwh4ilwoRM0LePH2TX752pREy+M1qfEUp/XyH3tF8VAixCAmIg7qlI
 IyjRX0AKDC1F08sM7/JmTf0M+hnl/oH2YPG8Q6p3igtfARvn+5bPZdSBpTAC9P5L
 QX3S2WVzv5X78IomIfENqbg5HyZP3IXeg7R7sqZhHbtoG54n5NEv/+aJl5HmHFTt
 gLTrXetL46OSMnLzKfd3hlJqCWSnTz1aGKgGX2cZy9ipI63+XrYMuNmiwJ+CrA3G
 pI98RmKnCPqV2rXij1GpVQNyG2aPR+VVZM3aaq6XBAmiNTaCfnvWbEBGhCkjaSgA
 UU7Y6D1Qxc0OZ1plcjhKc4l/W1oj8jqmG9nS6J2Xy4szdpZIdxBhlWq89xCrb9AC
 yIgKif2iVl7eMVKVG1Jq1u2wTwurBAamH73sCCNn8ndctBjicoM8pbtHMHxzceyZ
 w4Cff0yUNzHgPiqSHQRARw/CaUceL9kDoGzPeEQOR0A+27MpNulchts4HCtIvwzI
 yLIK1JFPHDrCACLTMuAhvov3EMTeoTIfc91eOZRjubRTPx7TxujaZHdP7N+R3nkk
 Giehc/l6IhFPhT8QACk0bziTVJ9in+Jx8pCnocGKuj80Uqs7Sq7swjlasy1Zoy7r
 x9Qzy1gZhPHnvPd6LWU4WyPa767D07DlG/zFdg+P3EeWa/3efdw=
 =ba3V
 -----END PGP SIGNATURE-----

Merge tag 'perf_urgent_for_v5.18_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Borislav Petkov:

 - A couple of fixes to cgroup-related handling of perf events

 - A couple of fixes to event encoding on Sapphire Rapids

 - Pass event caps of inherited events so that perf doesn't fail wrongly
   at fork()

 - Add support for a new Raptor Lake CPU

* tag 'perf_urgent_for_v5.18_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Always set cpuctx cgrp when enable cgroup event
  perf/core: Fix perf_cgroup_switch()
  perf/core: Use perf_cgroup_info->active to check if cgroup is active
  perf/core: Don't pass task around when ctx sched in
  perf/x86/intel: Update the FRONTEND MSR mask on Sapphire Rapids
  perf/x86/intel: Don't extend the pseudo-encoding to GP counters
  perf/core: Inherit event_caps
  perf/x86/uncore: Add Raptor Lake uncore support
  perf/x86/msr: Add Raptor Lake CPU support
  perf/x86/cstate: Add Raptor Lake support
  perf/x86: Add Intel Raptor Lake support
2022-04-10 07:08:22 -10:00
Linus Torvalds
50c94de67c - Allow the compiler to optimize away unused percpu accesses and change
the local_lock_* macros back to inline functions
 
 - A couple of fixes to static call insn patching
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmJStZ4ACgkQEsHwGGHe
 VUpUpA/8DHOMUQa7rM8z49ZWBV01HNVCLECTeeKshQBLyJfWc84MNOfdPbpgEGvY
 XE/eIZDnTMB5UKD0bfRqD+AQ0fXjl3NiLnJrdDZJqEQAiP/wGBswKNXMire8xPT8
 9MfaOKYWYPl0LY2uZBWVLcdC+lVe4kRGfhqAcl4LRx0ZSvMzgjcFy34NeXY8LlXD
 kFQJEzHa97CTROje54mtmXEt7Y5bxjxWwVTSyfEt0hJPGo1bJtJP6FaY01Muj+Xu
 h/OGNx3KLOYf9MqQC31caAwKgtUOptm8bTpvG3onaHg29qJgz2umKwONyOjYrUUn
 2PE3NREfMuKI38nf88pX+lOCs6/I1uVIjJPvAVJijIcuI1ZBXrfm26IP0lZ3LqG1
 h/9Y5gChiZPn1j90VnF4UCJUm4u3bYEAHqKIQgUdpcpUqX0NlxbDiXoYxJWfHnmB
 PBJ0PE7Vdo4MPK0n3BGVrzXAFeOyHsohAsKFijT8afRCMAOF/ebmVs/tI5NygFrK
 11e/U13/78iKkazZSxWew8vU3yXA39W5Rym7aPnhR2lWxvN+xQOjNTgZTxF9hUcZ
 6AcsaYJgHR7nD8SM7Y9+cwHWOWaDEdZMg9XSkgvyd1p0tHb4u+Ve/SQK7sA3j9q7
 ZmZyFSE1X3K+M1i+75rUSVmIEVM5cpfhodN89iRje/JIZ1KyRT8=
 =hSOc
 -----END PGP SIGNATURE-----

Merge tag 'locking_urgent_for_v5.18_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking fixes from Borislav Petkov:

 - Allow the compiler to optimize away unused percpu accesses and change
   the local_lock_* macros back to inline functions

 - A couple of fixes to static call insn patching

* tag 'locking_urgent_for_v5.18_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Revert "mm/page_alloc: mark pagesets as __maybe_unused"
  Revert "locking/local_lock: Make the empty local_lock_*() function a macro."
  x86/percpu: Remove volatile from arch_raw_cpu_ptr().
  static_call: Remove __DEFINE_STATIC_CALL macro
  static_call: Properly initialise DEFINE_STATIC_CALL_RET0()
  static_call: Don't make __static_call_return0 static
  x86,static_call: Fix __static_call_return0 for i386
2022-04-10 06:56:46 -10:00
Linus Torvalds
7136849ea9 - Use the correct static key checking primitive on the IRQ exit path
- Two fixes for the new forceidle balancer
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmJSsskACgkQEsHwGGHe
 VUqHRg//UvcM+Ygrx17yhWzHAKi5Mm5sVxSbsY08WU9cQsAQTZ4c9I8Rs2OHkyKC
 8k0uQDxrrOeJRcuJoP87pRPqefvep+Gk1873KEvBRUe7+sQATGO6KMshoiPrnYO5
 kQzd98rK9vrNu5ZwZFWADnCAYco+5nlsyFVXkRY0ZXhDUvfInus2OLkAm4ULXrt1
 FVtO69QXDK3y42NzLSPDCoQyeM/bcCCts6wpR2WWHE2F9zD8tiM8A4DBNpu6Iker
 wli2la27V4U0236ar7Md2HwD8AkQUuOSGYh9JD5RBNZjJpoHKPNNIv35dI7r9yXz
 6/r52pM+idMmfV7MjDWg7cIyHgJKbfBJ54+ibvoqd3Gi5R9IZLJOXGEQaaPLaMT0
 7movvEm/NDzHbQDGKmPoiRr4PYinRKFN85zTuzirscbHInMkmshciHmK2TG9Qt9m
 2L5DG/LnA0EQkhFyrGoxXTgnZwGWpmpWu7tRZfFTUsjbri4CRGThmQYpl7tAEzF7
 TC60WA2RfYXaJgtguZJZfiHXSYfzQriXLd4Mj1WRv6FU+IKedZmAPgjYO2dKu++y
 We8ZOER8Ysy2lJR/DDQ0waDp5UrTarX/WCFzIWNLKcFLvgLEPKrfeO8AtFHDVUZd
 58g9DCn1Jed8ZEYwxpPYpbLVcqQ790oShlU+/EA+FwN5pehuJNU=
 =elMK
 -----END PGP SIGNATURE-----

Merge tag 'sched_urgent_for_v5.18_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Borislav Petkov:

 - Use the correct static key checking primitive on the IRQ exit path

 - Two fixes for the new forceidle balancer

* tag 'sched_urgent_for_v5.18_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  entry: Fix compile error in dynamic_irqentry_exit_cond_resched()
  sched: Teach the forced-newidle balancer about CPU affinity limitation.
  sched/core: Fix forceidle balancing
2022-04-10 06:47:49 -10:00
Jiapeng Chong
9c95bc25ad tick/sched: Fix non-kernel-doc comment
Fixes the following W=1 kernel build warning:

kernel/time/tick-sched.c:1563: warning: This comment starts with '/**',
but isn't a kernel-doc comment.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220214084739.63228-1-jiapeng.chong@linux.alibaba.com
2022-04-10 12:23:34 +02:00
Paul Gortmaker
40e97e4296 tick/nohz: Use WARN_ON_ONCE() to prevent console saturation
While running some testing on code that happened to allow the variable
tick_nohz_full_running to get set but with no "possible" NOHZ cores to
back up that setting, this warning triggered:

        if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE))
                WARN_ON(tick_nohz_full_running);

The console was overwhemled with an endless stream of one WARN per tick
per core and there was no way to even see what was going on w/o using a
serial console to capture it and then trace it back to this.

Change it to WARN_ON_ONCE().

Fixes: 08ae95f4fd ("nohz_full: Allow the boot CPU to be nohz_full")
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20211206145950.10927-3-paul.gortmaker@windriver.com
2022-04-10 12:23:34 +02:00
Anna-Maria Behnsen
c54bc0fc84 timers: Fix warning condition in __run_timers()
When the timer base is empty, base::next_expiry is set to base::clk +
NEXT_TIMER_MAX_DELTA and base::next_expiry_recalc is false. When no timer
is queued until jiffies reaches base::next_expiry value, the warning for
not finding any expired timer and base::next_expiry_recalc is false in
__run_timers() triggers.

To prevent triggering the warning in this valid scenario
base::timers_pending needs to be added to the warning condition.

Fixes: 31cd0e119d ("timers: Recalculate next timer interrupt only when necessary")
Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20220405191732.7438-3-anna-maria@linutronix.de
2022-04-09 22:17:47 +02:00
Jakub Kicinski
34ba23b44c Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2022-04-09

We've added 63 non-merge commits during the last 9 day(s) which contain
a total of 68 files changed, 4852 insertions(+), 619 deletions(-).

The main changes are:

1) Add libbpf support for USDT (User Statically-Defined Tracing) probes.
   USDTs are an abstraction built on top of uprobes, critical for tracing
   and BPF, and widely used in production applications, from Andrii Nakryiko.

2) While Andrii was adding support for x86{-64}-specific logic of parsing
   USDT argument specification, Ilya followed-up with USDT support for s390
   architecture, from Ilya Leoshkevich.

3) Support name-based attaching for uprobe BPF programs in libbpf. The format
   supported is `u[ret]probe/binary_path:[raw_offset|function[+offset]]`, e.g.
   attaching to libc malloc can be done in BPF via SEC("uprobe/libc.so.6:malloc")
   now, from Alan Maguire.

4) Various load/store optimizations for the arm64 JIT to shrink the image
   size by using arm64 str/ldr immediate instructions. Also enable pointer
   authentication to verify return address for JITed code, from Xu Kuohai.

5) BPF verifier fixes for write access checks to helper functions, e.g.
   rd-only memory from bpf_*_cpu_ptr() must not be passed to helpers that
   write into passed buffers, from Kumar Kartikeya Dwivedi.

6) Fix overly excessive stack map allocation for its base map structure and
   buckets which slipped-in from cleanups during the rlimit accounting removal
   back then, from Yuntao Wang.

7) Extend the unstable CT lookup helpers for XDP and tc/BPF to report netfilter
   connection tracking tuple direction, from Lorenzo Bianconi.

8) Improve bpftool dump to show BPF program/link type names, Milan Landaverde.

9) Minor cleanups all over the place from various others.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (63 commits)
  bpf: Fix excessive memory allocation in stack_map_alloc()
  selftests/bpf: Fix return value checks in perf_event_stackmap test
  selftests/bpf: Add CO-RE relos into linked_funcs selftests
  libbpf: Use weak hidden modifier for USDT BPF-side API functions
  libbpf: Don't error out on CO-RE relos for overriden weak subprogs
  samples, bpf: Move routes monitor in xdp_router_ipv4 in a dedicated thread
  libbpf: Allow WEAK and GLOBAL bindings during BTF fixup
  libbpf: Use strlcpy() in path resolution fallback logic
  libbpf: Add s390-specific USDT arg spec parsing logic
  libbpf: Make BPF-side of USDT support work on big-endian machines
  libbpf: Minor style improvements in USDT code
  libbpf: Fix use #ifdef instead of #if to avoid compiler warning
  libbpf: Potential NULL dereference in usdt_manager_attach_usdt()
  selftests/bpf: Uprobe tests should verify param/return values
  libbpf: Improve string parsing for uprobe auto-attach
  libbpf: Improve library identification for uprobe binary path resolution
  selftests/bpf: Test for writes to map key from BPF helpers
  selftests/bpf: Test passing rdonly mem to global func
  bpf: Reject writes for PTR_TO_MAP_KEY in check_helper_mem_access
  bpf: Check PTR_TO_MEM | MEM_RDONLY in check_helper_mem_access
  ...
====================

Link: https://lore.kernel.org/r/20220408231741.19116-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-08 17:07:29 -07:00
Yuntao Wang
b45043192b bpf: Fix excessive memory allocation in stack_map_alloc()
The 'n_buckets * (value_size + sizeof(struct stack_map_bucket))' part of the
allocated memory for 'smap' is never used after the memlock accounting was
removed, thus get rid of it.

[ Note, Daniel:

Commit b936ca643a ("bpf: rework memlock-based memory accounting for maps")
moved `cost += n_buckets * (value_size + sizeof(struct stack_map_bucket))`
up and therefore before the bpf_map_area_alloc() allocation, sigh. In a later
step commit c85d69135a ("bpf: move memory size checks to bpf_map_charge_init()"),
and the overflow checks of `cost >= U32_MAX - PAGE_SIZE` moved into
bpf_map_charge_init(). And then 370868107b ("bpf: Eliminate rlimit-based
memory accounting for stackmap maps") finally removed the bpf_map_charge_init().
Anyway, the original code did the allocation same way as /after/ this fix. ]

Fixes: b936ca643a ("bpf: rework memlock-based memory accounting for maps")
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220407130423.798386-1-ytcoode@gmail.com
2022-04-09 00:28:21 +02:00