Commit graph

40805 commits

Author SHA1 Message Date
Connor O'Brien
9cb61e50bf bpf: btf: limit logging of ignored BTF mismatches
Enabling CONFIG_MODULE_ALLOW_BTF_MISMATCH is an indication that BTF
mismatches are expected and module loading should proceed
anyway. Logging with pr_warn() on every one of these "benign"
mismatches creates unnecessary noise when many such modules are
loaded. Instead, handle this case with a single log warning that BTF
info may be unavailable.

Mismatches also result in calls to __btf_verifier_log() via
__btf_verifier_log_type() or btf_verifier_log_member(), adding several
additional lines of logging per mismatched module. Add checks to these
paths to skip logging for module BTF mismatches in the "allow
mismatch" case.

All existing logging behavior is preserved in the default
CONFIG_MODULE_ALLOW_BTF_MISMATCH=n case.

Signed-off-by: Connor O'Brien <connoro@google.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20230107025331.3240536-1-connoro@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-01-10 15:58:30 -08:00
Jakub Kicinski
4aea86b403 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05 15:34:11 -08:00
Linus Torvalds
50011c32f4 Including fixes from bpf, wifi, and netfilter.
Current release - regressions:
 
  - bpf: fix nullness propagation for reg to reg comparisons,
    avoid null-deref
 
  - inet: control sockets should not use current thread task_frag
 
  - bpf: always use maximal size for copy_array()
 
  - eth: bnxt_en: don't link netdev to a devlink port for VFs
 
 Current release - new code bugs:
 
  - rxrpc: fix a couple of potential use-after-frees
 
  - netfilter: conntrack: fix IPv6 exthdr error check
 
  - wifi: iwlwifi: fw: skip PPAG for JF, avoid FW crashes
 
  - eth: dsa: qca8k: various fixes for the in-band register access
 
  - eth: nfp: fix schedule in atomic context when sync mc address
 
  - eth: renesas: rswitch: fix getting mac address from device tree
 
  - mobile: ipa: use proper endpoint mask for suspend
 
 Previous releases - regressions:
 
  - tcp: add TIME_WAIT sockets in bhash2, fix regression caught
    by Jiri / python tests
 
  - net: tc: don't intepret cls results when asked to drop, fix
    oob-access
 
  - vrf: determine the dst using the original ifindex for multicast
 
  - eth: bnxt_en:
    - fix XDP RX path if BPF adjusted packet length
    - fix HDS (header placement) and jumbo thresholds for RX packets
 
  - eth: ice: xsk: do not use xdp_return_frame() on tx_buf->raw_buf,
    avoid memory corruptions
 
 Previous releases - always broken:
 
  - ulp: prevent ULP without clone op from entering the LISTEN status
 
  - veth: fix race with AF_XDP exposing old or uninitialized descriptors
 
  - bpf:
    - pull before calling skb_postpull_rcsum() (fix checksum support
      and avoid a WARN())
    - fix panic due to wrong pageattr of im->image (when livepatch
      and kretfunc coexist)
    - keep a reference to the mm, in case the task is dead
 
  - mptcp: fix deadlock in fastopen error path
 
  - netfilter:
    - nf_tables: perform type checking for existing sets
    - nf_tables: honor set timeout and garbage collection updates
    - ipset: fix hash:net,port,net hang with /0 subnet
    - ipset: avoid hung task warning when adding/deleting entries
 
  - selftests: net:
    - fix cmsg_so_mark.sh test hang on non-x86 systems
    - fix the arp_ndisc_evict_nocarrier test for IPv6
 
  - usb: rndis_host: secure rndis_query check against int overflow
 
  - eth: r8169: fix dmar pte write access during suspend/resume with WOL
 
  - eth: lan966x: fix configuration of the PCS
 
  - eth: sparx5: fix reading of the MAC address
 
  - eth: qed: allow sleep in qed_mcp_trace_dump()
 
  - eth: hns3:
    - fix interrupts re-initialization after VF FLR
    - fix handling of promisc when MAC addr table gets full
    - refine the handling for VF heartbeat
 
  - eth: mlx5:
    - properly handle ingress QinQ-tagged packets on VST
    - fix io_eq_size and event_eq_size params validation on big endian
    - fix RoCE setting at HCA level if not supported at all
    - don't turn CQE compression on by default for IPoIB
 
  - eth: ena:
    - fix toeplitz initial hash key value
    - account for the number of XDP-processed bytes in interface stats
    - fix rx_copybreak value update
 
 Misc:
 
  - ethtool: harden phy stat handling against buggy drivers
 
  - docs: netdev: convert maintainer's doc from FAQ to a normal document
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmO3MLcACgkQMUZtbf5S
 IrsEQBAAijPrpxsGMfX+VMqZ8RPKA3Qg8XF3ji2fSp4c0kiKv6lYI7PzPTR3u/fj
 CAlhQMHv7z53uM6Zd7FdUVl23paaEycu8YnlwSubg9z+wSeh/RQ6iq94mSk1PV+K
 LLVR/yop2N35Yp/oc5KZMb9fMLkxRG9Ci73QUVVYgvIrSd4Zdm13FjfVjL2C1MZH
 Yp003wigMs9IkIHOpHjNqwn/5s//0yXsb1PgKxCsaMdMQsG0yC+7eyDmxshCqsji
 xQm15mkGMjvWEYJaa4Tj4L3JW6lWbQzCu9nqPUX16KpmrnScr8S8Is+aifFZIBeW
 GZeDYgvjSxNWodeOrJnD3X+fnbrR9+qfx7T9y7XighfytAz5DNm1LwVOvZKDgPFA
 s+LlxOhzkDNEqbIsusK/LW+04EFc5gJyTI2iR6s4SSqmH3c3coJZQJeyRFWDZy/x
 1oqzcCcq8SwGUTJ9g6HAmDQoVkhDWDT/ZcRKhpWG0nJub972lB2iwM7LrAu+HoHI
 r8hyCkHpOi5S3WZKI9gPiGD+yOlpVAuG2wHg2IpjhKQvtd9DFUChGDhFeoB2rqJf
 9uI3RJBBYTDkeNu3kpfy5uMh2XhvbIZntK5kwpJ4VettZWFMaOAzn7KNqk8iT4gJ
 ASMrUrX59X0TAN0MgpJJm7uGtKbKZOu4lHNm74TUxH7V7bYn7dk=
 =TlcN
 -----END PGP SIGNATURE-----

Merge tag 'net-6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bpf, wifi, and netfilter.

  Current release - regressions:

   - bpf: fix nullness propagation for reg to reg comparisons, avoid
     null-deref

   - inet: control sockets should not use current thread task_frag

   - bpf: always use maximal size for copy_array()

   - eth: bnxt_en: don't link netdev to a devlink port for VFs

  Current release - new code bugs:

   - rxrpc: fix a couple of potential use-after-frees

   - netfilter: conntrack: fix IPv6 exthdr error check

   - wifi: iwlwifi: fw: skip PPAG for JF, avoid FW crashes

   - eth: dsa: qca8k: various fixes for the in-band register access

   - eth: nfp: fix schedule in atomic context when sync mc address

   - eth: renesas: rswitch: fix getting mac address from device tree

   - mobile: ipa: use proper endpoint mask for suspend

  Previous releases - regressions:

   - tcp: add TIME_WAIT sockets in bhash2, fix regression caught by
     Jiri / python tests

   - net: tc: don't intepret cls results when asked to drop, fix
     oob-access

   - vrf: determine the dst using the original ifindex for multicast

   - eth: bnxt_en:
      - fix XDP RX path if BPF adjusted packet length
      - fix HDS (header placement) and jumbo thresholds for RX packets

   - eth: ice: xsk: do not use xdp_return_frame() on tx_buf->raw_buf,
     avoid memory corruptions

  Previous releases - always broken:

   - ulp: prevent ULP without clone op from entering the LISTEN status

   - veth: fix race with AF_XDP exposing old or uninitialized
     descriptors

   - bpf:
      - pull before calling skb_postpull_rcsum() (fix checksum support
        and avoid a WARN())
      - fix panic due to wrong pageattr of im->image (when livepatch and
        kretfunc coexist)
      - keep a reference to the mm, in case the task is dead

   - mptcp: fix deadlock in fastopen error path

   - netfilter:
      - nf_tables: perform type checking for existing sets
      - nf_tables: honor set timeout and garbage collection updates
      - ipset: fix hash:net,port,net hang with /0 subnet
      - ipset: avoid hung task warning when adding/deleting entries

   - selftests: net:
      - fix cmsg_so_mark.sh test hang on non-x86 systems
      - fix the arp_ndisc_evict_nocarrier test for IPv6

   - usb: rndis_host: secure rndis_query check against int overflow

   - eth: r8169: fix dmar pte write access during suspend/resume with
     WOL

   - eth: lan966x: fix configuration of the PCS

   - eth: sparx5: fix reading of the MAC address

   - eth: qed: allow sleep in qed_mcp_trace_dump()

   - eth: hns3:
      - fix interrupts re-initialization after VF FLR
      - fix handling of promisc when MAC addr table gets full
      - refine the handling for VF heartbeat

   - eth: mlx5:
      - properly handle ingress QinQ-tagged packets on VST
      - fix io_eq_size and event_eq_size params validation on big endian
      - fix RoCE setting at HCA level if not supported at all
      - don't turn CQE compression on by default for IPoIB

   - eth: ena:
      - fix toeplitz initial hash key value
      - account for the number of XDP-processed bytes in interface stats
      - fix rx_copybreak value update

  Misc:

   - ethtool: harden phy stat handling against buggy drivers

   - docs: netdev: convert maintainer's doc from FAQ to a normal
     document"

* tag 'net-6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (112 commits)
  caif: fix memory leak in cfctrl_linkup_request()
  inet: control sockets should not use current thread task_frag
  net/ulp: prevent ULP without clone op from entering the LISTEN status
  qed: allow sleep in qed_mcp_trace_dump()
  MAINTAINERS: Update maintainers for ptp_vmw driver
  usb: rndis_host: Secure rndis_query check against int overflow
  net: dpaa: Fix dtsec check for PCS availability
  octeontx2-pf: Fix lmtst ID used in aura free
  drivers/net/bonding/bond_3ad: return when there's no aggregator
  netfilter: ipset: Rework long task execution when adding/deleting entries
  netfilter: ipset: fix hash:net,port,net hang with /0 subnet
  net: sparx5: Fix reading of the MAC address
  vxlan: Fix memory leaks in error path
  net: sched: htb: fix htb_classify() kernel-doc
  net: sched: cbq: dont intepret cls results when asked to drop
  net: sched: atm: dont intepret cls results when asked to drop
  dt-bindings: net: marvell,orion-mdio: Fix examples
  dt-bindings: net: sun8i-emac: Add phy-supply property
  net: ipa: use proper endpoint mask for suspend
  selftests: net: return non-zero for failures reported in arp_ndisc_evict_nocarrier
  ...
2023-01-05 12:40:50 -08:00
Jakub Kicinski
d75858ef10 bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCY7X/4wAKCRDbK58LschI
 g7gzAQCjKsLtAWg1OplW+B7pvEPwkQ8g3O1+PYWlToCUACTlzQD+PEMrqGnxB573
 oQAk6I2yOTwLgvlHkrm+TIdKSouI4gs=
 =2hUY
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
bpf-next 2023-01-04

We've added 45 non-merge commits during the last 21 day(s) which contain
a total of 50 files changed, 1454 insertions(+), 375 deletions(-).

The main changes are:

1) Fixes, improvements and refactoring of parts of BPF verifier's
   state equivalence checks, from Andrii Nakryiko.

2) Fix a few corner cases in libbpf's BTF-to-C converter in particular
   around padding handling and enums, also from Andrii Nakryiko.

3) Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to better
  support decap on GRE tunnel devices not operating in collect metadata,
  from Christian Ehrig.

4) Improve x86 JIT's codegen for PROBE_MEM runtime error checks,
   from Dave Marchevsky.

5) Remove the need for trace_printk_lock for bpf_trace_printk
   and bpf_trace_vprintk helpers, from Jiri Olsa.

6) Add proper documentation for BPF_MAP_TYPE_SOCK{MAP,HASH} maps,
   from Maryam Tahhan.

7) Improvements in libbpf's btf_parse_elf error handling, from Changbin Du.

8) Bigger batch of improvements to BPF tracing code samples,
   from Daniel T. Lee.

9) Add LoongArch support to libbpf's bpf_tracing helper header,
   from Hengqi Chen.

10) Fix a libbpf compiler warning in perf_event_open_probe on arm32,
    from Khem Raj.

11) Optimize bpf_local_storage_elem by removing 56 bytes of padding,
    from Martin KaFai Lau.

12) Use pkg-config to locate libelf for resolve_btfids build,
    from Shen Jiamin.

13) Various libbpf improvements around API documentation and errno
    handling, from Xin Liu.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (45 commits)
  libbpf: Return -ENODATA for missing btf section
  libbpf: Add LoongArch support to bpf_tracing.h
  libbpf: Restore errno after pr_warn.
  libbpf: Added the description of some API functions
  libbpf: Fix invalid return address register in s390
  samples/bpf: Use BPF_KSYSCALL macro in syscall tracing programs
  samples/bpf: Fix tracex2 by using BPF_KSYSCALL macro
  samples/bpf: Change _kern suffix to .bpf with syscall tracing program
  samples/bpf: Use vmlinux.h instead of implicit headers in syscall tracing program
  samples/bpf: Use kyscall instead of kprobe in syscall tracing program
  bpf: rename list_head -> graph_root in field info types
  libbpf: fix errno is overwritten after being closed.
  bpf: fix regs_exact() logic in regsafe() to remap IDs correctly
  bpf: perform byte-by-byte comparison only when necessary in regsafe()
  bpf: reject non-exact register type matches in regsafe()
  bpf: generalize MAYBE_NULL vs non-MAYBE_NULL rule
  bpf: reorganize struct bpf_reg_state fields
  bpf: teach refsafe() to take into account ID remapping
  bpf: Remove unused field initialization in bpf's ctl_table
  selftests/bpf: Add jit probe_mem corner case tests to s390x denylist
  ...
====================

Link: https://lore.kernel.org/r/20230105000926.31350-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-04 20:21:25 -08:00
Linus Torvalds
150aae354b - Pass only an initialized perf event attribute to the LSM hook
- Fix a use-after-free on the perf syscall's error path
 
 - A potential integer overflow fix in amd_core_pmu_init()
 
 - Fix the cgroup events tracking after the context handling rewrite
 
 - Return the proper value from the inherit_event() function on error
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmOxotcACgkQEsHwGGHe
 VUqN0g//fCMUbsO/TRfDavqFMdAYi93EHWwd2I8KUDd2TPKY6KA2gQndw8aEvAF8
 iO/zO0osD0+23ANmpLBIGuyrM41UqiG+U/Q70++t/0yaD4nYegcpIamg518MEAK9
 HxP4Gl3A9Yy4QbcOgLGq+ouOvUljJAB3jhPc+KsWmgxsLrua4JZeEgIwNm/4yLOG
 cgetqX7l1A5ASPxU3MO+wSyjEHVIq6rA6y3HllpIHLYy5/TyvkhNZiykAxRn8Exy
 yznyYkifNUfCh0TOusMLmK5qR/UFPbGbZfxBlw8ni9sIocsz02U3N4XOwBPVgi8x
 2OVyH1j+cZUuBuk4AH39FmaFDe/PZxFiYvInH/Y1vmS5uqi5v7hn23VLxqykPbdw
 Drz/vo7YiuPvM3R1ibf0yXN53mn+zMzKFRYRQwPWN6c1ocxEltaeaDcRsLhh8ql+
 RM2mKVvWjELRuFWQpV8KCKuEpCz/niQ/8wvwXCxtPyJt9wiTPu4+j7K+p27T7FQR
 33zPSGgN/nCuVEBOIsyi3B7zzIoRy0l3nM6tOTZXM/TVFhLyda0fJUV6Ydi39CGy
 Mf1FnLd+9nrR+Oqinh/DzJ5nqoSrSxPV9FXRu+0bMsYD+lbiMtOZFw8VaQzxR1gD
 dUOve8zUgQAusdvGeJU1+iOmdAxbiSS9s3pqw7G+d+O4a/x7PBE=
 =M9iU
 -----END PGP SIGNATURE-----

Merge tag 'perf_urgent_for_v6.2_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Borislav Petkov:

 - Pass only an initialized perf event attribute to the LSM hook

 - Fix a use-after-free on the perf syscall's error path

 - A potential integer overflow fix in amd_core_pmu_init()

 - Fix the cgroup events tracking after the context handling rewrite

 - Return the proper value from the inherit_event() function on error

* tag 'perf_urgent_for_v6.2_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Call LSM hook after copying perf_event_attr
  perf: Fix use-after-free in error path
  perf/x86/amd: fix potential integer overflow on shift of a int
  perf/core: Fix cgroup events tracking
  perf core: Return error pointer if inherit_event() fails to find pmu_ctx
2023-01-01 11:27:00 -08:00
Linus Torvalds
95d248d16f - Prevent the leaking of a debug timer in futex_waitv()
- A preempt-RT mutex locking fix, adding the proper acquire semantics
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmOxn1cACgkQEsHwGGHe
 VUp+AQ/9GilVlaWqloC+sDK/7qTLfmQjR/XwQPii2pMBUO1oRFT7mG+dG7lKNpRL
 WCLTVIrzG528ORJytvMGuqx+CqenuTGPFI9rtizxdtRlNZyiS33hvlaGBPY7XYY7
 G2qugGKuZ7WVKZpq6szPwnrMuWjdsd+XOO+NE3h/5kI2T7ZukmlXl9/EtVii7+y3
 LGVjMhN+R0XCMCPSSPtFx5eUGalemPnGSBcX/Qb4zG1MphSPURFKHEQmWHZzgrkg
 6fmqx773u2mquxTYeEy0FDLjOYFEHlzDFgRtIDBaziHIxjzO/5r09m8RDF2mXetW
 W/pf/Iduh2TZWwJDngmUi0JxqOzz0l8T/6vSCuArYWkjb+0zDOPHuCYwif2iHAQo
 pMn+O5nAtLMZhdk+yjTs4abQ11q9owLh0nGps4A2SKgJIH0mL2BICwRotzuV1sxf
 CGnoOxIxZuZGq8xMoHp6sJ8e4RnJHBAt7PKrCsLTkPAjKjgmBpqshVav//kyCFeY
 2R4rgRY/a42Z86uNSysVnxxkop4vRgTZ3l7TijzyeIbycI7v9WMADk0TKTDgsMbL
 nEnlZPWXJGYNXH71ZPV5TPgidHfiO68sVKerXWRrWRLyGHhY2TC67Lz29vTAlbMK
 NdOM6pO0DkaKSLxEJ6qso819ePnRA/2O5B7xU/Ll7vONjCpHlFw=
 =9LHF
 -----END PGP SIGNATURE-----

Merge tag 'locking_urgent_for_v6.2_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking fixes from Borislav Petkov:

 - Prevent the leaking of a debug timer in futex_waitv()

 - A preempt-RT mutex locking fix, adding the proper acquire semantics

* tag 'locking_urgent_for_v6.2_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  futex: Fix futex_waitv() hrtimer debug object leak on kcalloc error
  rtmutex: Add acquire semantics for rtmutex lock acquisition slow path
2023-01-01 11:15:05 -08:00
Dave Marchevsky
30465003ad bpf: rename list_head -> graph_root in field info types
Many of the structs recently added to track field info for linked-list
head are useful as-is for rbtree root. So let's do a mechanical renaming
of list_head-related types and fields:

include/linux/bpf.h:
  struct btf_field_list_head -> struct btf_field_graph_root
  list_head -> graph_root in struct btf_field union
kernel/bpf/btf.c:
  list_head -> graph_root in struct btf_field_info

This is a nonfunctional change, functionality to actually use these
fields for rbtree will be added in further patches.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/r/20221217082506.1570898-5-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-28 20:14:22 -08:00
Kees Cook
45435d8da7 bpf: Always use maximal size for copy_array()
Instead of counting on prior allocations to have sized allocations to
the next kmalloc bucket size, always perform a krealloc that is at least
ksize(dst) in size (which is a no-op), so the size can be correctly
tracked by all the various allocation size trackers (KASAN,
__alloc_size, etc).

Reported-by: Hyunwoo Kim <v4bel@theori.io>
Link: https://lore.kernel.org/bpf/20221223094551.GA1439509@ubuntu
Fixes: ceb35b666d ("bpf/verifier: Use kmalloc_size_roundup() to match ksize() usage")
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Song Liu <song@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Stanislav Fomichev <sdf@google.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: bpf@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221223182836.never.866-kees@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-28 14:54:53 -08:00
Kui-Feng Lee
7ff94f276f bpf: keep a reference to the mm, in case the task is dead.
Fix the system crash that happens when a task iterator travel through
vma of tasks.

In task iterators, we used to access mm by following the pointer on
the task_struct; however, the death of a task will clear the pointer,
even though we still hold the task_struct.  That can cause an
unexpected crash for a null pointer when an iterator is visiting a
task that dies during the visit.  Keeping a reference of mm on the
iterator ensures we always have a valid pointer to mm.

Co-developed-by: Song Liu <song@kernel.org>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
Reported-by: Nathan Slingerland <slinger@meta.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221216221855.4122288-2-kuifeng@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-28 14:11:48 -08:00
Chuang Wang
9ed1d9aeef bpf: Fix panic due to wrong pageattr of im->image
In the scenario where livepatch and kretfunc coexist, the pageattr of
im->image is rox after arch_prepare_bpf_trampoline in
bpf_trampoline_update, and then modify_fentry or register_fentry returns
-EAGAIN from bpf_tramp_ftrace_ops_func, the BPF_TRAMP_F_ORIG_STACK flag
will be configured, and arch_prepare_bpf_trampoline will be re-executed.

At this time, because the pageattr of im->image is rox,
arch_prepare_bpf_trampoline will read and write im->image, which causes
a fault. as follows:

  insmod livepatch-sample.ko    # samples/livepatch/livepatch-sample.c
  bpftrace -e 'kretfunc:cmdline_proc_show {}'

BUG: unable to handle page fault for address: ffffffffa0206000
PGD 322d067 P4D 322d067 PUD 322e063 PMD 1297e067 PTE d428061
Oops: 0003 [#1] PREEMPT SMP PTI
CPU: 2 PID: 270 Comm: bpftrace Tainted: G            E K    6.1.0 #5
RIP: 0010:arch_prepare_bpf_trampoline+0xed/0x8c0
RSP: 0018:ffffc90001083ad8 EFLAGS: 00010202
RAX: ffffffffa0206000 RBX: 0000000000000020 RCX: 0000000000000000
RDX: ffffffffa0206001 RSI: ffffffffa0206000 RDI: 0000000000000030
RBP: ffffc90001083b70 R08: 0000000000000066 R09: ffff88800f51b400
R10: 000000002e72c6e5 R11: 00000000d0a15080 R12: ffff8880110a68c8
R13: 0000000000000000 R14: ffff88800f51b400 R15: ffffffff814fec10
FS:  00007f87bc0dc780(0000) GS:ffff88803e600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffa0206000 CR3: 0000000010b70000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
 bpf_trampoline_update+0x25a/0x6b0
 __bpf_trampoline_link_prog+0x101/0x240
 bpf_trampoline_link_prog+0x2d/0x50
 bpf_tracing_prog_attach+0x24c/0x530
 bpf_raw_tp_link_attach+0x73/0x1d0
 __sys_bpf+0x100e/0x2570
 __x64_sys_bpf+0x1c/0x30
 do_syscall_64+0x5b/0x80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

With this patch, when modify_fentry or register_fentry returns -EAGAIN
from bpf_tramp_ftrace_ops_func, the pageattr of im->image will be reset
to nx+rw.

Cc: stable@vger.kernel.org
Fixes: 00963a2e75 ("bpf: Support bpf_trampoline on functions with IPMODIFY (e.g. livepatch)")
Signed-off-by: Chuang Wang <nashuiliang@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20221224133146.780578-1-nashuiliang@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-28 13:46:28 -08:00
Andrii Nakryiko
4633a00682 bpf: fix regs_exact() logic in regsafe() to remap IDs correctly
Comparing IDs exactly between two separate states is not just
suboptimal, but also incorrect in some cases. So update regs_exact()
check to do byte-by-byte memcmp() only up to id/ref_obj_id. For id and
ref_obj_id perform proper check_ids() checks, taking into account idmap.

This change makes more states equivalent improving insns and states
stats across a bunch of selftest BPF programs:

File                                         Program                           Insns (A)  Insns (B)  Insns   (DIFF)  States (A)  States (B)  States (DIFF)
-------------------------------------------  --------------------------------  ---------  ---------  --------------  ----------  ----------  -------------
cgrp_kfunc_success.bpf.linked1.o             test_cgrp_get_release                   141        137     -4 (-2.84%)          13          13    +0 (+0.00%)
cgrp_kfunc_success.bpf.linked1.o             test_cgrp_xchg_release                  142        139     -3 (-2.11%)          14          13    -1 (-7.14%)
connect6_prog.bpf.linked1.o                  connect_v6_prog                         139        102   -37 (-26.62%)           9           6   -3 (-33.33%)
ima.bpf.linked1.o                            bprm_creds_for_exec                      68         61    -7 (-10.29%)           6           5   -1 (-16.67%)
linked_list.bpf.linked1.o                    global_list_in_list                     569        499   -70 (-12.30%)          60          52   -8 (-13.33%)
linked_list.bpf.linked1.o                    global_list_push_pop                    167        150   -17 (-10.18%)          18          16   -2 (-11.11%)
linked_list.bpf.linked1.o                    global_list_push_pop_multiple           881        815    -66 (-7.49%)          74          63  -11 (-14.86%)
linked_list.bpf.linked1.o                    inner_map_list_in_list                  579        534    -45 (-7.77%)          61          55    -6 (-9.84%)
linked_list.bpf.linked1.o                    inner_map_list_push_pop                 190        181     -9 (-4.74%)          19          18    -1 (-5.26%)
linked_list.bpf.linked1.o                    inner_map_list_push_pop_multiple        916        850    -66 (-7.21%)          75          64  -11 (-14.67%)
linked_list.bpf.linked1.o                    map_list_in_list                        588        525   -63 (-10.71%)          62          55   -7 (-11.29%)
linked_list.bpf.linked1.o                    map_list_push_pop                       183        174     -9 (-4.92%)          18          17    -1 (-5.56%)
linked_list.bpf.linked1.o                    map_list_push_pop_multiple              909        843    -66 (-7.26%)          75          64  -11 (-14.67%)
map_kptr.bpf.linked1.o                       test_map_kptr                           264        256     -8 (-3.03%)          26          26    +0 (+0.00%)
map_kptr.bpf.linked1.o                       test_map_kptr_ref                        95         91     -4 (-4.21%)           9           8   -1 (-11.11%)
task_kfunc_success.bpf.linked1.o             test_task_xchg_release                  139        136     -3 (-2.16%)          14          13    -1 (-7.14%)
test_bpf_nf.bpf.linked1.o                    nf_skb_ct_test                          815        509  -306 (-37.55%)          57          30  -27 (-47.37%)
test_bpf_nf.bpf.linked1.o                    nf_xdp_ct_test                          815        509  -306 (-37.55%)          57          30  -27 (-47.37%)
test_cls_redirect.bpf.linked1.o              cls_redirect                          78925      78390   -535 (-0.68%)        4782        4704   -78 (-1.63%)
test_cls_redirect_subprogs.bpf.linked1.o     cls_redirect                          64901      63897  -1004 (-1.55%)        4612        4470  -142 (-3.08%)
test_sk_lookup.bpf.linked1.o                 access_ctx_sk                           181         95   -86 (-47.51%)          19          10   -9 (-47.37%)
test_sk_lookup.bpf.linked1.o                 ctx_narrow_access                       447        437    -10 (-2.24%)          38          37    -1 (-2.63%)
test_sk_lookup_kern.bpf.linked1.o            sk_lookup_success                       148        133   -15 (-10.14%)          14          12   -2 (-14.29%)
test_tcp_check_syncookie_kern.bpf.linked1.o  check_syncookie_clsact                  304        300     -4 (-1.32%)          23          22    -1 (-4.35%)
test_tcp_check_syncookie_kern.bpf.linked1.o  check_syncookie_xdp                     304        300     -4 (-1.32%)          23          22    -1 (-4.35%)
test_verify_pkcs7_sig.bpf.linked1.o          bpf                                      87         76   -11 (-12.64%)           7           6   -1 (-14.29%)
-------------------------------------------  --------------------------------  ---------  ---------  --------------  ----------  ----------  -------------

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20221223054921.958283-7-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-27 17:37:07 -08:00
Andrii Nakryiko
4a95c85c99 bpf: perform byte-by-byte comparison only when necessary in regsafe()
Extract byte-by-byte comparison of bpf_reg_state in regsafe() into
a helper function, which makes it more convenient to use it "on demand"
only for registers that benefit from such checks, instead of doing it
all the time, even if result of such comparison is ignored.

Also, remove WARN_ON_ONCE(1)+return false dead code. There is no risk of
missing some case as compiler will warn about non-void function not
returning value in some branches (and that under assumption that default
case is removed in the future).

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20221223054921.958283-6-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-27 17:37:07 -08:00
Andrii Nakryiko
910f699966 bpf: reject non-exact register type matches in regsafe()
Generalize the (somewhat implicit) rule of regsafe(), which states that
if register types in old and current states do not match *exactly*, they
can't be safely considered equivalent.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20221223054921.958283-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-27 17:37:07 -08:00
Andrii Nakryiko
7f4ce97cd5 bpf: generalize MAYBE_NULL vs non-MAYBE_NULL rule
Make generic check to prevent XXX_OR_NULL and XXX register types to be
intermixed. While technically in some situations it could be safe, it's
impossible to enforce due to the loss of an ID when converting
XXX_OR_NULL to its non-NULL variant. So prevent this in general, not
just for PTR_TO_MAP_KEY and PTR_TO_MAP_VALUE.

PTR_TO_MAP_KEY_OR_NULL and PTR_TO_MAP_VALUE_OR_NULL checks, which were
previously special-cased, are simplified to generic check that takes
into account range_within() and tnum_in(). This is correct as BPF
verifier doesn't allow arithmetic on XXX_OR_NULL register types, so
var_off and ranges should stay zero. But even if in the future this
restriction is lifted, it's even more important to enforce that var_off
and ranges are compatible, otherwise it's possible to construct case
where this can be exploited to bypass verifier's memory range safety
checks.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20221223054921.958283-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-27 17:37:07 -08:00
Andrii Nakryiko
a73bf9f2d9 bpf: reorganize struct bpf_reg_state fields
Move id and ref_obj_id fields after scalar data section (var_off and
ranges). This is necessary to simplify next patch which will change
regsafe()'s logic to be safer, as it makes the contents that has to be
an exact match (type-specific parts, off, type, and var_off+ranges)
a single sequential block of memory, while id and ref_obj_id should
always be remapped and thus can't be memcp()'ed.

There are few places that assume that var_off is after id/ref_obj_id to
clear out id/ref_obj_id with the single memset(0). These are changed to
explicitly zero-out id/ref_obj_id fields. Other places are adjusted to
preserve exact byte-by-byte comparison behavior.

No functional changes.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20221223054921.958283-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-27 17:37:07 -08:00
Andrii Nakryiko
e8f55fcf77 bpf: teach refsafe() to take into account ID remapping
states_equal() check performs ID mapping between old and new states to
establish a 1-to-1 correspondence between IDs, even if their absolute
numberic values across two equivalent states differ. This is important
both for correctness and to avoid unnecessary work when two states are
equivalent.

With recent changes we partially fixed this logic by maintaining ID map
across all function frames. This patch also makes refsafe() check take
into account (and maintain) ID map, making states_equal() behavior more
optimal and correct.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20221223054921.958283-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-27 17:37:07 -08:00
Mathieu Desnoyers
94cd8fa09f futex: Fix futex_waitv() hrtimer debug object leak on kcalloc error
In a scenario where kcalloc() fails to allocate memory, the futex_waitv
system call immediately returns -ENOMEM without invoking
destroy_hrtimer_on_stack(). When CONFIG_DEBUG_OBJECTS_TIMERS=y, this
results in leaking a timer debug object.

Fixes: bf69bad38c ("futex: Implement sys_futex_waitv()")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: stable@vger.kernel.org
Cc: stable@vger.kernel.org # v5.16+
Link: https://lore.kernel.org/r/20221214222008.200393-1-mathieu.desnoyers@efficios.com
2022-12-27 12:52:02 +01:00
Namhyung Kim
0a041ebca4 perf/core: Call LSM hook after copying perf_event_attr
It passes the attr struct to the security_perf_event_open() but it's
not initialized yet.

Fixes: da97e18458 ("perf_event: Add support for LSM and SELinux checks")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20221220223140.4020470-1-namhyung@kernel.org
2022-12-27 12:44:01 +01:00
Peter Zijlstra
a551844e34 perf: Fix use-after-free in error path
The syscall error path has a use-after-free; put_pmu_ctx() will
reference ctx, therefore we must ensure ctx is destroyed after pmu_ctx
is.

Fixes: bd27568117 ("perf: Rewrite core context handling")
Reported-by: syzbot+b8e8c01c8ade4fe6e48f@syzkaller.appspotmail.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Chengming Zhou <zhouchengming@bytedance.com>
Link: https://lkml.kernel.org/r/Y6B3xEgkbmFUCeni@hirez.programming.kicks-ass.net
2022-12-27 12:44:01 +01:00
Chengming Zhou
f841b682ba perf/core: Fix cgroup events tracking
We encounter perf warnings when using cgroup events like:

  cd /sys/fs/cgroup
  mkdir test
  perf stat -e cycles -a -G test

Which then triggers:

  WARNING: CPU: 0 PID: 690 at kernel/events/core.c:849 perf_cgroup_switch+0xb2/0xc0
  Call Trace:
   <TASK>
   __schedule+0x4ae/0x9f0
   ? _raw_spin_unlock_irqrestore+0x23/0x40
   ? __cond_resched+0x18/0x20
   preempt_schedule_common+0x2d/0x70
   __cond_resched+0x18/0x20
   wait_for_completion+0x2f/0x160
   ? cpu_stop_queue_work+0x9e/0x130
   affine_move_task+0x18a/0x4f0

  WARNING: CPU: 0 PID: 690 at kernel/events/core.c:829 ctx_sched_in+0x1cf/0x1e0
  Call Trace:
   <TASK>
   ? ctx_sched_out+0xb7/0x1b0
   perf_cgroup_switch+0x88/0xc0
   __schedule+0x4ae/0x9f0
   ? _raw_spin_unlock_irqrestore+0x23/0x40
   ? __cond_resched+0x18/0x20
   preempt_schedule_common+0x2d/0x70
   __cond_resched+0x18/0x20
   wait_for_completion+0x2f/0x160
   ? cpu_stop_queue_work+0x9e/0x130
   affine_move_task+0x18a/0x4f0

The above two warnings are not complete here since I remove other
unimportant information. The problem is caused by the perf cgroup
events tracking:

  CPU0					CPU1
  perf_event_open()
    perf_event_alloc()
      account_event()
	account_event_cpu()
	  atomic_inc(perf_cgroup_events)
					  __perf_event_task_sched_out()
					    if (atomic_read(perf_cgroup_events))
					      perf_cgroup_switch()
						// kernel/events/core.c:849
						WARN_ON_ONCE(cpuctx->ctx.nr_cgroups == 0)
						if (READ_ONCE(cpuctx->cgrp) == cgrp) // false
						  return
						perf_ctx_lock()
						ctx_sched_out()
						cpuctx->cgrp = cgrp
						ctx_sched_in()
						  perf_cgroup_set_timestamp()
						    // kernel/events/core.c:829
						    WARN_ON_ONCE(!ctx->nr_cgroups)
						perf_ctx_unlock()
    perf_install_in_context()
      cpu_function_call()
					  __perf_install_in_context()
					    add_event_to_ctx()
					      list_add_event()
						perf_cgroup_event_enable()
						  ctx->nr_cgroups++
						  cpuctx->cgrp = X

We can see from above that we wrongly use percpu atomic perf_cgroup_events
to check if we need to perf_cgroup_switch(), which should only be used
when we know this CPU has cgroup events enabled.

The commit bd27568117 ("perf: Rewrite core context handling") change
to have only one context per-CPU, so we can just use cpuctx->cgrp to
check if this CPU has cgroup events enabled.

So percpu atomic perf_cgroup_events is not needed.

Fixes: bd27568117 ("perf: Rewrite core context handling")
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lkml.kernel.org/r/20221207124023.66252-1-zhouchengming@bytedance.com
2022-12-27 12:44:00 +01:00
Ravi Bangoria
e2d3714846 perf core: Return error pointer if inherit_event() fails to find pmu_ctx
inherit_event() returns NULL only when it finds orphaned events
otherwise it returns either valid child_event pointer or an error
pointer. Follow the same when it fails to find pmu_ctx.

Fixes: bd27568117 ("perf: Rewrite core context handling")
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221118051539.820-1-ravi.bangoria@amd.com
2022-12-27 12:44:00 +01:00
David S. Miller
be1236fce5 bpf-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCY6YkXgAKCRDbK58LschI
 g25kAP4jYi+YomSlmGUzN/fUbEIHkXXyh85Yh2/yHGYdVuIuvwEA0uXeC7JHQTca
 dkcyYvgY6zJwFBV0lAVnhTRzFirFkQk=
 =THs1
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
The following pull-request contains BPF updates for your *net* tree.

We've added 7 non-merge commits during the last 5 day(s) which contain
a total of 11 files changed, 231 insertions(+), 3 deletions(-).

The main changes are:

1) Fix a splat in bpf_skb_generic_pop() under CHECKSUM_PARTIAL due to
   misuse of skb_postpull_rcsum(), from Jakub Kicinski with test case
   from Martin Lau.

2) Fix BPF verifier's nullness propagation when registers are of
   type PTR_TO_BTF_ID, from Hao Sun.

3) Fix bpftool build for JIT disassembler under statically built
   libllvm, from Anton Protopopov.

4) Fix warnings reported by resolve_btfids when building vmlinux
   with CONFIG_SECURITY_NETWORK disabled, from Hou Tao.

5) Minor fix up for BPF selftest gitignore, from Stanislav Fomichev.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-24 09:39:02 +00:00
Linus Torvalds
51094a24b8 kernel hardening fixes for v6.2-rc1
- Fix CFI failure with KASAN (Sami Tolvanen)
 
 - Fix LKDTM + CFI under GCC 7 and 8 (Kristina Martsenko)
 
 - Limit CONFIG_ZERO_CALL_USED_REGS to Clang > 15.0.6 (Nathan Chancellor)
 
 - Ignore "contents" argument in LoadPin's LSM hook handling
 
 - Fix paste-o in /sys/kernel/warn_count API docs
 
 - Use READ_ONCE() consistently for oops/warn limit reading
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmOl9ckWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJlOKEACX/N/ukox8Kg16wfeetiB/Gu2H
 AdFF1MqoLjVRxNfFDf2RNolVVW88QTlu3eQqi5n8wnBiMAlkIqM6vtQXidWW/MEz
 xfaZjIECFxalAdZDoQI/7QjPt0TmRyG2LGxMxpxE7ZczLUOn1nAIeBm6JvbGFCJY
 6Sj7vdhwuqMuPAw3obefLgKs1e7fkJW7Uow3YokSWrRuLYnzd6CsF/QVs9fI1KJ7
 GVnrpD+hzdeYbLKIvXYycks5M8WJfCkB8vGx5OVqUk9e9XghIc8YnIO/k+aHow+1
 UB/ZfhciwTae04gA8GKK6mM595N1arnJBUtopvl3laH8QmuOa6oxUsWrn8Sf0IHe
 7GZXYeJ6cMPeHz+AnkJDSx1nzzcmXGmuMrCJRic3WurVMZVHU4aP8amDqdhl1Taq
 O/SvwutoWZJ8OJ0JtEguBsKM8so8h8PSO9LAfj//h8rRWh4beMD9p/epEGNeX7RF
 +KpiMmQ0bCpcJiA4uSJpInzewtuNnsd+hTLOwdflC777odxnHwli9mhhStmz5rA+
 UeoHy4tGYCMO7TRkZdWSP9bu+c5SuKZSuFVSTFDOo9zQuw/SIWk5MJRWPzCcIBCH
 5T/s3JoneziFlyWSQ861hH560RdN9vC8cE+Se2bttOXFBF42Gm+LLa3vItbecxz+
 DamqIuIfnrAfNRo9Ww==
 =NHw1
 -----END PGP SIGNATURE-----

Merge tag 'hardening-v6.2-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull kernel hardening fixes from Kees Cook:

 - Fix CFI failure with KASAN (Sami Tolvanen)

 - Fix LKDTM + CFI under GCC 7 and 8 (Kristina Martsenko)

 - Limit CONFIG_ZERO_CALL_USED_REGS to Clang > 15.0.6 (Nathan
   Chancellor)

 - Ignore "contents" argument in LoadPin's LSM hook handling

 - Fix paste-o in /sys/kernel/warn_count API docs

 - Use READ_ONCE() consistently for oops/warn limit reading

* tag 'hardening-v6.2-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  cfi: Fix CFI failure with KASAN
  exit: Use READ_ONCE() for all oops/warn limit reads
  security: Restrict CONFIG_ZERO_CALL_USED_REGS to gcc or clang > 15.0.6
  lkdtm: cfi: Make PAC test work with GCC 7 and 8
  docs: Fix path paste-o for /sys/kernel/warn_count
  LoadPin: Ignore the "contents" argument of the LSM hooks
2022-12-23 12:00:24 -08:00
Linus Torvalds
59d2c635f6 dma-mapping fixes for Linux 6.2
Fix up the sound code to not pass __GFP_COMP to the non-coherent DMA
 allocator, as it copes with that just as badly as the coherent allocator,
 and then add a check to make sure no one passes the flag ever again.
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmOlvh8LHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYN/Ig//SHLj6xlAPHiBSMY44j0qQj4w63G4WivPs4B0Nr7j
 Wne6yh0x8pg2L/BDY1KWoMqo8XhNIXVYKx6jahdqrik9IKJinIZPKT9nYkK5TbO+
 RQSu4HKqAsrMO7u1+LzVyhNZqOtavRXj4EJkkienHDTh7Rqry5o8hBAe40G2BLqe
 TMrnPLAJ0VZ/I45LakzCSCR2WC8GA+33849zJO1Vd4glnxL+VdM5WigYi1N2j6Ng
 d4ymVbprnJljXjIDK2aMy1AGw5ctlQE1hslD2OoUr/onJ6PPU1rmYmf/B1fm8Cqh
 ahxv2jJJsZ0BvwO7uouU6rsJfVGaZx7dMkABmX+A9b7RtP9mC19qc76VleYUouKB
 v28SJD+jFUoQIO5ylly2LEvSgiEPspymv1VJtYdX/y6/YAtm2XFgmIEJdv6dGQu0
 ZMpAs2nCfxzsGy4mQNpXEnBZwrP7GnEtm1ynBLQ1/SJ7toqBmX9/4JzZuRZ0TPEh
 gw+Tkl3fC/jfLUDaLSjCnJw4OpH4+ai9tJuhXtPZk6UN5VPrQbz5xu9bh6wiyPqX
 Pm6ubCDv0Y3I1rwoXwVIcrWfOMTAm8OwfMHCBRoRVp1vfuFSihptpWjIIx6rQCrI
 sdZtVla+FblL9zO5WvGStlx9FiZa39HJm2d6rQIazI5of6iNf9/xSyoWNcNWSrcu
 EFY=
 =q0m6
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-2022-12-23' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping fixes from Christoph Hellwig:
 "Fix up the sound code to not pass __GFP_COMP to the non-coherent DMA
  allocator, as it copes with that just as badly as the coherent
  allocator, and then add a check to make sure no one passes the flag
  ever again"

* tag 'dma-mapping-2022-12-23' of git://git.infradead.org/users/hch/dma-mapping:
  dma-mapping: reject GFP_COMP for noncoherent allocations
  ALSA: memalloc: don't use GFP_COMP for non-coherent dma allocations
2022-12-23 11:44:20 -08:00
Linus Torvalds
699aee7b47 Eight fixes, all cc:stable. One is for gcov and the remainder are MM.
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY6TcIgAKCRDdBJ7gKXxA
 ji/zAQDucpSw+HKksgDpO385EAdF4gQgYDi06zu/vjpF7Hd4KAEAoIX1ygHqHy3u
 z9xuulA9q84COV48Is9cU7eiijd0aQo=
 =QlXq
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2022-12-22-14-34' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull hotfixes from Andrew Morton:
 "Eight fixes, all cc:stable. One is for gcov and the remainder are MM"

* tag 'mm-hotfixes-stable-2022-12-22-14-34' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  gcov: add support for checksum field
  test_maple_tree: add test for mas_spanning_rebalance() on insufficient data
  maple_tree: fix mas_spanning_rebalance() on insufficient data
  hugetlb: really allocate vma lock for all sharable vmas
  kmsan: export kmsan_handle_urb
  kmsan: include linux/vmalloc.h
  mm/mempolicy: fix memory leak in set_mempolicy_home_node system call
  mm, mremap: fix mremap() expanding vma with addr inside vma
2022-12-23 10:45:00 -08:00
Sami Tolvanen
cf8016408d cfi: Fix CFI failure with KASAN
When CFI_CLANG and KASAN are both enabled, LLVM doesn't generate a
CFI type hash for asan.module_ctor functions in translation units
where CFI is disabled, which leads to a CFI failure during boot when
do_ctors calls the affected constructors:

  CFI failure at do_basic_setup+0x64/0x90 (target:
  asan.module_ctor+0x0/0x28; expected type: 0xa540670c)

Specifically, this happens because CFI is disabled for
kernel/cfi.c. There's no reason to keep CFI disabled here anymore, so
fix the failure by not filtering out CC_FLAGS_CFI for the file.

Note that https://reviews.llvm.org/rG3b14862f0a96 fixed the issue
where LLVM didn't emit CFI type hashes for any sanitizer constructors,
but now type hashes are emitted correctly for TUs that use CFI.

Link: https://github.com/ClangBuiltLinux/linux/issues/1742
Fixes: 8924560094 ("cfi: Switch to -fsanitize=kcfi")
Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221222225747.3538676-1-samitolvanen@google.com
2022-12-23 10:04:31 -08:00
Hao Sun
8374bfd5a3 bpf: fix nullness propagation for reg to reg comparisons
After befae75856, the verifier would propagate null information after
JEQ/JNE, e.g., if two pointers, one is maybe_null and the other is not,
the former would be marked as non-null in eq path. However, as comment
"PTR_TO_BTF_ID points to a kernel struct that does not need to be null
checked by the BPF program ... The verifier must keep this in mind and
can make no assumptions about null or non-null when doing branch ...".
If one pointer is maybe_null and the other is PTR_TO_BTF, the former is
incorrectly marked non-null. The following BPF prog can trigger a
null-ptr-deref, also see this report for more details[1]:

	0: (18) r1 = map_fd	        ; R1_w=map_ptr(ks=4, vs=4)
	2: (79) r6 = *(u64 *)(r1 +8)    ; R6_w=bpf_map->inner_map_data
					; R6 is PTR_TO_BTF_ID
					; equals to null at runtime
	3: (bf) r2 = r10
	4: (07) r2 += -4
	5: (62) *(u32 *)(r2 +0) = 0
	6: (85) call bpf_map_lookup_elem#1    ; R0_w=map_value_or_null
	7: (1d) if r6 == r0 goto pc+1
	8: (95) exit
	; from 7 to 9: R0=map_value R6=ptr_bpf_map
	9: (61) r0 = *(u32 *)(r0 +0)          ; null-ptr-deref
	10: (95) exit

So, make the verifier propagate nullness information for reg to reg
comparisons only if neither reg is PTR_TO_BTF_ID.

[1] https://lore.kernel.org/bpf/CACkBjsaFJwjC5oiw-1KXvcazywodwXo4zGYsRHwbr2gSG9WcSw@mail.gmail.com/T/#u

Fixes: befae75856 ("bpf: propagate nullness information for reg to reg comparisons")
Signed-off-by: Hao Sun <sunhao.th@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221222024414.29539-1-sunhao.th@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2022-12-22 17:19:06 -08:00
Ricardo Ribalda
cfca00767f bpf: Remove unused field initialization in bpf's ctl_table
Maxlen is used by standard proc_handlers such as proc_dointvec(), but in this
case we have our own proc_handler via bpf_stats_handler(). Therefore, remove
the initialization.

Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221221-bpf-syscall-v1-0-9550f5f2c3fc@chromium.org
2022-12-22 15:48:11 +01:00
Linus Torvalds
9d2f6060fe Tracing fix for 6.2:
- Make monitor structures read only
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCY6J+vxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qohJAP9Yx3A4xmopkMjpfK1HBzuB7j4U7blN
 2NhqKM626unbeQEAi3FhPRc5N/sGBdsUClYZIKau0p3ip1TVfYbhk8vSgwg=
 =VcGm
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fix from Steven Rostedt:
 "I missed this minor hardening of the kernel in the first pull.

   - Make monitor structures read only"

* tag 'trace-v6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  rv/monitors: Move monitor structure in rodata
2022-12-21 19:03:42 -08:00
Linus Torvalds
af9b3fa15d Trace probes updates for 6.2:
- New "symstr" type for dynamic events that writes the name of the
   function+offset into the ring buffer and not just the address
 
 - Prevent kernel symbol processing on addresses in user space probes
   (uprobes).
 
 - And minor fixes and clean ups
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCY5yAHxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qoWoAP9ZLmqgIqlH3Zcms31SR250kLXxsxT3
 JHe82hiuI1I3fAD/Z93QLHw9wngLqIMx/wXsdFjTNOGGWdxfclSWI2qI6Q0=
 =KaJg
 -----END PGP SIGNATURE-----

Merge tag 'trace-probes-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull trace probes updates from Steven Rostedt:

 - New "symstr" type for dynamic events that writes the name of the
   function+offset into the ring buffer and not just the address

 - Prevent kernel symbol processing on addresses in user space probes
   (uprobes).

 - And minor fixes and clean ups

* tag 'trace-probes-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing/probes: Reject symbol/symstr type for uprobe
  tracing/probes: Add symstr type for dynamic events
  kprobes: kretprobe events missing on 2-core KVM guest
  kprobes: Fix check for probe enabled in kill_kprobe()
  test_kprobes: Fix implicit declaration error of test_kprobes
  tracing: Fix race where eprobes can be called before the event
2022-12-21 18:57:24 -08:00
Rickard x Andersson
e96b95c2b7 gcov: add support for checksum field
In GCC version 12.1 a checksum field was added.

This patch fixes a kernel crash occurring during boot when using
gcov-kernel with GCC version 12.2.  The crash occurred on a system running
on i.MX6SX.

Link: https://lkml.kernel.org/r/20221220102318.3418501-1-rickaran@axis.com
Fixes: 977ef30a7d ("gcov: support GCC 12.1 and newer compilers")
Signed-off-by: Rickard x Andersson <rickaran@axis.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Tested-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Reviewed-by: Martin Liska <mliska@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-21 14:31:52 -08:00
Linus Torvalds
609d3bc623 Including fixes from bpf, netfilter and can.
Current release - regressions:
 
  - bpf: synchronize dispatcher update with bpf_dispatcher_xdp_func
 
  - rxrpc:
   - fix security setting propagation
   - fix null-deref in rxrpc_unuse_local()
   - fix switched parameters in peer tracing
 
 Current release - new code bugs:
 
  - rxrpc:
    - fix I/O thread startup getting skipped
    - fix locking issues in rxrpc_put_peer_locked()
    - fix I/O thread stop
    - fix uninitialised variable in rxperf server
    - fix the return value of rxrpc_new_incoming_call()
 
  - microchip: vcap: fix initialization of value and mask
 
  - nfp: fix unaligned io read of capabilities word
 
 Previous releases - regressions:
 
  - stop in-kernel socket users from corrupting socket's task_frag
 
  - stream: purge sk_error_queue in sk_stream_kill_queues()
 
  - openvswitch: fix flow lookup to use unmasked key
 
  - dsa: mv88e6xxx: avoid reg_lock deadlock in mv88e6xxx_setup_port()
 
  - devlink:
    - hold region lock when flushing snapshots
    - protect devlink dump by the instance lock
 
 Previous releases - always broken:
 
  - bpf:
    - prevent leak of lsm program after failed attach
    - resolve fext program type when checking map compatibility
 
  - skbuff: account for tail adjustment during pull operations
 
  - macsec: fix net device access prior to holding a lock
 
  - bonding: switch back when high prio link up
 
  - netfilter: flowtable: really fix NAT IPv6 offload
 
  - enetc: avoid buffer leaks on xdp_do_redirect() failure
 
  - unix: fix race in SOCK_SEQPACKET's unix_dgram_sendmsg()
 
  - dsa: microchip: remove IRQF_TRIGGER_FALLING in request_threaded_irq
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmOiGa4ACgkQMUZtbf5S
 IrvetBAAg/AjgG51gboLsuGjgRSwAi5T6ijgVR+pW+kMuoOdaamOF+h/zC1ox/H9
 QrWvTBipy+EqSD8bM4Xz0FNgidch8X4iWYhKGZuBht/4NP5FOzPUG2mNlUy5ANGq
 QZcCw6CUsir8HTb+IJpFEIq0JMwzKCm3WyAkYjEj4iuft0Y93cAgjkMVwoX0RERO
 o/pslC5dsozCLJxEglpw1aJq7aoroNuRSGSXl95nv8fU3UxmUXajnA3HNscXImdV
 6uqSIuyPIaGocpCBPRKUQd0sctkTY4cm8wmxxMCDVsBRVusoaq5eg1VRvxJm9Rxj
 gvDvHvfhnEuSigFF5A+paBp4c+i3C8g/UTBJTtptdAC+Y2tt4UT3Q5aaazYUOAqd
 W4TSJ3bk5zhkhpRF9clb0fNQaM1HOT4rkDEEGTfVN62dtHfPKpNwYufQKaYHdVj1
 RJ3ooH6c7TMVaRs6ZgEWNYToKZj94SIfPhfEhuqWXdNMDBkUMp2BXFFOp9fZDWju
 PsMQrRD7n6+XXpNvScYtnJDORqfIL9yHGZE9kxZA5QSDl9cnPA3SUbNruQPlXHrl
 w0yQlYuG3gcciua4dXaLfz1iN4rPdenuYhVBHhztEwDKl+b61CVQYlOHGkXPVURp
 oft74qCCFbva+Hf/7jENQotjT1tLfxAGdUARuFeDBueJgDRAPsw=
 =goV5
 -----END PGP SIGNATURE-----

Merge tag 'net-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bpf, netfilter and can.

  Current release - regressions:

   - bpf: synchronize dispatcher update with bpf_dispatcher_xdp_func

   - rxrpc:
      - fix security setting propagation
      - fix null-deref in rxrpc_unuse_local()
      - fix switched parameters in peer tracing

  Current release - new code bugs:

   - rxrpc:
      - fix I/O thread startup getting skipped
      - fix locking issues in rxrpc_put_peer_locked()
      - fix I/O thread stop
      - fix uninitialised variable in rxperf server
      - fix the return value of rxrpc_new_incoming_call()

   - microchip: vcap: fix initialization of value and mask

   - nfp: fix unaligned io read of capabilities word

  Previous releases - regressions:

   - stop in-kernel socket users from corrupting socket's task_frag

   - stream: purge sk_error_queue in sk_stream_kill_queues()

   - openvswitch: fix flow lookup to use unmasked key

   - dsa: mv88e6xxx: avoid reg_lock deadlock in mv88e6xxx_setup_port()

   - devlink:
      - hold region lock when flushing snapshots
      - protect devlink dump by the instance lock

  Previous releases - always broken:

   - bpf:
      - prevent leak of lsm program after failed attach
      - resolve fext program type when checking map compatibility

   - skbuff: account for tail adjustment during pull operations

   - macsec: fix net device access prior to holding a lock

   - bonding: switch back when high prio link up

   - netfilter: flowtable: really fix NAT IPv6 offload

   - enetc: avoid buffer leaks on xdp_do_redirect() failure

   - unix: fix race in SOCK_SEQPACKET's unix_dgram_sendmsg()

   - dsa: microchip: remove IRQF_TRIGGER_FALLING in
     request_threaded_irq"

* tag 'net-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (64 commits)
  net: fec: check the return value of build_skb()
  net: simplify sk_page_frag
  Treewide: Stop corrupting socket's task_frag
  net: Introduce sk_use_task_frag in struct sock.
  mctp: Remove device type check at unregister
  net: dsa: microchip: remove IRQF_TRIGGER_FALLING in request_threaded_irq
  can: kvaser_usb: hydra: help gcc-13 to figure out cmd_len
  can: flexcan: avoid unbalanced pm_runtime_enable warning
  Documentation: devlink: add missing toc entry for etas_es58x devlink doc
  mctp: serial: Fix starting value for frame check sequence
  nfp: fix unaligned io read of capabilities word
  net: stream: purge sk_error_queue in sk_stream_kill_queues()
  myri10ge: Fix an error handling path in myri10ge_probe()
  net: microchip: vcap: Fix initialization of value and mask
  rxrpc: Fix the return value of rxrpc_new_incoming_call()
  rxrpc: rxperf: Fix uninitialised variable
  rxrpc: Fix I/O thread stop
  rxrpc: Fix switched parameters in peer tracing
  rxrpc: Fix locking issues in rxrpc_put_peer_locked()
  rxrpc: Fix I/O thread startup getting skipped
  ...
2022-12-21 08:41:32 -08:00
Linus Torvalds
19822e3ee4 Urgent RCU pull request for v6.2
This commit fixes a lockdep false positive in synchronize_rcu() that
 can otherwise occur during early boot.  Theis fix simply avoids invoking
 lockdep if the scheduler has not yet been initialized, that is, during
 that portion of boot when interrupts are disabled.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmOeXj8THHBhdWxtY2tA
 a2VybmVsLm9yZwAKCRCevxLzctn7jPmZEACaI5JqO6Dr2U4HojJJBYEfLVaSYxDp
 JrUi5D5WzzZidyjM2fyyZZkdRVQ24i1aV2H/fbLoIIH/smYjE/KLEFHQmclpphw5
 BSOyapotjdt5YhIavvAeOjdUd7jPyMqhbDVnwzjnblhUD1ObLVlhIs8Pjn7/03sF
 gzlIhYgp3EL7GenT9j9kud2FwWP+wrVQ7SdJ+Ni/WAHYO8860xQAmFXH/07bYzx7
 fbp5iPkCOSSUoRMw/qQ8s7CE3XhBNKufv1BtcvV/uxEtutfV1qvEQBv/l2RBd0Vg
 wOVBZnWXze+7IUx13M90R/d04Nn7RaGwon6xBMlvIwL3qzEj8x/r1FYz7zZhQPkv
 wwChAxFHQACnLCZSu48WBtVrawNdZHM57KHUK4rloAbrK92FpVznhQU+5pBDy4c6
 rfY2my+SNO4kWvePEg/2fd8aQycrZr99fK/ojCIerEn8MNboxuVOYTjzy0qtUcVT
 yJ/80O8ADI3QL/NRhjMFWgEnBDbHN1PcGhiRoutApdLQkg/UPTJjCRZ7ibmIFYY2
 ViW3cSndr/f0I7sOex2EILHwiZ2bUKiwyeTW6vWuFl/7MEWsvpJaWoUxXgQj99Bt
 ncAOaxtmmuhbwrOCt2kab90A0c/thNx9kNYYIkG3vUNcSRzyHQtg3ydEljBpaTFR
 OzhrqdUA7W9Sfg==
 =UKUo
 -----END PGP SIGNATURE-----

Merge tag 'rcu-urgent.2022.12.17a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu

Pull RCU fix from Paul McKenney:
 "This fixes a lockdep false positive in synchronize_rcu() that can
  otherwise occur during early boot.

  The fix simply avoids invoking lockdep if the scheduler has not yet
  been initialized, that is, during that portion of boot when interrupts
  are disabled"

* tag 'rcu-urgent.2022.12.17a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
  rcu: Don't assert interrupts enabled too early in boot
2022-12-21 07:59:57 -08:00
Martin KaFai Lau
552d42a356 bpf: Reduce smap->elem_size
'struct bpf_local_storage_elem' has an unused 56 byte padding at the
end due to struct's cache-line alignment requirement. This padding
space is overlapped by storage value contents, so if we use sizeof()
to calculate the total size, we overinflate it by 56 bytes. Use
offsetof() instead to calculate more exact memory use.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221221013036.3427431-1-martin.lau@linux.dev
2022-12-21 15:42:39 +01:00
Christoph Hellwig
3622b86f49 dma-mapping: reject GFP_COMP for noncoherent allocations
While not quite as bogus as for the dma-coherent allocations that were
fixed earlier, GFP_COMP for these allocations has no benefits for
the dma-direct case, and can't be supported at all by dma dma-iommu
backend which splits up allocations into smaller orders.  Due to an
oversight in ffcb754584 that flag stopped being cleared for all
dma allocations, but only got rejected for coherent ones, so fix up
these callers to not allow __GFP_COMP as well after the sound code
has been fixed to not ask for it.

Fixes: ffcb754584 ("dma-mapping: reject __GFP_COMP in dma_alloc_attrs")
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Reported-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Tested-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
2022-12-21 08:45:38 +01:00
Alessandro Carminati
bfa87ac86c rv/monitors: Move monitor structure in rodata
It makes sense to move the important monitor structure into rodata to
prevent accidental structure modification.

Link: https://lkml.kernel.org/r/20221122173648.4732-1-acarmina@redhat.com

Signed-off-by: Alessandro Carminati <acarmina@redhat.com>
Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-12-20 11:46:40 -05:00
Jiri Olsa
e2bb9e01d5 bpf: Remove trace_printk_lock
Both bpf_trace_printk and bpf_trace_vprintk helpers use static buffer guarded
with trace_printk_lock spin lock.

The spin lock contention causes issues with bpf programs attached to
contention_begin tracepoint [1][2].

Andrii suggested we could get rid of the contention by using trylock, but we
could actually get rid of the spinlock completely by using percpu buffers the
same way as for bin_args in bpf_bprintf_prepare function.

Adding new return 'buf' argument to struct bpf_bprintf_data and making
bpf_bprintf_prepare to return also the buffer for printk helpers.

  [1] https://lore.kernel.org/bpf/CACkBjsakT_yWxnSWr4r-0TpPvbKm9-OBmVUhJb7hV3hY8fdCkw@mail.gmail.com/
  [2] https://lore.kernel.org/bpf/CACkBjsaCsTovQHFfkqJKto6S4Z8d02ud1D7MPESrHa1cVNNTrw@mail.gmail.com/

Reported-by: Hao Sun <sunhao.th@gmail.com>
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20221215214430.1336195-4-jolsa@kernel.org
2022-12-19 22:08:37 +01:00
Jiri Olsa
f19a405045 bpf: Do cleanup in bpf_bprintf_cleanup only when needed
Currently we always cleanup/decrement bpf_bprintf_nest_level variable
in bpf_bprintf_cleanup if it's > 0.

There's possible scenario where this could cause a problem, when
bpf_bprintf_prepare does not get bin_args buffer (because num_args is 0)
and following bpf_bprintf_cleanup call decrements bpf_bprintf_nest_level
variable, like:

  in task context:
    bpf_bprintf_prepare(num_args != 0) increments 'bpf_bprintf_nest_level = 1'
    -> first irq :
       bpf_bprintf_prepare(num_args == 0)
       bpf_bprintf_cleanup decrements 'bpf_bprintf_nest_level = 0'
    -> second irq:
       bpf_bprintf_prepare(num_args != 0) bpf_bprintf_nest_level = 1
       gets same buffer as task context above

Adding check to bpf_bprintf_cleanup and doing the real cleanup only if we
got bin_args data in the first place.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20221215214430.1336195-3-jolsa@kernel.org
2022-12-19 22:08:06 +01:00
Jiri Olsa
78aa1cc940 bpf: Add struct for bin_args arg in bpf_bprintf_prepare
Adding struct bpf_bprintf_data to hold bin_args argument for
bpf_bprintf_prepare function.

We will add another return argument to bpf_bprintf_prepare and
pass the struct to bpf_bprintf_cleanup for proper cleanup in
following changes.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20221215214430.1336195-2-jolsa@kernel.org
2022-12-19 22:07:52 +01:00
Hou Tao
cc07482246 bpf: Define sock security related BTF IDs under CONFIG_SECURITY_NETWORK
There are warnings reported from resolve_btfids when building vmlinux
with CONFIG_SECURITY_NETWORK disabled:

  WARN: resolve_btfids: unresolved symbol bpf_lsm_sk_free_security
  WARN: resolve_btfids: unresolved symbol bpf_lsm_sk_alloc_security

So only define BTF IDs for these LSM hooks when CONFIG_SECURITY_NETWORK
is enabled.

Fixes: c0c852dd18 ("bpf: Do not mark certain LSM hook arguments as trusted")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20221217062144.2507222-1-houtao@huaweicloud.com
2022-12-19 22:02:17 +01:00
Linus Torvalds
6feb57c2fd Kbuild updates for v6.2
- Support zstd-compressed debug info
 
  - Allow W=1 builds to detect objects shared among multiple modules
 
  - Add srcrpm-pkg target to generate a source RPM package
 
  - Make the -s option detection work for future GNU Make versions
 
  - Add -Werror to KBUILD_CPPFLAGS when CONFIG_WERROR=y
 
  - Allow W=1 builds to detect -Wundef warnings in any preprocessed files
 
  - Raise the minimum supported version of binutils to 2.25
 
  - Use $(intcmp ...) to compare integers if GNU Make >= 4.4 is used
 
  - Use $(file ...) to read a file if GNU Make >= 4.2 is used
 
  - Print error if GNU Make older than 3.82 is used
 
  - Allow modpost to detect section mismatches with Clang LTO
 
  - Include vmlinuz.efi into kernel tarballs for arm64 CONFIG_EFI_ZBOOT=y
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmOeImsVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsG06IP/iVjuWFvnjDZT4X8X6zN8aKp1vtR
 EMkmoRtt5cD4CLb1MG4N7irYHgedQSx4rYceP45MyW1I3egl6Ct14RDyeQ1xSIZb
 XFTLDCZvfl/up3MdiqNAqKRS7x5lk9++7F0t+2SoQxKQyJvm735XreX+VhZ1FeLB
 qcHrmzJ5veky5Ry/3OkNUgKFBjKEAL+qKMc55uvkXqfTb3KoBa2r4VC1OaoYGRru
 R8oF9qQRnGVQAl/LbBVchmgSjxryxPrCvBGiKlK03VkXdzEMHMimEJh3BQ6e0PGo
 gajdk+4liy7z+jQnI7jFhvJjGKzkEP/Bc99M/uS92QX5MgpH6mqpHMoqqPiqW87K
 RmZH37FqRu1Vo8dpibmH6r2K6YD/HHRjaDHk1VuuCQYEn0dsNmokPXOqd/1v0I1i
 TXPjWOw1AID5vMJWllqxFhpeVvf0vx5BT/UNrh68MLqlJZzv2eMVJb4fNy6640ml
 U0NclMnOa3eOmf5z1T7/LqDRTa63Q0kpanRrBpcmVOaqW+ZpQ3SQjh4uBN1PyJHL
 cX3Skc341DyRlFiT54QhGKlm57MEb2gjhBZ3Z4J+b7sEFgvjXH/W8vcOGIKlppmA
 CfYMyres4OV+fJc89ONkWsvLiOP1OeUGPvytm33J5QMKXc8SzOLP0D/F8kjrDflm
 EROKuZ4EA5ej/rOy
 =Ig/Y
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - Support zstd-compressed debug info

 - Allow W=1 builds to detect objects shared among multiple modules

 - Add srcrpm-pkg target to generate a source RPM package

 - Make the -s option detection work for future GNU Make versions

 - Add -Werror to KBUILD_CPPFLAGS when CONFIG_WERROR=y

 - Allow W=1 builds to detect -Wundef warnings in any preprocessed files

 - Raise the minimum supported version of binutils to 2.25

 - Use $(intcmp ...) to compare integers if GNU Make >= 4.4 is used

 - Use $(file ...) to read a file if GNU Make >= 4.2 is used

 - Print error if GNU Make older than 3.82 is used

 - Allow modpost to detect section mismatches with Clang LTO

 - Include vmlinuz.efi into kernel tarballs for arm64 CONFIG_EFI_ZBOOT=y

* tag 'kbuild-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (29 commits)
  buildtar: fix tarballs with EFI_ZBOOT enabled
  modpost: Include '.text.*' in TEXT_SECTIONS
  padata: Mark padata_work_init() as __ref
  kbuild: ensure Make >= 3.82 is used
  kbuild: refactor the prerequisites of the modpost rule
  kbuild: change module.order to list *.o instead of *.ko
  kbuild: use .NOTINTERMEDIATE for future GNU Make versions
  kconfig: refactor Makefile to reduce process forks
  kbuild: add read-file macro
  kbuild: do not sort after reading modules.order
  kbuild: add test-{ge,gt,le,lt} macros
  Documentation: raise minimum supported version of binutils to 2.25
  kbuild: add -Wundef to KBUILD_CPPFLAGS for W=1 builds
  kbuild: move -Werror from KBUILD_CFLAGS to KBUILD_CPPFLAGS
  kbuild: Port silent mode detection to future gnu make.
  init/version.c: remove #include <generated/utsrelease.h>
  firmware_loader: remove #include <generated/utsrelease.h>
  modpost: Mark uuid_le type to be suitable only for MEI
  kbuild: add ability to make source rpm buildable using koji
  kbuild: warn objects shared among multiple modules
  ...
2022-12-19 12:33:32 -06:00
Linus Torvalds
5f6e430f93 powerpc updates for 6.2
- Add powerpc qspinlock implementation optimised for large system scalability and
    paravirt. See the merge message for more details.
 
  - Enable objtool to be built on powerpc to generate mcount locations.
 
  - Use a temporary mm for code patching with the Radix MMU, so the writable mapping is
    restricted to the patching CPU.
 
  - Add an option to build the 64-bit big-endian kernel with the ELFv2 ABI.
 
  - Sanitise user registers on interrupt entry on 64-bit Book3S.
 
  - Many other small features and fixes.
 
 Thanks to: Aboorva Devarajan, Angel Iglesias, Benjamin Gray, Bjorn Helgaas, Bo Liu, Chen
 Lifu, Christoph Hellwig, Christophe JAILLET, Christophe Leroy, Christopher M. Riedl, Colin
 Ian King, Deming Wang, Disha Goel, Dmitry Torokhov, Finn Thain, Geert Uytterhoeven,
 Gustavo A. R. Silva, Haowen Bai, Joel Stanley, Jordan Niethe, Julia Lawall, Kajol Jain,
 Laurent Dufour, Li zeming, Miaoqian Lin, Michael Jeanson, Nathan Lynch, Naveen N. Rao,
 Nayna Jain, Nicholas Miehlbradt, Nicholas Piggin, Pali Rohár, Randy Dunlap, Rohan McLure,
 Russell Currey, Sathvika Vasireddy, Shaomin Deng, Stephen Kitt, Stephen Rothwell, Thomas
 Weißschuh, Tiezhu Yang, Uwe Kleine-König, Xie Shaowen, Xiu Jianfeng, XueBing Chen, Yang
 Yingliang, Zhang Jiaming, ruanjinjie, Jessica Yu, Wolfram Sang.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmOfrj8THG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgIWtD/9mGF/ze2k+qFTo+30fb7bO8WJIDgsR
 dIASnZjXV7q/45elvymhUdkQv4R7xL3pzC40P1+ZKtWzGTNe+zWUQLoALNwRK85j
 8CsxZbqefGNKE5Z6ZHo9s37wsu3+jJu9yEQpGFo1LINyzeclCn5St5oqfRam+Hd/
 cPF+VfvREwZ0+YOKGBhJ2EgC+Gc9xsFY7DLQsoYlu71iZZr6Z6rgZW/EY5h3RMGS
 YKBoVwDsWaU0FpFWrr/rYTI6DqSr3AHr1+ftDg7ncCZMD6vQva6aMCCt94aLB1aE
 vC+DNdhZlA558bXGa5yA7Wr//7aUBUIwyC60DogOeZ6vw3kD9tdEd1fbH5hmqNKY
 K5bfqm28XU2959CTE8RDgsYYZvwDcfrjBIML14WZGdCQOTcGKpgOGp22o6yNb1Pq
 JKpHHnVpvu2PZ/p2XdKSm9+etr2yI6lXZAEVTS7ehdtMukButjSHEVbSCEZ8tlWz
 KokQt2J23BMHuSrXK6+67wWQBtdsLEk+LBOQmweiwarMocqvL/Zjz/5J7DR2DtH8
 wlY3wOtB1+E5j7xZ+RgK3c3jNg5dH39ZwvFsSATWTI3P+iq6OK/bbk4q4LmZt2l9
 ZIfH/CXPf9BvGCHzHa3AAd3UBbJLFwj17btMEv1wFVPS0T4LPUzkgTNTNUYeP6zL
 h1e5QfgUxvKPuQ==
 =7k3p
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - Add powerpc qspinlock implementation optimised for large system
   scalability and paravirt. See the merge message for more details

 - Enable objtool to be built on powerpc to generate mcount locations

 - Use a temporary mm for code patching with the Radix MMU, so the
   writable mapping is restricted to the patching CPU

 - Add an option to build the 64-bit big-endian kernel with the ELFv2
   ABI

 - Sanitise user registers on interrupt entry on 64-bit Book3S

 - Many other small features and fixes

Thanks to Aboorva Devarajan, Angel Iglesias, Benjamin Gray, Bjorn
Helgaas, Bo Liu, Chen Lifu, Christoph Hellwig, Christophe JAILLET,
Christophe Leroy, Christopher M. Riedl, Colin Ian King, Deming Wang,
Disha Goel, Dmitry Torokhov, Finn Thain, Geert Uytterhoeven, Gustavo A.
R. Silva, Haowen Bai, Joel Stanley, Jordan Niethe, Julia Lawall, Kajol
Jain, Laurent Dufour, Li zeming, Miaoqian Lin, Michael Jeanson, Nathan
Lynch, Naveen N. Rao, Nayna Jain, Nicholas Miehlbradt, Nicholas Piggin,
Pali Rohár, Randy Dunlap, Rohan McLure, Russell Currey, Sathvika
Vasireddy, Shaomin Deng, Stephen Kitt, Stephen Rothwell, Thomas
Weißschuh, Tiezhu Yang, Uwe Kleine-König, Xie Shaowen, Xiu Jianfeng,
XueBing Chen, Yang Yingliang, Zhang Jiaming, ruanjinjie, Jessica Yu,
and Wolfram Sang.

* tag 'powerpc-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (181 commits)
  powerpc/code-patching: Fix oops with DEBUG_VM enabled
  powerpc/qspinlock: Fix 32-bit build
  powerpc/prom: Fix 32-bit build
  powerpc/rtas: mandate RTAS syscall filtering
  powerpc/rtas: define pr_fmt and convert printk call sites
  powerpc/rtas: clean up includes
  powerpc/rtas: clean up rtas_error_log_max initialization
  powerpc/pseries/eeh: use correct API for error log size
  powerpc/rtas: avoid scheduling in rtas_os_term()
  powerpc/rtas: avoid device tree lookups in rtas_os_term()
  powerpc/rtasd: use correct OF API for event scan rate
  powerpc/rtas: document rtas_call()
  powerpc/pseries: unregister VPA when hot unplugging a CPU
  powerpc/pseries: reset the RCU watchdogs after a LPM
  powerpc: Take in account addition CPU node when building kexec FDT
  powerpc: export the CPU node count
  powerpc/cpuidle: Set CPUIDLE_FLAG_POLLING for snooze state
  powerpc/dts/fsl: Fix pca954x i2c-mux node names
  cxl: Remove unnecessary cxl_pci_window_alignment()
  selftests/powerpc: Fix resource leaks
  ...
2022-12-19 07:13:33 -06:00
Paul E. McKenney
3f6c3d29df rcu: Don't assert interrupts enabled too early in boot
The rcu_poll_gp_seq_end() and rcu_poll_gp_seq_end_unlocked() both check
that interrupts are enabled, as they normally should be when waiting for
an RCU grace period.  Except that it is legal to wait for grace periods
during early boot, before interrupts have been enabled for the first time,
and polling for grace periods is required to work during this time.
This can result in false-positive lockdep splats in the presence of
boot-time-initiated tracing.

This commit therefore conditions those interrupts-enabled checks on
rcu_scheduler_active having advanced past RCU_SCHEDULER_INACTIVE, by
which time interrupts have been enabled.

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-12-17 16:12:20 -08:00
Linus Torvalds
4f292c4de4 New Feature:
* Randomize the per-cpu entry areas
 Cleanups:
 * Have CR3_ADDR_MASK use PHYSICAL_PAGE_MASK instead of open
   coding it
 * Move to "native" set_memory_rox() helper
 * Clean up pmd_get_atomic() and i386-PAE
 * Remove some unused page table size macros
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmOc53UACgkQaDWVMHDJ
 krCUHw//SGZ+La0hLZLAiAiZTXLZZHpYkOmg1Oj1+11qSU11uZzTFqDpauhaKpRS
 cJCSh+D+RXe5e2ipgt0+Zl0hESLt7pJf8258OE4ra0DL/IlyO9uqruAs9Kn3eRS/
 Fk76nG8gdEU+JKJqpG02GqOLslYQuIy96n9hpuj1x25b614+uezPfC7S4XEat0NT
 MbJQ+jnVDf16aJIJkzT+iSwhubDVeh+bSHeO0SSCzX23WLUqDeg5NvlyxoCHGbBh
 UpUTWggV/0pYAkBKRHToeJs8qTWREwuuH/8JGewpe9A0tjdB5wyZfNL2PuracweN
 9MauXC3T5f0+Ca4yIIaPq1fF7Ny/PR2dBFihk27rOD0N7tjaZxNwal2pB1sZcmvZ
 +PAokjyTPVH5ZXjkMYGGAUe1jyjwr2+TgFSZxhTnDuGtyVQiY4pihGKOifLCX6tv
 x6khvYeTBw7wfaDRtKEAf+2kLHYn+71HszHP/8bNKX9T03h+Zf0i1wdZu5xbM5Gc
 VK2wR7bCC+UftJJYG0pldcHg2qaF19RBHK2tLwp7zngUv7lTbkKfkgKjre73KV2a
 D4b76lrqdUMo6UYwYdw7WtDyarZS4OVLq2DcNhwwMddBCaX8kyN5a4AqwQlZYJ0u
 dM+kuMofE8U3yMxmMhJimkZUsj09yLHIqfynY0jbAcU3nhKZZNY=
 =wwVF
 -----END PGP SIGNATURE-----

Merge tag 'x86_mm_for_6.2_v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 mm updates from Dave Hansen:
 "New Feature:

   - Randomize the per-cpu entry areas

  Cleanups:

   - Have CR3_ADDR_MASK use PHYSICAL_PAGE_MASK instead of open coding it

   - Move to "native" set_memory_rox() helper

   - Clean up pmd_get_atomic() and i386-PAE

   - Remove some unused page table size macros"

* tag 'x86_mm_for_6.2_v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits)
  x86/mm: Ensure forced page table splitting
  x86/kasan: Populate shadow for shared chunk of the CPU entry area
  x86/kasan: Add helpers to align shadow addresses up and down
  x86/kasan: Rename local CPU_ENTRY_AREA variables to shorten names
  x86/mm: Populate KASAN shadow for entire per-CPU range of CPU entry area
  x86/mm: Recompute physical address for every page of per-CPU CEA mapping
  x86/mm: Rename __change_page_attr_set_clr(.checkalias)
  x86/mm: Inhibit _PAGE_NX changes from cpa_process_alias()
  x86/mm: Untangle __change_page_attr_set_clr(.checkalias)
  x86/mm: Add a few comments
  x86/mm: Fix CR3_ADDR_MASK
  x86/mm: Remove P*D_PAGE_MASK and P*D_PAGE_SIZE macros
  mm: Convert __HAVE_ARCH_P..P_GET to the new style
  mm: Remove pointless barrier() after pmdp_get_lockless()
  x86/mm/pae: Get rid of set_64bit()
  x86_64: Remove pointless set_64bit() usage
  x86/mm/pae: Be consistent with pXXp_get_and_clear()
  x86/mm/pae: Use WRITE_ONCE()
  x86/mm/pae: Don't (ab)use atomic64
  mm/gup: Fix the lockless PMD access
  ...
2022-12-17 14:06:53 -06:00
Linus Torvalds
03d84bd6d4 MSI fixes for 6.2:
- Return MSI_XA_DOMAIN_SIZE as the maximum MSI index when the architecture
   does not make use of irq domains instead of returning 0, which is pretty
   limiting.
 
 - Check for the presence of an irq domain when validating the MSI iterator,
   as s390/powerpc won't have one.
 
 - Fix powerpc's MSI backends which fail to clear the descriptor's IRQ field
   on teardown, leading to a splat and leaked descriptors.
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmOdpC8PHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDqHEP+wYuEti5Pib6PIxDgmm9eT7uvCPvOw31LojJ
 n+MMbOht5qx0vQ/9w/e5N5gwkao0i870s3vCA0vDh9JurHTS/NDo/PWsf2eyVY/0
 OOVEYFJS87wewXENK3mLabP1AD5+lv8RWe7HKEhrwl741K5NRs4xpKyOxOlgo4CO
 LXDG/kXN25tQ4HTKgbPXxU8P16PqsJI3H/2NKgZW0ntggldhhgO+Lb+TDgloAPo9
 dk9gTG5EEhJZu1em3gDAuX71Tjyr8/OcyNa5WEec78lBqlgyt9gMqEffhoBigM7d
 tVLsi667J94qdCJtG7Zeeo3996HQ4YqdqmO2csPzJq+d0TCKrTTwyvkyAmSJ51nV
 pUywGkhLRYsvA4PX/ZFcrT/GfJLIGhXmzqV5fWpAWfoDPAw/s3PfrTzugQ6cPpYE
 Ox8pcfo6xEhVPfCzeIlYShPEz746Kyje8vUuHbXKjekZolW0FE7qOQaVxGEY7qal
 IvI6MotDjqbJV0ancGgZPIU2r7zYmx8fXiEnqMJEg3xNf93O867u4GqnvoqKoQAP
 hU7lPFKyX9UYkZUyTR8Vp6v+tAObXbzkR+22nf0ktFT8toi7Ujq27r5zwdfr2Jjs
 Rg1X5wSigy7RVDVjVc8oXlYhNMrQXLWQUhh/OwtUBI4mhCEpOL2EWr5O2A3dRbXT
 ZL+syvmh
 =yuHx
 -----END PGP SIGNATURE-----

Merge tag 'msi-fixes-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms

Pull MSI fixes from Marc Zyngier:
 "Thomas tasked me with sending out a few urgent fixes after the giant
  MSI rework that landed in 6.2, as both s390 and powerpc ended-up
  suffering from it (they do not use the full core code infrastructure,
  leading to these previously undetected issues):

   - Return MSI_XA_DOMAIN_SIZE as the maximum MSI index when the
     architecture does not make use of irq domains instead of returning
     0, which is pretty limiting.

   - Check for the presence of an irq domain when validating the MSI
     iterator, as s390/powerpc won't have one.

   - Fix powerpc's MSI backends which fail to clear the descriptor's IRQ
     field on teardown, leading to a splat and leaked descriptors"

* tag 'msi-fixes-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms:
  powerpc/msi: Fix deassociation of MSI descriptors
  genirq/msi: Return MSI_XA_DOMAIN_SIZE as the maximum MSI index when no domain is present
  genirq/msi: Check for the presence of an irq domain when validating msi_ctrl
2022-12-17 13:58:09 -06:00
Kees Cook
7535b832c6 exit: Use READ_ONCE() for all oops/warn limit reads
Use a temporary variable to take full advantage of READ_ONCE() behavior.
Without this, the report (and even the test) might be out of sync with
the initial test.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/Y5x7GXeluFmZ8E0E@hirez.programming.kicks-ass.net
Fixes: 9fc9e278a5 ("panic: Introduce warn_limit")
Fixes: d4ccd54d28 ("exit: Put an upper limit on how often we can oops")
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Jann Horn <jannh@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Marco Elver <elver@google.com>
Cc: tangmeng <tangmeng@uniontech.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Kees Cook <keescook@chromium.org>
2022-12-16 12:26:57 -08:00
Thomas Gleixner
e982ad82bd genirq/msi: Return MSI_XA_DOMAIN_SIZE as the maximum MSI index when no domain is present
On architectures such as s390 that do not use irq domains for MSI,
returning 0 as the maximum MSI index is a bit counter-productive,
as it indicates that no MSI can be allocated. Bad idea.

Instead, return the maximum we're willing to support in the MSI
backing store (MSI_XA_DOMAIN_SIZE), and let the arch code do its
usual thing.

Thanks to Matthew Rosato for fixing the fix.

Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[maz: commit message]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/87fsdgzpqs.ffs@tglx
2022-12-16 14:04:04 +00:00
Marc Zyngier
db3568fd80 genirq/msi: Check for the presence of an irq domain when validating msi_ctrl
For architectures such as s390 and powerpc that do not use
irq domains for MSIs, dev->msi.domain is always NULL, so
the per-device, per-bus MSI domain is also guaranteed to
be NULL.

So checking one without checking the other is bound to result
in a splat, followed by a memory leak as we don't free the MSI
descriptors.

Add the missing check.

Reported-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/e570e70d-19bc-101b-0481-ff9a3cab3504@linux.ibm.com
2022-12-16 13:27:18 +00:00
Linus Torvalds
71a7507afb Driver Core changes for 6.2-rc1
Here is the set of driver core and kernfs changes for 6.2-rc1.
 
 The "big" change in here is the addition of a new macro,
 container_of_const() that will preserve the "const-ness" of a pointer
 passed into it.
 
 The "problem" of the current container_of() macro is that if you pass in
 a "const *", out of it can comes a non-const pointer unless you
 specifically ask for it.  For many usages, we want to preserve the
 "const" attribute by using the same call.  For a specific example, this
 series changes the kobj_to_dev() macro to use it, allowing it to be used
 no matter what the const value is.  This prevents every subsystem from
 having to declare 2 different individual macros (i.e.
 kobj_const_to_dev() and kobj_to_dev()) and having the compiler enforce
 the const value at build time, which having 2 macros would not do
 either.
 
 The driver for all of this have been discussions with the Rust kernel
 developers as to how to properly mark driver core, and kobject, objects
 as being "non-mutable".  The changes to the kobject and driver core in
 this pull request are the result of that, as there are lots of paths
 where kobjects and device pointers are not modified at all, so marking
 them as "const" allows the compiler to enforce this.
 
 So, a nice side affect of the Rust development effort has been already
 to clean up the driver core code to be more obvious about object rules.
 
 All of this has been bike-shedded in quite a lot of detail on lkml with
 different names and implementations resulting in the tiny version we
 have in here, much better than my original proposal.  Lots of subsystem
 maintainers have acked the changes as well.
 
 Other than this change, included in here are smaller stuff like:
   - kernfs fixes and updates to handle lock contention better
   - vmlinux.lds.h fixes and updates
   - sysfs and debugfs documentation updates
   - device property updates
 
 All of these have been in the linux-next tree for quite a while with no
 problems, OTHER than some merge issues with other trees that should be
 obvious when you hit them (block tree deletes a driver that this tree
 modifies, iommufd tree modifies code that this tree also touches).  If
 there are merge problems with these trees, please let me know.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCY5wz3A8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+yks0ACeKYUlVgCsER8eYW+x18szFa2QTXgAn2h/VhZe
 1Fp53boFaQkGBjl8mGF8
 =v+FB
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the set of driver core and kernfs changes for 6.2-rc1.

  The "big" change in here is the addition of a new macro,
  container_of_const() that will preserve the "const-ness" of a pointer
  passed into it.

  The "problem" of the current container_of() macro is that if you pass
  in a "const *", out of it can comes a non-const pointer unless you
  specifically ask for it. For many usages, we want to preserve the
  "const" attribute by using the same call. For a specific example, this
  series changes the kobj_to_dev() macro to use it, allowing it to be
  used no matter what the const value is. This prevents every subsystem
  from having to declare 2 different individual macros (i.e.
  kobj_const_to_dev() and kobj_to_dev()) and having the compiler enforce
  the const value at build time, which having 2 macros would not do
  either.

  The driver for all of this have been discussions with the Rust kernel
  developers as to how to properly mark driver core, and kobject,
  objects as being "non-mutable". The changes to the kobject and driver
  core in this pull request are the result of that, as there are lots of
  paths where kobjects and device pointers are not modified at all, so
  marking them as "const" allows the compiler to enforce this.

  So, a nice side affect of the Rust development effort has been already
  to clean up the driver core code to be more obvious about object
  rules.

  All of this has been bike-shedded in quite a lot of detail on lkml
  with different names and implementations resulting in the tiny version
  we have in here, much better than my original proposal. Lots of
  subsystem maintainers have acked the changes as well.

  Other than this change, included in here are smaller stuff like:

   - kernfs fixes and updates to handle lock contention better

   - vmlinux.lds.h fixes and updates

   - sysfs and debugfs documentation updates

   - device property updates

  All of these have been in the linux-next tree for quite a while with
  no problems"

* tag 'driver-core-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (58 commits)
  device property: Fix documentation for fwnode_get_next_parent()
  firmware_loader: fix up to_fw_sysfs() to preserve const
  usb.h: take advantage of container_of_const()
  device.h: move kobj_to_dev() to use container_of_const()
  container_of: add container_of_const() that preserves const-ness of the pointer
  driver core: fix up missed drivers/s390/char/hmcdrv_dev.c class.devnode() conversion.
  driver core: fix up missed scsi/cxlflash class.devnode() conversion.
  driver core: fix up some missing class.devnode() conversions.
  driver core: make struct class.devnode() take a const *
  driver core: make struct class.dev_uevent() take a const *
  cacheinfo: Remove of_node_put() for fw_token
  device property: Add a blank line in Kconfig of tests
  device property: Rename goto label to be more precise
  device property: Move PROPERTY_ENTRY_BOOL() a bit down
  device property: Get rid of __PROPERTY_ENTRY_ARRAY_EL*SIZE*()
  kernfs: fix all kernel-doc warnings and multiple typos
  driver core: pass a const * into of_device_uevent()
  kobject: kset_uevent_ops: make name() callback take a const *
  kobject: kset_uevent_ops: make filter() callback take a const *
  kobject: make kobject_namespace take a const *
  ...
2022-12-16 03:54:54 -08:00
Linus Torvalds
fe36bb8736 Tracing updates for 6.2:
- Add options to the osnoise tracer
   o panic_on_stop option that panics the kernel if osnoise is greater than some
     user defined threshold.
   o preempt option, to test noise while preemption is disabled
   o irq option, to test noise when interrupts are disabled
 
 - Add .percent and .graph suffix to histograms to give different outputs
 
 - Add nohitcount to disable showing hitcount in histogram output
 
 - Add new __cpumask() to trace event fields to annotate that a unsigned long
   array is a cpumask to user space and should be treated as one.
 
 - Add trace_trigger kernel command line parameter to enable trace event
   triggers at boot up. Useful to trace stack traces, disable tracing and take
   snapshots.
 
 - Fix x86/kmmio mmio tracer to work with the updates to lockdep
 
 - Unify the panic and die notifiers
 
 - Add back ftrace_expect reference that is used to extract more information in
   the ftrace_bug() code.
 
 - Have trigger filter parsing errors show up in the tracing error log.
 
 - Updated MAINTAINERS file to add kernel tracing  mailing list and patchwork
   info
 
 - Use IDA to keep track of event type numbers.
 
 - And minor fixes and clean ups
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCY5vIcxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qlO7AQCmtZbriadAR6N7Llj092YXmYfzrxyi
 1WS35vhpZsBJ8gEA8j68l+LrgNt51N2gXlTXEHgXzdBgL/TKAPSX4D99GQY=
 =z1pe
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing updates from Steven Rostedt:

 - Add options to the osnoise tracer:
      - 'panic_on_stop' option that panics the kernel if osnoise is
        greater than some user defined threshold.
      - 'preempt' option, to test noise while preemption is disabled
      - 'irq' option, to test noise when interrupts are disabled

 - Add .percent and .graph suffix to histograms to give different
   outputs

 - Add nohitcount to disable showing hitcount in histogram output

 - Add new __cpumask() to trace event fields to annotate that a unsigned
   long array is a cpumask to user space and should be treated as one.

 - Add trace_trigger kernel command line parameter to enable trace event
   triggers at boot up. Useful to trace stack traces, disable tracing
   and take snapshots.

 - Fix x86/kmmio mmio tracer to work with the updates to lockdep

 - Unify the panic and die notifiers

 - Add back ftrace_expect reference that is used to extract more
   information in the ftrace_bug() code.

 - Have trigger filter parsing errors show up in the tracing error log.

 - Updated MAINTAINERS file to add kernel tracing mailing list and
   patchwork info

 - Use IDA to keep track of event type numbers.

 - And minor fixes and clean ups

* tag 'trace-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (44 commits)
  tracing: Fix cpumask() example typo
  tracing: Improve panic/die notifiers
  ftrace: Prevent RCU stall on PREEMPT_VOLUNTARY kernels
  tracing: Do not synchronize freeing of trigger filter on boot up
  tracing: Remove pointer (asterisk) and brackets from cpumask_t field
  tracing: Have trigger filter parsing errors show up in error_log
  x86/mm/kmmio: Remove redundant preempt_disable()
  tracing: Fix infinite loop in tracing_read_pipe on overflowed print_trace_line
  Documentation/osnoise: Add osnoise/options documentation
  tracing/osnoise: Add preempt and/or irq disabled options
  tracing/osnoise: Add PANIC_ON_STOP option
  Documentation/osnoise: Escape underscore of NO_ prefix
  tracing: Fix some checker warnings
  tracing/osnoise: Make osnoise_options static
  tracing: remove unnecessary trace_trigger ifdef
  ring-buffer: Handle resize in early boot up
  tracing/hist: Fix issue of losting command info in error_log
  tracing: Fix issue of missing one synthetic field
  tracing/hist: Fix out-of-bound write on 'action_data.var_ref_idx'
  tracing/hist: Fix wrong return value in parse_action_params()
  ...
2022-12-15 18:01:16 -08:00