Commit Graph

2869 Commits

Author SHA1 Message Date
Linus Torvalds 6fd600d742 media updates for v6.10-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+QmuaPwR3wnBdVwACF8+vY7k4RUFAmZFpvgACgkQCF8+vY7k
 4RXffg//UOFGd12GwhBtkU1a3cBqT1DAUG8GRnmhLnGypRaiP7ypRhI/LV1ZZ0SQ
 vjKuDuXrbk+JJ4hxNTH8GoisYpnRqqC2vIm5cnjCiMxN/pY/GkzPm7MU5zEhuWMB
 Rtz5RS4UrTtpJ95XxuDhXY5rRb3uPXMF2LUHLUbYq3IoUGz8x/ta1aKE56B35vY+
 jDg9JQugR1ciIf0OL7kvDJJfDUKkGGsr/u4gRWBxntYHtVMdUJXso3tYa78F1mBX
 oTWKc8IFms1JgA7NdDnKttOCO0Ykb0IJxE0qO094xuOPW50wLsLByJXdxJtOBj/Q
 iLvSIVrk//U+re0j6xLJgKES6ldZvDKn5AU3O22lbm9cgeXrbONIHQOSqLumYPCi
 HLnuc0eq4oED1UHj695pNyjgigUmZL9mDMB31AU92r0pfOKpGFRnexT1tyhqFonN
 88HMKInudnLsE7lVPzbUSVZxJfhOFj7jf8LILnRzqzy0HOD7te5KhxdjxtBmXvoN
 lpQ3Cs+i/n3Fe510mO0rcpeR73nYkNnX7EoJWOjojCK+Cz7/GnXICF53T0yAYANA
 W6ZGKNCEEgs8ce6dFrRG33jv0I8b/u6L5BVuWT/Ndam+KwMw59OjKlNPDiTvtwSR
 OZDL9eifturMuMUe0HT6k6k3u6VYWWjn2cvMFHg4g7Y6JOrllfQ=
 =JM5r
 -----END PGP SIGNATURE-----

Merge tag 'media/v6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media updates from Mauro Carvalho Chehab:

 - New V4L2 ioctl VIDIOC_REMOVE_BUFS

 - experimental support for using generic metaformats on V4L2 core

 - New drivers: Intel IPU6 controller driver, Broadcom BCM283x/BCM271x

 - More cleanups at atomisp driver

 - Usual bunch of driver cleanups, improvements and fixes

* tag 'media/v6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (328 commits)
  media: bcm2835-unicam: Depend on COMMON_CLK
  Revert "media: v4l2-ctrls: show all owned controls in log_status"
  media: ov2740: Ensure proper reset sequence on probe()
  media: intel/ipu6: Don't print user-triggerable errors to kernel log
  media: bcm2835-unicam: Fix driver path in MAINTAINERS
  media: bcm2835-unicam: Fix a NULL vs IS_ERR() check
  media: bcm2835-unicam: Do not print error when irq not found
  media: bcm2835-unicam: Do not replace IRQ retcode during probe
  media: bcm2835-unicam: Convert to platform remove callback returning void
  media: media: intel/ipu6: Fix spelling mistake "remappinp" -> "remapping"
  media: intel/ipu6: explicitly include vmalloc.h
  media: cec.h: Fix kerneldoc
  media: uvcvideo: Refactor iterators
  media: v4l: async: refactor v4l2_async_create_ancillary_links
  media: intel/ipu6: Don't re-allocate memory for firmware
  media: dvb-frontends: tda10048: Fix integer overflow
  media: tc358746: Use the correct div_ function
  media: i2c: st-mipid02: Use the correct div function
  media: tegra-vde: Refactor timeout handling
  media: stk1160: Use min macro
  ...
2024-05-16 08:45:44 -07:00
Linus Torvalds de6fef50ea cgroup: Changes for v6.10
- The locking around cpuset hotplug processing has always been a bit of mess
   which was worked around by making hotplug processing asynchronous. The
   asynchronity isn't great and led to other issues. We tried to make the
   behavior synchronous a while ago but that led to lockdep splats. Waiman
   took another stab at cleaning up and making it synchronous. The patch has
   been in -next for well over a month and there haven't been any complaints,
   so fingers crossed.
 
 - Tracepoints added to help understanding rstat lock contentions.
 
 - A bunch of minor changes - doc updates, code cleanups and selftests.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZkUrFA4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGfTyAQCwd0aNQOqaKRhJGtWYShqV/aYzurCy1Z2tB9/3
 dkdy9gD7BHNk6kZQEbT97RrHPIduFansLtc76VziACibWBuomgg=
 =2DNQ
 -----END PGP SIGNATURE-----

Merge tag 'cgroup-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup updates from Tejun Heo:

 - The locking around cpuset hotplug processing has always been a bit of
   mess which was worked around by making hotplug processing
   asynchronous. The asynchronity isn't great and led to other issues.

   We tried to make the behavior synchronous a while ago but that led to
   lockdep splats. Waiman took another stab at cleaning up and making it
   synchronous. The patch has been in -next for well over a month and
   there haven't been any complaints, so fingers crossed.

 - Tracepoints added to help understanding rstat lock contentions.

 - A bunch of minor changes - doc updates, code cleanups and selftests.

* tag 'cgroup-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (24 commits)
  cgroup/rstat: add cgroup_rstat_cpu_lock helpers and tracepoints
  selftests/cgroup: Drop define _GNU_SOURCE
  docs: cgroup-v1: Update page cache removal functions
  selftests/cgroup: fix uninitialized variables in test_zswap.c
  selftests/cgroup: cpu_hogger init: use {} instead of {NULL}
  selftests/cgroup: fix clang warnings: uninitialized fd variable
  selftests/cgroup: fix clang build failures for abs() calls
  cgroup/cpuset: Remove outdated comment in sched_partition_write()
  cgroup/cpuset: Fix incorrect top_cpuset flags
  cgroup/cpuset: Avoid clearing CS_SCHED_LOAD_BALANCE twice
  cgroup/cpuset: Statically initialize more members of top_cpuset
  cgroup: Avoid unnecessary looping in cgroup_no_v1()
  cgroup, legacy_freezer: update comment for freezer_css_offline()
  docs, cgroup: add entries for pids to cgroup-v2.rst
  cgroup: don't call cgroup1_pidlist_destroy_all() for v2
  cgroup_freezer: update comment for freezer_css_online()
  cgroup/rstat: desc member cgrp in cgroup_rstat_flush_release
  cgroup/rstat: add cgroup_rstat_lock helpers and tracepoints
  cgroup/pids: Remove superfluous zeroing
  docs: cgroup-v1: Fix description for css_online
  ...
2024-05-15 17:06:08 -07:00
Linus Torvalds 1b294a1f35 Networking changes for 6.10.
Core & protocols
 ----------------
 
  - Complete rework of garbage collection of AF_UNIX sockets.
    AF_UNIX is prone to forming reference count cycles due to fd passing
    functionality. New method based on Tarjan's Strongly Connected Components
    algorithm should be both faster and remove a lot of workarounds
    we accumulated over the years.
 
  - Add TCP fraglist GRO support, allowing chaining multiple TCP packets
    and forwarding them together. Useful for small switches / routers which
    lack basic checksum offload in some scenarios (e.g. PPPoE).
 
  - Support using SMP threads for handling packet backlog i.e. packet
    processing from software interfaces and old drivers which don't
    use NAPI. This helps move the processing out of the softirq jumble.
 
  - Continue work of converting from rtnl lock to RCU protection.
    Don't require rtnl lock when reading: IPv6 routing FIB, IPv6 address
    labels, netdev threaded NAPI sysfs files, bonding driver's sysfs files,
    MPLS devconf, IPv4 FIB rules, netns IDs, tcp metrics, TC Qdiscs,
    neighbor entries, ARP entries via ioctl(SIOCGARP), a lot of the link
    information available via rtnetlink.
 
  - Small optimizations from Eric to UDP wake up handling, memory accounting,
    RPS/RFS implementation, TCP packet sizing etc.
 
  - Allow direct page recycling in the bulk API used by XDP, for +2% PPS.
 
  - Support peek with an offset on TCP sockets.
 
  - Add MPTCP APIs for querying last time packets were received/sent/acked,
    and whether MPTCP "upgrade" succeeded on a TCP socket.
 
  - Add intra-node communication shortcut to improve SMC performance.
 
  - Add IPv6 (and IPv{4,6}-over-IPv{4,6}) support to the GTP protocol driver.
 
  - Add HSR-SAN (RedBOX) mode of operation to the HSR protocol driver.
 
  - Add reset reasons for tracing what caused a TCP reset to be sent.
 
  - Introduce direction attribute for xfrm (IPSec) states.
    State can be used either for input or output packet processing.
 
 Things we sprinkled into general kernel code
 --------------------------------------------
 
  - Add bitmap_{read,write}(), bitmap_size(), expose BYTES_TO_BITS().
    This required touch-ups and renaming of a few existing users.
 
  - Add Endian-dependent __counted_by_{le,be} annotations.
 
  - Make building selftests "quieter" by printing summaries like
    "CC object.o" rather than full commands with all the arguments.
 
 Netfilter
 ---------
 
  - Use GFP_KERNEL to clone elements, to deal better with OOM situations
    and avoid failures in the .commit step.
 
 BPF
 ---
 
  - Add eBPF JIT for ARCv2 CPUs.
 
  - Support attaching kprobe BPF programs through kprobe_multi link in
    a session mode, meaning, a BPF program is attached to both function entry
    and return, the entry program can decide if the return program gets
    executed and the entry program can share u64 cookie value with return
    program. "Session mode" is a common use-case for tetragon and bpftrace.
 
  - Add the ability to specify and retrieve BPF cookie for raw tracepoint
    programs in order to ease migration from classic to raw tracepoints.
 
  - Add an internal-only BPF per-CPU instruction for resolving per-CPU
    memory addresses and implement support in x86, ARM64 and RISC-V JITs.
    This allows inlining functions which need to access per-CPU state.
 
  - Optimize x86 BPF JIT's emit_mov_imm64, and add support for various
    atomics in bpf_arena which can be JITed as a single x86 instruction.
    Support BPF arena on ARM64.
 
  - Add a new bpf_wq API for deferring events and refactor process-context
    bpf_timer code to keep common code where possible.
 
  - Harden the BPF verifier's and/or/xor value tracking.
 
  - Introduce crypto kfuncs to let BPF programs call kernel crypto APIs.
 
  - Support bpf_tail_call_static() helper for BPF programs with GCC 13.
 
  - Add bpf_preempt_{disable,enable}() kfuncs in order to allow a BPF
    program to have code sections where preemption is disabled.
 
 Driver API
 ----------
 
  - Skip software TC processing completely if all installed rules are
    marked as HW-only, instead of checking the HW-only flag rule by rule.
 
  - Add support for configuring PoE (Power over Ethernet), similar to
    the already existing support for PoDL (Power over Data Line) config.
 
  - Initial bits of a queue control API, for now allowing a single queue
    to be reset without disturbing packet flow to other queues.
 
  - Common (ethtool) statistics for hardware timestamping.
 
 Tests and tooling
 -----------------
 
  - Remove the need to create a config file to run the net forwarding tests
    so that a naive "make run_tests" can exercise them.
 
  - Define a method of writing tests which require an external endpoint
    to communicate with (to send/receive data towards the test machine).
    Add a few such tests.
 
  - Create a shared code library for writing Python tests. Expose the YAML
    Netlink library from tools/ to the tests for easy Netlink access.
 
  - Move netfilter tests under net/, extend them, separate performance tests
    from correctness tests, and iron out issues found by running them
    "on every commit".
 
  - Refactor BPF selftests to use common network helpers.
 
  - Further work filling in YAML definitions of Netlink messages for:
    nftables, team driver, bonding interfaces, vlan interfaces, VF info,
    TC u32 mark, TC police action.
 
  - Teach Python YAML Netlink to decode attribute policies.
 
  - Extend the definition of the "indexed array" construct in the specs
    to cover arrays of scalars rather than just nests.
 
  - Add hyperlinks between definitions in generated Netlink docs.
 
 Drivers
 -------
 
  - Make sure unsupported flower control flags are rejected by drivers,
    and make more drivers report errors directly to the application rather
    than dmesg (large number of driver changes from Asbjørn Sloth Tønnesen).
 
  - Ethernet high-speed NICs:
    - Broadcom (bnxt):
      - support multiple RSS contexts and steering traffic to them
      - support XDP metadata
      - make page pool allocations more NUMA aware
    - Intel (100G, ice, idpf):
      - extract datapath code common among Intel drivers into a library
      - use fewer resources in switchdev by sharing queues with the PF
      - add PFCP filter support
      - add Ethernet filter support
      - use a spinlock instead of HW lock in PTP clock ops
      - support 5 layer Tx scheduler topology
    - nVidia/Mellanox:
      - 800G link modes and 100G SerDes speeds
      - per-queue IRQ coalescing configuration
    - Marvell Octeon:
      - support offloading TC packet mark action
 
  - Ethernet NICs consumer, embedded and virtual:
    - stop lying about skb->truesize in USB Ethernet drivers, it messes up
      TCP memory calculations
    - Google cloud vNIC:
      - support changing ring size via ethtool
      - support ring reset using the queue control API
    - VirtIO net:
      - expose flow hash from RSS to XDP
      - per-queue statistics
      - add selftests
    - Synopsys (stmmac):
      - support controllers which require an RX clock signal from the MII
        bus to perform their hardware initialization
    - TI:
      - icssg_prueth: support ICSSG-based Ethernet on AM65x SR1.0 devices
      - icssg_prueth: add SW TX / RX Coalescing based on hrtimers
      - cpsw: minimal XDP support
    - Renesas (ravb):
      - support describing the MDIO bus
    - Realtek (r8169):
      - add support for RTL8168M
    - Microchip Sparx5:
      - matchall and flower actions mirred and redirect
 
  - Ethernet switches:
    - nVidia/Mellanox:
      - improve events processing performance
    - Marvell:
      - add support for MV88E6250 family internal PHYs
    - Microchip:
      - add DCB and DSCP mapping support for KSZ switches
      - vsc73xx: convert to PHYLINK
    - Realtek:
      - rtl8226b/rtl8221b: add C45 instances and SerDes switching
 
  - Many driver changes related to PHYLIB and PHYLINK deprecated API cleanup.
 
  - Ethernet PHYs:
    - Add a new driver for Airoha EN8811H 2.5 Gigabit PHY.
    - micrel: lan8814: add support for PPS out and external timestamp trigger
 
  - WiFi:
    - Disable Wireless Extensions (WEXT) in all Wi-Fi 7 devices drivers.
      Modern devices can only be configured using nl80211.
    - mac80211/cfg80211
      - handle color change per link for WiFi 7 Multi-Link Operation
    - Intel (iwlwifi):
      - don't support puncturing in 5 GHz
      - support monitor mode on passive channels
      - BZ-W device support
      - P2P with HE/EHT support
      - re-add support for firmware API 90
      - provide channel survey information for Automatic Channel Selection
    - MediaTek (mt76):
      - mt7921 LED control
      - mt7925 EHT radiotap support
      - mt7920e PCI support
    - Qualcomm (ath11k):
      - P2P support for QCA6390, WCN6855 and QCA2066
      - support hibernation
      - ieee80211-freq-limit Device Tree property support
    - Qualcomm (ath12k):
      - refactoring in preparation of multi-link support
      - suspend and hibernation support
      - ACPI support
      - debugfs support, including dfs_simulate_radar support
    - RealTek:
      - rtw88: RTL8723CS SDIO device support
      - rtw89: RTL8922AE Wi-Fi 7 PCI device support
      - rtw89: complete features of new WiFi 7 chip 8922AE including
        BT-coexistence and Wake-on-WLAN
      - rtw89: use BIOS ACPI settings to set TX power and channels
      - rtl8xxxu: enable Management Frame Protection (MFP) support
 
  - Bluetooth:
    - support for Intel BlazarI and Filmore Peak2 (BE201)
    - support for MediaTek MT7921S SDIO
    - initial support for Intel PCIe BT driver
    - remove HCI_AMP support
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmZD6sQACgkQMUZtbf5S
 IrtLYw/+I73ePGIye37o2jpbodcLAUZVfF3r6uYUzK8hokEcKD0QVJa9w7PizLZ3
 UO45ClOXFLJCkfP4reFenLfxGCel2AJI+F7VFl2xaO2XgrcH/lnVrHqKZEAEXjls
 KoYMnShIolv7h2MKP6hHtyTi2j1wvQUKsZC71o9/fuW+4fUT8gECx1YtYcL73wrw
 gEMdlUgBYC3jiiCUHJIFX6iPJ2t/TC+q1eIIF2K/Osrk2kIqQhzoozcL4vpuAZQT
 99ljx/qRelXa8oppDb7nM5eulg7WY8ZqxEfFZphTMC5nLEGzClxuOTTl2kDYI/D/
 UZmTWZDY+F5F0xvNk2gH84qVJXBOVDoobpT7hVA/tDuybobc/kvGDzRayEVqVzKj
 Q0tPlJs+xBZpkK5TVnxaFLJVOM+p1Xosxy3kNVXmuYNBvT/R89UbJiCrUKqKZF+L
 z/1mOYUv8UklHqYAeuJSptHvqJjTGa/fsEYP7dAUBbc1N2eVB8mzZ4mgU5rYXbtC
 E6UXXiWnoSRm8bmco9QmcWWoXt5UGEizHSJLz6t1R5Df/YmXhWlytll5aCwY1ksf
 FNoL7S4u7AZThL1Nwi7yUs4CAjhk/N4aOsk+41S0sALCx30BJuI6UdesAxJ0lu+Z
 fwCQYbs27y4p7mBLbkYwcQNxAxGm7PSK4yeyRIy2njiyV4qnLf8=
 =EsC2
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core & protocols:

   - Complete rework of garbage collection of AF_UNIX sockets.

     AF_UNIX is prone to forming reference count cycles due to fd
     passing functionality. New method based on Tarjan's Strongly
     Connected Components algorithm should be both faster and remove a
     lot of workarounds we accumulated over the years.

   - Add TCP fraglist GRO support, allowing chaining multiple TCP
     packets and forwarding them together. Useful for small switches /
     routers which lack basic checksum offload in some scenarios (e.g.
     PPPoE).

   - Support using SMP threads for handling packet backlog i.e. packet
     processing from software interfaces and old drivers which don't use
     NAPI. This helps move the processing out of the softirq jumble.

   - Continue work of converting from rtnl lock to RCU protection.

     Don't require rtnl lock when reading: IPv6 routing FIB, IPv6
     address labels, netdev threaded NAPI sysfs files, bonding driver's
     sysfs files, MPLS devconf, IPv4 FIB rules, netns IDs, tcp metrics,
     TC Qdiscs, neighbor entries, ARP entries via ioctl(SIOCGARP), a lot
     of the link information available via rtnetlink.

   - Small optimizations from Eric to UDP wake up handling, memory
     accounting, RPS/RFS implementation, TCP packet sizing etc.

   - Allow direct page recycling in the bulk API used by XDP, for +2%
     PPS.

   - Support peek with an offset on TCP sockets.

   - Add MPTCP APIs for querying last time packets were received/sent/acked
     and whether MPTCP "upgrade" succeeded on a TCP socket.

   - Add intra-node communication shortcut to improve SMC performance.

   - Add IPv6 (and IPv{4,6}-over-IPv{4,6}) support to the GTP protocol
     driver.

   - Add HSR-SAN (RedBOX) mode of operation to the HSR protocol driver.

   - Add reset reasons for tracing what caused a TCP reset to be sent.

   - Introduce direction attribute for xfrm (IPSec) states. State can be
     used either for input or output packet processing.

  Things we sprinkled into general kernel code:

   - Add bitmap_{read,write}(), bitmap_size(), expose BYTES_TO_BITS().

     This required touch-ups and renaming of a few existing users.

   - Add Endian-dependent __counted_by_{le,be} annotations.

   - Make building selftests "quieter" by printing summaries like
     "CC object.o" rather than full commands with all the arguments.

  Netfilter:

   - Use GFP_KERNEL to clone elements, to deal better with OOM
     situations and avoid failures in the .commit step.

  BPF:

   - Add eBPF JIT for ARCv2 CPUs.

   - Support attaching kprobe BPF programs through kprobe_multi link in
     a session mode, meaning, a BPF program is attached to both function
     entry and return, the entry program can decide if the return
     program gets executed and the entry program can share u64 cookie
     value with return program. "Session mode" is a common use-case for
     tetragon and bpftrace.

   - Add the ability to specify and retrieve BPF cookie for raw
     tracepoint programs in order to ease migration from classic to raw
     tracepoints.

   - Add an internal-only BPF per-CPU instruction for resolving per-CPU
     memory addresses and implement support in x86, ARM64 and RISC-V
     JITs. This allows inlining functions which need to access per-CPU
     state.

   - Optimize x86 BPF JIT's emit_mov_imm64, and add support for various
     atomics in bpf_arena which can be JITed as a single x86
     instruction. Support BPF arena on ARM64.

   - Add a new bpf_wq API for deferring events and refactor
     process-context bpf_timer code to keep common code where possible.

   - Harden the BPF verifier's and/or/xor value tracking.

   - Introduce crypto kfuncs to let BPF programs call kernel crypto
     APIs.

   - Support bpf_tail_call_static() helper for BPF programs with GCC 13.

   - Add bpf_preempt_{disable,enable}() kfuncs in order to allow a BPF
     program to have code sections where preemption is disabled.

  Driver API:

   - Skip software TC processing completely if all installed rules are
     marked as HW-only, instead of checking the HW-only flag rule by
     rule.

   - Add support for configuring PoE (Power over Ethernet), similar to
     the already existing support for PoDL (Power over Data Line)
     config.

   - Initial bits of a queue control API, for now allowing a single
     queue to be reset without disturbing packet flow to other queues.

   - Common (ethtool) statistics for hardware timestamping.

  Tests and tooling:

   - Remove the need to create a config file to run the net forwarding
     tests so that a naive "make run_tests" can exercise them.

   - Define a method of writing tests which require an external endpoint
     to communicate with (to send/receive data towards the test
     machine). Add a few such tests.

   - Create a shared code library for writing Python tests. Expose the
     YAML Netlink library from tools/ to the tests for easy Netlink
     access.

   - Move netfilter tests under net/, extend them, separate performance
     tests from correctness tests, and iron out issues found by running
     them "on every commit".

   - Refactor BPF selftests to use common network helpers.

   - Further work filling in YAML definitions of Netlink messages for:
     nftables, team driver, bonding interfaces, vlan interfaces, VF
     info, TC u32 mark, TC police action.

   - Teach Python YAML Netlink to decode attribute policies.

   - Extend the definition of the "indexed array" construct in the specs
     to cover arrays of scalars rather than just nests.

   - Add hyperlinks between definitions in generated Netlink docs.

  Drivers:

   - Make sure unsupported flower control flags are rejected by drivers,
     and make more drivers report errors directly to the application
     rather than dmesg (large number of driver changes from Asbjørn
     Sloth Tønnesen).

   - Ethernet high-speed NICs:
      - Broadcom (bnxt):
         - support multiple RSS contexts and steering traffic to them
         - support XDP metadata
         - make page pool allocations more NUMA aware
      - Intel (100G, ice, idpf):
         - extract datapath code common among Intel drivers into a library
         - use fewer resources in switchdev by sharing queues with the PF
         - add PFCP filter support
         - add Ethernet filter support
         - use a spinlock instead of HW lock in PTP clock ops
         - support 5 layer Tx scheduler topology
      - nVidia/Mellanox:
         - 800G link modes and 100G SerDes speeds
         - per-queue IRQ coalescing configuration
      - Marvell Octeon:
         - support offloading TC packet mark action

   - Ethernet NICs consumer, embedded and virtual:
      - stop lying about skb->truesize in USB Ethernet drivers, it
        messes up TCP memory calculations
      - Google cloud vNIC:
         - support changing ring size via ethtool
         - support ring reset using the queue control API
      - VirtIO net:
         - expose flow hash from RSS to XDP
         - per-queue statistics
         - add selftests
      - Synopsys (stmmac):
         - support controllers which require an RX clock signal from the
           MII bus to perform their hardware initialization
      - TI:
         - icssg_prueth: support ICSSG-based Ethernet on AM65x SR1.0 devices
         - icssg_prueth: add SW TX / RX Coalescing based on hrtimers
         - cpsw: minimal XDP support
      - Renesas (ravb):
         - support describing the MDIO bus
      - Realtek (r8169):
         - add support for RTL8168M
      - Microchip Sparx5:
         - matchall and flower actions mirred and redirect

   - Ethernet switches:
      - nVidia/Mellanox:
         - improve events processing performance
      - Marvell:
         - add support for MV88E6250 family internal PHYs
      - Microchip:
         - add DCB and DSCP mapping support for KSZ switches
         - vsc73xx: convert to PHYLINK
      - Realtek:
         - rtl8226b/rtl8221b: add C45 instances and SerDes switching

   - Many driver changes related to PHYLIB and PHYLINK deprecated API
     cleanup

   - Ethernet PHYs:
      - Add a new driver for Airoha EN8811H 2.5 Gigabit PHY.
      - micrel: lan8814: add support for PPS out and external timestamp trigger

   - WiFi:
      - Disable Wireless Extensions (WEXT) in all Wi-Fi 7 devices
        drivers. Modern devices can only be configured using nl80211.
      - mac80211/cfg80211
         - handle color change per link for WiFi 7 Multi-Link Operation
      - Intel (iwlwifi):
         - don't support puncturing in 5 GHz
         - support monitor mode on passive channels
         - BZ-W device support
         - P2P with HE/EHT support
         - re-add support for firmware API 90
         - provide channel survey information for Automatic Channel Selection
      - MediaTek (mt76):
         - mt7921 LED control
         - mt7925 EHT radiotap support
         - mt7920e PCI support
      - Qualcomm (ath11k):
         - P2P support for QCA6390, WCN6855 and QCA2066
         - support hibernation
         - ieee80211-freq-limit Device Tree property support
      - Qualcomm (ath12k):
         - refactoring in preparation of multi-link support
         - suspend and hibernation support
         - ACPI support
         - debugfs support, including dfs_simulate_radar support
      - RealTek:
         - rtw88: RTL8723CS SDIO device support
         - rtw89: RTL8922AE Wi-Fi 7 PCI device support
         - rtw89: complete features of new WiFi 7 chip 8922AE including
           BT-coexistence and Wake-on-WLAN
         - rtw89: use BIOS ACPI settings to set TX power and channels
         - rtl8xxxu: enable Management Frame Protection (MFP) support

   - Bluetooth:
      - support for Intel BlazarI and Filmore Peak2 (BE201)
      - support for MediaTek MT7921S SDIO
      - initial support for Intel PCIe BT driver
      - remove HCI_AMP support"

* tag 'net-next-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1827 commits)
  selftests: netfilter: fix packetdrill conntrack testcase
  net: gro: fix napi_gro_cb zeroed alignment
  Bluetooth: btintel_pcie: Refactor and code cleanup
  Bluetooth: btintel_pcie: Fix warning reported by sparse
  Bluetooth: hci_core: Fix not handling hdev->le_num_of_adv_sets=1
  Bluetooth: btintel: Fix compiler warning for multi_v7_defconfig config
  Bluetooth: btintel_pcie: Fix compiler warnings
  Bluetooth: btintel_pcie: Add *setup* function to download firmware
  Bluetooth: btintel_pcie: Add support for PCIe transport
  Bluetooth: btintel: Export few static functions
  Bluetooth: HCI: Remove HCI_AMP support
  Bluetooth: L2CAP: Fix div-by-zero in l2cap_le_flowctl_init()
  Bluetooth: qca: Fix error code in qca_read_fw_build_info()
  Bluetooth: hci_conn: Use __counted_by() and avoid -Wfamnae warning
  Bluetooth: btintel: Add support for Filmore Peak2 (BE201)
  Bluetooth: btintel: Add support for BlazarI
  LE Create Connection command timeout increased to 20 secs
  dt-bindings: net: bluetooth: Add MediaTek MT7921S SDIO Bluetooth
  Bluetooth: compute LE flow credits based on recvbuf space
  Bluetooth: hci_sync: Use cmd->num_cis instead of magic number
  ...
2024-05-14 19:42:24 -07:00
Linus Torvalds 4f8b6f25eb - Add a dm-crypt optional "high_priority" flag that enables the crypt
workqueues to use WQ_HIGHPRI.
 
 - Export dm-crypt workqueues via sysfs (by enabling WQ_SYSFS) to allow
   for improved visibility and controls over IO and crypt workqueues.
 
 - Fix dm-crypt to no longer constrain max_segment_size to PAGE_SIZE.
   This limit isn't needed given that the block core provides late bio
   splitting if bio exceeds underlying limits (e.g. max_segment_size).
 
 - Fix dm-crypt crypt_queue's use of WQ_UNBOUND to not use
   WQ_CPU_INTENSIVE because it is meaningless with WQ_UNBOUND.
 
 - Fix various issues with dm-delay target (ranging from a resource
   teardown fix, a fix for hung task when using kthread mode, and other
   improvements that followed from code inspection).
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAmZDiggACgkQxSPxCi2d
 A1oQrwf7BUHy7ehwCjRrVlFTteIlx0ULTpPxictakN/S+xZfcZUcbE20OjNVzdk9
 m5dx4Gn557rlMkiC4NDlHazVEVM5BbTpih27rvUgvX2nchUUdfHIT1OvU0isT4Yi
 h9g/o7i9DnBPjvyNjpXjP9YE7Xg8u2X9mxpv8DyU5M+QpFuofwzsfkCP7g14B0g2
 btGxT3AZ5Bo8A/csKeSqHq13Nbq/dcAZZ3IvjIg1xSXjm6CoQ04rfO0TN6SKfsFJ
 GXteBS2JT1MMcXf3oKweAeQduTE+psVFea7gt/8haKFldnV+DpPNg1gU/7rza5Os
 eL1+L1iPY5fuEJIkaqPjBVRGkxQqHg==
 =+OhH
 -----END PGP SIGNATURE-----

Merge tag 'for-6.10/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

 - Add a dm-crypt optional "high_priority" flag that enables the crypt
   workqueues to use WQ_HIGHPRI.

 - Export dm-crypt workqueues via sysfs (by enabling WQ_SYSFS) to allow
   for improved visibility and controls over IO and crypt workqueues.

 - Fix dm-crypt to no longer constrain max_segment_size to PAGE_SIZE.
   This limit isn't needed given that the block core provides late bio
   splitting if bio exceeds underlying limits (e.g. max_segment_size).

 - Fix dm-crypt crypt_queue's use of WQ_UNBOUND to not use
   WQ_CPU_INTENSIVE because it is meaningless with WQ_UNBOUND.

 - Fix various issues with dm-delay target (ranging from a resource
   teardown fix, a fix for hung task when using kthread mode, and other
   improvements that followed from code inspection).

* tag 'for-6.10/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm-delay: remove timer_lock
  dm-delay: change locking to avoid contention
  dm-delay: fix max_delay calculations
  dm-delay: fix hung task introduced by kthread mode
  dm-delay: fix workqueue delay_timer race
  dm-crypt: don't set WQ_CPU_INTENSIVE for WQ_UNBOUND crypt_queue
  dm: use queue_limits_set
  dm-crypt: stop constraining max_segment_size to PAGE_SIZE
  dm-crypt: export sysfs of all workqueues
  dm-crypt: add the optional "high_priority" flag
2024-05-14 18:34:19 -07:00
Linus Torvalds 103916ffe2 arm64 updates for 6.10
ACPI:
 * Support for the Firmware ACPI Control Structure (FACS) signature
   feature which is used to reboot out of hibernation on some systems.
 
 Kbuild:
 * Support for building Flat Image Tree (FIT) images, where the kernel
   Image is compressed alongside a set of devicetree blobs.
 
 Memory management:
 * Optimisation of our early page-table manipulation for creation of the
   linear mapping.
 
 * Support for userfaultfd write protection, which brings along some nice
   cleanups to our handling of invalid but present ptes.
 
 * Extend our use of range TLBI invalidation at EL1.
 
 Perf and PMUs:
 * Ensure that the 'pmu->parent' pointer is correctly initialised by PMU
   drivers.
 
 * Avoid allocating 'cpumask_t' types on the stack in some PMU drivers.
 
 * Fix parsing of the CPU PMU "version" field in assembly code, as it
   doesn't follow the usual architectural rules.
 
 * Add best-effort unwinding support for USER_STACKTRACE
 
 * Minor driver fixes and cleanups.
 
 Selftests:
 * Minor cleanups to the arm64 selftests (missing NULL check, unused
   variable).
 
 Miscellaneous
 * Add a command-line alias for disabling 32-bit application support.
 
 * Add part number for Neoverse-V2 CPUs.
 
 * Minor fixes and cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmY+IWkQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNBVNB/9JG4jlmgxzbTDoer0md31YFvWCDGeOKx1x
 g3XhE24W5w8eLXnc75p7/tOUKfo0TNWL4qdUs0hJCEUAOSy6a4Qz13bkkkvvBtDm
 nnHvEjidx5yprHggocsoTF29CKgHMJ3bt8rJe6g+O3Lp1JAFlXXNgplX5koeaVtm
 TtaFvX9MGyDDNkPIcQ/SQTFZJ2Oz51+ik6O8SYuGYtmAcR7MzlxH77lHl2mrF1bf
 Jzv/f5n0lS+Gt9tRuFWhbfEm4aKdUlLha4ufzUq42/vJvELboZbG3LqLxRG8DbqR
 +HvyZOG/xtu2dbzDqHkRumMToWmwzD4oBGSK4JAoJxeHavEdAvSG
 =JMvT
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Will Deacon:
 "The most interesting parts are probably the mm changes from Ryan which
  optimise the creation of the linear mapping at boot and (separately)
  implement write-protect support for userfaultfd.

  Outside of our usual directories, the Kbuild-related changes under
  scripts/ have been acked by Masahiro whilst the drivers/acpi/ parts
  have been acked by Rafael and the addition of cpumask_any_and_but()
  has been acked by Yury.

  ACPI:

   - Support for the Firmware ACPI Control Structure (FACS) signature
     feature which is used to reboot out of hibernation on some systems

  Kbuild:

   - Support for building Flat Image Tree (FIT) images, where the kernel
     Image is compressed alongside a set of devicetree blobs

  Memory management:

   - Optimisation of our early page-table manipulation for creation of
     the linear mapping

   - Support for userfaultfd write protection, which brings along some
     nice cleanups to our handling of invalid but present ptes

   - Extend our use of range TLBI invalidation at EL1

  Perf and PMUs:

   - Ensure that the 'pmu->parent' pointer is correctly initialised by
     PMU drivers

   - Avoid allocating 'cpumask_t' types on the stack in some PMU drivers

   - Fix parsing of the CPU PMU "version" field in assembly code, as it
     doesn't follow the usual architectural rules

   - Add best-effort unwinding support for USER_STACKTRACE

   - Minor driver fixes and cleanups

  Selftests:

   - Minor cleanups to the arm64 selftests (missing NULL check, unused
     variable)

  Miscellaneous:

   - Add a command-line alias for disabling 32-bit application support

   - Add part number for Neoverse-V2 CPUs

   - Minor fixes and cleanups"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (64 commits)
  arm64/mm: Fix pud_user_accessible_page() for PGTABLE_LEVELS <= 2
  arm64/mm: Add uffd write-protect support
  arm64/mm: Move PTE_PRESENT_INVALID to overlay PTE_NG
  arm64/mm: Remove PTE_PROT_NONE bit
  arm64/mm: generalize PMD_PRESENT_INVALID for all levels
  arm64: simplify arch_static_branch/_jump function
  arm64: Add USER_STACKTRACE support
  arm64: Add the arm64.no32bit_el0 command line option
  drivers/perf: hisi: hns3: Actually use devm_add_action_or_reset()
  drivers/perf: hisi: hns3: Fix out-of-bound access when valid event group
  drivers/perf: hisi_pcie: Fix out-of-bound access when valid event group
  kselftest: arm64: Add a null pointer check
  arm64: defer clearing DAIF.D
  arm64: assembler: update stale comment for disable_step_tsk
  arm64/sysreg: Update PIE permission encodings
  kselftest/arm64: Remove unused parameters in abi test
  perf/arm-spe: Assign parents for event_source device
  perf/arm-smmuv3: Assign parents for event_source device
  perf/arm-dsu: Assign parents for event_source device
  perf/arm-dmc620: Assign parents for event_source device
  ...
2024-05-14 11:09:39 -07:00
Linus Torvalds 9776dd3609 X86 interrupt handling update:
Support for posted interrupts on bare metal
 
     Posted interrupts is a virtualization feature which allows to inject
     interrupts directly into a guest without host interaction. The VT-d
     interrupt remapping hardware sets the bit which corresponds to the
     interrupt vector in a vector bitmap which is either used to inject the
     interrupt directly into the guest via a virtualized APIC or in case
     that the guest is scheduled out provides a host side notification
     interrupt which informs the host that an interrupt has been marked
     pending in the bitmap.
 
     This can be utilized on bare metal for scenarios where multiple
     devices, e.g. NVME storage, raise interrupts with a high frequency.  In
     the default mode these interrupts are handles independently and
     therefore require a full roundtrip of interrupt entry/exit.
 
     Utilizing posted interrupts this roundtrip overhead can be avoided by
     coalescing these interrupt entries to a single entry for the posted
     interrupt notification. The notification interrupt then demultiplexes
     the pending bits in a memory based bitmap and invokes the corresponding
     device specific handlers.
 
     Depending on the usage scenario and device utilization throughput
     improvements between 10% and 130% have been measured.
 
     As this is only relevant for high end servers with multiple device
     queues per CPU attached and counterproductive for situations where
     interrupts are arriving at distinct times, the functionality is opt-in
     via a kernel command line parameter.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmZBGUITHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYod3xD/98Xa4qZN7eceyyGUhgXnPLOKQzGQ7k
 7cmhsoAYjABeXLvuAvtKePL7ky7OPcqVW2E/g0+jdZuRDkRDbnVkM7CDMRTyL0/b
 BZLhVAXyANKjK79a5WvjL0zDasYQRQ16MQJ6TPa++mX0KhZSI7KvXWIqPWov5i02
 n8UbPUraH5bJi3qGKm6u4n2261Be1gtDag0ZjmGma45/3wsn3bWPoB7iPK6qxmq3
 Q7VARPXAcRp5wYACk6mCOM1dOXMUV9CgI5AUk92xGfXi4RAdsFeNSzeQWn9jHWOf
 CYbbJjNl4QmGP4IWmy6/Up4vIiEhUCOT2DmHsygrQTs/G+nPnMAe1qUuDuECiofj
 iToBL3hn1dHG8uINKOB81MJ33QEGWyYWY8PxxoR3LMTrhVpfChUlJO8T2XK5nu+i
 2EA6XLtJiHacpXhn8HQam0aQN9nvi4wT1LzpkhmboyCQuXTiXuJNbyLIh5TdFa1n
 DzqAGhRB67z6eGevJJ7kTI1X71W0poMwYlzCU8itnLOK8np0zFQ8bgwwqm9opZGq
 V2eSDuZAbqXVolzmaF8NSfM+b/R9URQtWsZ8cEc+/OdVV4HR4zfeqejy60TuV/4G
 39CTnn8vPBKcRSS6CAcJhKPhzIvHw4EMhoU4DJKBtwBdM58RyP9NY1wF3rIPJIGh
 sl61JBuYYuIZXg==
 =bqLN
 -----END PGP SIGNATURE-----

Merge tag 'x86-irq-2024-05-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 interrupt handling updates from Thomas Gleixner:
 "Add support for posted interrupts on bare metal.

  Posted interrupts is a virtualization feature which allows to inject
  interrupts directly into a guest without host interaction. The VT-d
  interrupt remapping hardware sets the bit which corresponds to the
  interrupt vector in a vector bitmap which is either used to inject the
  interrupt directly into the guest via a virtualized APIC or in case
  that the guest is scheduled out provides a host side notification
  interrupt which informs the host that an interrupt has been marked
  pending in the bitmap.

  This can be utilized on bare metal for scenarios where multiple
  devices, e.g. NVME storage, raise interrupts with a high frequency. In
  the default mode these interrupts are handles independently and
  therefore require a full roundtrip of interrupt entry/exit.

  Utilizing posted interrupts this roundtrip overhead can be avoided by
  coalescing these interrupt entries to a single entry for the posted
  interrupt notification. The notification interrupt then demultiplexes
  the pending bits in a memory based bitmap and invokes the
  corresponding device specific handlers.

  Depending on the usage scenario and device utilization throughput
  improvements between 10% and 130% have been measured.

  As this is only relevant for high end servers with multiple device
  queues per CPU attached and counterproductive for situations where
  interrupts are arriving at distinct times, the functionality is opt-in
  via a kernel command line parameter"

* tag 'x86-irq-2024-05-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/irq: Use existing helper for pending vector check
  iommu/vt-d: Enable posted mode for device MSIs
  iommu/vt-d: Make posted MSI an opt-in command line option
  x86/irq: Extend checks for pending vectors to posted interrupts
  x86/irq: Factor out common code for checking pending interrupts
  x86/irq: Install posted MSI notification handler
  x86/irq: Factor out handler invocation from common_interrupt()
  x86/irq: Set up per host CPU posted interrupt descriptors
  x86/irq: Reserve a per CPU IDT vector for posted MSIs
  x86/irq: Add a Kconfig option for posted MSI
  x86/irq: Remove bitfields in posted interrupt descriptor
  x86/irq: Unionize PID.PIR for 64bit access w/o casting
  KVM: VMX: Move posted interrupt descriptor out of VMX code
2024-05-14 10:01:29 -07:00
Linus Torvalds 6e5a0c30b6 Scheduler changes for v6.10:
- Add cpufreq pressure feedback for the scheduler
 
  - Rework misfit load-balancing wrt. affinity restrictions
 
  - Clean up and simplify the code around ::overutilized and
    ::overload access.
 
  - Simplify sched_balance_newidle()
 
  - Bump SCHEDSTAT_VERSION to 16 due to a cleanup of CPU_MAX_IDLE_TYPES
    handling that changed the output.
 
  - Rework & clean up <asm/vtime.h> interactions wrt. arch_vtime_task_switch()
 
  - Reorganize, clean up and unify most of the higher level
    scheduler balancing function names around the sched_balance_*()
    prefix.
 
  - Simplify the balancing flag code (sched_balance_running)
 
  - Miscellaneous cleanups & fixes
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmZBtA0RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gQEw//WiCiV7zTlWShSiG/g8GTfoAvl53QTWXF
 0jQ8TUcoIhxB5VeGgxVG1srYt8f505UXjH7L0MJLrbC3nOgRCg4NK57WiQEachKK
 HORIJHT0tMMsKIwX9D5Ovo4xYJn+j7mv7j/caB+hIlzZAbWk+zZPNWcS84p0ZS/4
 appY6RIcp7+cI7bisNMGUuNZS14+WMdWoX3TgoI6ekgDZ7Ky+kQvkwGEMBXsNElO
 qZOj6yS/QUE4Htwz0tVfd6h5svoPM/VJMIvl0yfddPGurfNw6jEh/fjcXnLdAzZ6
 9mgcosETncQbm0vfSac116lrrZIR9ygXW/yXP5S7I5dt+r+5pCrBZR2E5g7U4Ezp
 GjX1+6J9U6r6y12AMLRjadFOcDvxdwtszhZq4/wAcmS3B9dvupnH/w7zqY9ho3wr
 hTdtDHoAIzxJh7RNEHgeUC0/yQX3wJ9THzfYltDRIIjHTuvl4d5lHgsug+4Y9ClE
 pUIQm/XKouweQN9TZz2ULle4ZhRrR9sM9QfZYfirJ/RppmuKool4riWyQFQNHLCy
 mBRMjFFsTpFIOoZXU6pD4EabOpWdNrRRuND/0yg3WbDat2gBWq6jvSFv2UN1/v7i
 Un5jijTuN7t8yP5lY5Tyf47kQfLlA9bUx1v56KnF9mrpI87FyiDD3MiQVhDsvpGX
 rP96BIOrkSo=
 =obph
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:

 - Add cpufreq pressure feedback for the scheduler

 - Rework misfit load-balancing wrt affinity restrictions

 - Clean up and simplify the code around ::overutilized and
   ::overload access.

 - Simplify sched_balance_newidle()

 - Bump SCHEDSTAT_VERSION to 16 due to a cleanup of CPU_MAX_IDLE_TYPES
   handling that changed the output.

 - Rework & clean up <asm/vtime.h> interactions wrt arch_vtime_task_switch()

 - Reorganize, clean up and unify most of the higher level
   scheduler balancing function names around the sched_balance_*()
   prefix

 - Simplify the balancing flag code (sched_balance_running)

 - Miscellaneous cleanups & fixes

* tag 'sched-core-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
  sched/pelt: Remove shift of thermal clock
  sched/cpufreq: Rename arch_update_thermal_pressure() => arch_update_hw_pressure()
  thermal/cpufreq: Remove arch_update_thermal_pressure()
  sched/cpufreq: Take cpufreq feedback into account
  cpufreq: Add a cpufreq pressure feedback for the scheduler
  sched/fair: Fix update of rd->sg_overutilized
  sched/vtime: Do not include <asm/vtime.h> header
  s390/irq,nmi: Include <asm/vtime.h> header directly
  s390/vtime: Remove unused __ARCH_HAS_VTIME_TASK_SWITCH leftover
  sched/vtime: Get rid of generic vtime_task_switch() implementation
  sched/vtime: Remove confusing arch_vtime_task_switch() declaration
  sched/balancing: Simplify the sg_status bitmask and use separate ->overloaded and ->overutilized flags
  sched/fair: Rename set_rd_overutilized_status() to set_rd_overutilized()
  sched/fair: Rename SG_OVERLOAD to SG_OVERLOADED
  sched/fair: Rename {set|get}_rd_overload() to {set|get}_rd_overloaded()
  sched/fair: Rename root_domain::overload to ::overloaded
  sched/fair: Use helper functions to access root_domain::overload
  sched/fair: Check root_domain::overload value before update
  sched/fair: Combine EAS check with root_domain::overutilized access
  sched/fair: Simplify the continue_balancing logic in sched_balance_newidle()
  ...
2024-05-13 17:18:51 -07:00
Jakub Kicinski 6e62702feb bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZkGcZAAKCRDbK58LschI
 g6o6APwLsqhrM2w71VUN5ciCxu4H5VDtZp6wkdqtVbxxU4qNxQEApKgYgKt8ZLF3
 Kily5c7m+S4ZXhMX21rb8JhSAz0dfQk=
 =5Dk7
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2024-05-13

We've added 119 non-merge commits during the last 14 day(s) which contain
a total of 134 files changed, 9462 insertions(+), 4742 deletions(-).

The main changes are:

1) Add BPF JIT support for 32-bit ARCv2 processors, from Shahab Vahedi.

2) Add BPF range computation improvements to the verifier in particular
   around XOR and OR operators, refactoring of checks for range computation
   and relaxing MUL range computation so that src_reg can also be an unknown
   scalar, from Cupertino Miranda.

3) Add support to attach kprobe BPF programs through kprobe_multi link in
   a session mode, meaning, a BPF program is attached to both function entry
   and return, the entry program can decide if the return program gets
   executed and the entry program can share u64 cookie value with return
   program. Session mode is a common use-case for tetragon and bpftrace,
   from Jiri Olsa.

4) Fix a potential overflow in libbpf's ring__consume_n() and improve libbpf
   as well as BPF selftest's struct_ops handling, from Andrii Nakryiko.

5) Improvements to BPF selftests in context of BPF gcc backend,
   from Jose E. Marchesi & David Faust.

6) Migrate remaining BPF selftest tests from test_sock_addr.c to prog_test-
   -style in order to retire the old test, run it in BPF CI and additionally
   expand test coverage, from Jordan Rife.

7) Big batch for BPF selftest refactoring in order to remove duplicate code
   around common network helpers, from Geliang Tang.

8) Another batch of improvements to BPF selftests to retire obsolete
   bpf_tcp_helpers.h as everything is available vmlinux.h,
   from Martin KaFai Lau.

9) Fix BPF map tear-down to not walk the map twice on free when both timer
   and wq is used, from Benjamin Tissoires.

10) Fix BPF verifier assumptions about socket->sk that it can be non-NULL,
    from Alexei Starovoitov.

11) Change BTF build scripts to using --btf_features for pahole v1.26+,
    from Alan Maguire.

12) Small improvements to BPF reusing struct_size() and krealloc_array(),
    from Andy Shevchenko.

13) Fix s390 JIT to emit a barrier for BPF_FETCH instructions,
    from Ilya Leoshkevich.

14) Extend TCP ->cong_control() callback in order to feed in ack and
    flag parameters and allow write-access to tp->snd_cwnd_stamp
    from BPF program, from Miao Xu.

15) Add support for internal-only per-CPU instructions to inline
    bpf_get_smp_processor_id() helper call for arm64 and riscv64 BPF JITs,
    from Puranjay Mohan.

16) Follow-up to remove the redundant ethtool.h from tooling infrastructure,
    from Tushar Vyavahare.

17) Extend libbpf to support "module:<function>" syntax for tracing
    programs, from Viktor Malik.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (119 commits)
  bpf: make list_for_each_entry portable
  bpf: ignore expected GCC warning in test_global_func10.c
  bpf: disable strict aliasing in test_global_func9.c
  selftests/bpf: Free strdup memory in xdp_hw_metadata
  selftests/bpf: Fix a few tests for GCC related warnings.
  bpf: avoid gcc overflow warning in test_xdp_vlan.c
  tools: remove redundant ethtool.h from tooling infra
  selftests/bpf: Expand ATTACH_REJECT tests
  selftests/bpf: Expand getsockname and getpeername tests
  sefltests/bpf: Expand sockaddr hook deny tests
  selftests/bpf: Expand sockaddr program return value tests
  selftests/bpf: Retire test_sock_addr.(c|sh)
  selftests/bpf: Remove redundant sendmsg test cases
  selftests/bpf: Migrate ATTACH_REJECT test cases
  selftests/bpf: Migrate expected_attach_type tests
  selftests/bpf: Migrate wildcard destination rewrite test
  selftests/bpf: Migrate sendmsg6 v4 mapped address tests
  selftests/bpf: Migrate sendmsg deny test cases
  selftests/bpf: Migrate WILDCARD_IP test
  selftests/bpf: Handle SYSCALL_EPERM and SYSCALL_ENOTSUPP test cases
  ...
====================

Link: https://lore.kernel.org/r/20240513134114.17575-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-13 16:41:10 -07:00
Linus Torvalds 8815da98e0 Another not-too-busy cycle for documentation, including:
- Some build-system changes to detect the variable fonts installed by some
   distributions that can break the PDF build.
 
 - Various updates and additions to the Spanish, Chinese, Italian, and
   Japanese translations.
 
 - Update the stable-kernel rules to match modern practice
 
 ...and the usual array of corrections, updates, and typo fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmY9ASYACgkQF0NaE2wM
 flhPAwf/SYwHTBhKo0Xy3WsY3PHm4hsYVDwQ/Nfr6oa1mF+x4npxcN1RzPJd8iB9
 zXlynnBkptwvEoukJV2hw+gVwO9ixyqJzIt7AmRFgA5cywhklpxQQAVelQG4ISR2
 8M7LOXIjROJdY3OymPcQ2YF1m000tB9Khx7uvWrvMZEasXND/ITi9mFIJiOk841C
 5wGTHmYKjJwuqTm6CsghAgLJkRYGHD+gtp4w8wQwQzIHJ6B8SnbVPSnYYqJ8Qt/V
 31AEBgV3WJhmNiyNgP/p3rtDTCXBowSK8klOMa5CW3FQEIb4SQL/uBZ8qR8FQo2c
 l1zsuPKKJOqe9T+POWHXdjoryZn1Ug==
 =8fUD
 -----END PGP SIGNATURE-----

Merge tag 'docs-6.10' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "Another not-too-busy cycle for documentation, including:

   - Some build-system changes to detect the variable fonts installed by
     some distributions that can break the PDF build.

   - Various updates and additions to the Spanish, Chinese, Italian, and
     Japanese translations.

   - Update the stable-kernel rules to match modern practice

  ... and the usual array of corrections, updates, and typo fixes"

* tag 'docs-6.10' of git://git.lwn.net/linux: (42 commits)
  cgroup: Add documentation for missing zswap memory.stat
  kernel-doc: Added "*" in $type_constants2 to fix 'make htmldocs' warning.
  docs:core-api: fixed typos and grammar in printk-index page
  Documentation: tracing: Fix spelling mistakes
  docs/zh_CN/rust: Update the translation of quick-start to 6.9-rc4
  docs/zh_CN/rust: Update the translation of general-information to 6.9-rc4
  docs/zh_CN/rust: Update the translation of coding-guidelines to 6.9-rc4
  docs/zh_CN/rust: Update the translation of arch-support to 6.9-rc4
  docs: stable-kernel-rules: fix typo sent->send
  docs/zh_CN: remove two inconsistent spaces
  docs: scripts/check-variable-fonts.sh: Improve commands for detection
  docs: stable-kernel-rules: create special tag to flag 'no backporting'
  docs: stable-kernel-rules: explain use of stable@kernel.org (w/o @vger.)
  docs: stable-kernel-rules: remove code-labels tags and a indention level
  docs: stable-kernel-rules: call mainline by its name and change example
  docs: stable-kernel-rules: reduce redundancy
  docs, kprobes: Add riscv as supported architecture
  Docs: typos/spelling
  docs: kernel_include.py: Cope with docutils 0.21
  docs: ja_JP/howto: Catch up update in v6.8
  ...
2024-05-13 10:51:53 -07:00
Linus Torvalds c024814828 Hi,
This is pull request for trusted keys subsystem containing a new key
 type for the Data Co-Processor (DCP), which is an IP core built into
 many NXP SoCs such as i.mx6ull.
 
 BR, Jarkko
 -----BEGIN PGP SIGNATURE-----
 
 iJYEABYKAD4WIQRE6pSOnaBC00OEHEIaerohdGur0gUCZjzswCAcamFya2tvLnNh
 a2tpbmVuQGxpbnV4LmludGVsLmNvbQAKCRAaerohdGur0iVQAP9lxVjTKjMHQB01
 KFAXUogNU42JuJjzEiC5TaDxFPNHlAEAqVBYnPIZdP4VMF3UalVgIu/eRfvxTW/t
 klC+q7WiEwg=
 =+33z
 -----END PGP SIGNATURE-----

Merge tag 'keys-trusted-next-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd

Pull trusted keys updates from Jarkko Sakkinen:
 "This contains a new key type for the Data Co-Processor (DCP), which is
  an IP core built into many NXP SoCs such as i.mx6ull"

* tag 'keys-trusted-next-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
  docs: trusted-encrypted: add DCP as new trust source
  docs: document DCP-backed trusted keys kernel params
  MAINTAINERS: add entry for DCP-based trusted keys
  KEYS: trusted: Introduce NXP DCP-backed trusted keys
  KEYS: trusted: improve scalability of trust source config
  crypto: mxs-dcp: Add support for hardware-bound keys
2024-05-13 10:38:13 -07:00
Illia Ostapyshyn 62158261a8 docs: cgroup-v1: Update page cache removal functions
Commit 452e9e6992 ("filemap: Add filemap_remove_folio and
__filemap_remove_folio") reimplemented __delete_from_page_cache() as
__filemap_remove_folio() and delete_from_page_cache() as
filemap_remove_folio().  The compatibility wrappers were finally removed
in ece62684dc ("hugetlbfs: convert hugetlb_delete_from_page_cache() to
use folios") and 6ffcd825e7 ("mm: Remove __delete_from_page_cache()").

Update the remaining references to dead functions in the memcg
implementation memo.

Signed-off-by: Illia Ostapyshyn <illia@yshyn.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-05-13 07:00:43 -10:00
Linus Torvalds c0b9620bc3 RCU pull request for v6.10
This pull request contains the following branches:
 
 fixes.2024.04.15a: Fix a lockdep complain for lazy-preemptible kernel,
 remove redundant BH disable for TINY_RCU, remove redundant READ_ONCE()
 in tree.c, fix false positives KCSAN splat and fix buffer overflow in
 the print_cpu_stall_info().
 
 misc.2024.04.12a: Misc updates related to bpf, tracing and update the
 MAINTAINERS file.
 
 rcu-sync-normal-improve.2024.04.15a: An improvement of a normal
 synchronize_rcu() call in terms of latency. It maintains a separate
 track for sync. users only. This approach bypasses per-cpu nocb-lists
 thus sync-users do not depend on nocb-list length and how fast regular
 callbacks are processed.
 
 rcu-tasks.2024.04.15a: RCU tasks, switch tasks RCU grace periods to
 sleep at TASK_IDLE priority, fix some comments, add some diagnostic
 warning to the exit_tasks_rcu_start() and fix a buffer overflow in
 the show_rcu_tasks_trace_gp_kthread().
 
 rcutorture.2024.04.15a: Increase memory to guest OS, fix a Tasks
 Rude RCU testing, some updates for TREE09, dump mode information
 to debug GP kthread state, remove redundant READ_ONCE(), fix some
 comments about RCU_TORTURE_PIPE_LEN and pipe_count, remove some
 redundant pointer initialization, fix a hung splat task by when
 the rcutorture tests start to exit, fix invalid context warning,
 add '--do-kvfree' parameter to torture test and use slow register
 unregister callbacks only for rcutype test.
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEEu6QRe/mAUYNn5U0PBYqkjnKWLM8FAmYzsmUACgkQBYqkjnKW
 LM/FAwv+LcIJ9lO/wzUpnH3d3djBOPmyu7Us8ERNY5lcVZ+neS2m3vxq0kOk/cnV
 RGgZc7qjWqMQ9hAx/MmIodmiw036ceRDe5CP/Ec/TYx68m+NPG3VnP08s/xLXLlx
 n8aSJJu37y0ElMQMwvuQaoNJ2xqlZ8AHCR6iaqJtzmPBR6zHLyeCPVpdPJQfcSO7
 +9ABzqo8isGxeuaAE7y0WUp0ZsSpdYvdext5SStjtvZ+hKERdVluhBF+OxZIZByp
 RSBoZJrbTKKpzTUBSE0ci+mlfqBPmSVjjqvygscuwOoKhm+601E51DYb1QXkGujq
 vuc1f/c7VjTAXyvs9k4An2x3XcN5SFhA6Bhc+L6aU/UJBzAWrJJkVOwS79gHNSn1
 qshyhpDLE8MiBEi0QxaEmBZLkz3BX1aYbQA0+5wvgoz0u8QglrpRrPRIWUWC0wvq
 SOLIibZkJuPUOZuD5AP4tg80swTuSCvyWuiKUVRnJK9FsYKdcyNUCnOLIwUzQlrg
 1/hatlvS
 =cq8V
 -----END PGP SIGNATURE-----

Merge tag 'rcu.next.v6.10' of https://github.com/urezki/linux

Pull RCU updates from Uladzislau Rezki:

 - Fix a lockdep complain for lazy-preemptible kernel, remove redundant
   BH disable for TINY_RCU, remove redundant READ_ONCE() in tree.c, fix
   false positives KCSAN splat and fix buffer overflow in the
   print_cpu_stall_info().

 - Misc updates related to bpf, tracing and update the MAINTAINERS file.

 - An improvement of a normal synchronize_rcu() call in terms of
   latency. It maintains a separate track for sync. users only. This
   approach bypasses per-cpu nocb-lists thus sync-users do not depend on
   nocb-list length and how fast regular callbacks are processed.

 - RCU tasks: switch tasks RCU grace periods to sleep at TASK_IDLE
   priority, fix some comments, add some diagnostic warning to the
   exit_tasks_rcu_start() and fix a buffer overflow in the
   show_rcu_tasks_trace_gp_kthread().

 - RCU torture: Increase memory to guest OS, fix a Tasks Rude RCU
   testing, some updates for TREE09, dump mode information to debug GP
   kthread state, remove redundant READ_ONCE(), fix some comments about
   RCU_TORTURE_PIPE_LEN and pipe_count, remove some redundant pointer
   initialization, fix a hung splat task by when the rcutorture tests
   start to exit, fix invalid context warning, add '--do-kvfree'
   parameter to torture test and use slow register unregister callbacks
   only for rcutype test.

* tag 'rcu.next.v6.10' of https://github.com/urezki/linux: (48 commits)
  rcutorture: Use rcu_gp_slow_register/unregister() only for rcutype test
  torture: Scale --do-kvfree test time
  rcutorture: Fix invalid context warning when enable srcu barrier testing
  rcutorture: Make stall-tasks directly exit when rcutorture tests end
  rcutorture: Removing redundant function pointer initialization
  rcutorture: Make rcutorture support print rcu-tasks gp state
  rcutorture: Use the gp_kthread_dbg operation specified by cur_ops
  rcutorture: Re-use value stored to ->rtort_pipe_count instead of re-reading
  rcutorture: Fix rcu_torture_one_read() pipe_count overflow comment
  rcutorture: Remove extraneous rcu_torture_pipe_update_one() READ_ONCE()
  rcu: Allocate WQ with WQ_MEM_RECLAIM bit set
  rcu: Support direct wake-up of synchronize_rcu() users
  rcu: Add a trace event for synchronize_rcu_normal()
  rcu: Reduce synchronize_rcu() latency
  rcu: Fix buffer overflow in print_cpu_stall_info()
  rcu: Mollify sparse with RCU guard
  rcu-tasks: Fix show_rcu_tasks_trace_gp_kthread buffer overflow
  rcu-tasks: Fix the comments for tasks_rcu_exit_srcu_stall_timer
  rcu-tasks: Replace exit_tasks_rcu_start() initialization with WARN_ON_ONCE()
  rcu: Remove redundant CONFIG_PROVE_RCU #if condition
  ...
2024-05-13 09:49:06 -07:00
Linus Torvalds d65e1a0f30 - Store AP Query Configuration Information in a static buffer
- Rework the AP initialization and add missing cleanups to the error path
 
 - Swap IRQ and AP bus/device registration to avoid race conditions
 
 - Export prot_virt_guest symbol
 
 - Introduce AP configuration changes notifier interface to facilitate
   modularization of the AP bus
 
 - Add CONFIG_AP kernel configuration option to allow modularization of
   the AP bus
 
 - Rework CONFIG_ZCRYPT_DEBUG kernel configuration option description and
   dependency and rename it to CONFIG_AP_DEBUG
 
 - Convert sprintf() and snprintf() to sysfs_emit() in CIO code
 
 - Adjust indentation of RELOCS command build step
 
 - Make crypto performance counters upward compatible
 
 - Convert make_page_secure() and gmap_make_secure() to use folio
 
 - Rework channel-utilization-block (CUB) handling in preparation of
   introducing additional CUBs
 
 - Use attribute groups to simplify registration, removal and extension
   of measurement-related channel-path sysfs attributes
 
 - Add a per-channel-path binary "ext_measurement" sysfs attribute that
   provides access to extended channel-path measurement data
 
 - Export measurement data for all channel-measurement-groups (CMG), not
   only for a specific ones. This enables support of new CMG data formats
   in userspace without the need for kernel changes
 
 - Add a per-channel-path sysfs attribute "speed_bps" that provides the
   operating speed in bits per second or 0 if the operating speed is not
   available
 
 - The CIO tracepoint subchannel-type field "st" is incorrectly set to
   the value of subchannel-enabled SCHIB "ena" field. Fix that
 
 - Do not forcefully limit vmemmap starting address to MAX_PHYSMEM_BITS
 
 - Consider the maximum physical address available to a DCSS segment
   (512GB) when memory layout is set up
 
 - Simplify the virtual memory layout setup by reducing the size of
   identity mapping vs vmemmap overlap
 
 - Swap vmalloc and Lowcore/Real Memory Copy areas in virtual memory.
   This will allow to place the kernel image next to kernel modules
 
 - Move everyting KASLR related from <asm/setup.h> to <asm/page.h>
 
 - Put virtual memory layout information into a structure to improve
   code generation
 
 - Currently __kaslr_offset is the kernel offset in both physical and
   virtual memory spaces. Uncouple these offsets to allow uncoupling
   of the addresses spaces
 
 - Currently the identity mapping base address is implicit and is always
   set to zero. Make it explicit by putting into __identity_base persistent
   boot variable and use it in proper context
 
 - Introduce .amode31 section start and end macros AMODE31_START and
   AMODE31_END
 
 - Introduce OS_INFO entries that do not reference any data in memory,
   but rather provide only values
 
 - Store virtual memory layout in OS_INFO. It is read out by makedumpfile,
   crash and other tools
 
 - Store virtual memory layout in VMCORE_INFO. It is read out by crash and
   other tools when /proc/kcore device is used
 
 - Create additional PT_LOAD ELF program header that covers kernel image
   only, so that vmcore tools could locate kernel text and data when virtual
   and physical memory spaces are uncoupled
 
 - Uncouple physical and virtual address spaces
 
 - Map kernel at fixed location when KASLR mode is disabled. The location is
   defined by CONFIG_KERNEL_IMAGE_BASE kernel configuration value.
 
 - Rework deployment of kernel image for both compressed and uncompressed
   variants as defined by CONFIG_KERNEL_UNCOMPRESSED kernel configuration
   value
 
 - Move .vmlinux.relocs section in front of the compressed kernel.
   The interim section rescue step is avoided as result
 
 - Correct modules thunk offset calculation when branch target is more
   than 2GB away
 
 - Kernel modules contain their own set of expoline thunks. Now that the
   kernel modules area is less than 4GB away from kernel expoline thunks,
   make modules use kernel expolines. Also make EXPOLINE_EXTERN the default
   if the compiler supports it
 
 - userfaultfd can insert shared zeropages into processes running VMs,
   but that is not allowed for s390. Fallback to allocating a fresh
   zeroed anonymous folio and insert that instead
 
 - Re-enable shared zeropages for non-PV and non-skeys KVM guests
 
 - Rename hex2bitmap() to ap_hex2bitmap() and export it for external use
 
 - Add ap_config sysfs attribute to provide the means for setting or
   displaying adapters, domains and control domains assigned to a vfio-ap
   mediated device in a single operation
 
 - Make vfio_ap_mdev_link_queue() ignore duplicate link requests
 
 - Add write support to ap_config sysfs attribute to allow atomic update
   a vfio-ap mediated device state
 
 - Document ap_config sysfs attribute
 
 - Function os_info_old_init() is expected to be called only from a regular
   kdump kernel. Enable it to be called from a stand-alone dump kernel
 
 - Address gcc -Warray-bounds warning and fix array size in struct os_info
 
 - s390 does not support SMBIOS, so drop unneeded CONFIG_DMI checks
 
 - Use unwinder instead of __builtin_return_address() with ftrace to
   prevent returning of undefined values
 
 - Sections .hash and .gnu.hash are only created when CONFIG_PIE_BUILD
   kernel is enabled. Drop these for the case CONFIG_PIE_BUILD is disabled
 
 - Compile kernel with -fPIC and link with -no-pie to allow kpatch feature
   always succeed and drop the whole CONFIG_PIE_BUILD option-enabled code
 
 - Add missing virt_to_phys() converter for VSIE facility and crypto
   control blocks
 -----BEGIN PGP SIGNATURE-----
 
 iI0EABYIADUWIQQrtrZiYVkVzKQcYivNdxKlNrRb8AUCZjkp5xccYWdvcmRlZXZA
 bGludXguaWJtLmNvbQAKCRDNdxKlNrRb8D99AQCEby+KHssuZe9m0NvvikWREYBC
 myqob4EmdU3KdTEbNAEAt2OB7mzSQc90yjawI+Je7vwVyh3uc2Nb4Qg05yO6owI=
 =eOYN
 -----END PGP SIGNATURE-----

Merge tag 's390-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 updates from Alexander Gordeev:

 - Store AP Query Configuration Information in a static buffer

 - Rework the AP initialization and add missing cleanups to the error
   path

 - Swap IRQ and AP bus/device registration to avoid race conditions

 - Export prot_virt_guest symbol

 - Introduce AP configuration changes notifier interface to facilitate
   modularization of the AP bus

 - Add CONFIG_AP kernel configuration option to allow modularization of
   the AP bus

 - Rework CONFIG_ZCRYPT_DEBUG kernel configuration option description
   and dependency and rename it to CONFIG_AP_DEBUG

 - Convert sprintf() and snprintf() to sysfs_emit() in CIO code

 - Adjust indentation of RELOCS command build step

 - Make crypto performance counters upward compatible

 - Convert make_page_secure() and gmap_make_secure() to use folio

 - Rework channel-utilization-block (CUB) handling in preparation of
   introducing additional CUBs

 - Use attribute groups to simplify registration, removal and extension
   of measurement-related channel-path sysfs attributes

 - Add a per-channel-path binary "ext_measurement" sysfs attribute that
   provides access to extended channel-path measurement data

 - Export measurement data for all channel-measurement-groups (CMG), not
   only for a specific ones. This enables support of new CMG data
   formats in userspace without the need for kernel changes

 - Add a per-channel-path sysfs attribute "speed_bps" that provides the
   operating speed in bits per second or 0 if the operating speed is not
   available

 - The CIO tracepoint subchannel-type field "st" is incorrectly set to
   the value of subchannel-enabled SCHIB "ena" field. Fix that

 - Do not forcefully limit vmemmap starting address to MAX_PHYSMEM_BITS

 - Consider the maximum physical address available to a DCSS segment
   (512GB) when memory layout is set up

 - Simplify the virtual memory layout setup by reducing the size of
   identity mapping vs vmemmap overlap

 - Swap vmalloc and Lowcore/Real Memory Copy areas in virtual memory.
   This will allow to place the kernel image next to kernel modules

 - Move everyting KASLR related from <asm/setup.h> to <asm/page.h>

 - Put virtual memory layout information into a structure to improve
   code generation

 - Currently __kaslr_offset is the kernel offset in both physical and
   virtual memory spaces. Uncouple these offsets to allow uncoupling of
   the addresses spaces

 - Currently the identity mapping base address is implicit and is always
   set to zero. Make it explicit by putting into __identity_base
   persistent boot variable and use it in proper context

 - Introduce .amode31 section start and end macros AMODE31_START and
   AMODE31_END

 - Introduce OS_INFO entries that do not reference any data in memory,
   but rather provide only values

 - Store virtual memory layout in OS_INFO. It is read out by
   makedumpfile, crash and other tools

 - Store virtual memory layout in VMCORE_INFO. It is read out by crash
   and other tools when /proc/kcore device is used

 - Create additional PT_LOAD ELF program header that covers kernel image
   only, so that vmcore tools could locate kernel text and data when
   virtual and physical memory spaces are uncoupled

 - Uncouple physical and virtual address spaces

 - Map kernel at fixed location when KASLR mode is disabled. The
   location is defined by CONFIG_KERNEL_IMAGE_BASE kernel configuration
   value.

 - Rework deployment of kernel image for both compressed and
   uncompressed variants as defined by CONFIG_KERNEL_UNCOMPRESSED kernel
   configuration value

 - Move .vmlinux.relocs section in front of the compressed kernel. The
   interim section rescue step is avoided as result

 - Correct modules thunk offset calculation when branch target is more
   than 2GB away

 - Kernel modules contain their own set of expoline thunks. Now that the
   kernel modules area is less than 4GB away from kernel expoline
   thunks, make modules use kernel expolines. Also make EXPOLINE_EXTERN
   the default if the compiler supports it

 - userfaultfd can insert shared zeropages into processes running VMs,
   but that is not allowed for s390. Fallback to allocating a fresh
   zeroed anonymous folio and insert that instead

 - Re-enable shared zeropages for non-PV and non-skeys KVM guests

 - Rename hex2bitmap() to ap_hex2bitmap() and export it for external use

 - Add ap_config sysfs attribute to provide the means for setting or
   displaying adapters, domains and control domains assigned to a
   vfio-ap mediated device in a single operation

 - Make vfio_ap_mdev_link_queue() ignore duplicate link requests

 - Add write support to ap_config sysfs attribute to allow atomic update
   a vfio-ap mediated device state

 - Document ap_config sysfs attribute

 - Function os_info_old_init() is expected to be called only from a
   regular kdump kernel. Enable it to be called from a stand-alone dump
   kernel

 - Address gcc -Warray-bounds warning and fix array size in struct
   os_info

 - s390 does not support SMBIOS, so drop unneeded CONFIG_DMI checks

 - Use unwinder instead of __builtin_return_address() with ftrace to
   prevent returning of undefined values

 - Sections .hash and .gnu.hash are only created when CONFIG_PIE_BUILD
   kernel is enabled. Drop these for the case CONFIG_PIE_BUILD is
   disabled

 - Compile kernel with -fPIC and link with -no-pie to allow kpatch
   feature always succeed and drop the whole CONFIG_PIE_BUILD
   option-enabled code

 - Add missing virt_to_phys() converter for VSIE facility and crypto
   control blocks

* tag 's390-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (54 commits)
  Revert "s390: Relocate vmlinux ELF data to virtual address space"
  KVM: s390: vsie: Use virt_to_phys for crypto control block
  s390: Relocate vmlinux ELF data to virtual address space
  s390: Compile kernel with -fPIC and link with -no-pie
  s390: vmlinux.lds.S: Drop .hash and .gnu.hash for !CONFIG_PIE_BUILD
  s390/ftrace: Use unwinder instead of __builtin_return_address()
  s390/pci: Drop unneeded reference to CONFIG_DMI
  s390/os_info: Fix array size in struct os_info
  s390/os_info: Initialize old os_info in standalone dump kernel
  docs: Update s390 vfio-ap doc for ap_config sysfs attribute
  s390/vfio-ap: Add write support to sysfs attr ap_config
  s390/vfio-ap: Ignore duplicate link requests in vfio_ap_mdev_link_queue
  s390/vfio-ap: Add sysfs attr, ap_config, to export mdev state
  s390/ap: Externalize AP bus specific bitmap reading function
  s390/mm: Re-enable the shared zeropage for !PV and !skeys KVM guests
  mm/userfaultfd: Do not place zeropages when zeropages are disallowed
  s390/expoline: Make modules use kernel expolines
  s390/nospec: Correct modules thunk offset calculation
  s390/boot: Do not rescue .vmlinux.relocs section
  s390/boot: Rework deployment of the kernel image
  ...
2024-05-13 08:33:52 -07:00
Shahab Vahedi f122668ddc ARC: Add eBPF JIT support
This will add eBPF JIT support to the 32-bit ARCv2 processors. The
implementation is qualified by running the BPF tests on a Synopsys HSDK
board with "ARC HS38 v2.1c at 500 MHz" as the 4-core CPU.

The test_bpf.ko reports 2-10 fold improvements in execution time of its
tests. For instance:

test_bpf: #33 tcpdump port 22 jited:0 704 1766 2104 PASS
test_bpf: #33 tcpdump port 22 jited:1 120  224  260 PASS

test_bpf: #141 ALU_DIV_X: 4294967295 / 4294967295 = 1 jited:0 238 PASS
test_bpf: #141 ALU_DIV_X: 4294967295 / 4294967295 = 1 jited:1  23 PASS

test_bpf: #776 JMP32_JGE_K: all ... magnitudes jited:0 2034681 PASS
test_bpf: #776 JMP32_JGE_K: all ... magnitudes jited:1 1020022 PASS

Deployment and structure
------------------------
The related codes are added to "arch/arc/net":

- bpf_jit.h       -- The interface that a back-end translator must provide
- bpf_jit_core.c  -- Knows how to handle the input eBPF byte stream
- bpf_jit_arcv2.c -- The back-end code that knows the translation logic

The bpf_int_jit_compile() at the end of bpf_jit_core.c is the entrance
to the whole process. Normally, the translation is done in one pass,
namely the "normal pass". In case some relocations are not known during
this pass, some data (arc_jit_data) is allocated for the next pass to
come. This possible next (and last) pass is called the "extra pass".

1. Normal pass       # The necessary pass
     1a. Dry run       # Get the whole JIT length, epilogue offset, etc.
     1b. Emit phase    # Allocate memory and start emitting instructions
2. Extra pass        # Only needed if there are relocations to be fixed
     2a. Patch relocations

Support status
--------------
The JIT compiler supports BPF instructions up to "cpu=v4". However, it
does not yet provide support for:

- Tail calls
- Atomic operations
- 64-bit division/remainder
- BPF_PROBE_MEM* (exception table)

The result of "test_bpf" test suite on an HSDK board is:

hsdk-lnx# insmod test_bpf.ko test_suite=test_bpf

  test_bpf: Summary: 863 PASSED, 186 FAILED, [851/851 JIT'ed]

All the failing test cases are due to the ones that were not JIT'ed.
Categorically, they can be represented as:

  .-----------.------------.-------------.
  | test type |   opcodes  | # of cases  |
  |-----------+------------+-------------|
  | atomic    | 0xC3, 0xDB |         149 |
  | div64     | 0x37, 0x3F |          22 |
  | mod64     | 0x97, 0x9F |          15 |
  `-----------^------------+-------------|
                           | (total) 186 |
                           `-------------'

Setup: build config
-------------------
The following configs must be set to have a working JIT test:

  CONFIG_BPF_JIT=y
  CONFIG_BPF_JIT_ALWAYS_ON=y
  CONFIG_TEST_BPF=m

The following options are not necessary for the tests module,
but are good to have:

  CONFIG_DEBUG_INFO=y             # prerequisite for below
  CONFIG_DEBUG_INFO_BTF=y         # so bpftool can generate vmlinux.h

  CONFIG_FTRACE=y                 #
  CONFIG_BPF_SYSCALL=y            # all these options lead to
  CONFIG_KPROBE_EVENTS=y          # having CONFIG_BPF_EVENTS=y
  CONFIG_PERF_EVENTS=y            #

Some BPF programs provide data through /sys/kernel/debug:
  CONFIG_DEBUG_FS=y
arc# mount -t debugfs debugfs /sys/kernel/debug

Setup: elfutils
---------------
The libdw.{so,a} library that is used by pahole for processing
the final binary must come from elfutils 0.189 or newer. The
support for ARCv2 [1] has been added since that version.

[1]
https://sourceware.org/git/?p=elfutils.git;a=commit;h=de3d46b3e7

Setup: pahole
-------------
The line below in linux/scripts/Makefile.btf must be commented out:

pahole-flags-$(call test-ge, $(pahole-ver), 121) += --btf_gen_floats

Or else, the build will fail:

$ make V=1
  ...
  BTF     .btf.vmlinux.bin.o
pahole -J --btf_gen_floats                    \
       -j --lang_exclude=rust                 \
       --skip_encoding_btf_inconsistent_proto \
       --btf_gen_optimized .tmp_vmlinux.btf
Complex, interval and imaginary float types are not supported
Encountered error while encoding BTF.
  ...
  BTFIDS  vmlinux
./tools/bpf/resolve_btfids/resolve_btfids vmlinux
libbpf: failed to find '.BTF' ELF section in vmlinux
FAILED: load BTF from vmlinux: No data available

This is due to the fact that the ARC toolchains generate
"complex float" DIE entries in libgcc and at the moment, pahole
can't handle such entries.

Running the tests
-----------------
host$ scp /bld/linux/lib/test_bpf.ko arc:
arc # sysctl net.core.bpf_jit_enable=1
arc # insmod test_bpf.ko test_suite=test_bpf
      ...
      test_bpf: #1048 Staggered jumps: JMP32_JSLE_X jited:1 697811 PASS
      test_bpf: Summary: 863 PASSED, 186 FAILED, [851/851 JIT'ed]

Acknowledgments
---------------
- Claudiu Zissulescu for his unwavering support
- Yuriy Kolerov for testing and troubleshooting
- Vladimir Isaev for the pahole workaround
- Sergey Matyukevich for paving the road by adding the interpreter support

Signed-off-by: Shahab Vahedi <shahab@synopsys.com>
Link: https://lore.kernel.org/r/20240430145604.38592-1-list+bpf@vahedi.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-12 16:51:36 -07:00
Usama Arif db5b4f3253 cgroup: Add documentation for missing zswap memory.stat
This includes zswpin, zswpout and zswpwb.

Signed-off-by: Usama Arif <usamaarif642@gmail.com>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20240502185307.3942173-2-usamaarif642@gmail.com>
2024-05-09 10:54:37 -06:00
David Gstir b85b253e23 docs: document DCP-backed trusted keys kernel params
Document the kernel parameters trusted.dcp_use_otp_key
and trusted.dcp_skip_zk_test for DCP-backed trusted keys.

Co-developed-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Richard Weinberger <richard@nod.at>
Co-developed-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
Signed-off-by: David Gstir <david@sigma-star.at>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2024-05-09 18:29:03 +03:00
Will Deacon 42e7ddbaf1 Merge branch 'for-next/perf' into for-next/core
* for-next/perf: (41 commits)
  arm64: Add USER_STACKTRACE support
  drivers/perf: hisi: hns3: Actually use devm_add_action_or_reset()
  drivers/perf: hisi: hns3: Fix out-of-bound access when valid event group
  drivers/perf: hisi_pcie: Fix out-of-bound access when valid event group
  perf/arm-spe: Assign parents for event_source device
  perf/arm-smmuv3: Assign parents for event_source device
  perf/arm-dsu: Assign parents for event_source device
  perf/arm-dmc620: Assign parents for event_source device
  perf/arm-ccn: Assign parents for event_source device
  perf/arm-cci: Assign parents for event_source device
  perf/alibaba_uncore: Assign parents for event_source device
  perf/arm_pmu: Assign parents for event_source devices
  perf/imx_ddr: Assign parents for event_source devices
  perf/qcom: Assign parents for event_source devices
  Documentation: qcom-pmu: Use /sys/bus/event_source/devices paths
  perf/riscv: Assign parents for event_source devices
  perf/thunderx2: Assign parents for event_source devices
  Documentation: thunderx2-pmu: Use /sys/bus/event_source/devices paths
  perf/xgene: Assign parents for event_source devices
  Documentation: xgene-pmu: Use /sys/bus/event_source/devices paths
  ...
2024-05-09 15:56:10 +01:00
Linus Torvalds 1ab1a19db1 pci-v6.9-fixes-2
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmY61iEUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vwtwA//Zw8a27/+cHuciYCOYMIrjhucBUCc
 qHBdzDWTy+h3gkfbcRFfXs3XaIBhlGbtI1d0GG5FyMuqicxCsF/mCIyc2LSTMIUo
 4201qVl/EGrNIBhOVcZtK+CFQmwmw1AaBdz7q4dS4/549xXGQ+/8DibAjfUlcDgC
 2iIkcvfNW9Hj9n4tFezNSPLewGVgFY2yFpImLHZc2hAuSXQ0P0D9JEDUUVVIWg/c
 PSJQKKita/fxgKk8RRCTRdpVezAtd7QO8V4Ae5gGH+oho4nRvCO0kYGteglx7/ab
 ReNtfNUPJN9h7M5ZYpyiNp1aZTaMEp3P+gMsD9ohV0/+5MNNAiZhDLPguQaEy/2n
 ZiQh5K3vwQb2NStJXauiBqJ+NHeqf8m3mk76X3/hxma6wqDfEOsRvYaexwY+Wxfa
 I0tzjZF1LBepsoFyDJM/5S+3nCJoqaCUAy1ZbGXwsBAAzZHw6x9+ieJfJhnCOL96
 kkNiNlxs8OJTTMl6F8W88NvMnhmCF0JxSOVTfxTaVwCaD6GwpnMNpXSEXqXPlVL1
 jMRHr/hZ7JjHarELC/TGe1uUmPsBhIym862XV+7E+9uUbcWldwjBijuSZQ9zUpX2
 UL0Cc2gJzyh/GwpeDVCyBGaINxzNVq3D5H6rYHlWQP+dp59stt/UBqBWJmerTHjy
 QQ+9+XS/CiElUL8=
 =EQN9
 -----END PGP SIGNATURE-----

Merge tag 'pci-v6.9-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci fixes from Bjorn Helgaas:

 - Update kernel-parameters doc to describe "pcie_aspm=off" more
   accurately (Bjorn Helgaas)

 - Restore the parent's (not the child's) ASPM state to the parent
   during resume, which fixes a reboot during resume (Kai-Heng Feng)

* tag 'pci-v6.9-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  PCI/ASPM: Restore parent state to parent, child state to child
  PCI/ASPM: Clarify that pcie_aspm=off means leave ASPM untouched
2024-05-08 09:37:58 -07:00
Bjorn Helgaas 2e0239d47d PCI/ASPM: Clarify that pcie_aspm=off means leave ASPM untouched
Previously we claimed "pcie_aspm=off" meant that ASPM would be disabled,
which is wrong.

Correct this to say that with "pcie_aspm=off", Linux doesn't touch any ASPM
configuration at all.  ASPM may have been enabled by firmware, and that
will be left unchanged.  See "aspm_support_enabled".

Link: https://lore.kernel.org/r/20240429191821.691726-1-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: David E. Box <david.e.box@linux.intel.com>
2024-05-03 11:45:32 -05:00
Andrea della Porta 1279e8d0dc arm64: Add the arm64.no32bit_el0 command line option
Introducing the field 'el0' to the idreg-override for register
ID_AA64PFR0_EL1. This field is also aliased to the new kernel
command line option 'arm64.no32bit_el0' as a more recognizable
and mnemonic name to disable the execution of 32 bit userspace
applications (i.e. avoid Aarch32 execution state in EL0) from
kernel command line.

Link: https://lore.kernel.org/all/20240207105847.7739-1-andrea.porta@suse.com/
Signed-off-by: Andrea della Porta <andrea.porta@suse.com>
Link: https://lore.kernel.org/r/20240429102833.6426-1-andrea.porta@suse.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-05-03 13:08:06 +01:00
Remington Brasga da51bbcdba Docs: typos/spelling
Fix spelling and grammar in Docs descriptions

Signed-off-by: Remington Brasga <rbrasga@uci.edu>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240429225527.2329-1-rbrasga@uci.edu
2024-05-02 10:02:29 -06:00
Jacob Pan be9be07b22 iommu/vt-d: Make posted MSI an opt-in command line option
Add a command line opt-in option for posted MSI if CONFIG_X86_POSTED_MSI=y.

Also introduce a helper function for testing if posted MSI is supported on
the platform.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240423174114.526704-12-jacob.jun.pan@linux.intel.com
2024-04-30 00:54:43 +02:00
Bingbu Cao ba124c8cf3 media: Documentation: add Intel IPU6 ISYS driver admin-guide doc
This document mainly describe the functionality of IPU6 and IPU6 isys
driver, and gives an example that how user can do imaging capture with
tools.

Signed-off-by: Bingbu Cao <bingbu.cao@intel.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2024-04-29 14:56:38 +02:00
Linus Torvalds aec147c188 Misc fixes:
- Make the CPU_MITIGATIONS=n interaction with conflicting
    mitigation-enabling boot parameters a bit saner.
 
  - Re-enable CPU mitigations by default on non-x86
 
  - Fix TDX shared bit propagation on mprotect()
 
  - Fix potential show_regs() system hang when PKE
    initialization is not fully finished yet.
 
  - Add the 0x10-0x1f model IDs to the Zen5 range
 
  - Harden #VC instruction emulation some more
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmYuCVMRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1h0Hw/1HVlmRGTrQQBvVMlzt6Y3GlUk2uHSiSh0
 pO57sh9tMu/3kWdcrUi4xkEVHmfBjMxXY5sw/7VXQ9mG7wv+SVgF3gAaAl+5q73K
 JKPPAhkPqUmXP3Sm1rqTt8iZtTViY3ilP6QEZaOIfL2Pwa7X3QP8TJRBKAJCrXEM
 hOEMXSd1W1Escs/uPlhCXHx8TRVTr9f4bv8TdHBXZGHTida5vejj+yhMSdaM94qw
 ywZ4an1NOnLGcNEMMYhOQ6Kbh9Ckj46JRjpodTfmjodLd/jOhVU5C7nTZfHRXSRU
 3UQBZtTZIYYCs8Urg2l/W5IhywWV3P9Jg+D+vl/bdEKJ+yINLAnOgVhVPqeG2GWt
 Ww3FelgRz0AkQKTegRCK2jQWnHActSrYmkr4M24wa/cVkMrcpXT3LHj8PgRnllx5
 q5JqQ37G3QYHMzslbBqyUHzJv8KzgdZdgyFTN3dX1q9n5FPy7Ul9Ue1Zp2SoId8i
 K6u+IjCkftWwIbv8AhXiEVo0ynfBkmV4UNVGJks1xIPA3lmNv3ax5nQMJLvZzJ48
 n+Id8ALEWxyOrKR6bdWdPtJqd0Nw/q4e6AOTzVYE94X8+uVuug4m4X7QPo+Ctbz1
 IkhTxmBbHzgKylbddK6LkdnXnHCGidOmXsF3VS6TRfz7ALaMUgpaHw34reEhiOlT
 xsIw+XVOKg==
 =AfRR
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2024-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:

 - Make the CPU_MITIGATIONS=n interaction with conflicting
   mitigation-enabling boot parameters a bit saner.

 - Re-enable CPU mitigations by default on non-x86

 - Fix TDX shared bit propagation on mprotect()

 - Fix potential show_regs() system hang when PKE initialization
   is not fully finished yet.

 - Add the 0x10-0x1f model IDs to the Zen5 range

 - Harden #VC instruction emulation some more

* tag 'x86-urgent-2024-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  cpu: Ignore "mitigations" kernel parameter if CPU_MITIGATIONS=n
  cpu: Re-enable CPU mitigations by default for !X86 architectures
  x86/tdx: Preserve shared bit on mprotect()
  x86/cpu: Fix check for RDPKRU in __show_regs()
  x86/CPU/AMD: Add models 0x10-0x1f to the Zen5 range
  x86/sev: Check for MWAITX and MONITORX opcodes in the #VC handler
2024-04-28 11:58:16 -07:00
Sean Christopherson ce0abef6a1 cpu: Ignore "mitigations" kernel parameter if CPU_MITIGATIONS=n
Explicitly disallow enabling mitigations at runtime for kernels that were
built with CONFIG_CPU_MITIGATIONS=n, as some architectures may omit code
entirely if mitigations are disabled at compile time.

E.g. on x86, a large pile of Kconfigs are buried behind CPU_MITIGATIONS,
and trying to provide sane behavior for retroactively enabling mitigations
is extremely difficult, bordering on impossible.  E.g. page table isolation
and call depth tracking require build-time support, BHI mitigations will
still be off without additional kernel parameters, etc.

  [ bp: Touchups. ]

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240420000556.2645001-3-seanjc@google.com
2024-04-25 15:47:39 +02:00
Maíra Canal b413f9cd4c mm: Update shuffle documentation to match its current state
Commit 839195352d ("mm/shuffle: remove dynamic reconfiguration")
removed the dynamic reconfiguration capabilities from the shuffle page
allocator. This means that, now, we don't have any perspective of an
"autodetection of memory-side-cache" that triggers the enablement of the
shuffle page allocator.

Therefore, let the documentation reflect that the only way to enable
the shuffle page allocator is by setting `page_alloc.shuffle=1`.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240422142007.1062231-1-mcanal@igalia.com
2024-04-24 13:05:01 -06:00
Thomas Weißschuh 8af2d1ab78 admin-guide/hw-vuln/core-scheduling: fix return type of PR_SCHED_CORE_GET
sched_core_share_pid() copies the cookie to userspace with
put_user(id, (u64 __user *)uaddr), expecting 64 bits of space.
The "unsigned long" datatype that is documented in core-scheduling.rst
however is only 32 bits large on 32 bit architectures.

Document "unsigned long long" as the correct data type that is always
64bits large.

This matches what the selftest cs_prctl_test.c has been doing all along.

Fixes: 0159bb020c ("Documentation: Add usecases, design and interface for core scheduling")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/util-linux/df7a25a0-7923-4f8b-a527-5e6f0064074d@t-8ch.de/
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Chris Hyser <chris.hyser@oracle.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240423-core-scheduling-cookie-v1-1-5753a35f8dfc@weissschuh.net
2024-04-24 13:04:27 -06:00
Vincent Guittot 97450eb909 sched/pelt: Remove shift of thermal clock
The optional shift of the clock used by thermal/hw load avg has been
introduced to handle case where the signal was not always a high frequency
hw signal. Now that cpufreq provides a signal for firmware and
SW pressure, we can remove this exception and always keep this PELT signal
aligned with other signals.
Mark sysctl_sched_migration_cost boot parameter as deprecated

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Qais Yousef <qyousef@layalina.io>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://lore.kernel.org/r/20240326091616.3696851-6-vincent.guittot@linaro.org
2024-04-24 12:08:02 +02:00
Linus Torvalds 4d2008430c A set of updates from Thorsten to his (new) guide to verifying bugs and
tracking down regressions.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmYmeN4PHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5YtzcH/1plJvgfTDMWZWoOgNuPSiExWFgqoWk3KyDw
 hqBfQ0p5QvTnC/rQx8mgGP3EqUXegwFHormRT2oSNyCSBHBSMdJU5ZT/WOp4rDpa
 NvjM7FurBhb3b83ieottl6dDxnbUiRy6+z/uayEeV03KEolL7/nljfBozg9282Ep
 p6cc4D5zwNcYdeqEom32AVOPFUUtQAvPhGj4OUHBwjzn7WF2lfzb4z4RudmsyE5E
 1Whd35HEOpKsNlosy9XJ8ZP/QHAb0Nkwdg2irT71lZ+37nUX/KgMxFuJUxviYI9m
 4jtZsjRew7RS/EUtwbUsyD4DTXla0N2H+//ZF5AatRz8RuQKZgQ=
 =A/2u
 -----END PGP SIGNATURE-----

Merge tag 'docs-6.9-fixes2' of git://git.lwn.net/linux

Pull documentation fixes from Jonathan Corbet:
 "A set of updates from Thorsten to his (new) guide to verifying bugs
  and tracking down regressions"

* tag 'docs-6.9-fixes2' of git://git.lwn.net/linux:
  docs: verify/bisect: stable regressions: first stable, then mainline
  docs: verify/bisect: describe how to use a build host
  docs: verify/bisect: explain testing reverts, patches and newer code
  docs: verify/bisect: proper headlines and more spacing
  docs: verify/bisect: add and fetch stable branches ahead of time
  docs: verify/bisect: use git switch, tag kernel, and various fixes
2024-04-22 09:41:03 -07:00
Jonathan Cameron 556da13434 Documentation: qcom-pmu: Use /sys/bus/event_source/devices paths
To allow setting an appropriate parent for the struct pmu device
remove existing references to /sys/devices/ path.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240412161057.14099-14-Jonathan.Cameron@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-04-19 15:59:30 +01:00
Jonathan Cameron 90b4a1a927 Documentation: thunderx2-pmu: Use /sys/bus/event_source/devices paths
To allow setting an appropriate parent for the struct pmu device
remove existing references to /sys/devices/ path.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240412161057.14099-11-Jonathan.Cameron@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-04-19 15:59:29 +01:00
Jonathan Cameron 867ba6d204 Documentation: xgene-pmu: Use /sys/bus/event_source/devices paths
To allow setting an appropriate parent for the struct pmu device
remove existing references to /sys/devices/ path.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240412161057.14099-9-Jonathan.Cameron@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-04-19 15:59:29 +01:00
Jonathan Cameron eff6af5313 Documentation: hns-pmu: Use /sys/bus/event_source/devices paths
To allow setting an appropriate parent for the struct pmu device
remove existing references to /sys/devices/ path.

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240412161057.14099-5-Jonathan.Cameron@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-04-19 15:59:28 +01:00
Jonathan Cameron d0412b6ecb Documentation: hisi-pmu: Drop reference to /sys/devices path
Having assigned a parent to the device, the suggested path is
no longer valid.  As /sys/bus/event_sources based path is also
provided, simply drop mention of alternative.

Reviewed-by: Yicong Yang <yangyicong@hisilicon.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240412161057.14099-3-Jonathan.Cameron@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-04-19 15:59:28 +01:00
Xiu Jianfeng c9169291be docs, cgroup: add entries for pids to cgroup-v2.rst
This patch add two entries (pids.peak and pids.events) for pids
controller, and also update pids.current because it's on non-root.

Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-04-18 06:00:58 -10:00
Alexander Gordeev 54f2ecc318 s390: Map kernel at fixed location when KASLR is disabled
Since kernel virtual and physical address spaces are
uncoupled the kernel is mapped at the top of the virtual
address space in case KASLR is disabled.

That does not pose any issue with regard to the kernel
booting and operation, but makes it difficult to use a
generated vmlinux with some debugging tools (e.g. gdb),
because the exact location of the kernel image in virtual
memory is unknown. Make that location known and introduce
CONFIG_KERNEL_IMAGE_BASE configuration option.

A custom CONFIG_KERNEL_IMAGE_BASE value that would break
the virtual memory layout leads to a build error.

The kernel image size is defined by KERNEL_IMAGE_SIZE
macro and set to 512 MB, by analogy with x86.

Suggested-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-04-17 13:38:02 +02:00
Mikulas Patocka 5268de78e1 dm-crypt: add the optional "high_priority" flag
When WQ_HIGHPRI was used for the dm-crypt kcryptd workqueue it was
reported that dm-crypt performs badly when the system is loaded[1].
Because of reports of audio skipping, dm-crypt stopped using
WQ_HIGHPRI with commit f612b2132d (Revert "dm crypt: use WQ_HIGHPRI
for the IO and crypt workqueues").

But it has since been determined that WQ_HIGHPRI provides improved
performance (with reduced latency) for highend systems with much more
resources than those laptop/desktop users which suffered from the use
of WQ_HIGHPRI.

As such, add an option "high_priority" that allows the use of
WQ_HIGHPRI for dm-crypt's workqueues and also sets the write_thread to
nice level MIN_NICE (-20). This commit makes it optional, so that
normal users won't be harmed by it.

[1] https://listman.redhat.com/archives/dm-devel/2023-February/053410.html

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2024-04-16 11:34:47 -04:00
Uladzislau Rezki (Sony) 988f569ae0 rcu: Reduce synchronize_rcu() latency
A call to a synchronize_rcu() can be optimized from a latency
point of view. Workloads which depend on this can benefit of it.

The delay of wakeme_after_rcu() callback, which unblocks a waiter,
depends on several factors:

- how fast a process of offloading is started. Combination of:
    - !CONFIG_RCU_NOCB_CPU/CONFIG_RCU_NOCB_CPU;
    - !CONFIG_RCU_LAZY/CONFIG_RCU_LAZY;
    - other.
- when started, invoking path is interrupted due to:
    - time limit;
    - need_resched();
    - if limit is reached.
- where in a nocb list it is located;
- how fast previous callbacks completed;

Example:

1. On our embedded devices i can easily trigger the scenario when
it is a last in the list out of ~3600 callbacks:

<snip>
  <...>-29      [001] d..1. 21950.145313: rcu_batch_start: rcu_preempt CBs=3613 bl=28
...
  <...>-29      [001] ..... 21950.152578: rcu_invoke_callback: rcu_preempt rhp=00000000b2d6dee8 func=__free_vm_area_struct.cfi_jt
  <...>-29      [001] ..... 21950.152579: rcu_invoke_callback: rcu_preempt rhp=00000000a446f607 func=__free_vm_area_struct.cfi_jt
  <...>-29      [001] ..... 21950.152580: rcu_invoke_callback: rcu_preempt rhp=00000000a5cab03b func=__free_vm_area_struct.cfi_jt
  <...>-29      [001] ..... 21950.152581: rcu_invoke_callback: rcu_preempt rhp=0000000013b7e5ee func=__free_vm_area_struct.cfi_jt
  <...>-29      [001] ..... 21950.152582: rcu_invoke_callback: rcu_preempt rhp=000000000a8ca6f9 func=__free_vm_area_struct.cfi_jt
  <...>-29      [001] ..... 21950.152583: rcu_invoke_callback: rcu_preempt rhp=000000008f162ca8 func=wakeme_after_rcu.cfi_jt
  <...>-29      [001] d..1. 21950.152625: rcu_batch_end: rcu_preempt CBs-invoked=3612 idle=....
<snip>

2. We use cpuset/cgroup to classify tasks and assign them into
different cgroups. For example "backgrond" group which binds tasks
only to little CPUs or "foreground" which makes use of all CPUs.
Tasks can be migrated between groups by a request if an acceleration
is needed.

See below an example how "surfaceflinger" task gets migrated.
Initially it is located in the "system-background" cgroup which
allows to run only on little cores. In order to speed it up it
can be temporary moved into "foreground" cgroup which allows
to use big/all CPUs:

cgroup_attach_task():
 -> cgroup_migrate_execute()
   -> cpuset_can_attach()
     -> percpu_down_write()
       -> rcu_sync_enter()
         -> synchronize_rcu()
   -> now move tasks to the new cgroup.
 -> cgroup_migrate_finish()

<snip>
         rcuop/1-29      [000] .....  7030.528570: rcu_invoke_callback: rcu_preempt rhp=00000000461605e0 func=wakeme_after_rcu.cfi_jt
    PERFD-SERVER-1855    [000] d..1.  7030.530293: cgroup_attach_task: dst_root=3 dst_id=22 dst_level=1 dst_path=/foreground pid=1900 comm=surfaceflinger
   TimerDispatch-2768    [002] d..5.  7030.537542: sched_migrate_task: comm=surfaceflinger pid=1900 prio=98 orig_cpu=0 dest_cpu=4
<snip>

"Boosting a task" depends on synchronize_rcu() latency:

- first trace shows a completion of synchronize_rcu();
- second shows attaching a task to a new group;
- last shows a final step when migration occurs.

3. To address this drawback, maintain a separate track that consists
of synchronize_rcu() callers only. After completion of a grace period
users are deferred to a dedicated worker to process requests.

4. This patch reduces the latency of synchronize_rcu() approximately
by ~30-40% on synthetic tests. The real test case, camera launch time,
shows(time is in milliseconds):

1-run 542 vs 489 improvement 9%
2-run 540 vs 466 improvement 13%
3-run 518 vs 468 improvement 9%
4-run 531 vs 457 improvement 13%
5-run 548 vs 475 improvement 13%
6-run 509 vs 484 improvement 4%

Synthetic test(no "noise" from other callbacks):
Hardware: x86_64 64 CPUs, 64GB of memory
Linux-6.6

- 10K tasks(simultaneous);
- each task does(1000 loops)
     synchronize_rcu();
     kfree(p);

default: CONFIG_RCU_NOCB_CPU: takes 54 seconds to complete all users;
patch: CONFIG_RCU_NOCB_CPU: takes 35 seconds to complete all users.

Running 60K gives approximately same results on my setup. Please note
it is without any interaction with another type of callbacks, otherwise
it will impact a lot a default case.

5. By default it is disabled. To enable this perform one of the
below sequence:

echo 1 > /sys/module/rcutree/parameters/rcu_normal_wake_from_gp
or pass a boot parameter "rcutree.rcu_normal_wake_from_gp=1"

Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Co-developed-by: Neeraj Upadhyay (AMD) <neeraj.iitr10@gmail.com>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.iitr10@gmail.com>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
2024-04-15 19:47:49 +02:00
Thorsten Leemhuis 8d939ae349 docs: verify/bisect: stable regressions: first stable, then mainline
Rearrange the instructions so that readers facing a regression within a
stable or longterm series first test its latest release before testing
mainline. This is less scary for some people. It also reduces the chance
that something goes sideways for readers that compile their first
kernel, as mainline can cause slightly more trouble.

Signed-off-by: Thorsten Leemhuis <linux@leemhuis.info>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/efd3cb9c68db450091021326bf9c334553df0ec2.1712647788.git.linux@leemhuis.info
2024-04-15 09:41:56 -06:00
Thorsten Leemhuis 2bcfd71e8d docs: verify/bisect: describe how to use a build host
Describe how to build kernels on another system (with and without
cross-compiling), as building locally can be quite painfully on some
slow systems. This is done in an add-on section, as it would make the
step-by-step guide to complicated if this special case would be
described there.

Signed-off-by: Thorsten Leemhuis <linux@leemhuis.info>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/288160cb4769e46a3280250ca71da0abc4aa002d.1712647788.git.linux@leemhuis.info
2024-04-15 09:41:56 -06:00
Thorsten Leemhuis a421835a2a docs: verify/bisect: explain testing reverts, patches and newer code
Rename 'Supplementary tasks' to 'Complementary tasks' while introducing
a section 'Optional tasks: test reverts, patches, or later versions':
the latter is something readers occasionally will have to do after
reporting a bug and thus is best covered here.

Signed-off-by: Thorsten Leemhuis <linux@leemhuis.info>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/dacf26a4c48e9e8f04ecbc77e0a74c9b2a6a1103.1712647788.git.linux@leemhuis.info
2024-04-15 09:41:56 -06:00
Thorsten Leemhuis 453de3207f docs: verify/bisect: proper headlines and more spacing
Various small improvements and fixes:

* Separate ref links from their target with a space for better
  readability.

* Add a proper heading for the note at the end of the step-by-step
  guide.

* Use proper 3rd and 4th level headlines in the reference section and
  add short intros for the 2nd level headlines that lacked one.

Signed-off-by: Thorsten Leemhuis <linux@leemhuis.info>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/f59f0f235a2192ed93899a7338153e4cb71075f0.1712647788.git.linux@leemhuis.info
2024-04-15 09:41:56 -06:00
Thorsten Leemhuis 932c9a5398 docs: verify/bisect: add and fetch stable branches ahead of time
Add and fetch all required stable branches ahead of time. This fixes a
bug, as readers that wanted to bisect a regression within a stable or
longterm series otherwise did not have them available at the right time.
This way also matches the flow somewhat better and avoids some "if you
haven't already added it" phrases that otherwise become necessary in
future changes.

Signed-off-by: Thorsten Leemhuis <linux@leemhuis.info>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/57dcf312959476abe6151bf3d35eb79e3e9a83d1.1712647788.git.linux@leemhuis.info
2024-04-15 09:41:56 -06:00
Thorsten Leemhuis abbb99301e docs: verify/bisect: use git switch, tag kernel, and various fixes
Various small improvements and fixes:

* Use the more modern 'git switch' instead of 'git checkout', which
  makes it more obvious what's happening (among others due to the
  --discard-changes parameter that is more clear than --force).

* Provide a hint how a mainline version number and one from a stable
  series look like.

* When trying to validate the bisection result with a revert, add a
  special tag to facilitate the identification.

* Sync version numbers used in various examples for consistency: stick
  to 6.0.13, 6.0.15, and 6.1.5.

* Fix a few typos and oddities.

Signed-off-by: Thorsten Leemhuis <linux@leemhuis.info>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/85029aa004447b0eeb5043fb014630f2acafacec.1712647788.git.linux@leemhuis.info
2024-04-15 09:41:55 -06:00
Josh Poimboeuf 36d4fe147c x86/bugs: Remove CONFIG_BHI_MITIGATION_AUTO and spectre_bhi=auto
Unlike most other mitigations' "auto" options, spectre_bhi=auto only
mitigates newer systems, which is confusing and not particularly useful.

Remove it.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/412e9dc87971b622bbbaf64740ebc1f140bff343.1712813475.git.jpoimboe@kernel.org
2024-04-12 12:05:54 +02:00
Josh Poimboeuf 5f882f3b0a x86/bugs: Clarify that syscall hardening isn't a BHI mitigation
While syscall hardening helps prevent some BHI attacks, there's still
other low-hanging fruit remaining.  Don't classify it as a mitigation
and make it clear that the system may still be vulnerable if it doesn't
have a HW or SW mitigation enabled.

Fixes: ec9404e40e ("x86/bhi: Add BHI mitigation knob")
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/b5951dae3fdee7f1520d5136a27be3bdfe95f88b.1712813475.git.jpoimboe@kernel.org
2024-04-11 10:30:33 +02:00
Josh Poimboeuf dfe648903f x86/bugs: Fix BHI documentation
Fix up some inaccuracies in the BHI documentation.

Fixes: ec9404e40e ("x86/bhi: Add BHI mitigation knob")
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/8c84f7451bfe0dd08543c6082a383f390d4aa7e2.1712813475.git.jpoimboe@kernel.org
2024-04-11 10:30:25 +02:00
Karel Balej 9e66f74ce7 docs: *-regressions.rst: unify quoting, add missing word
Quoting of the '"no regressions" rule' expression differs between
occurrences, sometimes being presented as '"no regressions rule"'. Unify
the quoting using the first form which seems semantically correct or is
at least used dominantly, albeit marginally.

One of the occurrences is obviously missing the 'rule' part -- add it.

Signed-off-by: Karel Balej <balejk@matfyz.cz>
Reviewed-by: Thorsten Leemhuis <linux@leemhuis.info>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240328194342.11760-2-balejk@matfyz.cz
2024-04-10 15:01:32 -06:00
Linus Torvalds 2bb69f5fc7 x86 mitigations for the native BHI hardware vulnerabilty:
Branch History Injection (BHI) attacks may allow a malicious application to
 influence indirect branch prediction in kernel by poisoning the branch
 history. eIBRS isolates indirect branch targets in ring0.  The BHB can
 still influence the choice of indirect branch predictor entry, and although
 branch predictor entries are isolated between modes when eIBRS is enabled,
 the BHB itself is not isolated between modes.
 
 Add mitigations against it either with the help of microcode or with
 software sequences for the affected CPUs.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmYUKPMTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYofT8EACJJix+GzGUcJjOvfWFZcxwziY152hO
 5XSzHOZZL6oz5Yk/Rye/S9RVTN7aDjn1CEvI0cD/ULxaTP869sS9dDdUcHhEJ//5
 6hjqWsWiKc1QmLjBy3Pcb97GZHQXM5a9D1f6jXnJD+0FMLbQHpzSEBit0H4tv/TC
 75myGgYihvUbhN9/bL10M5fz+UADU42nChvPWDMr9ukljjCqa46tPTmKUIAW5TWj
 /xsyf+Nk+4kZpdaidKGhpof6KCV2rNeevvzUGN8Pv5y13iAmvlyplqTcQ6dlubnZ
 CuDX5Ji9spNF9WmhKpLgy5N+Ocb64oVHov98N2zw1sT1N8XOYcSM0fBj7SQIFURs
 L7T4jBZS+1c3ZGJPPFWIaGjV8w1ZMhelglwJxjY7ZgRD6fK3mwRx/ks54J8H4HjE
 FbirXaZLeKlscDIOKtnxxKoIGwpdGwLKQYi/wEw7F9NhCLSj9wMia+j3uYIUEEHr
 6xEiYEtyjcV3ocxagH7eiHyrasOKG64vjx2h1XodusBA2Wrvgm/jXlchUu+wb6B4
 LiiZJt+DmOdQ1h5j3r2rt3hw7+nWa7kyq34qfN6NSUCHiedp6q7BClueSaKiOCGk
 RoNibNiS+CqaxwGxj/RGuvajEJeEMCsLuCxzT3aeaDBsqscW6Ka/HkGA76Tpb5nJ
 E3JyjYE7AlG4rw==
 =W0W3
 -----END PGP SIGNATURE-----

Merge tag 'nativebhi' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 mitigations from Thomas Gleixner:
 "Mitigations for the native BHI hardware vulnerabilty:

  Branch History Injection (BHI) attacks may allow a malicious
  application to influence indirect branch prediction in kernel by
  poisoning the branch history. eIBRS isolates indirect branch targets
  in ring0. The BHB can still influence the choice of indirect branch
  predictor entry, and although branch predictor entries are isolated
  between modes when eIBRS is enabled, the BHB itself is not isolated
  between modes.

  Add mitigations against it either with the help of microcode or with
  software sequences for the affected CPUs"

[ This also ends up enabling the full mitigation by default despite the
  system call hardening, because apparently there are other indirect
  calls that are still sufficiently reachable, and the 'auto' case just
  isn't hardened enough.

  We'll have some more inevitable tweaking in the future    - Linus ]

* tag 'nativebhi' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  KVM: x86: Add BHI_NO
  x86/bhi: Mitigate KVM by default
  x86/bhi: Add BHI mitigation knob
  x86/bhi: Enumerate Branch History Injection (BHI) bug
  x86/bhi: Define SPEC_CTRL_BHI_DIS_S
  x86/bhi: Add support for clearing branch history at syscall entry
  x86/syscall: Don't force use of indirect calls for system calls
  x86/bugs: Change commas to semicolons in 'spectre_v2' sysfs file
2024-04-08 20:07:51 -07:00
I Hsin Cheng a24e3b7d27 docs: cgroup-v1: Fix description for css_online
The original description refers to the comment on
cgroup_for_each_descendant_pre() for more details. However, the macro
cgroup_for_each_descendant_pre() no longer exist, we replace it with the
corresponding macro cgroup_for_each_live_descendant_pre().

Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-04-08 08:01:36 -10:00