Commit graph

2417 commits

Author SHA1 Message Date
Xianting Tian
c5b4216929
Documentation: kdump: describe VMCOREINFO export for RISCV64
The following interrelated definitions and ranges are needed by the
kdump crash tool, which are exported by
"arch/riscv/kernel/crash_core.c":

    VA_BITS,
    PAGE_OFFSET,
    phys_ram_base,
    KERNEL_LINK_ADDR,
    MODULES_VADDR ~ MODULES_END,
    VMALLOC_START ~ VMALLOC_END,
    VMEMMAP_START ~ VMEMMAP_END,

Document these RISCV64 exports above.

Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
Acked-by: Baoquan He <bhe@redhat.com>
Link: https://lore.kernel.org/r/20221026144208.373504-3-xianting.tian@linux.alibaba.com
[Palmer: wrap commit text]
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-02 16:48:58 -08:00
Peter Korsgaard
035641b01e dm init: add dm-mod.waitfor to wait for asynchronously probed block devices
Just calling wait_for_device_probe() is not enough to ensure that
asynchronously probed block devices are available (E.G. mmc, usb), so
add a "dm-mod.waitfor=<device1>[,..,<deviceN>]" parameter to get
dm-init to explicitly wait for specific block devices before
initializing the tables with logic similar to the rootwait logic that
was introduced with commit  cc1ed7542c ("init: wait for
asynchronously scanned block devices").

E.G. with dm-verity on mmc using:
dm-mod.waitfor="PARTLABEL=hash-a,PARTLABEL=root-a"

[    0.671671] device-mapper: init: waiting for all devices to be available before creating mapped devices
[    0.671679] device-mapper: init: waiting for device PARTLABEL=hash-a ...
[    0.710695] mmc0: new HS200 MMC card at address 0001
[    0.711158] mmcblk0: mmc0:0001 004GA0 3.69 GiB
[    0.715954] mmcblk0boot0: mmc0:0001 004GA0 partition 1 2.00 MiB
[    0.722085] mmcblk0boot1: mmc0:0001 004GA0 partition 2 2.00 MiB
[    0.728093] mmcblk0rpmb: mmc0:0001 004GA0 partition 3 512 KiB, chardev (249:0)
[    0.738274]  mmcblk0: p1 p2 p3 p4 p5 p6 p7
[    0.751282] device-mapper: init: waiting for device PARTLABEL=root-a ...
[    0.751306] device-mapper: init: all devices available
[    0.751683] device-mapper: verity: sha256 using implementation "sha256-generic"
[    0.759344] device-mapper: ioctl: dm-0 (vroot) is ready
[    0.766540] VFS: Mounted root (squashfs filesystem) readonly on device 254:0.

Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-12-02 17:37:45 -05:00
Kees Cook
9fc9e278a5 panic: Introduce warn_limit
Like oops_limit, add warn_limit for limiting the number of warnings when
panic_on_warn is not set.

Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: tangmeng <tangmeng@uniontech.com>
Cc: "Guilherme G. Piccoli" <gpiccoli@igalia.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-doc@vger.kernel.org
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221117234328.594699-5-keescook@chromium.org
2022-12-02 13:04:44 -08:00
Kees Cook
de92f65719 exit: Allow oops_limit to be disabled
In preparation for keeping oops_limit logic in sync with warn_limit,
have oops_limit == 0 disable checking the Oops counter.

Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
2022-12-02 13:04:39 -08:00
Nicholas Piggin
6b34a099fa powerpc/64s/hash: add stress_hpt kernel boot option to increase hash faults
This option increases the number of hash misses by limiting the number
of kernel HPT entries, by keeping a per-CPU record of the last kernel
HPTEs installed, and removing that from the hash table on the next hash
insertion. A timer round-robins CPUs removing remaining kernel HPTEs and
clearing the TLB (in the case of bare metal) to increase and slightly
randomise kernel fault activity.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Add comment about NR_CPUS usage, fixup whitespace]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221024030150.852517-1-npiggin@gmail.com
2022-12-02 18:04:25 +11:00
Jann Horn
d4ccd54d28 exit: Put an upper limit on how often we can oops
Many Linux systems are configured to not panic on oops; but allowing an
attacker to oops the system **really** often can make even bugs that look
completely unexploitable exploitable (like NULL dereferences and such) if
each crash elevates a refcount by one or a lock is taken in read mode, and
this causes a counter to eventually overflow.

The most interesting counters for this are 32 bits wide (like open-coded
refcounts that don't use refcount_t). (The ldsem reader count on 32-bit
platforms is just 16 bits, but probably nobody cares about 32-bit platforms
that much nowadays.)

So let's panic the system if the kernel is constantly oopsing.

The speed of oopsing 2^32 times probably depends on several factors, like
how long the stack trace is and which unwinder you're using; an empirically
important one is whether your console is showing a graphical environment or
a text console that oopses will be printed to.
In a quick single-threaded benchmark, it looks like oopsing in a vfork()
child with a very short stack trace only takes ~510 microseconds per run
when a graphical console is active; but switching to a text console that
oopses are printed to slows it down around 87x, to ~45 milliseconds per
run.
(Adding more threads makes this faster, but the actual oops printing
happens under &die_lock on x86, so you can maybe speed this up by a factor
of around 2 and then any further improvement gets eaten up by lock
contention.)

It looks like it would take around 8-12 days to overflow a 32-bit counter
with repeated oopsing on a multi-core X86 system running a graphical
environment; both me (in an X86 VM) and Seth (with a distro kernel on
normal hardware in a standard configuration) got numbers in that ballpark.

12 days aren't *that* short on a desktop system, and you'd likely need much
longer on a typical server system (assuming that people don't run graphical
desktop environments on their servers), and this is a *very* noisy and
violent approach to exploiting the kernel; and it also seems to take orders
of magnitude longer on some machines, probably because stuff like EFI
pstore will slow it down a ton if that's active.

Signed-off-by: Jann Horn <jannh@google.com>
Link: https://lore.kernel.org/r/20221107201317.324457-1-jannh@google.com
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221117234328.594699-2-keescook@chromium.org
2022-12-01 08:50:38 -08:00
Jian Wen
9b34a307f3 docs: admin-guide: cgroup-v1: update description of inactive_file
MADV_FREE pages have been moved into the LRU_INACTIVE_FILE list by commit
f7ad2a6cb9 ("mm: move MADV_FREE pages into LRU_INACTIVE_FILE list").

Link: https://lkml.kernel.org/r/20221111034639.3593380-1-wenjian1@xiaomi.com
Signed-off-by: Jian Wen <wenjian1@xiaomi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30 15:58:54 -08:00
Sergey Senozhatsky
77db7bb56b zram: add incompressible flag to read_block_state()
Add a new flag to zram block state that shows if the page is
incompressible: that none of the algorithm (including secondary ones)
could compress it.

Link: https://lkml.kernel.org/r/20221109115047.2921851-14-senozhatsky@chromium.org
Suggested-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Alexey Romanov <avromanov@sberdevices.ru>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30 15:58:53 -08:00
Sergey Senozhatsky
b46f9ea3cb zram: add incompressible writeback
Add support for incompressible pages writeback:

  echo incompressible > /sys/block/zramX/writeback

Link: https://lkml.kernel.org/r/20221109115047.2921851-13-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Alexey Romanov <avromanov@sberdevices.ru>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30 15:58:53 -08:00
Sergey Senozhatsky
443dd79806 documentation: add zram recompression documentation
Document user-space visible device attributes that are enabled by
ZRAM_MULTI_COMP.

Link: https://lkml.kernel.org/r/20221109115047.2921851-12-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Alexey Romanov <avromanov@sberdevices.ru>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30 15:58:52 -08:00
Sergey Senozhatsky
60e9b39ebe zram: add recompress flag to read_block_state()
Add a new flag to zram block state that shows if the page was recompressed
(using alternative compression algorithm).

Link: https://lkml.kernel.org/r/20221109115047.2921851-6-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Alexey Romanov <avromanov@sberdevices.ru>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30 15:58:51 -08:00
SeongJae Park
7f0a86f3c9 Docs/admin-guide/mm/damon/usage: document schemes/<s>/tried_regions sysfs directory
Document 'tried_regions' directory in DAMON sysfs interface usage in the
administrator guide.

Link: https://lkml.kernel.org/r/20221101220328.95765-8-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30 15:58:45 -08:00
Johannes Weiner
57e9cc50f4 mm: vmscan: split khugepaged stats from direct reclaim stats
Direct reclaim stats are useful for identifying a potential source for
application latency, as well as spotting issues with kswapd.  However,
khugepaged currently distorts the picture: as a kernel thread it doesn't
impose allocation latencies on userspace, and it explicitly opts out of
kswapd reclaim.  Its activity showing up in the direct reclaim stats is
misleading.  Counting it as kswapd reclaim could also cause confusion when
trying to understand actual kswapd behavior.

Break out khugepaged from the direct reclaim counters into new
pgsteal_khugepaged, pgdemote_khugepaged, pgscan_khugepaged counters.

Test with a huge executable (CONFIG_READ_ONLY_THP_FOR_FS):

pgsteal_kswapd 1342185
pgsteal_direct 0
pgsteal_khugepaged 3623
pgscan_kswapd 1345025
pgscan_direct 0
pgscan_khugepaged 3623

Link: https://lkml.kernel.org/r/20221026180133.377671-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Eric Bergen <ebergen@meta.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30 15:58:41 -08:00
SeongJae Park
1b01663875 Docs/admin-guide/mm/damon/usage: fix wrong usage example of init_regions file
DAMON debugfs interface assumes the users will write all inputs at once. 
However, redirecting a string of multiple lines sometimes end up writing
line by line.  Therefore, the example usage of 'init_regions' file, which
writes input as a string of multiple lines can fail.  Fix it to use a
single line string instead.  Also update the description of the usage to
not assume users will write inputs in multiple lines.

Link: https://lkml.kernel.org/r/20221024174619.15600-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Vinicius Petrucci <vpetrucci@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30 15:58:41 -08:00
SeongJae Park
bd4149290c Docs/admin-guide/mm/damon/usage: describe the rules of sysfs region directories
Patch series "Docs/admin-buide/mm/damon/usage: minor fixes".

DAMON usage document contains an unclear description and a wrong usage
example.  This patchset fixes the two minor problems.


This patch (of 2):

Target region directories of DAMON sysfs interface should contain no
overlap and sorted by the address, but not clearly documented.  Actually,
a user had an issue[1] due to the poor documentation.  Add clear
description of it on the usage document.

[1] https://lore.kernel.org/damon/CAEZ6=UNUcH2BvJj++OrT=XQLdkidU79wmCO=tantSOB36pPNTg@mail.gmail.com/

Link: https://lkml.kernel.org/r/20221024174619.15600-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20221024174619.15600-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Reported-by: Vinicius Petrucci <vpetrucci@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30 15:58:41 -08:00
Dave Airlie
795bd9bb21 This tag contains the patches that add the new compute acceleration
subsystem, which is part of the DRM subsystem.
 
 The patches:
 - Add a new directory at drivers/accel.
 - Add a new major (261) for compute accelerators.
 - Add a new DRM minor type for compute accelerators.
 - Integrate the accel core code with DRM core code.
 - Add documentation for the accel subsystem.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEE7TEboABC71LctBLFZR1NuKta54AFAmN8r2UACgkQZR1NuKta
 54CJzwf/SkjfYoo8bLi1eedSSLZQZiNFtkqUqH0jho2UT5MG3yoPasXe/hovnf6+
 SOE1fMO/VpfQPJ5xqjuHwUaGxXIFFi1vkMllhZrODljTP0dNjhgWc7WTTsKTj586
 pMpYWA8bDfoF0SSGfEYbDi7qqiZFFVP6UeM4jvQgPhURNbLhF2dx69D3K1ZLvJJE
 673ZzJkZcnZRdTjypCE2gCOcXpXclbNenFHqMkuaX6JfGu+8YPR0GGVWiBfo+FvO
 gNnxXDhBmYdn+Khzzxr+ssewOklHM+Moe2Iv20jZQgVsHot3c+bJx+7BPBHpg5Ib
 4uqIrwMZENIQoH+FgYxSRMfEn0JI/w==
 =RCyB
 -----END PGP SIGNATURE-----

Merge tag 'drm-accel-2022-11-22' of https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/accel into drm-next

This tag contains the patches that add the new compute acceleration
subsystem, which is part of the DRM subsystem.

The patches:
- Add a new directory at drivers/accel.
- Add a new major (261) for compute accelerators.
- Add a new DRM minor type for compute accelerators.
- Integrate the accel core code with DRM core code.
- Add documentation for the accel subsystem.

Signed-off-by: Dave Airlie <airlied@redhat.com>

some acks from the list (some are in the patch series):
Acked-by: Daniel Stone <daniels@collabora.com>
Acked-by: Sonal Santan <sonal.santan@amd.com>
Acked-by: Maxime Ripard <maxime@cerno.tech>
Acked-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Tested-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>

From: Oded Gabbay <ogabbay@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20221122112222.GA352082@ogabbay-vm-u20.habana-labs.com
2022-11-30 12:01:27 +10:00
Yicong Yang
17d573984d drivers/perf: hisi: Add TLP filter support
The PMU support to filter the TLP when counting the bandwidth with below
options:

- only count the TLP headers
- only count the TLP payloads
- count both TLP headers and payloads

In the current driver it's default to count the TLP payloads only, which
will have an implicity side effects that on the traffic only have header
only TLPs, we'll get no data.

Make this user configuration through "len_mode" parameter and make it
default to count both TLP headers and payloads when user not specified.
Also update the documentation for it.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20221117084136.53572-5-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-11-29 14:30:55 +00:00
Bagas Sanjaya
c8dff677e6 Documentation: perf: Indent filter options list of hisi-pcie-pmu
The "Filter options" list have a rather ugly indentation. Also, the first
paragraph after list name is rendered without separator (as continuation
from the name).

Align the list by indenting the list items and add a blank line
separator for each list name.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20221117084136.53572-4-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-11-29 14:30:54 +00:00
Yicong Yang
eb79f12b4c docs: perf: Fix PMU instance name of hisi-pcie-pmu
The PMU instance will be called hisi_pcie<sicl>_core<core> rather than
hisi_pcie<sicl>_<core>. Fix this in the documentation.

Fixes: c8602008e2 ("docs: perf: Add description for HiSilicon PCIe PMU driver")
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20221117084136.53572-3-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-11-29 14:30:54 +00:00
Deming Wang
8dec5779a0 media: vivid.rst: fix TV and S-Video Inputs section
remove the double word 'in'.

Signed-off-by: Deming Wang <wangdeming@inspur.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2022-11-25 11:18:25 +00:00
Hans Verkuil
e9305a003f media: admin-guide: cec.rst
Document administration details about CEC devices. This was formerly
documented in a cec-status.txt I kept on my website, but this really
belongs here as an admin guide.

Updated the original cec-status.txt, and converted it to .rst.

Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2022-11-25 07:39:21 +00:00
Daniel Almeida
0c078e310b media: visl: add virtual stateless decoder driver
A virtual stateless device for stateless uAPI development purposes.

This tool's objective is to help the development and testing of
userspace applications that use the V4L2 stateless API to decode media.

A userspace implementation can use visl to run a decoding loop even when
no hardware is available or when the kernel uAPI for the codec has not
been upstreamed yet. This can reveal bugs at an early stage.

This driver can also trace the contents of the V4L2 controls submitted
to it.  It can also dump the contents of the vb2 buffers through a
debugfs interface. This is in many ways similar to the tracing
infrastructure available for other popular encode/decode APIs out there
and can help develop a userspace application by using another (working)
one as a reference.

Note that no actual decoding of video frames is performed by visl. The
V4L2 test pattern generator is used to write various debug information
to the capture buffers instead.

Signed-off-by: Daniel Almeida <daniel.almeida@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2022-11-25 07:32:16 +00:00
Dave Airlie
d47f958083 Linux 6.1-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmN6wAgeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG0EYH/3/RO90NbrFItraN
 Lzr+d3VdbGjTu8xd1M+PRTmwh3zxLpB+Jwqr0T0A2gzL9B/D+AUPUJdrCVbv9DqS
 FLJAVqoeV20dNBAHSffOOLPsgCZ+Eu+LzlNN7Iqde0e8cyZICFMNktitui84Xm/i
 1NgFVgz9OZ6+aieYvUj3FrFq0p8GTIaC/oybDZrxYKcO8ZzKVMJ11swRw10wwq0g
 qOOECvV3w7wlQ8upQZkzFxItKFc7EexZI6R4elXeGSJJ9Hlc092dv/zsKB9dwV+k
 WcwkJrZRoezYXzgGBFxUcQtzi+ethjrPjuJuM1rYLUSIcfIW/0lkaSLgRoBu8D+I
 1GfXkXs=
 =gt6P
 -----END PGP SIGNATURE-----

Backmerge tag 'v6.1-rc6' into drm-next

Linux 6.1-rc6

This is needed for drm-misc-next and tegra.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2022-11-24 11:05:43 +10:00
Steven Rostedt (Google)
a01fdc897f tracing: Add trace_trigger kernel command line option
Allow triggers to be enabled at kernel boot up. For example:

  trace_trigger="sched_switch.stacktrace if prev_state == 2"

The above will enable the stacktrace trigger on top of the sched_switch
event and only trigger if its prev_state is 2 (TASK_UNINTERRUPTIBLE). Then
at boot up, a stacktrace will trigger and be recorded in the tracing ring
buffer every time the sched_switch happens where the previous state is
TASK_INTERRUPTIBLE.

Another useful trigger would be "traceoff" which can stop tracing on an
event if a field of the event matches a certain value defined by the
filter ("if" statement).

Link: https://lore.kernel.org/linux-trace-kernel/20221020210056.0d8d0a5b@gandalf.local.home

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-23 19:08:30 -05:00
Perry Yuan
1056d31470 Documentation: add amd-pstate kernel command line options
Add a new amd pstate driver command line option to enable driver passive
working mode via MSR and shared memory interface to request desired
performance within abstract scale and the power management firmware
(SMU) convert the perf requests into actual hardware pstates.

Also the `disable` parameter can disable the pstate driver loading by
adding `amd_pstate=disable` to kernel command line.

Acked-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Tested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Perry Yuan <Perry.Yuan@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-11-22 19:57:15 +01:00
Perry Yuan
8a2cbf72a4 Documentation: amd-pstate: add driver working mode introduction
Introduce the `amd_pstate` driver new working mode with
`amd_pstate=passive` added to kernel command line.
If there is no passive mode enabled by user, amd_pstate driver will be
disabled by default for now.

Acked-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Tested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Perry Yuan <Perry.Yuan@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-11-22 19:57:15 +01:00
Oded Gabbay
8bf4889762 drivers/accel: define kconfig and register a new major
Add a new Kconfig for the accel subsystem. The Kconfig currently
contains only the basic CONFIG_DRM_ACCEL option that will be used to
decide whether to compile the accel registration code. Therefore, the
kconfig option is defined as bool.

The accel code will be compiled as part of drm.ko and will be called
directly from the DRM core code. The reason we compile it as part of
drm.ko and not as a separate module is because of cyclic dependency
between drm.ko and the separate module (if it would have existed).
This is due to the fact that DRM core code calls accel functions and
vice-versa.

The accelerator devices will be exposed to the user space with a new,
dedicated major number - 261.

The accel init function registers the new major number as a char device
and create corresponding sysfs and debugfs root entries, similar to
what is done in DRM init function.

I added a new header called drm_accel.h to include/drm/, that will hold
the prototypes of the drm_accel.c functions. In case CONFIG_DRM_ACCEL
is set to 'N', that header will contain empty inline implementations of
those functions, to allow DRM core code to compile successfully
without dependency on CONFIG_DRM_ACCEL.

I Updated the MAINTAINERS file accordingly with the newly added folder
and I have taken the liberty to appropriate the dri-devel mailing list
and the dri-devel IRC channel for the accel subsystem.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Tested-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Reviewed-by: Melissa Wen <mwen@igalia.com>
2022-11-22 13:13:51 +02:00
Jiucheng Xu
537216e59f docs/perf: Add documentation for the Amlogic G12 DDR PMU
Add a user guide to show how to use DDR PMU to
monitor DDR bandwidth on Amlogic G12 SoC

Signed-off-by: Jiucheng Xu <jiucheng.xu@amlogic.com>
Reviewed-by: Chris Healy <healych@amazon.com>
Link: https://lore.kernel.org/r/20221121021602.3306998-2-jiucheng.xu@amlogic.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-11-21 18:28:45 +00:00
Kim Phillips
1198d2316d iommu/amd: Fix ill-formed ivrs_ioapic, ivrs_hpet and ivrs_acpihid options
Currently, these options cause the following libkmod error:

libkmod: ERROR ../libkmod/libkmod-config.c:489 kcmdline_parse_result: \
	Ignoring bad option on kernel command line while parsing module \
	name: 'ivrs_xxxx[XX:XX'

Fix by introducing a new parameter format for these options and
throw a warning for the deprecated format.

Users are still allowed to omit the PCI Segment if zero.

Adding a Link: to the reason why we're modding the syntax parsing
in the driver and not in libkmod.

Fixes: ca3bf5d47c ("iommu/amd: Introduces ivrs_acpihid kernel parameter")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/linux-modules/20200310082308.14318-2-lucas.demarchi@intel.com/
Reported-by: Kim Phillips <kim.phillips@amd.com>
Co-developed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Link: https://lore.kernel.org/r/20220919155638.391481-2-kim.phillips@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-11-19 10:05:28 +01:00
Oleksandr Natalenko
8603b6f586 core_pattern: add CPU specifier
Statistically, in a large deployment regular segfaults may indicate a CPU
issue.

Currently, it is not possible to find out what CPU the segfault happened
on.  There are at least two attempts to improve segfault logging with this
regard, but they do not help in case the logs rotate.

Hence, lets make sure it is possible to permanently record a CPU the task
ran on using a new core_pattern specifier.

Link: https://lkml.kernel.org/r/20220903064330.20772-1-oleksandr@redhat.com
Signed-off-by: Oleksandr Natalenko <oleksandr@redhat.com>
Suggested-by: Renaud Métrich <rmetrich@redhat.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: Grzegorz Halat <ghalat@redhat.com>
Cc: "Guilherme G. Piccoli" <gpiccoli@igalia.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Joel Savitz <jsavitz@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Will Deacon <will@kernel.org>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-18 13:55:06 -08:00
Zhen Lei
a9ae89df73 arm64: kdump: Support crashkernel=X fall back to reserve region above DMA zones
For crashkernel=X without '@offset', select a region within DMA zones
first, and fall back to reserve region above DMA zones. This allows
users to use the same configuration on multiple platforms.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Acked-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20221116121044.1690-3-thunder.leizhen@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-11-18 14:14:35 +00:00
Zhen Lei
a149cf00b1 arm64: kdump: Provide default size when crashkernel=Y,low is not specified
Try to allocate at least 128 MiB low memory automatically for the case
that crashkernel=,high is explicitly specified, while crashkenrel=,low
is omitted. This allows users to focus more on the high memory
requirements of their business rather than the low memory requirements
of the crash kernel booting.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Acked-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20221116121044.1690-2-thunder.leizhen@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-11-18 14:14:35 +00:00
Jason A. Donenfeld
b9b01a5625 random: use random.trust_{bootloader,cpu} command line option only
It's very unusual to have both a command line option and a compile time
option, and apparently that's confusing to people. Also, basically
everybody enables the compile time option now, which means people who
want to disable this wind up having to use the command line option to
ensure that anyway. So just reduce the number of moving pieces and nix
the compile time option in favor of the more versatile command line
option.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-11-18 02:18:10 +01:00
Thomas Zimmermann
9a758d8756 drm: Move nomodeset kernel parameter to drivers/video
Move the nomodeset kernel parameter to drivers/video to make it
available to non-DRM drivers. Adapt the interface, but keep the DRM
interface drm_firmware_drivers_only() to avoid churn within DRM. The
function should later be inlined into callers.

The parameter disables any DRM graphics driver that would replace a
driver for firmware-provided scanout buffers. It is an option to easily
fallback to basic graphics output if the hardware's native driver is
broken. Moving it to a more prominent location wil make it available
to fbdev as well.

v2:
	* clarify the meaning of the nomodeset parameter (Javier)

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221111133024.9897-2-tzimmermann@suse.de
2022-11-16 13:26:12 +01:00
Besar Wicaksono
84481be716 perf: arm_cspmu: Add support for NVIDIA SCF and MCF attribute
Add support for NVIDIA System Cache Fabric (SCF) and Memory Control
Fabric (MCF) PMU attributes for CoreSight PMU implementation in
NVIDIA devices.

Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Link: https://lore.kernel.org/r/20221111222330.48602-3-bwicaksono@nvidia.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-11-15 13:48:08 +00:00
Guilherme G. Piccoli
727209376f x86/split_lock: Add sysctl to control the misery mode
Commit b041b525da ("x86/split_lock: Make life miserable for split lockers")
changed the way the split lock detector works when in "warn" mode;
basically, it not only shows the warn message, but also intentionally
introduces a slowdown through sleeping plus serialization mechanism
on such task. Based on discussions in [0], seems the warning alone
wasn't enough motivation for userspace developers to fix their
applications.

This slowdown is enough to totally break some proprietary (aka.
unfixable) userspace[1].

Happens that originally the proposal in [0] was to add a new mode
which would warns + slowdown the "split locking" task, keeping the
old warn mode untouched. In the end, that idea was discarded and
the regular/default "warn" mode now slows down the applications. This
is quite aggressive with regards proprietary/legacy programs that
basically are unable to properly run in kernel with this change.
While it is understandable that a malicious application could DoS
by split locking, it seems unacceptable to regress old/proprietary
userspace programs through a default configuration that previously
worked. An example of such breakage was reported in [1].

Add a sysctl to allow controlling the "misery mode" behavior, as per
Thomas suggestion on [2]. This way, users running legacy and/or
proprietary software are allowed to still execute them with a decent
performance while still observing the warning messages on kernel log.

[0] https://lore.kernel.org/lkml/20220217012721.9694-1-tony.luck@intel.com/
[1] https://github.com/doitsujin/dxvk/issues/2938
[2] https://lore.kernel.org/lkml/87pmf4bter.ffs@tglx/

[ dhansen: minor changelog tweaks, including clarifying the actual
  	   problem ]

Fixes: b041b525da ("x86/split_lock: Make life miserable for split lockers")
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Tested-by: Andre Almeida <andrealmeid@igalia.com>
Link: https://lore.kernel.org/all/20221024200254.635256-1-gpiccoli%40igalia.com
2022-11-10 10:14:22 -08:00
Jonathan Neuschäfer
7a96be33c8 docs: admin-guide: hw_random: Make document title more generic and concise
The hw_random subsystem no longer works only on specific Intel chipsets;
make the title of hw_random.rst reflect this fact.

While we're at it, also remove the words "Linux support for", since it's
clear from context that this is a document about Linux.

Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Link: https://lore.kernel.org/r/20221101160119.955997-1-j.neuschaefer@gmx.net
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-11-09 14:02:40 -07:00
David Heidelberg
b592f9ee1f Docs/admin-guide/mm/zswap: remove a paragraph about zswap being a new feature
Nine years have passed since Linux 3.11.

Signed-off-by: David Heidelberg <david@ixit.cz>
Link: https://lore.kernel.org/r/20221104122612.14906-1-david@ixit.cz
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-11-09 13:56:18 -07:00
Dafna Hirschfeld
e0eee57eba media: vimc: Update device configuration in the documentation
Current configuration in the document is outdated and doesn't work. Update
it to match the configuration described in the following commit:

commit 9b4a9b31b9 ("media: vimc: Enable set resolution at the scaler src pad")

Fixed commit description:
Shuah Khan <skhan@linuxfoundation.org>

Signed-off-by: Dafna Hirschfeld <dafna@fastmail.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2022-11-08 08:42:30 +00:00
Meng Li
67c0b2b529 Documentation: amd-pstate: Add tbench and gitsource test introduction
Introduce tbench and gitsource test cases design and implementation.
Monitor cpus changes about performance and power consumption etc.

Signed-off-by: Meng Li <li.meng@amd.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-11-01 03:23:25 -06:00
Thomas Richter
1f3307cf3a s390/con3215: Drop console data printout when buffer full
Using z/VM the 3270 terminal emulator also emulates an IBM 3215 console
which outputs line by line. When the screen is full, the console enters
the MORE... state and waits for the operator to confirm the data
on the screen by pressing a clear key. If this does not happen in the
default time frame (currently 50 seconds) the console enters the HOLDING
state.
It then waits another time frame (currently 10 seconds) before the output
continues on the next screen. When the operator presses the clear key
during these wait times, the output continues immediately.

This may lead to a very long boot time when the console
has to print many messages, also the system may hang because of the
console's limited buffer space and the system waits for the console
output to drain and finally to finish. This problem can only occur
when a terminal emulator is actually connected to the 3215 console
driver. If not z/VM simply drops console output.

Remedy this rare situation and add a kernel boot command line parameter
con3215_drop. It can be set to 0 (do not drop) or 1 (do drop) which is
the default. This instructs the kernel drop console data when the
console buffer is full. This speeds up the boot time considerable and
also does not hang the system anymore.

Add a sysfs attribute file for console IBM 3215 named con_drop.
This allows for changing the behavior after the boot, for example when
during interactive debugging a panic/crash is expected.

Here is a test of the new behavior using the following test program:
 #/bin/bash
 declare -i cnt=4

 mode=$(cat /sys/bus/ccw/drivers/3215/con_drop)
 [ $mode = yes ] && cnt=25

 echo "cons_drop $(cat /sys/bus/ccw/drivers/3215/con_drop)"
 echo "vmcp term more 5 2"
 vmcp term more 5 2
 echo "Run $cnt iterations of "'echo t > /proc/sysrq-trigger'

 for i in $(seq $cnt)
 do
	echo "$i. command 'echo t > /proc/sysrq-trigger' at $(date +%F,%T)"
	echo t > /proc/sysrq-trigger
	sleep 1
 done
 echo "droptest done" > /dev/kmsg
 #

Output with sysfs attribute con_drop set to 1:
 # ./droptest.sh
 cons_drop yes
 vmcp term more 5 2
 Run 25 iterations of echo t > /proc/sysrq-trigger
 1. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:09
 2. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:10
 3. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:11
 4. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:12
 5. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:13
 6. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:14
 7. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:15
 8. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:16
 9. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:17
 10. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:18
 11. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:19
 12. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:20
 13. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:21
 14. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:22
 15. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:23
 16. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:24
 17. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:25
 18. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:26
 19. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:27
 20. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:28
 21. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:29
 22. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:30
 23. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:31
 24. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:32
 25. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:33
 #

There are no hangs anymore.

Output with sysfs attribute con_drop set to 0 and identical
setting for z/VM console 'term more 5 2'. Sometimes hitting the
clear key at the x3270 console to progress output.

 # ./droptest.sh
 cons_drop no
 vmcp term more 5 2
 Run 4 iterations of echo t > /proc/sysrq-trigger
 1. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:20:58
 2. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:24:32
 3. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:28:04
 4. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:31:37
 #

Details:
Enable function raw3215_write() to handle tab expansion and newlines
and feed it with input not larger than the console buffer of 65536
bytes. Function raw3125_putchar() just forwards its character for
output to raw3215_write().

This moves tab to blank conversion to one function raw3215_write()
which also does call raw3215_make_room() to wait for enough free
buffer space.

Function handle_write() loops over all its input and segments input
into chunks of console buffer size (should the input be larger).

Rework tab expansion handling logic to avoid code duplication.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-10-26 14:46:51 +02:00
Hans Verkuil
de547896aa media: vivid.rst: loop_video is set on the capture devnode
The example on how to use and test Capture Overlay specified
the wrong video device node. Back in 2015 the loop_video control
moved from the output device to the capture device, but this
example code is still referring to the output video device.

Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2022-10-25 16:43:54 +01:00
Linus Torvalds
9d6e681d33 ACPI fixes for 6.1-rc2
- Add missing device reference counting to acpi_get_pci_dev() after
    changing it recently (Rafael Wysocki).
 
  - Fix resource list walk in acpi_dma_get_range() (Robin Murphy).
 
  - Add IRQ override quirk for LENOVO IdeaPad and extend the IRQ
    override warning message (Jiri Slaby).
 
  - Fix integer overflow in ghes_estatus_pool_init() (Ashish Kalra).
 
  - Fix multiple error records handling in one of the ACPI extlog driver
    code paths (Tony Luck).
 
  - Prune DSDT override documentation from index after dropping it (Bagas
    Sanjaya).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmNS5j0SHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxBCIQALRw66b/rUiobFH2shHthACDeIuFa25U
 35/F97bitR6kTPkQKiKtoWTGd+d+1TL7z7kOWS9ontQB91OXN877K3j7h5lFtemw
 PPBL+e13VCg6ZfUxpVocMTnH/nyfLIMAAX335i+ogknyW0SrISwmBhb/luvzzpiR
 0htCcnH3ia6pDiCbr1+Q6wXOYA3anWsm7hN6QfE4QE0AjjP/TXe+VdA5ntX7zKKO
 TIjMgso5drLoGTVA0gUWSIR5mqvqYE+9QWTU0yr90fG3sap9Jj3DzSaF5Ctb2cYB
 B9w5KWL5H8KfeJxgYtSPGG7iZ11IrMUWTFiKG+NBAZ2+qvonW6kcmXWi0jr9Lqt/
 hW4t1luA/jk/b9Gg+izSmvMSJmJLeRLXp1echTUn3dnl8vWE04Di1oKIaNoE3s1p
 ufn5auOJLvdgHqqHRjbZUPiYqfoEp7Awqy1tbg+g+BKZwKUwMCX2ej2QlI8M1nBI
 ayB7YqfLblBuvS7GEs0Xsuv/T9Q3Ntl5UVHWhCzekBjwFmtuOItOxVmbubtZgSex
 j66CtPhw9LpGeiOqs28GcOBYU4iFDeY10T70u5XApJ2Qo+A8gH38hUNOuBTd6Nu+
 qU4qira6Ij72lcpnA7MfwULYY3tmhHxe0N9F7ql8MiXQVUMsihb1Y9q/5qHz3bVu
 p6gepNzValgR
 =w4gn
 -----END PGP SIGNATURE-----

Merge tag 'acpi-6.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
 "These fix issues introduced during this merge window (ACPI/PCI, device
  enumeration and documentation) and some other ones found recently.

  Specifics:

   - Add missing device reference counting to acpi_get_pci_dev() after
     changing it recently (Rafael Wysocki)

   - Fix resource list walk in acpi_dma_get_range() (Robin Murphy)

   - Add IRQ override quirk for LENOVO IdeaPad and extend the IRQ
     override warning message (Jiri Slaby)

   - Fix integer overflow in ghes_estatus_pool_init() (Ashish Kalra)

   - Fix multiple error records handling in one of the ACPI extlog
     driver code paths (Tony Luck)

   - Prune DSDT override documentation from index after dropping it
     (Bagas Sanjaya)"

* tag 'acpi-6.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: scan: Fix DMA range assignment
  ACPI: PCI: Fix device reference counting in acpi_get_pci_dev()
  ACPI: resource: note more about IRQ override
  ACPI: resource: do IRQ override on LENOVO IdeaPad
  ACPI: extlog: Handle multiple records
  ACPI: APEI: Fix integer overflow in ghes_estatus_pool_init()
  Documentation: ACPI: Prune DSDT override documentation from index
2022-10-21 18:08:30 -07:00
Rafael J. Wysocki
3f8deab61e Merge branches 'acpi-scan', 'acpi-resource', 'acpi-apei', 'acpi-extlog' and 'acpi-docs'
Merge assorted ACPI fixes for 6.1-rc2:

 - Fix resource list walk in acpi_dma_get_range() (Robin Murphy).

 - Add IRQ override quirk for LENOVO IdeaPad and extend the IRQ
   override warning message (Jiri Slaby).

 - Fix integer overflow in ghes_estatus_pool_init() (Ashish Kalra).

 - Fix multiple error records handling in one of the ACPI extlog driver
   code paths (Tony Luck).

 - Prune DSDT override documentation from index after dropping it (Bagas
   Sanjaya).

* acpi-scan:
  ACPI: scan: Fix DMA range assignment

* acpi-resource:
  ACPI: resource: note more about IRQ override
  ACPI: resource: do IRQ override on LENOVO IdeaPad

* acpi-apei:
  ACPI: APEI: Fix integer overflow in ghes_estatus_pool_init()

* acpi-extlog:
  ACPI: extlog: Handle multiple records

* acpi-docs:
  Documentation: ACPI: Prune DSDT override documentation from index
2022-10-21 20:07:41 +02:00
Stephen Kitt
aadc0cd52f docs: sysctl/fs: re-order, prettify
This brings the text markup in line with sysctl/abi and
sysctl/kernel:

* the entries are ordered alphabetically
* the table of contents is automatically generated
* markup is used as appropriate for constants etc.

The content isn't fully up-to-date but the obsolete entries are gone,
so remove the kernel version mention.

Signed-off-by: Stephen Kitt <steve@sk2.org>
Link: https://lore.kernel.org/r/20220930102937.135841-6-steve@sk2.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-10-19 16:42:09 -06:00
Stephen Kitt
4ce463179a docs: sysctl/fs: remove references to super-max/-nr
These were removed in 2.4.7.8. Remove references to super-max and
super-nr in the sysctl documentation.

Signed-off-by: Stephen Kitt <steve@sk2.org>
Link: https://lore.kernel.org/r/20220930102937.135841-5-steve@sk2.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-10-19 16:42:09 -06:00
Stephen Kitt
c46ff5c5b8 docs: sysctl/fs: merge the aio sections
There are two sections documenting aio-nr and aio-max-nr, merge them.
I kept the second explanation of aio-nr, which seems clearer to me,
along with the effects of the values from the first section.

Signed-off-by: Stephen Kitt <steve@sk2.org>
Link: https://lore.kernel.org/r/20220930102937.135841-4-steve@sk2.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-10-19 16:42:09 -06:00
Stephen Kitt
81885a45b3 docs: sysctl/fs: remove references to dquot-max/-nr
dquot-max was removed in 2.4.10.5; dquot-nr was replaced with dqstats
in 2.5.18 which is now /proc/sys/fs/quota. Remove references to
dquot-max and dquot-nr in the sysctl documentation.

Signed-off-by: Stephen Kitt <steve@sk2.org>
Link: https://lore.kernel.org/r/20220930102937.135841-3-steve@sk2.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-10-19 16:42:09 -06:00
Stephen Kitt
a75c6589fe docs: sysctl/fs: remove references to inode-max
inode-max was removed in 2.3.20pre1, remove references to it in the
sysctl documentation.

Signed-off-by: Stephen Kitt <steve@sk2.org>
Link: https://lore.kernel.org/r/20220930102937.135841-2-steve@sk2.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-10-19 16:42:08 -06:00
Milan Broz
dc3efedf9f dm verity: Add documentation for try_verify_in_tasklet option
Add documentation that was missing from commit 5721d4e5a9 ("dm
verity: Add optional "try_verify_in_tasklet" feature").

Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-10-18 17:17:48 -04:00
Linus Torvalds
bd9a3dba18 PSI updates for v6.1:
- Various performance optimizations, resulting in a 4%-9% speedup
    in the mmtests/config-scheduler-perfpipe micro-benchmark.
 
  - New interface to turn PSI on/off on a per cgroup level.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmNJKPsRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iPmg//aovCitAQX2lLoHJDIgdQibU40oaEpKTX
 wM549EGz3Dr6qmwF8+qT1U2Ge6af/hHQc5G/ZqDpKbuTjUIc3RmBkqX80dNKFLuH
 uyi9UtfsSriw+ks8fWuDdjr+S4oppwW9ZoIXvK8v4bisd3F31DNGvKPTayNxt73m
 lExfzJiD1oJixDxGX8MGO9QpcoywmjWjzjrB2P+J8hnTpArouHx/HOKdQOpG6wXq
 ZRr9kZvju6ucDpXCTa1HJrfVRxNAh35tx/b4cDtXbBFifVAeKaPOrHapMTVsqfel
 Z7T+2DymhidNYK0hrRJoGUwa/vkz+2Sm1ZLG9LlgUCXVco/9S1zw1ZuQakVvzPen
 wriuxRaAkR+szCP0L8js5+/DAkGa43MjKsvQHmDVnetQtlsAD4eYnn+alQ837SXv
 MP3jwFqF+e4mcWdoQcfh0OWUgGec5XZzdgRYrFkBKyTWGLB2iPivcAMNf0X/h82Q
 xxv4DQJIIJ017GOQ/ho2saq+GbtFCvX8YnGYas9T47Bjjluhjo7jgTVtPTo+mhtN
 RfwMdG718Ap/gvnAX7wMe/t+L/4AP8AIgDRi5L35dTRqETwOjH+LAvOYjleQFYgu
 kMVtLMyzU+TGwHscuzPFRh7TnvSJ4sD48Ll1BPnyZsh3SS9u0gAs1bml7Cu7JbmW
 SIZD/S/hzdI=
 =91tB
 -----END PGP SIGNATURE-----

Merge tag 'sched-psi-2022-10-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull PSI updates from Ingo Molnar:

 - Various performance optimizations, resulting in a 4%-9% speedup in
   the mmtests/config-scheduler-perfpipe micro-benchmark.

 - New interface to turn PSI on/off on a per cgroup level.

* tag 'sched-psi-2022-10-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/psi: Per-cgroup PSI accounting disable/re-enable interface
  sched/psi: Cache parent psi_group to speed up group iteration
  sched/psi: Consolidate cgroup_psi()
  sched/psi: Add PSI_IRQ to track IRQ/SOFTIRQ pressure
  sched/psi: Remove NR_ONCPU task accounting
  sched/psi: Optimize task switch inside shared cgroups again
  sched/psi: Move private helpers to sched/stats.h
  sched/psi: Save percpu memory when !psi_cgroups_enabled
  sched/psi: Don't create cgroup PSI files when psi_disabled
  sched/psi: Fix periodic aggregation shut off
2022-10-14 13:03:00 -07:00
Bagas Sanjaya
83439a0f1c Documentation: ACPI: Prune DSDT override documentation from index
Commit d206cef03c ("ACPI: docs: Drop useless DSDT override documentation")
removes useless DSDT override documentation. However, the commit forgets
to prune the documentation entry from table of contents of ACPI admin
guide documentation, hence triggers Sphinx warning:

Documentation/admin-guide/acpi/index.rst:8: WARNING: toctree contains reference to nonexisting document 'admin-guide/acpi/dsdt-override'

Prune the entry to fix the warning.

Fixes: d206cef03c ("ACPI: docs: Drop useless DSDT override documentation")
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-10-13 20:33:12 +02:00
Linus Torvalds
778ce723e9 xen: branch for v6.1-rc1
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCY0ZjFAAKCRCAXGG7T9hj
 vjEsAP4rFMnqc6AXy4Mpvv8cxBtEuQZbwEqgBrMJUvK1jZQrBQD/dOJK2GBCVcfD
 2yaVlefFiJGTw5WUlbPeohUlTZ8pJwg=
 =xsHV
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-6.1-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen updates from Juergen Gross:

 - Some minor typo fixes

 - A fix of the Xen pcifront driver for supporting the device model to
   run in a Linux stub domain

 - A cleanup of the pcifront driver

 - A series to enable grant-based virtio with Xen on x86

 - A cleanup of Xen PV guests to distinguish between safe and faulting
   MSR accesses

 - Two fixes of the Xen gntdev driver

 - Two fixes of the new xen grant DMA driver

* tag 'for-linus-6.1-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: Kconfig: Fix spelling mistake "Maxmium" -> "Maximum"
  xen/pv: support selecting safe/unsafe msr accesses
  xen/pv: refactor msr access functions to support safe and unsafe accesses
  xen/pv: fix vendor checks for pmu emulation
  xen/pv: add fault recovery control to pmu msr accesses
  xen/virtio: enable grant based virtio on x86
  xen/virtio: use dom0 as default backend for CONFIG_XEN_VIRTIO_FORCE_GRANT
  xen/virtio: restructure xen grant dma setup
  xen/pcifront: move xenstore config scanning into sub-function
  xen/gntdev: Accommodate VMA splitting
  xen/gntdev: Prevent leaking grants
  xen/virtio: Fix potential deadlock when accessing xen_grant_dma_devices
  xen/virtio: Fix n_pages calculation in xen_grant_dma_map(unmap)_page()
  xen/xenbus: Fix spelling mistake "hardward" -> "hardware"
  xen-pcifront: Handle missed Connected state
2022-10-12 14:39:38 -07:00
Linus Torvalds
676cb49573 - hfs and hfsplus kmap API modernization from Fabio Francesco
- Valentin Schneider makes crash-kexec work properly when invoked from
   an NMI-time panic.
 
 - ntfs bugfixes from Hawkins Jiawei
 
 - Jiebin Sun improves IPC msg scalability by replacing atomic_t's with
   percpu counters.
 
 - nilfs2 cleanups from Minghao Chi
 
 - lots of other single patches all over the tree!
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY0Yf0gAKCRDdBJ7gKXxA
 joapAQDT1d1zu7T8yf9cQXkYnZVuBKCjxKE/IsYvqaq1a42MjQD/SeWZg0wV05B8
 DhJPj9nkEp6R3Rj3Mssip+3vNuceAQM=
 =lUQY
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:

 - hfs and hfsplus kmap API modernization (Fabio Francesco)

 - make crash-kexec work properly when invoked from an NMI-time panic
   (Valentin Schneider)

 - ntfs bugfixes (Hawkins Jiawei)

 - improve IPC msg scalability by replacing atomic_t's with percpu
   counters (Jiebin Sun)

 - nilfs2 cleanups (Minghao Chi)

 - lots of other single patches all over the tree!

* tag 'mm-nonmm-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (71 commits)
  include/linux/entry-common.h: remove has_signal comment of arch_do_signal_or_restart() prototype
  proc: test how it holds up with mapping'less process
  mailmap: update Frank Rowand email address
  ia64: mca: use strscpy() is more robust and safer
  init/Kconfig: fix unmet direct dependencies
  ia64: update config files
  nilfs2: replace WARN_ONs by nilfs_error for checkpoint acquisition failure
  fork: remove duplicate included header files
  init/main.c: remove unnecessary (void*) conversions
  proc: mark more files as permanent
  nilfs2: remove the unneeded result variable
  nilfs2: delete unnecessary checks before brelse()
  checkpatch: warn for non-standard fixes tag style
  usr/gen_init_cpio.c: remove unnecessary -1 values from int file
  ipc/msg: mitigate the lock contention with percpu counter
  percpu: add percpu_counter_add_local and percpu_counter_sub_local
  fs/ocfs2: fix repeated words in comments
  relay: use kvcalloc to alloc page array in relay_alloc_page_array
  proc: make config PROC_CHILDREN depend on PROC_FS
  fs: uninline inode_maybe_inc_iversion()
  ...
2022-10-12 11:00:22 -07:00
Juergen Gross
3fac3734c4 xen/pv: support selecting safe/unsafe msr accesses
Instead of always doing the safe variants for reading and writing MSRs
in Xen PV guests, make the behavior controllable via Kconfig option
and a boot parameter.

The default will be the current behavior, which is to always use the
safe variant.

Signed-off-by: Juergen Gross <jgross@suse.com>
2022-10-11 10:51:05 +02:00
Linus Torvalds
27bc50fc90 - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in
linux-next for a couple of months without, to my knowledge, any negative
   reports (or any positive ones, come to that).
 
 - Also the Maple Tree from Liam R.  Howlett.  An overlapping range-based
   tree for vmas.  It it apparently slight more efficient in its own right,
   but is mainly targeted at enabling work to reduce mmap_lock contention.
 
   Liam has identified a number of other tree users in the kernel which
   could be beneficially onverted to mapletrees.
 
   Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat
   (https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com).
   This has yet to be addressed due to Liam's unfortunately timed
   vacation.  He is now back and we'll get this fixed up.
 
 - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer.  It uses
   clang-generated instrumentation to detect used-unintialized bugs down to
   the single bit level.
 
   KMSAN keeps finding bugs.  New ones, as well as the legacy ones.
 
 - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of
   memory into THPs.
 
 - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to support
   file/shmem-backed pages.
 
 - userfaultfd updates from Axel Rasmussen
 
 - zsmalloc cleanups from Alexey Romanov
 
 - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and memory-failure
 
 - Huang Ying adds enhancements to NUMA balancing memory tiering mode's
   page promotion, with a new way of detecting hot pages.
 
 - memcg updates from Shakeel Butt: charging optimizations and reduced
   memory consumption.
 
 - memcg cleanups from Kairui Song.
 
 - memcg fixes and cleanups from Johannes Weiner.
 
 - Vishal Moola provides more folio conversions
 
 - Zhang Yi removed ll_rw_block() :(
 
 - migration enhancements from Peter Xu
 
 - migration error-path bugfixes from Huang Ying
 
 - Aneesh Kumar added ability for a device driver to alter the memory
   tiering promotion paths.  For optimizations by PMEM drivers, DRM
   drivers, etc.
 
 - vma merging improvements from Jakub Matěn.
 
 - NUMA hinting cleanups from David Hildenbrand.
 
 - xu xin added aditional userspace visibility into KSM merging activity.
 
 - THP & KSM code consolidation from Qi Zheng.
 
 - more folio work from Matthew Wilcox.
 
 - KASAN updates from Andrey Konovalov.
 
 - DAMON cleanups from Kaixu Xia.
 
 - DAMON work from SeongJae Park: fixes, cleanups.
 
 - hugetlb sysfs cleanups from Muchun Song.
 
 - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY0HaPgAKCRDdBJ7gKXxA
 joPjAQDZ5LlRCMWZ1oxLP2NOTp6nm63q9PWcGnmY50FjD/dNlwEAnx7OejCLWGWf
 bbTuk6U2+TKgJa4X7+pbbejeoqnt5QU=
 =xfWx
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in
   linux-next for a couple of months without, to my knowledge, any
   negative reports (or any positive ones, come to that).

 - Also the Maple Tree from Liam Howlett. An overlapping range-based
   tree for vmas. It it apparently slightly more efficient in its own
   right, but is mainly targeted at enabling work to reduce mmap_lock
   contention.

   Liam has identified a number of other tree users in the kernel which
   could be beneficially onverted to mapletrees.

   Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat
   at [1]. This has yet to be addressed due to Liam's unfortunately
   timed vacation. He is now back and we'll get this fixed up.

 - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses
   clang-generated instrumentation to detect used-unintialized bugs down
   to the single bit level.

   KMSAN keeps finding bugs. New ones, as well as the legacy ones.

 - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of
   memory into THPs.

 - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to
   support file/shmem-backed pages.

 - userfaultfd updates from Axel Rasmussen

 - zsmalloc cleanups from Alexey Romanov

 - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and
   memory-failure

 - Huang Ying adds enhancements to NUMA balancing memory tiering mode's
   page promotion, with a new way of detecting hot pages.

 - memcg updates from Shakeel Butt: charging optimizations and reduced
   memory consumption.

 - memcg cleanups from Kairui Song.

 - memcg fixes and cleanups from Johannes Weiner.

 - Vishal Moola provides more folio conversions

 - Zhang Yi removed ll_rw_block() :(

 - migration enhancements from Peter Xu

 - migration error-path bugfixes from Huang Ying

 - Aneesh Kumar added ability for a device driver to alter the memory
   tiering promotion paths. For optimizations by PMEM drivers, DRM
   drivers, etc.

 - vma merging improvements from Jakub Matěn.

 - NUMA hinting cleanups from David Hildenbrand.

 - xu xin added aditional userspace visibility into KSM merging
   activity.

 - THP & KSM code consolidation from Qi Zheng.

 - more folio work from Matthew Wilcox.

 - KASAN updates from Andrey Konovalov.

 - DAMON cleanups from Kaixu Xia.

 - DAMON work from SeongJae Park: fixes, cleanups.

 - hugetlb sysfs cleanups from Muchun Song.

 - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core.

Link: https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com [1]

* tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (555 commits)
  hugetlb: allocate vma lock for all sharable vmas
  hugetlb: take hugetlb vma_lock when clearing vma_lock->vma pointer
  hugetlb: fix vma lock handling during split vma and range unmapping
  mglru: mm/vmscan.c: fix imprecise comments
  mm/mglru: don't sync disk for each aging cycle
  mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol
  mm: memcontrol: use do_memsw_account() in a few more places
  mm: memcontrol: deprecate swapaccounting=0 mode
  mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled
  mm/secretmem: remove reduntant return value
  mm/hugetlb: add available_huge_pages() func
  mm: remove unused inline functions from include/linux/mm_inline.h
  selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory
  selftests/vm: add file/shmem MADV_COLLAPSE selftest for cleared pmd
  selftests/vm: add thp collapse shmem testing
  selftests/vm: add thp collapse file and tmpfs testing
  selftests/vm: modularize thp collapse memory operations
  selftests/vm: dedup THP helpers
  mm/khugepaged: add tracepoint to hpage_collapse_scan_file()
  mm/madvise: add file and shmem support to MADV_COLLAPSE
  ...
2022-10-10 17:53:04 -07:00
Linus Torvalds
f23cdfcd04 IOMMU Updates for Linux v6.1:
Including:
 
 	- Removal of the bus_set_iommu() interface which became
 	  unnecesary because of IOMMU per-device probing
 
 	- Make the dma-iommu.h header private
 
 	- Intel VT-d changes from Lu Baolu:
 	  - Decouple PASID and PRI from SVA
 	  - Add ESRTPS & ESIRTPS capability check
 	  - Cleanups
 
 	- Apple DART support for the M1 Pro/MAX SOCs
 
 	- Support for AMD IOMMUv2 page-tables for the DMA-API layer. The
 	  v2 page-tables are compatible with the x86 CPU page-tables.
 	  Using them for DMA-API prepares support for hardware-assisted
 	  IOMMU virtualization
 
 	- Support for MT6795 Helio X10 M4Us in the Mediatek IOMMU driver
 
 	- Some smaller fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmNEC5oACgkQK/BELZcB
 GuNcOQ/6A5SXmcvDRLYZW1ENM5Z6xsZ1LabSZkjhYSpmbJyu8Uny/Z2aRWqxPMLJ
 hJeHTsWSLhrTq1VfjFhELHB3kgT2DRr7H3LXXaMNC6qz690EcavX1wKX2AxH0m22
 8YrktkyAmFQ3BG6rsQLdlMMasLph/x06ix/xO9opQZVFdj/fV0Jx7ekX1JK+U3hx
 MI96i5W3G5PBVHBypAvjxSlmA4saj9Fhk7l3IZL7py9AOKz7NypuwWRs+86PMBiO
 EzLt5aF4g8pmKChF/c9BsoIbjBYvTG/s3NbycIng0ACc2SOvf+EvtoVZQclWifbT
 lwti9PLdsoVUnPOZHLYOTx4xSf/UyoLVzaLxJ52aoXnNYe2qaX5DANXhT2mWIY/Y
 z1mzOkShmK7WF7a8arRyqJeLJ4SvDx8GrbvLiom3DAzmqVHzzFGadHtt5fvGYN4F
 Jet/JIN3HjECQbamqtPBpWquBFhLmgusPksIiyMFscRvYdZqkaVkTkElcF3WqAMm
 QkeecfoTQ9Vdtdz44ZVLRjKpS77yRZmHshp1r/rfSI+9Ok8uRI+xmmcyrAI6ElqH
 DH14tLHPzw694rTHF+bTCd+pPMGOoFLi0xAfUXAeGWm1uzC1JIRrVu5JeQNOUOSD
 5SQDXB7dPrhXngaws5Fx2u3amCO3688mslcGgM7q54kC+LyVo0E=
 =h0sT
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu updates from Joerg Roedel:

 - remove the bus_set_iommu() interface which became unnecesary because
   of IOMMU per-device probing

 - make the dma-iommu.h header private

 - Intel VT-d changes from Lu Baolu:
	  - Decouple PASID and PRI from SVA
	  - Add ESRTPS & ESIRTPS capability check
	  - Cleanups

 - Apple DART support for the M1 Pro/MAX SOCs

 - support for AMD IOMMUv2 page-tables for the DMA-API layer.

   The v2 page-tables are compatible with the x86 CPU page-tables. Using
   them for DMA-API prepares support for hardware-assisted IOMMU
   virtualization

 - support for MT6795 Helio X10 M4Us in the Mediatek IOMMU driver

 - some smaller fixes and cleanups

* tag 'iommu-updates-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (59 commits)
  iommu/vt-d: Avoid unnecessary global DMA cache invalidation
  iommu/vt-d: Avoid unnecessary global IRTE cache invalidation
  iommu/vt-d: Rename cap_5lp_support to cap_fl5lp_support
  iommu/vt-d: Remove pasid_set_eafe()
  iommu/vt-d: Decouple PASID & PRI enabling from SVA
  iommu/vt-d: Remove unnecessary SVA data accesses in page fault path
  dt-bindings: iommu: arm,smmu-v3: Relax order of interrupt names
  iommu: dart: Support t6000 variant
  iommu/io-pgtable-dart: Add DART PTE support for t6000
  iommu/io-pgtable: Add DART subpage protection support
  iommu/io-pgtable: Move Apple DART support to its own file
  iommu/mediatek: Add support for MT6795 Helio X10 M4Us
  iommu/mediatek: Introduce new flag TF_PORT_TO_ADDR_MT8173
  dt-bindings: mediatek: Add bindings for MT6795 M4U
  iommu/iova: Fix module config properly
  iommu/amd: Fix sparse warning
  iommu/amd: Remove outdated comment
  iommu/amd: Free domain ID after domain_flush_pages
  iommu/amd: Free domain id in error path
  iommu/virtio: Fix compile error with viommu_capable()
  ...
2022-10-10 13:20:53 -07:00
Linus Torvalds
adf4bfc4a9 cgroup changes for v6.1-rc1.
* cpuset now support isolated cpus.partition type, which will enable dynamic
   CPU isolation.
 * pids.peak added to remember the max number of pids used.
 * Holes in cgroup namespace plugged.
 * Internal cleanups.
 
 Note that for-6.1-fixes was pulled into for-6.1 twice. Both were for
 follow-up cleanups and each merge commit has details.
 
 Also, 8a693f7766 ("cgroup: Remove CFTYPE_PRESSURE") removes the flag used
 by PSI changes in the tip tree and the merged result won't compile due to
 the missing flag. Simply removing the struct init lines specifying the flag
 is the correct resolution. linux-next already contains the correct fix:
 
  https://lkml.kernel.org/r/20220912161812.072aaa3b@canb.auug.org.au
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYIACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCYzsl7w4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGYsxAP4kad4YPw+CueLyyEMiYgBHouqDt8cG0+FJWK3X
 svTC7wD/eCLfxZM8TjjSrMmvaMrml586mr3NoQaFeW0x3twptQQ=
 =LERu
 -----END PGP SIGNATURE-----

Merge tag 'cgroup-for-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup updates from Tejun Heo:

 - cpuset now support isolated cpus.partition type, which will enable
   dynamic CPU isolation

 - pids.peak added to remember the max number of pids used

 - holes in cgroup namespace plugged

 - internal cleanups

* tag 'cgroup-for-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (25 commits)
  cgroup: use strscpy() is more robust and safer
  iocost_monitor: reorder BlkgIterator
  cgroup: simplify code in cgroup_apply_control
  cgroup: Make cgroup_get_from_id() prettier
  cgroup/cpuset: remove unreachable code
  cgroup: Remove CFTYPE_PRESSURE
  cgroup: Improve cftype add/rm error handling
  kselftest/cgroup: Add cpuset v2 partition root state test
  cgroup/cpuset: Update description of cpuset.cpus.partition in cgroup-v2.rst
  cgroup/cpuset: Make partition invalid if cpumask change violates exclusivity rule
  cgroup/cpuset: Relocate a code block in validate_change()
  cgroup/cpuset: Show invalid partition reason string
  cgroup/cpuset: Add a new isolated cpus.partition type
  cgroup/cpuset: Relax constraints to partition & cpus changes
  cgroup/cpuset: Allow no-task partition to have empty cpuset.cpus.effective
  cgroup/cpuset: Miscellaneous cleanups & add helper functions
  cgroup/cpuset: Enable update_tasks_cpumask() on top_cpuset
  cgroup: add pids.peak interface for pids controller
  cgroup: Remove data-race around cgrp_dfl_visible
  cgroup: Fix build failure when CONFIG_SHRINKER_DEBUG
  ...
2022-10-10 11:12:25 -07:00
Linus Torvalds
4899a36f91 powerpc updates for 6.1
- Remove our now never-true definitions for pgd_huge() and p4d_leaf().
 
  - Add pte_needs_flush() and huge_pmd_needs_flush() for 64-bit.
 
  - Add support for syscall wrappers.
 
  - Add support for KFENCE on 64-bit.
 
  - Update 64-bit HV KVM to use the new guest state entry/exit accounting API.
 
  - Support execute-only memory when using the Radix MMU (P9 or later).
 
  - Implement CONFIG_PARAVIRT_TIME_ACCOUNTING for pseries guests.
 
  - Updates to our linker script to move more data into read-only sections.
 
  - Allow the VDSO to be randomised on 32-bit.
 
  - Many other small features and fixes.
 
 Thanks to: Andrew Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Athira Rajeev, Christophe
 Leroy, David Hildenbrand, Disha Goel, Fabiano Rosas, Gaosheng Cui, Gustavo A. R. Silva,
 Haren Myneni, Hari Bathini, Jilin Yuan, Joel Stanley, Kajol Jain, Kees Cook, Krzysztof
 Kozlowski, Laurent Dufour, Liang He, Li Huafei, Lukas Bulwahn, Madhavan Srinivasan, Nathan
 Chancellor, Nathan Lynch, Nicholas Miehlbradt, Nicholas Piggin, Pali Rohár, Rohan McLure,
 Russell Currey, Sachin Sant, Segher Boessenkool, Shrikanth Hegde, Tyrel Datwyler, Wolfram
 Sang, ye xingchen, Zheng Yongjun.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmNCpBMTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgDx3EACCf86iumFF3RyvENtDwoTRgH3H0z2E
 /ZC4LKrtxgaPFJzKUT4F0kLK85Hw5GzMEKK42NIhAB0o5vFwmEzxOtnlHOyEufAm
 EDIZDIfxV2J9Qx/cW2DSojPj/o9O6noXwhw9SBqMwiDWd8gXmNgOUEklAO7aR7Vq
 Ne2N2FLMNthZydCoHR6dAEjfe2ceFXP5cALwzQO+ILDdZQ0UcF2Yq4yw/gEDoCrB
 FH7mmE7UaQQHvYzo85VTZu7XfUys1P7kUcnhVurOg7/07ITnvnQR+itKZXC+bSft
 1K7ULtjd2QiCgxZA/apFc3lO46kqHVFsB3onRQw12/Ku5vfGFfY0L0iK97OgM4s0
 0u4r+J7A+MM5YBJVVjwZ6woYO5CWMHYKBZepxOpcvftPxj1LNkiHsryqKILGISEC
 aIY/lI0hpeNU4QshDMXzSTgeb/VF9O5cGPncTPkOFbXxD4RpVyz8tSngsG1+D8lj
 S6B2h3k4A14rnblLOxP22jcedBlTYQcRQS4vwr0a7+63QTjfSJ12xT3ucIAKU9f7
 65rVSS/igbrfxqHDmrd60WWZBMXeK0Zy7YIG6iYPTxpP31eFpSp9wtDlV7V2+EH2
 F2p+TJY8aTA8UW+2L5gigN3RsBeeEB8zxJkB14ivICM7+XzVu11PxPDqjDZYkfzC
 ueKKvCcHhHAYqQ==
 =TFBA
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - Remove our now never-true definitions for pgd_huge() and p4d_leaf().

 - Add pte_needs_flush() and huge_pmd_needs_flush() for 64-bit.

 - Add support for syscall wrappers.

 - Add support for KFENCE on 64-bit.

 - Update 64-bit HV KVM to use the new guest state entry/exit accounting
   API.

 - Support execute-only memory when using the Radix MMU (P9 or later).

 - Implement CONFIG_PARAVIRT_TIME_ACCOUNTING for pseries guests.

 - Updates to our linker script to move more data into read-only
   sections.

 - Allow the VDSO to be randomised on 32-bit.

 - Many other small features and fixes.

Thanks to Andrew Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Athira
Rajeev, Christophe Leroy, David Hildenbrand, Disha Goel, Fabiano Rosas,
Gaosheng Cui, Gustavo A. R. Silva, Haren Myneni, Hari Bathini, Jilin
Yuan, Joel Stanley, Kajol Jain, Kees Cook, Krzysztof Kozlowski, Laurent
Dufour, Liang He, Li Huafei, Lukas Bulwahn, Madhavan Srinivasan, Nathan
Chancellor, Nathan Lynch, Nicholas Miehlbradt, Nicholas Piggin, Pali
Rohár, Rohan McLure, Russell Currey, Sachin Sant, Segher Boessenkool,
Shrikanth Hegde, Tyrel Datwyler, Wolfram Sang, ye xingchen, and Zheng
Yongjun.

* tag 'powerpc-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (214 commits)
  KVM: PPC: Book3S HV: Fix stack frame regs marker
  powerpc: Don't add __powerpc_ prefix to syscall entry points
  powerpc/64s/interrupt: Fix stack frame regs marker
  powerpc/64: Fix msr_check_and_set/clear MSR[EE] race
  powerpc/64s/interrupt: Change must-hard-mask interrupt check from BUG to WARN
  powerpc/pseries: Add firmware details to the hardware description
  powerpc/powernv: Add opal details to the hardware description
  powerpc: Add device-tree model to the hardware description
  powerpc/64: Add logical PVR to the hardware description
  powerpc: Add PVR & CPU name to hardware description
  powerpc: Add hardware description string
  powerpc/configs: Enable PPC_UV in powernv_defconfig
  powerpc/configs: Update config files for removed/renamed symbols
  powerpc/mm: Fix UBSAN warning reported on hugetlb
  powerpc/mm: Always update max/min_low_pfn in mem_topology_setup()
  powerpc/mm/book3s/hash: Rename flush_tlb_pmd_range
  powerpc: Drops STABS_DEBUG from linker scripts
  powerpc/64s: Remove lost/old comment
  powerpc/64s: Remove old STAB comment
  powerpc: remove orphan systbl_chk.sh
  ...
2022-10-09 14:05:15 -07:00
Linus Torvalds
ef688f8b8c The first batch of KVM patches, mostly covering x86, which I
am sending out early due to me travelling next week.  There is a
 lone mm patch for which Andrew gave an informal ack at
 https://lore.kernel.org/linux-mm/20220817102500.440c6d0a3fce296fdf91bea6@linux-foundation.org.
 
 I will send the bulk of ARM work, as well as other
 architectures, at the end of next week.
 
 ARM:
 
 * Account stage2 page table allocations in memory stats.
 
 x86:
 
 * Account EPT/NPT arm64 page table allocations in memory stats.
 
 * Tracepoint cleanups/fixes for nested VM-Enter and emulated MSR accesses.
 
 * Drop eVMCS controls filtering for KVM on Hyper-V, all known versions of
   Hyper-V now support eVMCS fields associated with features that are
   enumerated to the guest.
 
 * Use KVM's sanitized VMCS config as the basis for the values of nested VMX
   capabilities MSRs.
 
 * A myriad event/exception fixes and cleanups.  Most notably, pending
   exceptions morph into VM-Exits earlier, as soon as the exception is
   queued, instead of waiting until the next vmentry.  This fixed
   a longstanding issue where the exceptions would incorrecly become
   double-faults instead of triggering a vmexit; the common case of
   page-fault vmexits had a special workaround, but now it's fixed
   for good.
 
 * A handful of fixes for memory leaks in error paths.
 
 * Cleanups for VMREAD trampoline and VMX's VM-Exit assembly flow.
 
 * Never write to memory from non-sleepable kvm_vcpu_check_block()
 
 * Selftests refinements and cleanups.
 
 * Misc typo cleanups.
 
 Generic:
 
 * remove KVM_REQ_UNHALT
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmM2zwcUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNpbwf+MlVeOlzE5SBdrJ0TEnLmKUel1lSz
 QnZzP5+D65oD0zhCilUZHcg6G4mzZ5SdVVOvrGJvA0eXh25ruLNMF6jbaABkMLk/
 FfI1ybN7A82hwJn/aXMI/sUurWv4Jteaad20JC2DytBCnsW8jUqc49gtXHS2QWy4
 3uMsFdpdTAg4zdJKgEUfXBmQviweVpjjl3ziRyZZ7yaeo1oP7XZ8LaE1nR2l5m0J
 mfjzneNm5QAnueypOh5KhSwIvqf6WHIVm/rIHDJ1HIFbgfOU0dT27nhb1tmPwAcE
 +cJnnMUHjZqtCXteHkAxMClyRq0zsEoKk0OGvSOOMoq3Q0DavSXUNANOig==
 =/hqX
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "The first batch of KVM patches, mostly covering x86.

  ARM:

   - Account stage2 page table allocations in memory stats

  x86:

   - Account EPT/NPT arm64 page table allocations in memory stats

   - Tracepoint cleanups/fixes for nested VM-Enter and emulated MSR
     accesses

   - Drop eVMCS controls filtering for KVM on Hyper-V, all known
     versions of Hyper-V now support eVMCS fields associated with
     features that are enumerated to the guest

   - Use KVM's sanitized VMCS config as the basis for the values of
     nested VMX capabilities MSRs

   - A myriad event/exception fixes and cleanups. Most notably, pending
     exceptions morph into VM-Exits earlier, as soon as the exception is
     queued, instead of waiting until the next vmentry. This fixed a
     longstanding issue where the exceptions would incorrecly become
     double-faults instead of triggering a vmexit; the common case of
     page-fault vmexits had a special workaround, but now it's fixed for
     good

   - A handful of fixes for memory leaks in error paths

   - Cleanups for VMREAD trampoline and VMX's VM-Exit assembly flow

   - Never write to memory from non-sleepable kvm_vcpu_check_block()

   - Selftests refinements and cleanups

   - Misc typo cleanups

  Generic:

   - remove KVM_REQ_UNHALT"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (94 commits)
  KVM: remove KVM_REQ_UNHALT
  KVM: mips, x86: do not rely on KVM_REQ_UNHALT
  KVM: x86: never write to memory from kvm_vcpu_check_block()
  KVM: x86: Don't snapshot pending INIT/SIPI prior to checking nested events
  KVM: nVMX: Make event request on VMXOFF iff INIT/SIPI is pending
  KVM: nVMX: Make an event request if INIT or SIPI is pending on VM-Enter
  KVM: SVM: Make an event request if INIT or SIPI is pending when GIF is set
  KVM: x86: lapic does not have to process INIT if it is blocked
  KVM: x86: Rename kvm_apic_has_events() to make it INIT/SIPI specific
  KVM: x86: Rename and expose helper to detect if INIT/SIPI are allowed
  KVM: nVMX: Make an event request when pending an MTF nested VM-Exit
  KVM: x86: make vendor code check for all nested events
  mailmap: Update Oliver's email address
  KVM: x86: Allow force_emulation_prefix to be written without a reload
  KVM: selftests: Add an x86-only test to verify nested exception queueing
  KVM: selftests: Use uapi header to get VMX and SVM exit reasons/codes
  KVM: x86: Rename inject_pending_events() to kvm_check_and_inject_events()
  KVM: VMX: Update MTF and ICEBP comments to document KVM's subtle behavior
  KVM: x86: Treat pending TRIPLE_FAULT requests as pending exceptions
  KVM: x86: Morph pending exceptions to pending VM-Exits at queue time
  ...
2022-10-09 09:39:55 -07:00
Linus Torvalds
e8bc52cb8d Driver core changes for 6.1-rc1
Here is the big set of driver core and debug printk changes for 6.1-rc1.
 Included in here is:
 	- dynamic debug updates for the core and the drm subsystem.  The
 	  drm changes have all been acked by the relevant maintainers.
 	- kernfs fixes for syzbot reported problems
 	- kernfs refactors and updates for cgroup requirements
 	- magic number cleanups and removals from the kernel tree (they
 	  were not being used and they really did not actually do
 	  anything.)
 	- other tiny cleanups
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCY0BYUA8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ylozwCdFRlcghaf7XBUyNgRZRwMC+oQI8EAn1G/nEDE
 6aFd2er41uK0IGQnSmYO
 =OK0k
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the big set of driver core and debug printk changes for
  6.1-rc1. Included in here is:

   - dynamic debug updates for the core and the drm subsystem. The drm
     changes have all been acked by the relevant maintainers

   - kernfs fixes for syzbot reported problems

   - kernfs refactors and updates for cgroup requirements

   - magic number cleanups and removals from the kernel tree (they were
     not being used and they really did not actually do anything)

   - other tiny cleanups

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'driver-core-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (74 commits)
  docs: filesystems: sysfs: Make text and code for ->show() consistent
  Documentation: NBD_REQUEST_MAGIC isn't a magic number
  a.out: restore CMAGIC
  device property: Add const qualifier to device_get_match_data() parameter
  drm_print: add _ddebug descriptor to drm_*dbg prototypes
  drm_print: prefer bare printk KERN_DEBUG on generic fn
  drm_print: optimize drm_debug_enabled for jump-label
  drm-print: add drm_dbg_driver to improve namespace symmetry
  drm-print.h: include dyndbg header
  drm_print: wrap drm_*_dbg in dyndbg descriptor factory macro
  drm_print: interpose drm_*dbg with forwarding macros
  drm: POC drm on dyndbg - use in core, 2 helpers, 3 drivers.
  drm_print: condense enum drm_debug_category
  debugfs: use DEFINE_SHOW_ATTRIBUTE to define debugfs_regset32_fops
  driver core: use IS_ERR_OR_NULL() helper in device_create_groups_vargs()
  Documentation: ENI155_MAGIC isn't a magic number
  Documentation: NBD_REPLY_MAGIC isn't a magic number
  nbd: remove define-only NBD_MAGIC, previously magic number
  Documentation: FW_HEADER_MAGIC isn't a magic number
  Documentation: EEPROM_MAGIC_VALUE isn't a magic number
  ...
2022-10-07 17:04:10 -07:00
Linus Torvalds
ffb39098bf linux-kselftest-kunit-6.1-rc1
This KUnit update for Linux 6.1-rc1 consists of several documentation
 fixes, UML related cleanups, and a feature to enable/disable KUnit
 tests. This update includes the following change to
 
 - rename all_test_uml.config, use it for --alltests
 
 Note: if anyone was using all_tests_uml.config, this change breaks them.
 This change simplifies the usage and eliminates the need to type:
 --kunitconfig=tools/testing/kunit/configs/all_tests_uml.config.
 
 A simple workaround to create a symlink to the new name can solve the
 problem for anyone using all_tests_uml.config.
 
 all_tests_uml.config should work across ~all architectures.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmM97okACgkQCwJExA0N
 Qxwe9BAA4rUh/CdaDpGxSoknamlSlXbcsxyHQK32OoqtIQPJfPcqxZwZhqGzpZAE
 z7qqnfN7dGWACcJjaZYZtkjJ1tsJ7OADf6m81drM6lmHSDRldAhdXMK9qqJAA2pk
 OMciu7TcoSezaMw3eKZZ8jeQ2OIEUirQQXFIuMkb9r+E65tyIGCVo9njY19is6bk
 7TcrUNlzgTednzkJ7B5BY54vZjvET/kwwBhXUgygMoi0JkKIoksJcxF2y7x1UKUx
 Kfu/3bqiBAAEiOdMLxnxqwNH/Uhny1IWV6w+p0gtEsVGS47QvRew7gtIuKgCZWpf
 S/w7QL+zCaXGB3jM8KVk7p4L9qU6EIttqu7kwXSs5OZz8QVrw6oc1l0RDUI5NIWi
 TLRO4QJHWssKNrSyldtk2J9wFEHOF48Hc6HYqHxJLFYwfutt5hzsKotN6+H60Uth
 TVAE+oUWknQks1Imv7x3Avq5k6CyfH9dFqqH+IrM2hDbqkZN1hfHvm1RJgcXFaHf
 f4UpAYwCgkb3Maa7mTBq2/x98UUhh051EQuHDUO6H0tzccxJUbwkeeGK+WH51aZu
 d6RD4ZnH4BlB+EMbfXgkXdsGqcsQMneoLh5/V0y1L8aSyjJQ8FYlK5lQYwu9VBXQ
 6g6AH4nlhvVplXgCDeGnnTa7HfTD87Vx9Dws+uO+J/QbwqTatI4=
 =Q5W1
 -----END PGP SIGNATURE-----

Merge tag 'linux-kselftest-kunit-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull KUnit updates from Shuah Khan:
 "Several documentation fixes, UML related cleanups, and a feature to
  enable/disable KUnit tests

  This includes the change to rename all_test_uml.config, and use it for
  '--alltests'. Note: if anyone was using all_tests_uml.config, this
  change breaks them.

  This change simplifies the usage and eliminates the need to type:

     --kunitconfig=tools/testing/kunit/configs/all_tests_uml.config

  A simple workaround to create a symlink to the new name can solve the
  problem for anyone using all_tests_uml.config.

  all_tests_uml.config should work across ~all architectures"

* tag 'linux-kselftest-kunit-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  Documentation: Kunit: Use full path to .kunitconfig
  kunit: tool: rename all_test_uml.config, use it for --alltests
  kunit: tool: remove UML specific options from all_tests_uml.config
  lib: stackinit: update reference to kunit-tool
  lib: overflow: update reference to kunit-tool
  Documentation: KUnit: update links in the index page
  Documentation: KUnit: add intro to the getting-started page
  Documentation: KUnit: Reword start guide for selecting tests
  Documentation: KUnit: add note about mrproper in start.rst
  Documentation: KUnit: avoid repeating "kunit.py run" in start.rst
  Documentation: KUnit: remove duplicated docs for kunit_tool
  Documentation: Kunit: Add ref for other kinds of tests
  Documentation: KUnit: Fix non-uml anchor
  Documentation: Kunit: Fix inconsistent titles
  Documentation: kunit: fix trivial typo
  kunit: no longer call module_info(test, "Y") for kunit modules
  kunit: add kunit.enable to enable/disable KUnit test
  kunit: tool: make --raw_output=kunit (aka --raw_output) preserve leading spaces
2022-10-06 12:57:55 -07:00
Linus Torvalds
dd42d9c3f4 linux-kselftest-next-6.1-rc1
This Kselftest update for Linux 6.1-rc1 consists of fixes and new tests.
 
 - Adds a amd-pstate-ut test module, this module is used by kselftest
   to unit test amd-pstate functionality
 - Fixes and cleanups to to cpu-hotplug to delete the fault injection
   test code
 - Improvements to vm test to use top_srcdir for builds
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmM912gACgkQCwJExA0N
 QxzfAhAAspak4B2FVZdXrb7ulGxCV2CSd49qXbv++7rBy3Ey17XlIO8pUj2FMPpx
 nBrD2UD+ET/6wHjSzkcSzU47N2BnhetuUWqN2vPmuYo4c8xDME9VSWeoKQ9Yw5rc
 6r6TxNG/Jbxab4y9r8jibwkidVzcgfYO/Ecnj9HAZ6wDERkVG+LBxpX4HpxEnCsY
 c3frYFl0cdtLOSGbEA18qctG7ro2otgh4+cWVfSo9NDypL9UfF8bZ8DzqeMOwRLr
 z2S8L+kBPyQx+ZNdqUgNOfegVx2gKu/P43Xpg8WqRJG/42xxmRmftEOLoVKZ1cCe
 Rpjv+ZV+/u3QnbEejqJ7MU/QX9JzMG35O4mo7ojxpOJ+cZNXc1ycOm+IOvTCSUJE
 fiqM2idLvQNjKMVI+FHvkE+9us6K8fzquCYqgRLBKCnfnJJfgT6tWwZpw/G8OSZ6
 Px37bfQhALzDj79OkxxoV96GdkR9a2L8GRvZG+7W4l6Oa00OM+rK1oQnjKVgmjT5
 wtMwJAXGpwFYqnJagtXcg8dsxS/ZA+k4Eq/skd/yUlCmugmgeVltjk3UmEAYs6Nx
 ame3798/haptgR1u7ncqltqjx11YR2k9i0Sjf7iqQl8zJhjtDdePAlo2gB4mFUio
 mZ9BBxCD+oUnAZprA5AwN0ZxPKLWgvPS5rLqTtFt3Pf0agjHSpY=
 =O0Lh
 -----END PGP SIGNATURE-----

Merge tag 'linux-kselftest-next-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull Kselftest updates from Shuah Khan:
 "Fixes and new tests:

   - Add an amd-pstate-ut test module, used by kselftest to unit test
     amd-pstate functionality

   - Fixes and cleanups to to cpu-hotplug to delete the fault injection
     test code

   - Improvements to vm test to use top_srcdir for builds"

* tag 'linux-kselftest-next-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  docs:kselftest: fix kselftest_module.h path of example module
  cpufreq: amd-pstate: Add explanation for X86_AMD_PSTATE_UT
  selftests/cpu-hotplug: Add log info when test success
  selftests/cpu-hotplug: Reserve one cpu online at least
  selftests/cpu-hotplug: Delete fault injection related code
  selftests/cpu-hotplug: Use return instead of exit
  selftests/cpu-hotplug: Correct log info
  cpufreq: amd-pstate: modify type in argument 2 for filp_open
  Documentation: amd-pstate: Add unit test introduction
  selftests: amd-pstate: Add test trigger for amd-pstate driver
  cpufreq: amd-pstate: Add test module for amd-pstate driver
  cpufreq: amd-pstate: Expose struct amd_cpudata
  selftests/vm: use top_srcdir instead of recomputing relative paths
2022-10-06 12:53:15 -07:00
Linus Torvalds
18fd049731 arm64 updates for 6.1:
- arm64 perf: DDR PMU driver for Alibaba's T-Head Yitian 710 SoC, SVE
   vector granule register added to the user regs together with SVE perf
   extensions documentation.
 
 - SVE updates: add HWCAP for SVE EBF16, update the SVE ABI documentation
   to match the actual kernel behaviour (zeroing the registers on syscall
   rather than "zeroed or preserved" previously).
 
 - More conversions to automatic system registers generation.
 
 - vDSO: use self-synchronising virtual counter access in gettimeofday()
   if the architecture supports it.
 
 - arm64 stacktrace cleanups and improvements.
 
 - arm64 atomics improvements: always inline assembly, remove LL/SC
   trampolines.
 
 - Improve the reporting of EL1 exceptions: rework BTI and FPAC exception
   handling, better EL1 undefs reporting.
 
 - Cortex-A510 erratum 2658417: remove BF16 support due to incorrect
   result.
 
 - arm64 defconfig updates: build CoreSight as a module, enable options
   necessary for docker, memory hotplug/hotremove, enable all PMUs
   provided by Arm.
 
 - arm64 ptrace() support for TPIDR2_EL0 (register provided with the SME
   extensions).
 
 - arm64 ftraces updates/fixes: fix module PLTs with mcount, remove
   unused function.
 
 - kselftest updates for arm64: simple HWCAP validation, FP stress test
   improvements, validation of ZA regs in signal handlers, include larger
   SVE and SME vector lengths in signal tests, various cleanups.
 
 - arm64 alternatives (code patching) improvements to robustness and
   consistency: replace cpucap static branches with equivalent
   alternatives, associate callback alternatives with a cpucap.
 
 - Miscellaneous updates: optimise kprobe performance of patching
   single-step slots, simplify uaccess_mask_ptr(), move MTE registers
   initialisation to C, support huge vmalloc() mappings, run softirqs on
   the per-CPU IRQ stack, compat (arm32) misalignment fixups for
   multiword accesses.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmM9W4cACgkQa9axLQDI
 XvEy3w/+LJ3KCFowWiz5gTAWikjv+UVssHjLMJixn47V7hsEFQ26Xnam/438rTMI
 kE95u6DHUpw2SMIxKzFRO7oI5cQtP+cWGwTtOUnjVO+U1oN+HqDOIbO9DbylWDcU
 eeeqMMmawMfTPuZrYklpOhXscsorbrKIvYBg7wHYOcwBYV3EPhWr89lwMvTVRuyJ
 qpX628KlkGMaBcONNhv3nS3qZcAOs0oHQCAVS4C8czLDL+vtJlumXUS3xr1Mqm72
 xtFe7sje8Djr2kZ8mzh0GbFiZEBoBD3F/l7ayq8gVRaVpToUt8sk36Stjs4LojF1
 6imuAfji/5TItkScq5KhGqj6MIugwp/eUVbRN74OLNTYx7msF1ZADNFQ+Q0UuY0H
 SYK13KvmOji0xjS8qAfhqrwNB79sk3fb+zF9LjETbdz4ZJCgg9gcFbSUTY0DvMfS
 MXZk/jVeB07olA8xYbjh0BRt4UV9xU628FPQzK5k7e4Nzl4jSvgtJZCZanfuVtjy
 /ZS1vbN8o7tQLBAlVnw+Exi/VedkKxkkMgm8tPKsMgERTFDx0Pc4Gs72hRpDnPWT
 MRbeCCGleAf3JQ5vF0coBDNOCEVvweQgShHOyHTz0GyhWXLCFx3RJICo5I4EIpps
 LLUk4JK0fO3LVrf1AEpu5ZP4+Sact0zfsH3gB7qyLPYFDmjDXD8=
 =jl3Z
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:

 - arm64 perf: DDR PMU driver for Alibaba's T-Head Yitian 710 SoC, SVE
   vector granule register added to the user regs together with SVE perf
   extensions documentation.

 - SVE updates: add HWCAP for SVE EBF16, update the SVE ABI
   documentation to match the actual kernel behaviour (zeroing the
   registers on syscall rather than "zeroed or preserved" previously).

 - More conversions to automatic system registers generation.

 - vDSO: use self-synchronising virtual counter access in gettimeofday()
   if the architecture supports it.

 - arm64 stacktrace cleanups and improvements.

 - arm64 atomics improvements: always inline assembly, remove LL/SC
   trampolines.

 - Improve the reporting of EL1 exceptions: rework BTI and FPAC
   exception handling, better EL1 undefs reporting.

 - Cortex-A510 erratum 2658417: remove BF16 support due to incorrect
   result.

 - arm64 defconfig updates: build CoreSight as a module, enable options
   necessary for docker, memory hotplug/hotremove, enable all PMUs
   provided by Arm.

 - arm64 ptrace() support for TPIDR2_EL0 (register provided with the SME
   extensions).

 - arm64 ftraces updates/fixes: fix module PLTs with mcount, remove
   unused function.

 - kselftest updates for arm64: simple HWCAP validation, FP stress test
   improvements, validation of ZA regs in signal handlers, include
   larger SVE and SME vector lengths in signal tests, various cleanups.

 - arm64 alternatives (code patching) improvements to robustness and
   consistency: replace cpucap static branches with equivalent
   alternatives, associate callback alternatives with a cpucap.

 - Miscellaneous updates: optimise kprobe performance of patching
   single-step slots, simplify uaccess_mask_ptr(), move MTE registers
   initialisation to C, support huge vmalloc() mappings, run softirqs on
   the per-CPU IRQ stack, compat (arm32) misalignment fixups for
   multiword accesses.

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (126 commits)
  arm64: alternatives: Use vdso/bits.h instead of linux/bits.h
  arm64/kprobe: Optimize the performance of patching single-step slot
  arm64: defconfig: Add Coresight as module
  kselftest/arm64: Handle EINTR while reading data from children
  kselftest/arm64: Flag fp-stress as exiting when we begin finishing up
  kselftest/arm64: Don't repeat termination handler for fp-stress
  ARM64: reloc_test: add __init/__exit annotations to module init/exit funcs
  arm64/mm: fold check for KFENCE into can_set_direct_map()
  arm64: ftrace: fix module PLTs with mcount
  arm64: module: Remove unused plt_entry_is_initialized()
  arm64: module: Make plt_equals_entry() static
  arm64: fix the build with binutils 2.27
  kselftest/arm64: Don't enable v8.5 for MTE selftest builds
  arm64: uaccess: simplify uaccess_mask_ptr()
  arm64: asm/perf_regs.h: Avoid C++-style comment in UAPI header
  kselftest/arm64: Fix typo in hwcap check
  arm64: mte: move register initialization to C
  arm64: mm: handle ARM64_KERNEL_USES_PMD_MAPS in vmemmap_populate()
  arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()
  arm64/sve: Add Perf extensions documentation
  ...
2022-10-06 11:51:49 -07:00
Meng Li
7fe3629729 Documentation: amd-pstate: Add unit test introduction
Introduce the AMD P-State unit test module design and implementation.
It also talks about kselftest and how to use.

Signed-off-by: Meng Li <li.meng@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-10-05 11:05:18 -06:00
Linus Torvalds
0326074ff4 Networking changes for 6.1.
Core
 ----
 
  - Introduce and use a single page frag cache for allocating small skb
    heads, clawing back the 10-20% performance regression in UDP flood
    test from previous fixes.
 
  - Run packets which already went thru HW coalescing thru SW GRO.
    This significantly improves TCP segment coalescing and simplifies
    deployments as different workloads benefit from HW or SW GRO.
 
  - Shrink the size of the base zero-copy send structure.
 
  - Move TCP init under a new slow / sleepable version of DO_ONCE().
 
 BPF
 ---
 
  - Add BPF-specific, any-context-safe memory allocator.
 
  - Add helpers/kfuncs for PKCS#7 signature verification from BPF
    programs.
 
  - Define a new map type and related helpers for user space -> kernel
    communication over a ring buffer (BPF_MAP_TYPE_USER_RINGBUF).
 
  - Allow targeting BPF iterators to loop through resources of one
    task/thread.
 
  - Add ability to call selected destructive functions.
    Expose crash_kexec() to allow BPF to trigger a kernel dump.
    Use CAP_SYS_BOOT check on the loading process to judge permissions.
 
  - Enable BPF to collect custom hierarchical cgroup stats efficiently
    by integrating with the rstat framework.
 
  - Support struct arguments for trampoline based programs.
    Only structs with size <= 16B and x86 are supported.
 
  - Invoke cgroup/connect{4,6} programs for unprivileged ICMP ping
    sockets (instead of just TCP and UDP sockets).
 
  - Add a helper for accessing CLOCK_TAI for time sensitive network
    related programs.
 
  - Support accessing network tunnel metadata's flags.
 
  - Make TCP SYN ACK RTO tunable by BPF programs with TCP Fast Open.
 
  - Add support for writing to Netfilter's nf_conn:mark.
 
 Protocols
 ---------
 
  - WiFi: more Extremely High Throughput (EHT) and Multi-Link
    Operation (MLO) work (802.11be, WiFi 7).
 
  - vsock: improve support for SO_RCVLOWAT.
 
  - SMC: support SO_REUSEPORT.
 
  - Netlink: define and document how to use netlink in a "modern" way.
    Support reporting missing attributes via extended ACK.
 
  - IPSec: support collect metadata mode for xfrm interfaces.
 
  - TCPv6: send consistent autoflowlabel in SYN_RECV state
    and RST packets.
 
  - TCP: introduce optional per-netns connection hash table to allow
    better isolation between namespaces (opt-in, at the cost of memory
    and cache pressure).
 
  - MPTCP: support TCP_FASTOPEN_CONNECT.
 
  - Add NEXT-C-SID support in Segment Routing (SRv6) End behavior.
 
  - Adjust IP_UNICAST_IF sockopt behavior for connected UDP sockets.
 
  - Open vSwitch:
    - Allow specifying ifindex of new interfaces.
    - Allow conntrack and metering in non-initial user namespace.
 
  - TLS: support the Korean ARIA-GCM crypto algorithm.
 
  - Remove DECnet support.
 
 Driver API
 ----------
 
  - Allow selecting the conduit interface used by each port
    in DSA switches, at runtime.
 
  - Ethernet Power Sourcing Equipment and Power Device support.
 
  - Add tc-taprio support for queueMaxSDU parameter, i.e. setting
    per traffic class max frame size for time-based packet schedules.
 
  - Support PHY rate matching - adapting between differing host-side
    and link-side speeds.
 
  - Introduce QUSGMII PHY mode and 1000BASE-KX interface mode.
 
  - Validate OF (device tree) nodes for DSA shared ports; make
    phylink-related properties mandatory on DSA and CPU ports.
    Enforcing more uniformity should allow transitioning to phylink.
 
  - Require that flash component name used during update matches one
    of the components for which version is reported by info_get().
 
  - Remove "weight" argument from driver-facing NAPI API as much
    as possible. It's one of those magic knobs which seemed like
    a good idea at the time but is too indirect to use in practice.
 
  - Support offload of TLS connections with 256 bit keys.
 
 New hardware / drivers
 ----------------------
 
  - Ethernet:
    - Microchip KSZ9896 6-port Gigabit Ethernet Switch
    - Renesas Ethernet AVB (EtherAVB-IF) Gen4 SoCs
    - Analog Devices ADIN1110 and ADIN2111 industrial single pair
      Ethernet (10BASE-T1L) MAC+PHY.
    - Rockchip RV1126 Gigabit Ethernet (a version of stmmac IP).
 
  - Ethernet SFPs / modules:
    - RollBall / Hilink / Turris 10G copper SFPs
    - HALNy GPON module
 
  - WiFi:
    - CYW43439 SDIO chipset (brcmfmac)
    - CYW89459 PCIe chipset (brcmfmac)
    - BCM4378 on Apple platforms (brcmfmac)
 
 Drivers
 -------
 
  - CAN:
    - gs_usb: HW timestamp support
 
  - Ethernet PHYs:
    - lan8814: cable diagnostics
 
  - Ethernet NICs:
    - Intel (100G):
      - implement control of FCS/CRC stripping
      - port splitting via devlink
      - L2TPv3 filtering offload
    - nVidia/Mellanox:
      - tunnel offload for sub-functions
      - MACSec offload, w/ Extended packet number and replay
        window offload
      - significantly restructure, and optimize the AF_XDP support,
        align the behavior with other vendors
    - Huawei:
      - configuring DSCP map for traffic class selection
      - querying standard FEC statistics
      - querying SerDes lane number via ethtool
    - Marvell/Cavium:
      - egress priority flow control
      - MACSec offload
    - AMD/SolarFlare:
      - PTP over IPv6 and raw Ethernet
    - small / embedded:
      - ax88772: convert to phylink (to support SFP cages)
      - altera: tse: convert to phylink
      - ftgmac100: support fixed link
      - enetc: standard Ethtool counters
      - macb: ZynqMP SGMII dynamic configuration support
      - tsnep: support multi-queue and use page pool
      - lan743x: Rx IP & TCP checksum offload
      - igc: add xdp frags support to ndo_xdp_xmit
 
  - Ethernet high-speed switches:
    - Marvell (prestera):
      - support SPAN port features (traffic mirroring)
      - nexthop object offloading
    - Microchip (sparx5):
      - multicast forwarding offload
      - QoS queuing offload (tc-mqprio, tc-tbf, tc-ets)
 
  - Ethernet embedded switches:
    - Marvell (mv88e6xxx):
      - support RGMII cmode
    - NXP (felix):
      - standardized ethtool counters
    - Microchip (lan966x):
      - QoS queuing offload (tc-mqprio, tc-tbf, tc-cbs, tc-ets)
      - traffic policing and mirroring
      - link aggregation / bonding offload
      - QUSGMII PHY mode support
 
  - Qualcomm 802.11ax WiFi (ath11k):
    - cold boot calibration support on WCN6750
    - support to connect to a non-transmit MBSSID AP profile
    - enable remain-on-channel support on WCN6750
    - Wake-on-WLAN support for WCN6750
    - support to provide transmit power from firmware via nl80211
    - support to get power save duration for each client
    - spectral scan support for 160 MHz
 
  - MediaTek WiFi (mt76):
    - WiFi-to-Ethernet bridging offload for MT7986 chips
 
  - RealTek WiFi (rtw89):
    - P2P support
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmM7vtkACgkQMUZtbf5S
 Irvotg//dmh53rC+UMKO3OgOqPlSMnaqzbUdDEfN6mj4Mpox7Csb8zERVURHhBHY
 fvlXWsDgxmvgTebI5fvNC5+f1iW5xcqgJV2TWnNmDOKWwvQwb6qQfgixVmunvkpe
 IIukMXYt0dAf9bXeeEfbNXcCb85cPwB76stX0tMV6BX7osp3T0TL1fvFk0NJkL0j
 TeydLad/yAQtPb4TbeWYjNDoxPVDf0cVpUrevLGmWE88UMYmgTqPze+h1W5Wri52
 bzjdLklY/4cgcIZClHQ6F9CeRWqEBxvujA5Hj/cwOcn/ptVVJWUGi7sQo3sYkoSs
 HFu+F8XsTec14kGNC0Ab40eVdqs5l/w8+E+4jvgXeKGOtVns8DwoiUIzqXpyty89
 Ib04mffrwWNjFtHvo/kIsNwP05X2PGE9HUHfwsTUfisl/ASvMmQp7D7vUoqQC/4B
 AMVzT5qpjkmfBHYQQGuw8FxJhMeAOjC6aAo6censhXJyiUhIfleQsN0syHdaNb8q
 9RZlhAgQoVb6ZgvBV8r8unQh/WtNZ3AopwifwVJld2unsE/UNfQy2KyqOWBES/zf
 LP9sfuX0JnmHn8s1BQEUMPU1jF9ZVZCft7nufJDL6JhlAL+bwZeEN4yCiAHOPZqE
 ymSLHI9s8yWZoNpuMWKrI9kFexVnQFKmA3+quAJUcYHNMSsLkL8=
 =Gsio
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core:

   - Introduce and use a single page frag cache for allocating small skb
     heads, clawing back the 10-20% performance regression in UDP flood
     test from previous fixes.

   - Run packets which already went thru HW coalescing thru SW GRO. This
     significantly improves TCP segment coalescing and simplifies
     deployments as different workloads benefit from HW or SW GRO.

   - Shrink the size of the base zero-copy send structure.

   - Move TCP init under a new slow / sleepable version of DO_ONCE().

  BPF:

   - Add BPF-specific, any-context-safe memory allocator.

   - Add helpers/kfuncs for PKCS#7 signature verification from BPF
     programs.

   - Define a new map type and related helpers for user space -> kernel
     communication over a ring buffer (BPF_MAP_TYPE_USER_RINGBUF).

   - Allow targeting BPF iterators to loop through resources of one
     task/thread.

   - Add ability to call selected destructive functions. Expose
     crash_kexec() to allow BPF to trigger a kernel dump. Use
     CAP_SYS_BOOT check on the loading process to judge permissions.

   - Enable BPF to collect custom hierarchical cgroup stats efficiently
     by integrating with the rstat framework.

   - Support struct arguments for trampoline based programs. Only
     structs with size <= 16B and x86 are supported.

   - Invoke cgroup/connect{4,6} programs for unprivileged ICMP ping
     sockets (instead of just TCP and UDP sockets).

   - Add a helper for accessing CLOCK_TAI for time sensitive network
     related programs.

   - Support accessing network tunnel metadata's flags.

   - Make TCP SYN ACK RTO tunable by BPF programs with TCP Fast Open.

   - Add support for writing to Netfilter's nf_conn:mark.

  Protocols:

   - WiFi: more Extremely High Throughput (EHT) and Multi-Link Operation
     (MLO) work (802.11be, WiFi 7).

   - vsock: improve support for SO_RCVLOWAT.

   - SMC: support SO_REUSEPORT.

   - Netlink: define and document how to use netlink in a "modern" way.
     Support reporting missing attributes via extended ACK.

   - IPSec: support collect metadata mode for xfrm interfaces.

   - TCPv6: send consistent autoflowlabel in SYN_RECV state and RST
     packets.

   - TCP: introduce optional per-netns connection hash table to allow
     better isolation between namespaces (opt-in, at the cost of memory
     and cache pressure).

   - MPTCP: support TCP_FASTOPEN_CONNECT.

   - Add NEXT-C-SID support in Segment Routing (SRv6) End behavior.

   - Adjust IP_UNICAST_IF sockopt behavior for connected UDP sockets.

   - Open vSwitch:
      - Allow specifying ifindex of new interfaces.
      - Allow conntrack and metering in non-initial user namespace.

   - TLS: support the Korean ARIA-GCM crypto algorithm.

   - Remove DECnet support.

  Driver API:

   - Allow selecting the conduit interface used by each port in DSA
     switches, at runtime.

   - Ethernet Power Sourcing Equipment and Power Device support.

   - Add tc-taprio support for queueMaxSDU parameter, i.e. setting per
     traffic class max frame size for time-based packet schedules.

   - Support PHY rate matching - adapting between differing host-side
     and link-side speeds.

   - Introduce QUSGMII PHY mode and 1000BASE-KX interface mode.

   - Validate OF (device tree) nodes for DSA shared ports; make
     phylink-related properties mandatory on DSA and CPU ports.
     Enforcing more uniformity should allow transitioning to phylink.

   - Require that flash component name used during update matches one of
     the components for which version is reported by info_get().

   - Remove "weight" argument from driver-facing NAPI API as much as
     possible. It's one of those magic knobs which seemed like a good
     idea at the time but is too indirect to use in practice.

   - Support offload of TLS connections with 256 bit keys.

  New hardware / drivers:

   - Ethernet:
      - Microchip KSZ9896 6-port Gigabit Ethernet Switch
      - Renesas Ethernet AVB (EtherAVB-IF) Gen4 SoCs
      - Analog Devices ADIN1110 and ADIN2111 industrial single pair
        Ethernet (10BASE-T1L) MAC+PHY.
      - Rockchip RV1126 Gigabit Ethernet (a version of stmmac IP).

   - Ethernet SFPs / modules:
      - RollBall / Hilink / Turris 10G copper SFPs
      - HALNy GPON module

   - WiFi:
      - CYW43439 SDIO chipset (brcmfmac)
      - CYW89459 PCIe chipset (brcmfmac)
      - BCM4378 on Apple platforms (brcmfmac)

  Drivers:

   - CAN:
      - gs_usb: HW timestamp support

   - Ethernet PHYs:
      - lan8814: cable diagnostics

   - Ethernet NICs:
      - Intel (100G):
         - implement control of FCS/CRC stripping
         - port splitting via devlink
         - L2TPv3 filtering offload
      - nVidia/Mellanox:
         - tunnel offload for sub-functions
         - MACSec offload, w/ Extended packet number and replay window
           offload
         - significantly restructure, and optimize the AF_XDP support,
           align the behavior with other vendors
      - Huawei:
         - configuring DSCP map for traffic class selection
         - querying standard FEC statistics
         - querying SerDes lane number via ethtool
      - Marvell/Cavium:
         - egress priority flow control
         - MACSec offload
      - AMD/SolarFlare:
         - PTP over IPv6 and raw Ethernet
      - small / embedded:
         - ax88772: convert to phylink (to support SFP cages)
         - altera: tse: convert to phylink
         - ftgmac100: support fixed link
         - enetc: standard Ethtool counters
         - macb: ZynqMP SGMII dynamic configuration support
         - tsnep: support multi-queue and use page pool
         - lan743x: Rx IP & TCP checksum offload
         - igc: add xdp frags support to ndo_xdp_xmit

   - Ethernet high-speed switches:
      - Marvell (prestera):
         - support SPAN port features (traffic mirroring)
         - nexthop object offloading
      - Microchip (sparx5):
         - multicast forwarding offload
         - QoS queuing offload (tc-mqprio, tc-tbf, tc-ets)

   - Ethernet embedded switches:
      - Marvell (mv88e6xxx):
         - support RGMII cmode
      - NXP (felix):
         - standardized ethtool counters
      - Microchip (lan966x):
         - QoS queuing offload (tc-mqprio, tc-tbf, tc-cbs, tc-ets)
         - traffic policing and mirroring
         - link aggregation / bonding offload
         - QUSGMII PHY mode support

   - Qualcomm 802.11ax WiFi (ath11k):
      - cold boot calibration support on WCN6750
      - support to connect to a non-transmit MBSSID AP profile
      - enable remain-on-channel support on WCN6750
      - Wake-on-WLAN support for WCN6750
      - support to provide transmit power from firmware via nl80211
      - support to get power save duration for each client
      - spectral scan support for 160 MHz

   - MediaTek WiFi (mt76):
      - WiFi-to-Ethernet bridging offload for MT7986 chips

   - RealTek WiFi (rtw89):
      - P2P support"

* tag 'net-next-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1864 commits)
  eth: pse: add missing static inlines
  once: rename _SLOW to _SLEEPABLE
  net: pse-pd: add regulator based PSE driver
  dt-bindings: net: pse-dt: add bindings for regulator based PoDL PSE controller
  ethtool: add interface to interact with Ethernet Power Equipment
  net: mdiobus: search for PSE nodes by parsing PHY nodes.
  net: mdiobus: fwnode_mdiobus_register_phy() rework error handling
  net: add framework to support Ethernet PSE and PDs devices
  dt-bindings: net: phy: add PoDL PSE property
  net: marvell: prestera: Propagate nh state from hw to kernel
  net: marvell: prestera: Add neighbour cache accounting
  net: marvell: prestera: add stub handler neighbour events
  net: marvell: prestera: Add heplers to interact with fib_notifier_info
  net: marvell: prestera: Add length macros for prestera_ip_addr
  net: marvell: prestera: add delayed wq and flush wq on deinit
  net: marvell: prestera: Add strict cleanup of fib arbiter
  net: marvell: prestera: Add cleanup of allocated fib_nodes
  net: marvell: prestera: Add router nexthops ABI
  eth: octeon: fix build after netif_napi_add() changes
  net/mlx5: E-Switch, Return EBUSY if can't get mode lock
  ...
2022-10-04 13:38:03 -07:00
Linus Torvalds
b5f0b11353 - Get rid of a single ksize() usage
- By popular demand, print the previous microcode revision an update
   was done over
 
 - Remove more code related to the now gone MICROCODE_OLD_INTERFACE
 
 - Document the problems stemming from microcode late loading
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmM8QL8ACgkQEsHwGGHe
 VUoDxw/9FA3rOAZD7N0PI/vspMUxEDQVYV60tfuuynao72HZv+tfJbRTXe42p3ZO
 B+kRPFud4lAOE1ykDHJ2A2OZzvthGfYlUnMyvk1IvK/gOwkkiSH4c6sVSrOYWtl7
 uoIN/3J83BMZoWNOKqrg1OOzotzkTyeucPXdWF+sRkfVzBIgbDqtplbFFCP4abPK
 WxatY2hkTfBCiN92OSOLaMGg0POpmycy+6roR2Qr5rWrC7nfREVNbKdOyEykZsfV
 U2gPm0A953sZ3Ye6waFib+qjJdyR7zBQRCJVEGOB6g8BlNwqGv/TY7NIUWSVFT9Y
 qcAnD3hI0g0UTYdToBUvYEpfD8zC9Wg3tZEpZSBRKh3AR2+Xt44VKQFO4L9uIt6g
 hWFMBLsFiYnBmKW3arNLQcdamE34GRhwUfXm0OjHTvTWb3aFO1I9+NBCaHp19KVy
 HD13wGSyj5V9SAVD0ztRFut4ZESejDyYBw9joB2IsjkY2IJmAAsRFgV0KXqUvQLX
 TX13hnhm894UfQ+4KCXnA0UeEDoXhwAbYFxR89yGeOxoGe1oaPXr9C1/r88YLq0n
 ekjIVZ3G97PIxmayj3cv9YrRIrrJi4PWF1Raey6go3Ma+rNBRnya5UF6Noch1lHh
 HeF7t84BZ5Ub6GweWYaMHQZCA+wMCZMYYuCMNzN7b54yRtQuvCc=
 =lWDD
 -----END PGP SIGNATURE-----

Merge tag 'x86_microcode_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x75 microcode loader updates from Borislav Petkov:

 - Get rid of a single ksize() usage

 - By popular demand, print the previous microcode revision an update
   was done over

 - Remove more code related to the now gone MICROCODE_OLD_INTERFACE

 - Document the problems stemming from microcode late loading

* tag 'x86_microcode_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode/AMD: Track patch allocation size explicitly
  x86/microcode: Print previous version of microcode after reload
  x86/microcode: Remove ->request_microcode_user()
  x86/microcode: Document the whole late loading problem
2022-10-04 10:12:08 -07:00
Linus Torvalds
5bb3a16dbe - Add support for locking the APIC in X2APIC mode to prevent SGX enclave leaks
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmM7/C0ACgkQEsHwGGHe
 VUp3jA//caeFnxfKU8F6B+D+sqXub+fRC88U/5j5TVgueiFdpgbHxbeb5isV2DW8
 /1as5LN6PCy0V9ax873PblaRdL+ir2KVjzTOQhv1fsBWtRJgRAte6ltHfncST3jM
 Qm9ELCSYlwX6wgPDFWrLd5/jCHLq/7ffvWATwt4ZS9/+ebwJJ3JRU82aydvSAe+M
 tYDLA0RYjrTKB7bfhlxft8n3/ttOwmaQ9sPw6J9KWL9VZ8d7uA0cZgHMFgclr9xb
 +cTuOQwJ+a+njS3tXYWH0lU4lLw6tRcWrl3/gHpiuiwU85xdoClyCcsmvZYMLy1s
 cv4CmAQqs0dSlcII/F+6pTh+E8gcpzm2+uiP8k56dCK3c9OhtdKcPRcxm169JID+
 hKZ9rcscx1e/Q3rzhRtC4w2LguDcARe2b0OSMsGH40uZ/UOHdnYVk8oD81O9LWrB
 T4DgjJCcSf4y0KGDXtjJ39POfe0zFHgx8nr5JOL9Las6gZtXZHRmCzBacIU6JeJo
 5VEl6Yx1ca7aiuNTTcsmmW3zbsR9J6VppFZ6ma9jTTM8HbINM6g4UDepkeubNUdv
 tAJVbb6MEDGtT1ybH4oalrYkPfhYDwRkahqPtMY0ROtlzxO65q2DI3rH2gtyboas
 oP4hfEDtzKiJj4FiIRMKbryKz7MBWImHMRovRGKjB/Y48mTxX3M=
 =e8H8
 -----END PGP SIGNATURE-----

Merge tag 'x86_apic_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 APIC update from Borislav Petkov:

 - Add support for locking the APIC in X2APIC mode to prevent SGX
   enclave leaks

* tag 'x86_apic_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic: Don't disable x2APIC if locked
2022-10-04 09:37:02 -07:00
Johannes Weiner
e55b9f9686 mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol
Since 2d1c498072 ("mm: memcontrol: make swap tracking an integral part
of memory control"), CONFIG_MEMCG_SWAP hasn't been a user-visible config
option anymore, it just means CONFIG_MEMCG && CONFIG_SWAP.

Update the sites accordingly and drop the symbol.

[ While touching the docs, remove two references to CONFIG_MEMCG_KMEM,
  which hasn't been a user-visible symbol for over half a decade. ]

Link: https://lkml.kernel.org/r/20220926135704.400818-5-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03 14:03:36 -07:00
Johannes Weiner
b25806dcd3 mm: memcontrol: deprecate swapaccounting=0 mode
The swapaccounting= commandline option already does very little today.  To
close a trivial containment failure case, the swap ownership tracking part
of the swap controller has recently become mandatory (see commit
2d1c498072 ("mm: memcontrol: make swap tracking an integral part of
memory control") for details), which makes up the majority of the work
during swapout, swapin, and the swap slot map.

The only thing left under this flag is the page_counter operations and the
visibility of the swap control files in the first place, which are rather
meager savings.  There also aren't many scenarios, if any, where
controlling the memory of a cgroup while allowing it unlimited access to a
global swap space is a workable resource isolation strategy.

On the other hand, there have been several bugs and confusion around the
many possible swap controller states (cgroup1 vs cgroup2 behavior, memory
accounting without swap accounting, memcg runtime disabled).

This puts the maintenance overhead of retaining the toggle above its
practical benefits.  Deprecate it.

Link: https://lkml.kernel.org/r/20220926135704.400818-3-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Suggested-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03 14:03:36 -07:00
Zach O'Keefe
58ac9a8993 mm/khugepaged: attempt to map file/shmem-backed pte-mapped THPs by pmds
The main benefit of THPs are that they can be mapped at the pmd level,
increasing the likelihood of TLB hit and spending less cycles in page
table walks.  pte-mapped hugepages - that is - hugepage-aligned compound
pages of order HPAGE_PMD_ORDER mapped by ptes - although being contiguous
in physical memory, don't have this advantage.  In fact, one could argue
they are detrimental to system performance overall since they occupy a
precious hugepage-aligned/sized region of physical memory that could
otherwise be used more effectively.  Additionally, pte-mapped hugepages
can be the cheapest memory to collapse for khugepaged since no new
hugepage allocation or copying of memory contents is necessary - we only
need to update the mapping page tables.

In the anonymous collapse path, we are able to collapse pte-mapped
hugepages (albeit, perhaps suboptimally), but the file/shmem path makes no
effort when compound pages (of any order) are encountered.

Identify pte-mapped hugepages in the file/shmem collapse path.  The
final step of which makes a racy check of the value of the pmd to
ensure it maps a pte table.  This should be fine, since races that
result in false-positive (i.e.  attempt collapse even though we
shouldn't) will fail later in collapse_pte_mapped_thp() once we
actually lock mmap_lock and reinspect the pmd value.  Races that result
in false-negatives (i.e.  where we decide to not attempt collapse, but
should have) shouldn't be an issue, since in the worst case, we do
nothing - which is what we've done up to this point.  We make a similar
check in retract_page_tables().  If we do think we've found a
pte-mapped hugepgae in khugepaged context, attempt to update page
tables mapping this hugepage.

Note that these collapses still count towards the
/sys/kernel/mm/transparent_hugepage/khugepaged/pages_collapsed counter,
and if the pte-mapped hugepage was also mapped into multiple process'
address spaces, could be incremented for each page table update.  Since we
increment the counter when a pte-mapped hugepage is successfully added to
the list of to-collapse pte-mapped THPs, it's possible that we never
actually update the page table either.  This is different from how
file/shmem pages_collapsed accounting works today where only a successful
page cache update is counted (it's also possible here that no page tables
are actually changed).  Though it incurs some slop, this is preferred to
either not accounting for the event at all, or plumbing through data in
struct mm_slot on whether to account for the collapse or not.

Also note that work still needs to be done to support arbitrary compound
pages, and that this should all be converted to using folios.

[shy828301@gmail.com: Spelling mistake, update comment, and add Documentation]
  Link: https://lore.kernel.org/linux-mm/CAHbLzkpHwZxFzjfX9nxVoRhzup8WMjMfyL6Xiq8mZ9M-N3ombw@mail.gmail.com/
Link: https://lkml.kernel.org/r/20220907144521.3115321-3-zokeefe@google.com
Link: https://lkml.kernel.org/r/20220922224046.1143204-3-zokeefe@google.com
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Chris Kennelly <ckennelly@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Houghton <jthoughton@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rongwei Wang <rongwei.wang@linux.alibaba.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03 14:03:33 -07:00
Liu Shixin
f498150208 mm/huge_memory: prevent THP_ZERO_PAGE_ALLOC increased twice
A user who reads THP_ZERO_PAGE_ALLOC may be more concerned about the huge
zero pages that are really allocated for thp.  It is misleading to
increase THP_ZERO_PAGE_ALLOC twice if two threads call get_huge_zero_page
concurrently.  Don't increase the value if the huge page is not really
used.

Update Documentation/admin-guide/mm/transhuge.rst to suit.

Link: https://lkml.kernel.org/r/20220909021653.3371879-1-liushixin2@huawei.com
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03 14:03:08 -07:00
SeongJae Park
f1f3afd59d Docs/admin-guide/mm/damon/usage: note DAMON debugfs interface deprecation plan
Commit b18402726b ("Docs/admin-guide/mm/damon/usage: document DAMON
sysfs interface") announced the DAMON debugfs interface deprecation plan,
but it is not so aggressively announced.  As the deprecation time is
coming, this commit makes the announce more easy to be found by adding the
note at the beginning of the DAMON debugfs interface usage document.

Link: https://lkml.kernel.org/r/20220909202901.57977-8-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Yun Levi <ppbuk5246@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03 14:03:07 -07:00
SeongJae Park
04cc7e4bf7 Docs/admin-guide/mm/damon/start: mention the dependency as sysfs instead of debugfs
'Getting Started' document of DAMON says DAMON user-space tool, damo[1],
is using DAMON debugfs interface, and therefore it needs to ensure debugfs
is mounted.  However, the latest version of the tool is using DAMON sysfs
interface.  Moreover, DAMON debugfs interface is going to be deprecated as
announced by commit b18402726b ("Docs/admin-guide/mm/damon/usage:
document DAMON sysfs interface").

This commit therefore update the document to tell readers about DAMON
sysfs interface dependency instead and never mention about debugfs
interface, which will be deprecated.

[1] https://github.com/awslabs/damo

Link: https://lkml.kernel.org/r/20220909202901.57977-7-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Yun Levi <ppbuk5246@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03 14:03:07 -07:00
SeongJae Park
0ff11f103f Docs/admin-guide/mm/damon: rename the title of the document
The title of the DAMON document for admin-guide, 'Monitoring Data
Accesses', could confuse readers in some ways.  First of all, DAMON is not
the only single way for data access monitoring.  And the document is for
not only the data access monitoring but also data access pattern based
memory management optimizations (DAMOS).  This commit updates the title to
'DAMON: Data Access MONitor', which more explicitly explains what the
document describes.

Link: https://lkml.kernel.org/r/20220909202901.57977-5-sj@kernel.org
Fixes: c4ba6014ae ("Documentation: add documents for DAMON")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Yun Levi <ppbuk5246@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03 14:03:06 -07:00
Linus Torvalds
9388076b4c ACPI updates for 6.1-rc1
- Reimplement acpi_get_pci_dev() using the list of physical devices
    associated with the given ACPI device object (Rafael Wysocki).
 
  - Rename ACPI device object reference counting functions (Rafael
    Wysocki).
 
  - Rearrange ACPI device object initialization code (Rafael Wysocki).
 
  - Drop parent field from struct acpi_device (Rafael Wysocki).
 
  - Extend the the int3472-tps68470 driver to support multiple consumers
    of a single TPS68470 along with the requisite framework-level
    support (Daniel Scally).
 
  - Filter out non-memory resources in is_memory(), add a helper
    function to find all memory type resources of an ACPI device object
    and use that function in 3 places (Heikki Krogerus).
 
  - Add IRQ override quirks for Asus Vivobook K3402ZA/K3502ZA and ASUS
    model S5402ZA (Tamim Khan, Kellen Renshaw).
 
  - Fix acpi_dev_state_d0() kerneldoc (Sakari Ailus).
 
  - Fix up suspend-to-idle support on ASUS Rembrandt laptops (Mario
    Limonciello).
 
  - Clean up ACPI platform devices support code (Andy Shevchenko, John
    Garry).
 
  - Clean up ACPI bus management code (Andy Shevchenko, ye xingchen).
 
  - Add support for multiple DMA windows with different offsets to the
    ACPI device enumeration code and use it on LoongArch (Jianmin Lv).
 
  - Clean up the ACPI LPSS (Intel SoC) driver (Andy Shevchenko).
 
  - Add a quirk for Dell Inspiron 14 2-in-1 for StorageD3Enable (Mario
    Limonciello).
 
  - Drop unused dev_fmt() and redundant 'HMAT' prefix from the HMAT
    parsing code (Liu Shixin).
 
  - Make ACPI FPDT parsing code avoid calling acpi_os_map_memory() on
    invalid physical addresses (Hans de Goede).
 
  - Silence missing-declarations warning related to Apple device
    properties management (Lukas Wunner).
 
  - Disable frequency invariance in the CPPC library if registers used
    by cppc_get_perf_ctrs() are accessed via PCC (Jeremy Linton).
 
  - Add ACPI disabled check to acpi_cpc_valid() (Perry Yuan).
 
  - Fix Tx acknowledge in the PCC address space handler (Huisong Li).
 
  - Use wait_for_completion_timeout() for PCC mailbox operations (Huisong
    Li).
 
  - Release resources on PCC address space setup failure path (Rafael
    Mendonca).
 
  - Remove unneeded result variables from APEI code (ye xingchen).
 
  - Print total number of records found during BERT log parsing (Dmitry
    Monakhov).
 
  - Drop support for 3 _OSI strings that should not be necessary any
    more and update documentation on custom _OSI strings so that adding
    new ones is not encouraged any more (Mario Limonciello).
 
  - Drop unneeded result variable from ec_write() (ye xingchen).
 
  - Remove the leftover struct acpi_ac_bl from the ACPI AC driver (Hanjun
    Guo).
 
  - Reorder symbols to get rid of a few forward declarations in the ACPI
    fan driver (Uwe Kleine-König).
 
  - Add Toshiba Satellite/Portege Z830 ACPI backlight quirk (Arvid
    Norlander).
 
  - Add ARM DMA-330 controller to the supported list in the ACPI AMBA
    driver (Vijayenthiran Subramaniam).
 
  - Drop references to non-functional 01.org/linux-acpi web site from
    MAINTAINERS and Kconfig help texts (Rafael Wysocki).
 
  - Replace strlcpy() with unused retval with strscpy() in the ACPI
    support code (Wolfram Sang).
 
  - Do not initialize ret in main() in the pfrut utility (Shi junming).
 
  - Drop useless ACPI DSDT override documentation (Rafael Wysocki).
 
  - Fix a few typos and wording mistakes in the ACPI device enumeration
    documentation (Jean Delvare).
 
  - Introduce acpi_dev_uid_to_integer() to convert a _UID string into an
    integer value (Andy Shevchenko).
 
  - Use acpi_dev_uid_to_integer() in several places to unify _UID
    handling (Andy Shevchenko).
 
  - Drop unused pnpid32_to_pnpid() declaration from  PNP code (Gaosheng
    Cui).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmM7OhkSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRx/TkQALQ4TN451dPSj9jcYSNY6qZ/9b4P9Iym
 TmRf3wO3+IVZQ8JajeKKRuVKNsW3sC0RcFkJJVmgZkydJBr1Uui2L0ZLzi8axGNy
 RlbZm5NyBeFnlP0fA8Gb2iRMXVAUcRIx+RvZCulxxFmgQ8UhoU4wlVZWlEcko4TQ
 hGp++lJYcRHR1NbVLSXZhFvzopKLdhGL6vB1Awsjb/I7TVqn23+k4jVRV1DYkIQ7
 qgFM+Z7osRVZiVQbaPoOgdykeSa43qXu7Vgs7F/QeJuIiUYx59xDh0/WCJBxnuDM
 cHGiaNnvuJghKmCg43X8+joaHEH/jCFyvBVGfiSzRvjz03WOPRs1XztwdEiCi+py
 RcZGzrPaXmkCjNeytPRooiifyqm95HT7aMBN/aTvKBXDaGRrfPheXF+i2idl24HM
 NrHqMaa0+5qoDGHLUEaf5znlCHfS+3lwq6+lGVrq/UGf6B3cP+9HwOyevEW493JX
 4nuv69Y517moR9W3mBU8sAn5mUjshcka7pghRj7QnuoqRqWLbU3lIz8oUDHr84cI
 ixpIPvt2KlZ5UjnN9aqu/6k70JkJvy4SrKjnx4iqu03ePmMrRc0Hcpy7+VMlgumD
 tgN9aW+YDgy0/Z5QmO1MOvFodVmA5sX6+gnX1neAjuDdIo3LkJptlkO1fCx2jfQu
 cgPQk1CtPOos
 =xyUK
 -----END PGP SIGNATURE-----

Merge tag 'acpi-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI updates from Rafael Wysocki:
 "ACPI and PNP updates for 6.1-rc1.

  These rearrange the ACPI device object initialization code (to get rid
  of a redundant parent pointer from struct acpi_device among other
  things), unify the _UID handling, drop support for some _OSI strings
  that should not be necessary any more, add new IDs to support more
  hardware and some more quirks, fix a few issues and clean up code all
  over.

  Specifics:

   - Reimplement acpi_get_pci_dev() using the list of physical devices
     associated with the given ACPI device object (Rafael Wysocki)

   - Rename ACPI device object reference counting functions (Rafael
     Wysocki)

   - Rearrange ACPI device object initialization code (Rafael Wysocki)

   - Drop parent field from struct acpi_device (Rafael Wysocki)

   - Extend the the int3472-tps68470 driver to support multiple
     consumers of a single TPS68470 along with the requisite
     framework-level support (Daniel Scally)

   - Filter out non-memory resources in is_memory(), add a helper
     function to find all memory type resources of an ACPI device object
     and use that function in 3 places (Heikki Krogerus)

   - Add IRQ override quirks for Asus Vivobook K3402ZA/K3502ZA and ASUS
     model S5402ZA (Tamim Khan, Kellen Renshaw)

   - Fix acpi_dev_state_d0() kerneldoc (Sakari Ailus)

   - Fix up suspend-to-idle support on ASUS Rembrandt laptops (Mario
     Limonciello)

   - Clean up ACPI platform devices support code (Andy Shevchenko, John
     Garry)

   - Clean up ACPI bus management code (Andy Shevchenko, ye xingchen)

   - Add support for multiple DMA windows with different offsets to the
     ACPI device enumeration code and use it on LoongArch (Jianmin Lv)

   - Clean up the ACPI LPSS (Intel SoC) driver (Andy Shevchenko)

   - Add a quirk for Dell Inspiron 14 2-in-1 for StorageD3Enable (Mario
     Limonciello)

   - Drop unused dev_fmt() and redundant 'HMAT' prefix from the HMAT
     parsing code (Liu Shixin)

   - Make ACPI FPDT parsing code avoid calling acpi_os_map_memory() on
     invalid physical addresses (Hans de Goede)

   - Silence missing-declarations warning related to Apple device
     properties management (Lukas Wunner)

   - Disable frequency invariance in the CPPC library if registers used
     by cppc_get_perf_ctrs() are accessed via PCC (Jeremy Linton)

   - Add ACPI disabled check to acpi_cpc_valid() (Perry Yuan)

   - Fix Tx acknowledge in the PCC address space handler (Huisong Li)

   - Use wait_for_completion_timeout() for PCC mailbox operations
     (Huisong Li)

   - Release resources on PCC address space setup failure path (Rafael
     Mendonca)

   - Remove unneeded result variables from APEI code (ye xingchen)

   - Print total number of records found during BERT log parsing (Dmitry
     Monakhov)

   - Drop support for 3 _OSI strings that should not be necessary any
     more and update documentation on custom _OSI strings so that adding
     new ones is not encouraged any more (Mario Limonciello)

   - Drop unneeded result variable from ec_write() (ye xingchen)

   - Remove the leftover struct acpi_ac_bl from the ACPI AC driver
     (Hanjun Guo)

   - Reorder symbols to get rid of a few forward declarations in the
     ACPI fan driver (Uwe Kleine-König)

   - Add Toshiba Satellite/Portege Z830 ACPI backlight quirk (Arvid
     Norlander)

   - Add ARM DMA-330 controller to the supported list in the ACPI AMBA
     driver (Vijayenthiran Subramaniam)

   - Drop references to non-functional 01.org/linux-acpi web site from
     MAINTAINERS and Kconfig help texts (Rafael Wysocki)

   - Replace strlcpy() with unused retval with strscpy() in the ACPI
     support code (Wolfram Sang)

   - Do not initialize ret in main() in the pfrut utility (Shi junming)

   - Drop useless ACPI DSDT override documentation (Rafael Wysocki)

   - Fix a few typos and wording mistakes in the ACPI device enumeration
     documentation (Jean Delvare)

   - Introduce acpi_dev_uid_to_integer() to convert a _UID string into
     an integer value (Andy Shevchenko)

   - Use acpi_dev_uid_to_integer() in several places to unify _UID
     handling (Andy Shevchenko)

   - Drop unused pnpid32_to_pnpid() declaration from PNP code (Gaosheng
     Cui)"

* tag 'acpi-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (79 commits)
  ACPI: LPSS: Deduplicate skipping device in acpi_lpss_create_device()
  ACPI: LPSS: Replace loop with first entry retrieval
  ACPI: x86: s2idle: Add another ID to s2idle_dmi_table
  ACPI: x86: s2idle: Fix a NULL pointer dereference
  MAINTAINERS: Drop records pointing to 01.org/linux-acpi
  ACPI: Kconfig: Drop link to https://01.org/linux-acpi
  ACPI: docs: Drop useless DSDT override documentation
  ACPI: DPTF: Drop stale link from Kconfig help
  ACPI: x86: s2idle: Add a quirk for ASUSTeK COMPUTER INC. ROG Flow X13
  ACPI: x86: s2idle: Add a quirk for Lenovo Slim 7 Pro 14ARH7
  ACPI: x86: s2idle: Add a quirk for ASUS ROG Zephyrus G14
  ACPI: x86: s2idle: Add a quirk for ASUS TUF Gaming A17 FA707RE
  ACPI: x86: s2idle: Add module parameter to prefer Microsoft GUID
  ACPI: x86: s2idle: If a new AMD _HID is missing assume Rembrandt
  ACPI: x86: s2idle: Move _HID handling for AMD systems into structures
  platform/x86: int3472: Add board data for Surface Go2 IR camera
  platform/x86: int3472: Support multiple gpio lookups in board data
  platform/x86: int3472: Support multiple clock consumers
  ACPI: bus: Add iterator for dependent devices
  ACPI: scan: Add acpi_dev_get_next_consumer_dev()
  ...
2022-10-03 13:19:53 -07:00
Rafael J. Wysocki
a7ece531b9 Merge branches 'acpi-misc', 'acpi-tools' and 'acpi-docs'
Merge miscellaneous ACPI material, ACPI tools changes and ACPI
documentation updates for 6.1-rc1:

 - Drop references to non-functional 01.org/linux-acpi web site from
   MAINTAINERS and Kconfig help texts (Rafael Wysocki).

 - Replace strlcpy() with unused retval with strscpy() in the ACPI
   support code (Wolfram Sang).

 - Do not initialize ret in main() in the pfrut utility (Shi junming).

 - Drop useless ACPI DSDT override documentation (Rafael Wysocki).

 - Fix a few typos and wording mistakes in the ACPI device enumeration
   documentation (Jean Delvare).

* acpi-misc:
  MAINTAINERS: Drop records pointing to 01.org/linux-acpi
  ACPI: Kconfig: Drop link to https://01.org/linux-acpi
  ACPI: DPTF: Drop stale link from Kconfig help
  ACPI: move from strlcpy() with unused retval to strscpy()

* acpi-tools:
  ACPI: tools: pfrut: Do not initialize ret in main()

* acpi-docs:
  ACPI: docs: Drop useless DSDT override documentation
  ACPI: docs: enumeration: Fix a few typos and wording mistakes
2022-10-03 20:03:49 +02:00
Linus Torvalds
f3dfe925f9 There's not a huge amount of activity in the docs tree this time around,
but a few significant changes even so:
 
 - A complete rewriting of the top-level index.rst file, which mostly
   reflects itself in a redone top page in the HTML-rendered docs.  The hope
   is that the new organization will be a friendlier starting point for
   both users and developers.
 
 - Some math-rendering improvements.
 
 - A coding-style.rst update on the use of BUG() and WARN()
 
 - A big maintainer-PHP guide update.
 
 - Some code-of-conduct updates
 
 - More Chinese translation work
 
 Plus the usual pile of typo fixes, corrections, and updates.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmM7BksPHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5Y8i4H/ihd1ppgVYy1yvFL3L1KkcsNyt3bFUa6hide
 qmkhqpzjsNmbTOaW19Y6epCzRzvxG7M9hzztIewt1BhRDvgRC8GaQNNRw/IBs0B6
 kprisINC2/ap4JjCroYWepfd+H8NSiVxqtd8hVSMWDSh2cK9vw0qVqQq59I+gght
 64pA4F2nPO6bamZzAELTdWRj0ITL1A/V/jYj+T074B094arc4HyekIQ5Jn9GTCmt
 jFBH9yxAb3l8K7KgzH7FgxKY/an0HxKDh4Cnx2Jv+dcocgCwy1iXCuyEZbFd9GEB
 UyhPcCyrIe/I2B9U9LrqLvXA8LW7jwE+MZMqZpaRkxcIdE2gEFQ=
 =M7tR
 -----END PGP SIGNATURE-----

Merge tag 'docs-6.1' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "There's not a huge amount of activity in the docs tree this time
  around, but a few significant changes even so:

   - A complete rewriting of the top-level index.rst file, which mostly
     reflects itself in a redone top page in the HTML-rendered docs. The
     hope is that the new organization will be a friendlier starting
     point for both users and developers.

   - Some math-rendering improvements.

   - A coding-style.rst update on the use of BUG() and WARN()

   - A big maintainer-PHP guide update.

   - Some code-of-conduct updates

   - More Chinese translation work

  Plus the usual pile of typo fixes, corrections, and updates"

* tag 'docs-6.1' of git://git.lwn.net/linux: (66 commits)
  checkpatch: warn on usage of VM_BUG_ON() and other BUG variants
  coding-style.rst: document BUG() and WARN() rules ("do not crash the kernel")
  Documentation: devres: add missing IO helper
  Documentation: devres: update IRQ helper
  Documentation/mm: modify page_referenced to folio_referenced
  Documentation/CoC: Reflect current CoC interpretation and practices
  docs/doc-guide: Add documentation on SPHINX_IMGMATH
  docs: process/5.Posting.rst: clarify use of Reported-by: tag
  docs, kprobes: Fix the wrong location of Kprobes
  docs: add a man-pages link to the front page
  docs: put atomic*.txt and memory-barriers.txt into the core-api book
  docs: move asm-annotations.rst into core-api
  docs: remove some index.rst cruft
  docs: reconfigure the HTML left column
  docs: Rewrite the front page
  docs: promote the title of process/index.rst
  Documentation: devres: add missing SPI helper
  Documentation: devres: add missing PINCTRL helpers
  docs: hugetlbpage.rst: fix a typo of hugepage size
  docs/zh_CN: Add new translation of admin-guide/bootconfig.rst
  ...
2022-10-03 10:23:32 -07:00
Joe Fradley
d20a6ba5e3 kunit: add kunit.enable to enable/disable KUnit test
This patch adds the kunit.enable module parameter that will need to be
set to true in addition to KUNIT being enabled for KUnit tests to run.
The default value is true giving backwards compatibility. However, for
the production+testing use case the new config option
KUNIT_DEFAULT_ENABLED can be set to N requiring the tester to opt-in
by passing kunit.enable=1 to the kernel.

Signed-off-by: Joe Fradley <joefradley@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-09-30 13:17:39 -06:00
Catalin Marinas
53630a1f61 Merge branch 'for-next/misc' into for-next/core
* for-next/misc:
  : Miscellaneous patches
  arm64/kprobe: Optimize the performance of patching single-step slot
  ARM64: reloc_test: add __init/__exit annotations to module init/exit funcs
  arm64/mm: fold check for KFENCE into can_set_direct_map()
  arm64: uaccess: simplify uaccess_mask_ptr()
  arm64: mte: move register initialization to C
  arm64: mm: handle ARM64_KERNEL_USES_PMD_MAPS in vmemmap_populate()
  arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()
  arm64: support huge vmalloc mappings
  arm64: spectre: increase parameters that can be used to turn off bhb mitigation individually
  arm64: run softirqs on the per-CPU IRQ stack
  arm64: compat: Implement misalignment fixups for multiword loads
2022-09-30 09:18:26 +01:00
Rafael J. Wysocki
d206cef03c ACPI: docs: Drop useless DSDT override documentation
Because https://01.org/linux-acpi web site has become permanently
inaccessible, the "Overriding DSDT" document in the kernel tree
pointing to it as the main source of information is useless (and
the config option name mentioned by it is incorrect), so drop it
and drop the pointer to it from the ACPI Kconfig.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-09-28 17:31:23 +02:00
Hoi Pok Wu
16461c66de docs: hugetlbpage.rst: fix a typo of hugepage size
should be kB instead of Kb

Signed-off-by: Hoi Pok Wu <wuhoipok@gmail.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Link: https://lore.kernel.org/r/20220922030645.9719-1-wuhoipok@gmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-09-27 13:21:45 -06:00
Lin Yujun
06cb31cc76 Documentation/hw-vuln: Update spectre doc
commit 7c693f54c8 ("x86/speculation: Add spectre_v2=ibrs option to support Kernel IBRS")

adds the "ibrs " option  in
Documentation/admin-guide/kernel-parameters.txt but omits it to
Documentation/admin-guide/hw-vuln/spectre.rst, add it.

Signed-off-by: Lin Yujun <linyujun809@huawei.com>
Link: https://lore.kernel.org/r/20220830123614.23007-1-linyujun809@huawei.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-09-27 13:21:44 -06:00
Lukas Bulwahn
32a3a9db16 docs: admin-guide: for kernel bugs refer to other kernel documentation
The current section 'If something goes wrong' makes a number of suggestions
for debugging, bug hunting and reporting issues, which are quite briefly
described in that section.

However, the suggestions are also well covered in other kernel
documentation or sometimes simply outdated. Here, each suggestion in that
section is summarized, and then followed with its assessment, and the
derived action for each suggestion:

  - use MAINTAINERS and mailing list: covered in 'Reporting issues',
    summarized in the short guide, detailed in its further section.
    Reporting issues even provides some specific examples that guides
    readers well through the needed steps. Refer to 'Reporting issues'.

  - contact Linus Torvalds: probably outdated as currently described.
    nevertheless covered in 'Reporting issues'. Reporting issues points out
    to contact the relevant kernel maintainers first, and after some
    patience and failed attempts with those maintainers, contacting Linus
    Torvalds might be okay. Refer to 'Reporting issues'.

  - tell what kernel, how to duplicate, the setup, if the problem is new
    or old and when did you notice: covered in 'Reporting issues',
    especially in Step-by-step guide how to report issues to the kernel
    maintainers. Refer to 'Reporting issues'.

  - duplicate kernel bug reports exactly: covered in 'Reporting issues',
    especially in Write and send the report. Refer to 'Reporting issues'.

  - read 'Bug hunting': keep this reference. Refer to 'Bug hunting'.

  - compile the kernel with CONFIG_KALLSYMS: covered in 'Reporting issues',
    especially in Decode failure messages. Refer to 'Reporting issues'.

  - alternatively, use ksymoops: ksymoops at the mentioned URL seems not to
    be maintained anymore. It was released roughly once a year until
    version 2.4.11 in 2005, but has not seen a new release since then. The
    information in ./scripts/ksymoops/README is from 1999, and does not
    give more insight on its actual maintenance state either. Ksymoops is
    mentioned as system utility in changes.rst, but also not recommended
    there. Drop the explanation on using ksymoops.

  - alternatively, lookup dump manually with the EIP and nm to determine
    the function in which the kernel crashes: this method seems already a
    quite advanced and low-level debugging method. Even all the further
    references on bug hunting and debugging do not mention it. Drop this
    alternative method and limit mentioning methods explained in the other
    existing kernel documentation.

  - read 'Reporting issues': keep this reference.
    Refer to 'Reporting issues'.

  - use gdb for debugging: some specific details, e.g., edit
    arch/x86/Makefile, are probably outdated or limited to one (historic
    important) setup. Using gdb is covered in 'Bug hunting', 'Debugging
    kernel and modules via gdb' and 'Using kgdb, kdb and the kernel
    debugger internals'. Refer to those three documents.

Overall, it is sufficient to refer to reporting-issues.rst,
bug-hunting.rst, gdb-kernel-debugging.rst and kgdb.rst and this way cover
the existing suggestions.

'Reporting issues' is quite new and probably up to date. 'Bug hunting',
'Debugging kernel and modules via gdb' and 'Using kgdb, kdb and the kernel
debugger internals' might need some revisit and update, but they are
generally in an acceptable state for referring to them.

Replace the existing suggestions by reference to other existing kernel
documentation covering those suggestions---partly even nicely summarized
and then explained in greater detail.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Link: https://lore.kernel.org/r/20220720041325.15693-3-lukas.bulwahn@gmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-09-27 13:21:43 -06:00
Lukas Bulwahn
3f10b50829 docs: admin-guide: do not mention the 'run a.out user programs' feature
Running a.out user programs with the latest kernel release is a very rare
and uncommon use case nowadays. The support of a.out user programs is only
remaining for the alpha architecture and is not defined and activated in
the architecture's Kconfig (so even the activation of this support requires
to modify the Kconfig file and not just kernel build configuration).

The discussion on a.out support in 2019 (see Link) shows that the support
of a.out user programs is just remaining for a special corner case from
some (alpha architecture) users.

There is no need to point out and mention this special feature to the
general audience of kernel users. Delete the reference to this historic and
special feature.

Link: https://lore.kernel.org/all/CAHk-=wgt7M6yA5BJCJo0nF22WgPJnN8CvViL9CAJmd+S+Civ6w@mail.gmail.com/

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Link: https://lore.kernel.org/r/20220720041325.15693-2-lukas.bulwahn@gmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-09-27 13:21:43 -06:00
Akhil Raj
d2bef8e103 Remove duplicate words inside documentation
I have removed repeated `the` inside the documentation

Signed-off-by: Akhil Raj <lf32.dev@gmail.com>
Link: https://lore.kernel.org/r/20220827145359.32599-1-lf32.dev@gmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-09-27 13:21:43 -06:00
xu xin
21b7bdb504 ksm: add profit monitoring documentation
Add the description of KSM profit and how to determine it separately in
system-wide range and inner a single process.

Link: https://lkml.kernel.org/r/20220830144003.299870-1-xu.xin16@zte.com.cn
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
Reviewed-by: Xiaokai Ran <ran.xiaokai@zte.com.cn>
Reviewed-by: Yang Yang <yang.yang29@zte.com.cn>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Izik Eidus <izik.eidus@ravellosystems.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26 19:46:29 -07:00
Yu Zhao
07017acb06 mm: multi-gen LRU: admin guide
Add an admin guide.

Link: https://lkml.kernel.org/r/20220918080010.2920238-14-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Acked-by: Brian Geffon <bgeffon@google.com>
Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: Steven Barrett <steven@liquorix.net>
Acked-by: Suleiman Souhlal <suleiman@google.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Tested-by: Daniel Byrne <djbyrne@mtu.edu>
Tested-by: Donald Carr <d@chaos-reins.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Tested-by: Shuang Zhai <szhai2@cs.rochester.edu>
Tested-by: Sofia Trinh <sofia.trinh@edi.works>
Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26 19:46:10 -07:00
Christophe Leroy
404a5e72f4 Documentation: Rename PPC_FSL_BOOK3E to PPC_E500
CONFIG_PPC_FSL_BOOK3E is redundant with CONFIG_PPC_E500.

Rename it so that CONFIG_PPC_FSL_BOOK3E can be removed later.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d3d42b395c09e66b0705fda1e51779f33e13ac38.1663606876.git.christophe.leroy@csgroup.eu
2022-09-26 23:00:13 +10:00
Tejun Heo
026e14a276 Merge branch 'for-6.0-fixes' into for-6.1
for-6.0 has the following fix for cgroup_get_from_id().

  836ac87d ("cgroup: fix cgroup_get_from_id")

which conflicts with the following two commits in for-6.1.

  4534dee9 ("cgroup: cgroup: Honor caller's cgroup NS when resolving cgroup id")
  fa7e439c ("cgroup: Homogenize cgroup_get_from_id() return value")

While the resolution is straightforward, the code ends up pretty ugly
afterwards. Let's pull for-6.0-fixes into for-6.1 so that the code can be
fixed up there.

Signed-off-by: Tejun Heo <tj@kernel.org>
2022-09-23 07:19:38 -10:00
Shuai Xue
a6f92909d6 docs: perf: Add description for Alibaba's T-Head PMU driver
Alibaba's T-Head SoC implements uncore PMU for performance and functional
debugging to facilitate system maintenance. Document it to provide guidance
on how to use it.

Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220914022326.88550-2-xueshuai@linux.alibaba.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-09-22 14:09:10 +01:00
Yauheni Kaliuta
bfeb7e399b bpf: Use bpf_capable() instead of CAP_SYS_ADMIN for blinding decision
The full CAP_SYS_ADMIN requirement for blinding looks too strict nowadays.
These days given unprivileged BPF is disabled by default, the main users
for constant blinding coming from unprivileged in particular via cBPF -> eBPF
migration (e.g. old-style socket filters).

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220831090655.156434-1-ykaliuta@redhat.com
Link: https://lore.kernel.org/bpf/20220905090149.61221-1-ykaliuta@redhat.com
2022-09-16 22:11:57 +02:00
Kefeng Wang
e92072237e arm64: support huge vmalloc mappings
As commit 559089e0a9 ("vmalloc: replace VM_NO_HUGE_VMAP with
VM_ALLOW_HUGE_VMAP"), the use of hugepage mappings for vmalloc
is an opt-in strategy, so it is saftly to support huge vmalloc
mappings on arm64, for now, it is used in kvmalloc() and
alloc_large_system_hash().

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Link: https://lore.kernel.org/r/20220911044423.139229-1-wangkefeng.wang@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-16 09:51:28 +01:00
Greg Kroah-Hartman
a791dc1353 Linux 6.0-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmMeQ2keHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGYRMH+gLNHiGirGZlm2GQ
 tKaZQUy7MiXuIP0hGDonDIIIAmIVhnjm9MDG8KT4W8AvEd7ukncyYqJfwWeWQPhP
 4mZcf6l3Z8Ke+qiaFpXpMPCxTyWcln1ox0EoNx2g9gdPxZntaRuuaTQVljUfTiey
 aVPHxve8ip3G7jDoJnuLSxESOqWxkb8v/SshBP1E5bF5BZ+cgZRqq7FNigFqxjbk
 wF29K09BVOPjdgkSvY/b0/SnL5KlSdMAv+FrPcJNGivcdIPgf/qJks5cI2HRUo7o
 CpKgbcLorCVyD+d+zLonJBwIy3arbmKD8JqYnfdTSIqVOUqHXWUDfeydsH32u1Gu
 lPSI2Hw=
 =7LTL
 -----END PGP SIGNATURE-----

Merge 6.0-rc5 into driver-core-next

We need the driver core and debugfs changes in this branch.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-12 16:51:22 +02:00
Petr Vorel
bfca3dd3d0 kernel/utsname_sysctl.c: print kernel arch
Print the machine hardware name (UTS_MACHINE) in /proc/sys/kernel/arch.

This helps people who debug kernel with initramfs with minimal environment
(i.e.  without coreutils or even busybox) or allow to open sysfs file
instead of run 'uname -m' in high level languages.

Link: https://lkml.kernel.org/r/20220901194403.3819-1-pvorel@suse.cz
Signed-off-by: Petr Vorel <pvorel@suse.cz>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: David Sterba <dsterba@suse.com>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-11 21:55:12 -07:00
Li Zhe
c4f20f1479 page_ext: introduce boot parameter 'early_page_ext'
In commit 2f1ee0913c ("Revert "mm: use early_pfn_to_nid in
page_ext_init""), we call page_ext_init() after page_alloc_init_late() to
avoid some panic problem.  It seems that we cannot track early page
allocations in current kernel even if page structure has been initialized
early.

This patch introduces a new boot parameter 'early_page_ext' to resolve
this problem.  If we pass it to the kernel, page_ext_init() will be moved
up and the feature 'deferred initialization of struct pages' will be
disabled to initialize the page allocator early and prevent the panic
problem above.  It can help us to catch early page allocations.  This is
useful especially when we find that the free memory value is not the same
right after different kernel booting.

[akpm@linux-foundation.org: fix section issue by removing __meminitdata]
Link: https://lkml.kernel.org/r/20220825102714.669-1-lizhe.67@bytedance.com
Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-11 20:26:02 -07:00
Huang Ying
c6833e1000 memory tiering: rate limit NUMA migration throughput
In NUMA balancing memory tiering mode, if there are hot pages in slow
memory node and cold pages in fast memory node, we need to promote/demote
hot/cold pages between the fast and cold memory nodes.

A choice is to promote/demote as fast as possible.  But the CPU cycles and
memory bandwidth consumed by the high promoting/demoting throughput will
hurt the latency of some workload because of accessing inflating and slow
memory bandwidth contention.

A way to resolve this issue is to restrict the max promoting/demoting
throughput.  It will take longer to finish the promoting/demoting.  But
the workload latency will be better.  This is implemented in this patch as
the page promotion rate limit mechanism.

The number of the candidate pages to be promoted to the fast memory node
via NUMA balancing is counted, if the count exceeds the limit specified by
the users, the NUMA balancing promotion will be stopped until the next
second.

A new sysctl knob kernel.numa_balancing_promote_rate_limit_MBps is added
for the users to specify the limit.

Link: https://lkml.kernel.org/r/20220713083954.34196-3-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: osalvador <osalvador@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zhong Jiang <zhongjiang-ali@linux.alibaba.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-11 20:25:54 -07:00
Charan Teja Kalla
9a79443ddc mm/cma_debug: show complete cma name in debugfs directories
Currently only 12 characters of the cma name is being used as the debug
directories where as the cma name can be of length CMA_MAX_NAME(=64)
characters.  One side problem with this is having 2 cma's with first
common 12 characters would end up in trying to create directories with
same name and fails with -EEXIST thus can limit cma debug functionality.

The 'cma-' prefix is used initially where cma areas don't have any names
and are represented by simple integer values.  Since now each cma would be
having its own name, drop 'cma-' prefix for the cma debug directories as
they are clearly evident that they are for cma debug through creating them
in /sys/kernel/debug/cma/ path.

Link: https://lkml.kernel.org/r/1660223729-22461-1-git-send-email-quic_charante@quicinc.com
Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Pavan Kondeti <quic_pkondeti@quicinc.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-11 20:25:50 -07:00
Axel Rasmussen
816284a3d0 userfaultfd: update documentation to describe /dev/userfaultfd
Explain the different ways to create a new userfaultfd, and how access
control works for each way.

[axelrasmussen@google.com: improve wording in documentation, per Mike]
  Link: https://lkml.kernel.org/r/20220819205201.658693-5-axelrasmussen@google.com
Link: https://lkml.kernel.org/r/20220808175614.3885028-5-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dmitry V. Levin <ldv@altlinux.org>
Cc: Gleb Fotengauer-Malinovskiy <glebfm@altlinux.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zhang Yi <yi.zhang@huawei.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-11 20:25:49 -07:00
Liu Song
877ace9eab arm64: spectre: increase parameters that can be used to turn off bhb mitigation individually
In our environment, it was found that the mitigation BHB has a great
impact on the benchmark performance. For example, in the lmbench test,
the "process fork && exit" test performance drops by 20%.
So it is necessary to have the ability to turn off the mitigation
individually through cmdline, thus avoiding having to compile the
kernel by adjusting the config.

Signed-off-by: Liu Song <liusong@linux.alibaba.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/1661514050-22263-1-git-send-email-liusong@linux.alibaba.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-09 19:02:22 +01:00
Chengming Zhou
34f26a1561 sched/psi: Per-cgroup PSI accounting disable/re-enable interface
PSI accounts stalls for each cgroup separately and aggregates it
at each level of the hierarchy. This may cause non-negligible overhead
for some workloads when under deep level of the hierarchy.

commit 3958e2d0c3 ("cgroup: make per-cgroup pressure stall tracking configurable")
make PSI to skip per-cgroup stall accounting, only account system-wide
to avoid this each level overhead.

But for our use case, we also want leaf cgroup PSI stats accounted for
userspace adjustment on that cgroup, apart from only system-wide adjustment.

So this patch introduce a per-cgroup PSI accounting disable/re-enable
interface "cgroup.pressure", which is a read-write single value file that
allowed values are "0" and "1", the defaults is "1" so per-cgroup
PSI stats is enabled by default.

Implementation details:

It should be relatively straight-forward to disable and re-enable
state aggregation, time tracking, averaging on a per-cgroup level,
if we can live with losing history from while it was disabled.
I.e. the avgs will restart from 0, total= will have gaps.

But it's hard or complex to stop/restart groupc->tasks[] updates,
which is not implemented in this patch. So we always update
groupc->tasks[] and PSI_ONCPU bit in psi_group_change() even when
the cgroup PSI stats is disabled.

Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Link: https://lkml.kernel.org/r/20220907090332.2078-1-zhouchengming@bytedance.com
2022-09-09 11:08:33 +02:00
Chengming Zhou
52b1364ba0 sched/psi: Add PSI_IRQ to track IRQ/SOFTIRQ pressure
Now PSI already tracked workload pressure stall information for
CPU, memory and IO. Apart from these, IRQ/SOFTIRQ could have
obvious impact on some workload productivity, such as web service
workload.

When CONFIG_IRQ_TIME_ACCOUNTING, we can get IRQ/SOFTIRQ delta time
from update_rq_clock_task(), in which we can record that delta
to CPU curr task's cgroups as PSI_IRQ_FULL status.

Note we don't use PSI_IRQ_SOME since IRQ/SOFTIRQ always happen in
the current task on the CPU, make nothing productive could run
even if it were runnable, so we only use PSI_IRQ_FULL.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Link: https://lore.kernel.org/r/20220825164111.29534-8-zhouchengming@bytedance.com
2022-09-09 11:08:32 +02:00
Jim Cromie
ace7c4bbb2 doc-dyndbg: edit dynamic-debug-howto for brevity, audience
Rework/modernize docs:

 - use /proc/dynamic_debug/control in examples
   its *always* there (when dyndbg is config'd), even when <debugfs> is not.
   drop <debugfs> talk, its a distraction here.

 - alias ddcmd='echo $* > /proc/dynamic_debug/control
   focus on args: declutter, hide boilerplate, make pwd independent.

 - swap sections: Viewing before Controlling. control file as Catalog.

 - focus on use by a system administrator
   add an alias to make examples more readable
   drop grep-101 lessons, admins know this.

 - use init/main.c as 1st example, thread it thru doc where useful.
   everybodys kernel boots, runs these.

 - add *prdbg* api section
   to the bottom of the file, its for developers more than admins.
   move list of api functions there.

 - simplify - drop extra words, phrases, sentences.

 - add "decorator" flags line to unify "prefix", trim fmlt descriptions

CC: linux-doc@vger.kernel.org
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Link: https://lore.kernel.org/r/20220904214134.408619-20-jim.cromie@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-07 17:04:49 +02:00
Jim Cromie
753914ed85 doc-dyndbg: describe "class CLASS_NAME" query support
Add an explanation of the new "class CLASS_NAME" syntax and meaning,
noting that the module determines if CLASS_NAME applies to it.

Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Link: https://lore.kernel.org/r/20220904214134.408619-19-jim.cromie@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-07 17:04:49 +02:00
Vasant Hegde
d799a183da iommu/amd: Add command-line option to enable different page table
Enhance amd_iommu command line option to specify v1 or v2 page table.
By default system will boot in V1 page table mode.

Co-developed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
Link: https://lore.kernel.org/r/20220825063939.8360-10-vasant.hegde@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-09-07 16:12:37 +02:00
Nicholas Piggin
0e8a631328 powerpc/pseries: Implement CONFIG_PARAVIRT_TIME_ACCOUNTING
CONFIG_VIRT_CPU_ACCOUNTING_GEN under pseries does not provide stolen
time accounting unless CONFIG_PARAVIRT_TIME_ACCOUNTING is enabled.
Implement this using the VPA accumulated wait counters.

Note this will not work on current KVM hosts because KVM does not
implement the VPA dispatch counters (yet). It could be implemented
with the dispatch trace log as it is for VIRT_CPU_ACCOUNTING_NATIVE,
but that is not necessary for the more limited accounting provided
by PARAVIRT_TIME_ACCOUNTING, and it is more expensive, complex, and
has downsides like potential log wrap.

From Shrikanth:

  [...] it was tested on Power10 [PowerVM] Shared LPAR. system has two
  LPAR. we will call first one LPAR1 and second one as LPAR2. Test was
  carried out in SMT=1. Similar observation was seen in SMT=8 as well.

  LPAR config header from each LPAR is below. LPAR1 is twice as big as
  LPAR2. Since Both are sharing the same underlying hardware, work
  stealing will happen when both the LPAR's are contending for the same
  resource.

  LPAR1:
  type=Shared mode=Uncapped smt=Off lcpu=40 cpus=40 ent=20.00
  LPAR2:
  type=Shared mode=Uncapped smt=Off lcpu=20 cpus=40 ent=10.00

  mpstat was used to check for the utilization. stress-ng has been used
  as the workload. Few cases are tested. when the both LPAR are idle
  there is no steal time. when LPAR1 starts running at 100% which
  consumes all of the physical resource, steal time starts to get
  accounted.  With LPAR1 running at 100% and LPAR2 starts running, steal
  time starts increasing. This is as expected. When the LPAR2 Load is
  increased further, steal time increases further.

  Case 1: 0% LPAR1; 0% LPAR2
   %usr  %nice   %sys %iowait  %irq  %soft %steal %guest %gnice  %idle
   0.00   0.00   0.05   0.00   0.00   0.00   0.00   0.00   0.00  99.95

  Case 2: 100% LPAR1; 0% LPAR2
   %usr  %nice   %sys %iowait  %irq  %soft %steal %guest %gnice  %idle
  97.68   0.00   0.00   0.00   0.00   0.00   2.32   0.00   0.00   0.00

  Case 3: 100% LPAR1; 50% LPAR2
   %usr  %nice   %sys %iowait  %irq  %soft %steal %guest %gnice  %idle
  86.34   0.00   0.10   0.00   0.00   0.03  13.54   0.00   0.00   0.00

  Case 4: 100% LPAR1; 100% LPAR2
   %usr  %nice   %sys %iowait  %irq  %soft %steal %guest %gnice  %idle
  78.54   0.00   0.07   0.00   0.00   0.02  21.36   0.00   0.00   0.00

  Case 5: 50% LPAR1; 100% LPAR2
   %usr  %nice   %sys %iowait  %irq  %soft %steal %guest %gnice  %idle
  49.37   0.00   0.00   0.00   0.00   0.00   1.17   0.00   0.00  49.47

  Patch is accounting for the steal time and basic tests are holding
  good.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
[mpe: Add SPDX tag to new paravirt_api_clock.h]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220902085316.2071519-3-npiggin@gmail.com
2022-09-05 14:14:02 +10:00
Waiman Long
8cbfdc24fc cgroup/cpuset: Update description of cpuset.cpus.partition in cgroup-v2.rst
Update Documentation/admin-guide/cgroup-v2.rst on the newly introduced
"isolated" cpuset partition type as well as other changes made in other
cpuset patches.

Signed-off-by: Waiman Long <longman@redhat.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2022-09-04 10:47:28 -10:00
Jakub Kicinski
60ad1100d5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
tools/testing/selftests/net/.gitignore
  sort the net-next version and use it

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-01 12:58:02 -07:00
Daniel Sneddon
b8d1d16360 x86/apic: Don't disable x2APIC if locked
The APIC supports two modes, legacy APIC (or xAPIC), and Extended APIC
(or x2APIC).  X2APIC mode is mostly compatible with legacy APIC, but
it disables the memory-mapped APIC interface in favor of one that uses
MSRs.  The APIC mode is controlled by the EXT bit in the APIC MSR.

The MMIO/xAPIC interface has some problems, most notably the APIC LEAK
[1].  This bug allows an attacker to use the APIC MMIO interface to
extract data from the SGX enclave.

Introduce support for a new feature that will allow the BIOS to lock
the APIC in x2APIC mode.  If the APIC is locked in x2APIC mode and the
kernel tries to disable the APIC or revert to legacy APIC mode a GP
fault will occur.

Introduce support for a new MSR (IA32_XAPIC_DISABLE_STATUS) and handle
the new locked mode when the LEGACY_XAPIC_DISABLED bit is set by
preventing the kernel from trying to disable the x2APIC.

On platforms with the IA32_XAPIC_DISABLE_STATUS MSR, if SGX or TDX are
enabled the LEGACY_XAPIC_DISABLED will be set by the BIOS.  If
legacy APIC is required, then it SGX and TDX need to be disabled in the
BIOS.

[1]: https://aepicleak.com/aepicleak.pdf

Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
Link: https://lkml.kernel.org/r/20220816231943.1152579-1-daniel.sneddon@linux.intel.com
2022-08-31 14:34:11 -07:00
Linus Torvalds
d68d289fbe A handful of fixes for documentation and the docs build system.
-----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmMM3aoPHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5YsdsH/09lM6s1trLFE8AVRGcSVJUelbZNZfm2WI7J
 7ElIVVVsBvRfLMVpjCkPrKjRV2jyJ+wDtIs1g54ra/J9/xyWSmlp8jDLvB+PgqGR
 zxhZX1/dbGEoRwqY3S5Vm5i3DsdBKMcGGc1nIM2xzZXdRoptvbzbZOHFGU/mqdC5
 QZeVe6ycCzsIgGz6aKlhHEszp/Jd2T4Zj/kJ9n95IxNB3l8uKTnSn/s0cQfGRZU/
 dtednWNs+1nUMGMerfgq7rstE0GfQFbfWL+2bYlDuSS3RgsaMXD+jwhxx7RV4oM2
 XCbgDoYW+rg4jArWOvBLWW+EFK5ywYYBjhGXDm041jaKi/vPRbo=
 =aQJl
 -----END PGP SIGNATURE-----

Merge tag 'docs-6.0-fixes' of git://git.lwn.net/linux

Pull documentation fixes from Jonathan Corbet:
 "A handful of fixes for documentation and the docs build system"

* tag 'docs-6.0-fixes' of git://git.lwn.net/linux:
  docs/conf.py: add function attribute '__fix_address' to conf.py
  Docs/admin-guide/mm/damon/usage: fix the example code snip
  docs: Update version number from 5.x to 6.x in README.rst
  docs/ja_JP/SubmittingPatches: Remove reference to submitting-drivers.rst
  docs: kerneldoc-preamble: Test xeCJK.sty before loading
2022-08-29 09:49:48 -07:00
Linus Torvalds
2f23a7c914 Misc fixes:
- Fix PAT on Xen, which caused i915 driver failures
  - Fix compat INT 80 entry crash on Xen PV guests
  - Fix 'MMIO Stale Data' mitigation status reporting on older Intel CPUs
  - Fix RSB stuffing regressions
  - Fix ORC unwinding on ftrace trampolines
  - Add Intel Raptor Lake CPU model number
  - Fix (work around) a SEV-SNP bootloader bug providing bogus values in
    boot_params->cc_blob_address, by ignoring the value on !SEV-SNP bootups.
  - Fix SEV-SNP early boot failure
  - Fix the objtool list of noreturn functions and annotate snp_abort(),
    which bug confused objtool on gcc-12.
  - Fix the documentation for retbleed
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmMLgUERHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hU1hAAj9bKO+D07gROpkRhXpLvbqm+mUqk6It8
 qEuyLkC/xvD9N//Pya0ZPPE+l23EHQxM/L0tXCYwsc8Zlx3fr678FQktHZuxULaP
 qlGmwub8PvsyMesXBGtgHJ4RT8L07FelBR+E/TpqCSbv/geM7mcc0ojyOKo+OmbD
 vbsUTDNqsMYndzePUBrv/vJRqVLGlbd1a22DKE/YiYBbZu5hughNUB0bHYhYdsml
 mutcRsVTKRbZOi4wiuY/pnveMf/z9wAwKOWd9EBaDWILygVSHkBJJQyhHigBh3F8
 H1hFn1q4ybULdrAUT4YND9Tjb6U8laU0fxg8Z43bay7bFXXElqoj3W2qka9NjAim
 QvWSFvTYQRTIWt6sCshVsUBWj6fBZdHxcJ8Fh+ucJJp+JWl9/aD0A41vK2Jx0LIt
 p8YzBRKEqd1Q/7QD855BArN7HNtQGgShNt4oPVZ7nPnjuQt0+lngu8NR+6X3RKpX
 r8rordyPvzgPL6W+1uV+8hnz1w+YU2xplAbK+zTijwgJVgyf8khSlZQNpFKULrou
 zjtzo/2nB+4C4bvfetNnaOGhi1/AdCHZHyZE35rotpd73SLHvdOrH0Ll9oCVfVrC
 UWbC1E67cHQw97Ni/4CrCsJRBULK01uyszVCxlEkSYInf0UsKlnmy+TqxizwsCVy
 reYQ0ePyWg0=
 =ZpCJ
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2022-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc x86 fixes from Ingo Molnar:

 - Fix PAT on Xen, which caused i915 driver failures

 - Fix compat INT 80 entry crash on Xen PV guests

 - Fix 'MMIO Stale Data' mitigation status reporting on older Intel CPUs

 - Fix RSB stuffing regressions

 - Fix ORC unwinding on ftrace trampolines

 - Add Intel Raptor Lake CPU model number

 - Fix (work around) a SEV-SNP bootloader bug providing bogus values in
   boot_params->cc_blob_address, by ignoring the value on !SEV-SNP
   bootups.

 - Fix SEV-SNP early boot failure

 - Fix the objtool list of noreturn functions and annotate snp_abort(),
   which bug confused objtool on gcc-12.

 - Fix the documentation for retbleed

* tag 'x86-urgent-2022-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Documentation/ABI: Mention retbleed vulnerability info file for sysfs
  x86/sev: Mark snp_abort() noreturn
  x86/sev: Don't use cc_platform_has() for early SEV-SNP calls
  x86/boot: Don't propagate uninitialized boot_params->cc_blob_address
  x86/cpu: Add new Raptor Lake CPU model number
  x86/unwind/orc: Unwind ftrace trampolines with correct ORC entry
  x86/nospec: Fix i386 RSB stuffing
  x86/nospec: Unwreck the RSB stuffing
  x86/bugs: Add "unknown" reporting for MMIO Stale Data
  x86/entry: Fix entry_INT80_compat for Xen PV guests
  x86/PAT: Have pat_enabled() properly reflect state when running on Xen
2022-08-28 10:10:23 -07:00
Linus Torvalds
e022620b5d arm64 fixes for -rc3
- Fix workaround for Cortex-A76 erratum #1286807
 
 - Add workaround for AMU erratum #2457168 on Cortex-A510
 
 - Drop reference to removed CONFIG_ARCH_RANDOM #define
 
 - Fix parsing of the "rodata=full" cmdline option
 
 - Fix a bunch of issues in the SME register state switching and sigframe code
 
 - Fix incorrect extraction of the CTR_EL0.CWG register field
 
 - Fix ACPI cache topology probing when the PPTT is not present
 
 - Trivial comment and whitespace fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmMIr4YQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNOpcB/9PQM3DtbFcCFJeetFxJYVSPHikHfEj4SHY
 H6NRNLA9JwwCqA1/RKbJQDpFS34JdL5UgtNuUFbg4zkH2aaeML7WKuRg396jv2ke
 GmHpOkNeIaae0NQes3MLZQf+Heh2oimRYwlPXJc23vhAIZagwby3yR/9POrxKm6p
 kamLIrmwU1mU7pwr7HEkRr1HOd/eOJ+q6P3UYrFlAgAp5PEfVGgcdjbHJ1gY1KWw
 HwR6s2yWQ71+Aug9wMS56TQszBedyuIOeKezAI+IbThF6t/kQlfJna+hqvPp0zEE
 mz1hss7ACpfDURdga35B1bE146aZ1SKEsXXaB853JI76K6D534Y9
 =3oRd
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:
 "A bumper crop of arm64 fixes for -rc3.

  The largest change is fixing our parsing of the 'rodata=full' command
  line option, which kstrtobool() started treating as 'rodata=false'.
  The fix actually makes the parsing of that option much less fragile
  and updates the documentation at the same time.

  We still have a boot issue pending when KASLR is disabled at compile
  time, but there's a fresh fix on the list which I'll send next week if
  it holds up to testing.

  Summary:

   - Fix workaround for Cortex-A76 erratum #1286807

   - Add workaround for AMU erratum #2457168 on Cortex-A510

   - Drop reference to removed CONFIG_ARCH_RANDOM #define

   - Fix parsing of the "rodata=full" cmdline option

   - Fix a bunch of issues in the SME register state switching and sigframe code

   - Fix incorrect extraction of the CTR_EL0.CWG register field

   - Fix ACPI cache topology probing when the PPTT is not present

   - Trivial comment and whitespace fixes"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64/sme: Don't flush SVE register state when handling SME traps
  arm64/sme: Don't flush SVE register state when allocating SME storage
  arm64/signal: Flush FPSIMD register state when disabling streaming mode
  arm64/signal: Raise limit on stack frames
  arm64/cache: Fix cache_type_cwg() for register generation
  arm64/sysreg: Guard SYS_FIELD_ macros for asm
  arm64/sysreg: Directly include bitfield.h
  arm64: cacheinfo: Fix incorrect assignment of signed error value to unsigned fw_level
  arm64: errata: add detection for AMEVCNTR01 incrementing incorrectly
  arm64: fix rodata=full
  arm64: Fix comment typo
  docs/arm64: elf_hwcaps: unify newlines in HWCAP lists
  arm64: adjust KASLR relocation after ARCH_RANDOM removal
  arm64: Fix match_list for erratum 1286807 on Arm Cortex-A76
2022-08-26 11:32:53 -07:00
Jakub Kicinski
880b0dd94f Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
  21234e3a84 ("net/mlx5e: Fix use after free in mlx5e_fs_init()")
  c7eafc5ed0 ("net/mlx5e: Convert ethtool_steering member of flow_steering struct to pointer")
https://lore.kernel.org/all/20220825104410.67d4709c@canb.auug.org.au/
https://lore.kernel.org/all/20220823055533.334471-1-saeed@kernel.org/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-25 16:07:42 -07:00
Kairui Song
465d0eb0dc Docs/admin-guide/mm/damon/usage: fix the example code snip
The workflow example code is not working since it got the file names
wrong. So fix this.

Fixes: b18402726b ("Docs/admin-guide/mm/damon/usage: document DAMON sysfs interface")
Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Kairui Song <kasong@tencent.com>
Link: https://lore.kernel.org/r/20220823114053.53305-1-ryncsn@gmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-08-25 12:50:13 -06:00
Lukas Bulwahn
602684adb4 docs: Update version number from 5.x to 6.x in README.rst
A quick 'grep "5\.x" . -R' on Documentation shows that README.rst,
2.Process.rst and applying-patches.rst all mention the version number "5.x"
for kernel releases.

As the next release will be version 6.0, updating the version number to 6.x
in README.rst seems reasonable.

The description in 2.Process.rst is just a description of recent kernel
releases, it was last updated in the beginning of 2020, and can be
revisited at any time on a regular basis, independent of changing the
version number from 5 to 6. So, there is no need to update this document
now when transitioning from 5.x to 6.x numbering.

The document applying-patches.rst is probably obsolete for most users
anyway, a reader will sufficiently well understand the steps, even it
mentions version 5 rather than version 6. So, do not update that to a
version 6.x numbering scheme.

Update version number from 5.x to 6.x in README.rst only.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Link: https://lore.kernel.org/r/20220824080836.23087-1-lukas.bulwahn@gmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-08-25 12:45:10 -06:00
Bagas Sanjaya
1faa34672f Documentation: sysctl: align cells in second content column
Stephen Rothwell reported htmldocs warning when merging net-next tree:

Documentation/admin-guide/sysctl/net.rst:37: WARNING: Malformed table.
Text in column margin in table line 4.

========= =================== = ========== ==================
Directory Content               Directory  Content
========= =================== = ========== ==================
802       E802 protocol         mptcp     Multipath TCP
appletalk Appletalk protocol    netfilter Network Filter
ax25      AX25                  netrom     NET/ROM
bridge    Bridging              rose      X.25 PLP layer
core      General parameter     tipc      TIPC
ethernet  Ethernet protocol     unix      Unix domain sockets
ipv4      IP version 4          x25       X.25 protocol
ipv6      IP version 6
========= =================== = ========== ==================

The warning above is caused by cells in second "Content" column of
/proc/sys/net subdirectory table which are in column margin.

Align these cells against the column header to fix the warning.

Link: https://lore.kernel.org/linux-next/20220823134905.57ed08d5@canb.auug.org.au/
Fixes: 1202cdd665 ("Remove DECnet support from kernel")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://lore.kernel.org/r/20220824035804.204322-1-bagasdotme@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-24 18:42:51 -07:00
Yosry Ahmed
ebc97a52b5 mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.
We keep track of several kernel memory stats (total kernel memory, page
tables, stack, vmalloc, etc) on multiple levels (global, per-node,
per-memcg, etc). These stats give insights to users to how much memory
is used by the kernel and for what purposes.

Currently, memory used by KVM mmu is not accounted in any of those
kernel memory stats. This patch series accounts the memory pages
used by KVM for page tables in those stats in a new
NR_SECONDARY_PAGETABLE stat. This stat can be later extended to account
for other types of secondary pages tables (e.g. iommu page tables).

KVM has a decent number of large allocations that aren't for page
tables, but for most of them, the number/size of those allocations
scales linearly with either the number of vCPUs or the amount of memory
assigned to the VM. KVM's secondary page table allocations do not scale
linearly, especially when nested virtualization is in use.

From a KVM perspective, NR_SECONDARY_PAGETABLE will scale with KVM's
per-VM pages_{4k,2m,1g} stats unless the guest is doing something
bizarre (e.g. accessing only 4kb chunks of 2mb pages so that KVM is
forced to allocate a large number of page tables even though the guest
isn't accessing that much memory). However, someone would need to either
understand how KVM works to make that connection, or know (or be told) to
go look at KVM's stats if they're running VMs to better decipher the stats.

Furthermore, having NR_PAGETABLE side-by-side with NR_SECONDARY_PAGETABLE
is informative. For example, when backing a VM with THP vs. HugeTLB,
NR_SECONDARY_PAGETABLE is roughly the same, but NR_PAGETABLE is an order
of magnitude higher with THP. So having this stat will at the very least
prove to be useful for understanding tradeoffs between VM backing types,
and likely even steer folks towards potential optimizations.

The original discussion with more details about the rationale:
https://lore.kernel.org/all/87ilqoi77b.wl-maz@kernel.org

This stat will be used by subsequent patches to count KVM mmu
memory usage.

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220823004639.2387269-2-yosryahmed@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-08-24 13:51:42 -07:00
Kuniyuki Iwashima
5dcd08cd19 net: Fix data-races around netdev_max_backlog.
While reading netdev_max_backlog, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

While at it, we remove the unnecessary spaces in the doc.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24 13:46:57 +01:00
Mark Rutland
2e8cff0a0e arm64: fix rodata=full
On arm64, "rodata=full" has been suppored (but not documented) since
commit:

  c55191e96c ("arm64: mm: apply r/o permissions of VM areas to its linear alias as well")

As it's necessary to determine the rodata configuration early during
boot, arm64 has an early_param() handler for this, whereas init/main.c
has a __setup() handler which is run later.

Unfortunately, this split meant that since commit:

  f9a40b0890 ("init/main.c: return 1 from handled __setup() functions")

... passing "rodata=full" would result in a spurious warning from the
__setup() handler (though RO permissions would be configured
appropriately).

Further, "rodata=full" has been broken since commit:

  0d6ea3ac94 ("lib/kstrtox.c: add "false"/"true" support to kstrtobool()")

... which caused strtobool() to parse "full" as false (in addition to
many other values not documented for the "rodata=" kernel parameter.

This patch fixes this breakage by:

* Moving the core parameter parser to an __early_param(), such that it
  is available early.

* Adding an (optional) arch hook which arm64 can use to parse "full".

* Updating the documentation to mention that "full" is valid for arm64.

* Having the core parameter parser handle "on" and "off" explicitly,
  such that any undocumented values (e.g. typos such as "ful") are
  reported as errors rather than being silently accepted.

Note that __setup() and early_param() have opposite conventions for
their return values, where __setup() uses 1 to indicate a parameter was
handled and early_param() uses 0 to indicate a parameter was handled.

Fixes: f9a40b0890 ("init/main.c: return 1 from handled __setup() functions")
Fixes: 0d6ea3ac94 ("lib/kstrtox.c: add "false"/"true" support to kstrtobool()")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jagdish Gediya <jvgediya@linux.ibm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220817154022.3974645-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-08-23 11:02:02 +01:00
Stephen Hemminger
1202cdd665 Remove DECnet support from kernel
DECnet is an obsolete network protocol that receives more attention
from kernel janitors than users. It belongs in computer protocol
history museum not in Linux kernel.

It has been "Orphaned" in kernel since 2010. The iproute2 support
for DECnet was dropped in 5.0 release. The documentation link on
Sourceforge says it is abandoned there as well.

Leave the UAPI alone to keep userspace programs compiling.
This means that there is still an empty neighbour table
for AF_DECNET.

The table of /proc/sys/net entries was updated to match
current directories and reformatted to be alphabetical.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: David Ahern <dsahern@kernel.org>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-22 14:26:30 +01:00
Ashok Raj
3ecf671f1d x86/microcode: Document the whole late loading problem
Commit

  d23d33ea0f ("x86/microcode: Taint and warn on late loading")

started tainting the kernel after microcode late loading.

There is some history behind why x86 microcode started doing the late
loading stop_machine() rendezvous. Document the whole situation.

No functional changes.

  [ bp: Fix typos, heavily massage. ]

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220813223825.3164861-2-ashok.raj@intel.com
2022-08-18 15:57:53 +02:00
Pawan Gupta
7df548840c x86/bugs: Add "unknown" reporting for MMIO Stale Data
Older Intel CPUs that are not in the affected processor list for MMIO
Stale Data vulnerabilities currently report "Not affected" in sysfs,
which may not be correct. Vulnerability status for these older CPUs is
unknown.

Add known-not-affected CPUs to the whitelist. Report "unknown"
mitigation status for CPUs that are not in blacklist, whitelist and also
don't enumerate MSR ARCH_CAPABILITIES bits that reflect hardware
immunity to MMIO Stale Data vulnerabilities.

Mitigation is not deployed when the status is unknown.

  [ bp: Massage, fixup. ]

Fixes: 8d50cdf8b8 ("x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale Data")
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Suggested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/a932c154772f2121794a5f2eded1a11013114711.1657846269.git.pawan.kumar.gupta@linux.intel.com
2022-08-18 15:35:22 +02:00
Linus Torvalds
c5f1e32e32 Fix the "IBPB mitigated RETBleed" mode of operation on AMD CPUs
(not turned on by default), which also need STIBP enabled (if
 available) to be '100% safe' on even the shortest speculation
 windows.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmL3fqcRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gnuw/6AighFp+Gp4qXP1DIVU+acVnZsxbdt7GA
 WGs/JJfKYsKpWvDGFxnwtF2V1Imq8XVRPVPyFKvLQiBs2h8vNcVkgIvJsdeTFsqQ
 uUwUaYgDXuhLYaFpnMGouoeA3iw2zf/CY5ZJX79Nl/CwNwT7FxiLbu+JF/I2Yc0V
 yddiQ8xgT0VJhaBcUTsD2qFl8wjpxer7gNBFR4ujiYWXHag3qKyZuaySmqCz4xhd
 4nyhJCp34548MsTVXDys2gnYpgLWweB9zOPvH4+GgtiFF3UJxRMhkB9NzfZq1l5W
 tCjgGupb3vVoXOVb/xnXyZlPbdFNqSAja7iOXYdmNUSURd7LC0PYHpVxN0rkbFcd
 V6noyU3JCCp86ceGTC0u3Iu6LLER6RBGB0gatVlzomWLjTEiC806eo23CVE22cnk
 poy7FO3RWa+q1AqWsEzc3wr14ZgSKCBZwwpn6ispT/kjx9fhAFyKtH2/Sznx26GH
 yKOF7pPCIXjCpcMnNoUu8cVyzfk0g3kOWQtKjaL9WfeyMtBaHhctngR0s1eCxZNJ
 rBlTs+YO7fO42unZEExgvYekBzI70aThIkvxahKEWW48owWph+i/sn5gzdVF+ynR
 R4PGeylfd8ZXr21cG2rG9250JLwqzhsxnAGvNjYg1p/hdyrzLTGWHIc9r9BU9000
 mmOP9uY6Cjc=
 =Ac6x
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fix from Ingo Molnar:
 "Fix the 'IBPB mitigated RETBleed' mode of operation on AMD CPUs (not
  turned on by default), which also need STIBP enabled (if available) to
  be '100% safe' on even the shortest speculation windows"

* tag 'x86-urgent-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/bugs: Enable STIBP for IBPB mitigated RETBleed
2022-08-13 14:24:12 -07:00
Linus Torvalds
b1701d5e29 - hugetlb_vmemmap cleanups from Muchun Song
- hardware poisoning support for 1GB hugepages, from Naoya Horiguchi
 
 - highmem documentation fixups from Fabio De Francesco
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYvMFaQAKCRDdBJ7gKXxA
 jrM7AQCsaVsrRJgVMNqjDvLAsWFEm48s6/smQT4UZ9n/rPQUsAEA0iRpOfbjcxwK
 npAml1mp5uP2AkimTVLB6Bokt6N+wAg=
 =9Wta
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2022-08-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull remaining MM updates from Andrew Morton:
 "Three patch series - two that perform cleanups and one feature:

   - hugetlb_vmemmap cleanups from Muchun Song

   - hardware poisoning support for 1GB hugepages, from Naoya Horiguchi

   - highmem documentation fixups from Fabio De Francesco"

* tag 'mm-stable-2022-08-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (23 commits)
  Documentation/mm: add details about kmap_local_page() and preemption
  highmem: delete a sentence from kmap_local_page() kdocs
  Documentation/mm: rrefer kmap_local_page() and avoid kmap()
  Documentation/mm: avoid invalid use of addresses from kmap_local_page()
  Documentation/mm: don't kmap*() pages which can't come from HIGHMEM
  highmem: specify that kmap_local_page() is callable from interrupts
  highmem: remove unneeded spaces in kmap_local_page() kdocs
  mm, hwpoison: enable memory error handling on 1GB hugepage
  mm, hwpoison: skip raw hwpoison page in freeing 1GB hugepage
  mm, hwpoison: make __page_handle_poison returns int
  mm, hwpoison: set PG_hwpoison for busy hugetlb pages
  mm, hwpoison: make unpoison aware of raw error info in hwpoisoned hugepage
  mm, hwpoison, hugetlb: support saving mechanism of raw error pages
  mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry
  mm/hugetlb: check gigantic_page_runtime_supported() in return_unused_surplus_pages()
  mm: hugetlb_vmemmap: use PTRS_PER_PTE instead of PMD_SIZE / PAGE_SIZE
  mm: hugetlb_vmemmap: move code comments to vmemmap_dedup.rst
  mm: hugetlb_vmemmap: improve hugetlb_vmemmap code readability
  mm: hugetlb_vmemmap: replace early_param() with core_param()
  mm: hugetlb_vmemmap: move vmemmap code related to HugeTLB to hugetlb_vmemmap.c
  ...
2022-08-10 11:18:00 -07:00
Linus Torvalds
5318b987fe More from the CPU vulnerability nightmares front:
Intel eIBRS machines do not sufficiently mitigate against RET
 mispredictions when doing a VM Exit therefore an additional RSB,
 one-entry stuffing is needed.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmLqsGsACgkQEsHwGGHe
 VUpXGg//ZEkxhf3Ri7X9PknAWNG6eIEqigKqWcdnOw+Oq/GMVb6q7JQsqowK7KBZ
 AKcY5c/KkljTJNohditnfSOePyCG5nDTPgfkjzIawnaVdyJWMRCz/L4X2cv6ykDl
 2l2EvQm4Ro8XAogYhE7GzDg/osaVfx93OkLCQj278VrEMWgM/dN2RZLpn+qiIkNt
 DyFlQ7cr5UASh/svtKLko268oT4JwhQSbDHVFLMJ52VaLXX36yx4rValZHUKFdox
 ZDyj+kiszFHYGsI94KAD0dYx76p6mHnwRc4y/HkVcO8vTacQ2b9yFYBGTiQatITf
 0Nk1RIm9m3rzoJ82r/U0xSIDwbIhZlOVNm2QtCPkXqJZZFhopYsZUnq2TXhSWk4x
 GQg/2dDY6gb/5MSdyLJmvrTUtzResVyb/hYL6SevOsIRnkwe35P6vDDyp15F3TYK
 YvidZSfEyjtdLISBknqYRQD964dgNZu9ewrj+WuJNJr+A2fUvBzUebXjxHREsugN
 jWp5GyuagEKTtneVCvjwnii+ptCm6yfzgZYLbHmmV+zhinyE9H1xiwVDvo5T7DDS
 ZJCBgoioqMhp5qR59pkWz/S5SNGui2rzEHbAh4grANy8R/X5ASRv7UHT9uAo6ve1
 xpw6qnE37CLzuLhj8IOdrnzWwLiq7qZ/lYN7m+mCMVlwRWobbOo=
 =a8em
 -----END PGP SIGNATURE-----

Merge tag 'x86_bugs_pbrsb' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 eIBRS fixes from Borislav Petkov:
 "More from the CPU vulnerability nightmares front:

  Intel eIBRS machines do not sufficiently mitigate against RET
  mispredictions when doing a VM Exit therefore an additional RSB,
  one-entry stuffing is needed"

* tag 'x86_bugs_pbrsb' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/speculation: Add LFENCE to RSB fill sequence
  x86/speculation: Add RSB VM Exit protections
2022-08-09 09:29:07 -07:00
Muchun Song
dff033818a mm: hugetlb_vmemmap: introduce the name HVO
It it inconvenient to mention the feature of optimizing vmemmap pages
associated with HugeTLB pages when communicating with others since there
is no specific or abbreviated name for it when it is first introduced. 
Let us give it a name HVO (HugeTLB Vmemmap Optimization) from now.

This commit also updates the document about "hugetlb_free_vmemmap" by the
way discussed in thread [1].

Link: https://lore.kernel.org/all/21aae898-d54d-cc4b-a11f-1bb7fddcfffa@redhat.com/ [1]
Link: https://lkml.kernel.org/r/20220628092235.91270-4-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Will Deacon <will@kernel.org>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-08 18:06:42 -07:00
Linus Torvalds
e74acdf55d Modules updates for 6.0
For the 6.0 merge window the modules code shifts to cleanup and minor fixes
 effort. This is becomes much easier to do and review now due to the code
 split to its own directory from effort on the last kernel release. I expect
 to see more of this with time and as we expand on test coverage in the future.
 The cleanups and fixes come from usual suspects such as Christophe Leroy and
 Aaron Tomlin but there are also some other contributors.
 
 One particular minor fix worth mentioning is from Helge Deller, where he spotted
 a *forever* incorrect natural alignment on both ELF section header tables:
 
   * .altinstructions
   * __bug_table sections
 
 A lot of back and forth went on in trying to determine the ill effects of this
 misalignment being present for years and it has been determined there should
 be no real ill effects unless you have a buggy exception handler. Helge actually
 hit one of these buggy exception handlers on parisc which is how he ended up
 spotting this issue. When implemented correctly these paths with incorrect
 misalignment would just mean a performance penalty, but given that we are
 dealing with alternatives on modules and with the __bug_table (where info
 regardign BUG()/WARN() file/line information associated with it is stored)
 this really shouldn't be a big deal.
 
 The only other change with mentioning is the kmap() with kmap_local_page()
 and my only concern with that was on what is done after preemption, but the
 virtual addresses are restored after preemption. This is only used on module
 decompression.
 
 This all has sit on linux-next for a while except the kmap stuff which has
 been there for 3 weeks.
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmLxL4gSHG1jZ3JvZkBr
 ZXJuZWwub3JnAAoJEM4jHQowkoin8AYP/iv/Oh/Zzh4UvZzkkOSzhf1qDgGhjFb0
 aFIODZzpEfZ5ix5GcLapB8/QIwQgxiIRa3WkTMc0uyv+mddlbKuILFnI9A1I+TQe
 N4gmKeYXwWRyxLa6y7/B3lVzuLxf4DpcxfS2c3A65MkYi09XPA9oXCy7JjzsmEiZ
 z2Lu8lTe6hg8VarBTogHBxiEU7ybfDCnHWj7/Oe6zz8tS/R0i0ndNBu9xmaCqSh7
 QC8++eqCaS+zfW0uTmnGDo1/zWLBblCZ5HAHG8bLlPHezUbekNz6G1D4CVwFyNQ8
 wy1Gjy8nFWc+rwUl1CTgJ+A7wodGrMCyt5SmcNUVBOWdlSmli5vFJp61ET6UdrV+
 +8owATwwIm8hbkIAI4037j7pMgrO27d130GRxFwgG9GNoqew2AM7y/9HrlmW49PE
 IqJA4Pm3zg26IhLIRcH7jLg3oKGuFf0nkMTDoooI5a9DlcsCXPuGd0FBw2WbR71D
 Px6dlVoAW0NrP2tm8YzkTKIT+aN+UId4Vdi2oFs1t8Sye/U+LCjvwrXPk13pZKdR
 VxfM1oVxeRwiAUq0VuIrnj7windF5Mpy2hDLHeWjzQmLcEGAtCYEGyxKTBkNTtPt
 gm9XBzT6Rbzi+Sc++ZoHYHe1g4T66sjYOp4N90sRRMD3FR97ZyW8eD01gwf6p1Uy
 aCOrA+sRHK3F
 =hPvl
 -----END PGP SIGNATURE-----

Merge tag 'modules-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux

Pull module updates from Luis Chamberlain:
 "For the 6.0 merge window the modules code shifts to cleanup and minor
  fixes effort. This becomes much easier to do and review now due to the
  code split to its own directory from effort on the last kernel
  release. I expect to see more of this with time and as we expand on
  test coverage in the future. The cleanups and fixes come from usual
  suspects such as Christophe Leroy and Aaron Tomlin but there are also
  some other contributors.

  One particular minor fix worth mentioning is from Helge Deller, where
  he spotted a *forever* incorrect natural alignment on both ELF section
  header tables:

    * .altinstructions
    * __bug_table sections

  A lot of back and forth went on in trying to determine the ill effects
  of this misalignment being present for years and it has been
  determined there should be no real ill effects unless you have a buggy
  exception handler. Helge actually hit one of these buggy exception
  handlers on parisc which is how he ended up spotting this issue. When
  implemented correctly these paths with incorrect misalignment would
  just mean a performance penalty, but given that we are dealing with
  alternatives on modules and with the __bug_table (where info regardign
  BUG()/WARN() file/line information associated with it is stored) this
  really shouldn't be a big deal.

  The only other change with mentioning is the kmap() with
  kmap_local_page() and my only concern with that was on what is done
  after preemption, but the virtual addresses are restored after
  preemption. This is only used on module decompression.

  This all has sit on linux-next for a while except the kmap stuff which
  has been there for 3 weeks"

* tag 'modules-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
  module: Replace kmap() with kmap_local_page()
  module: Show the last unloaded module's taint flag(s)
  module: Use strscpy() for last_unloaded_module
  module: Modify module_flags() to accept show_state argument
  module: Move module's Kconfig items in kernel/module/
  MAINTAINERS: Update file list for module maintainers
  module: Use vzalloc() instead of vmalloc()/memset(0)
  modules: Ensure natural alignment for .altinstructions and __bug_table sections
  module: Increase readability of module_kallsyms_lookup_name()
  module: Fix ERRORs reported by checkpatch.pl
  module: Add support for default value for module async_probe
2022-08-08 14:12:19 -07:00
Kim Phillips
e6cfcdda8c x86/bugs: Enable STIBP for IBPB mitigated RETBleed
AMD's "Technical Guidance for Mitigating Branch Type Confusion,
Rev. 1.0 2022-07-12" whitepaper, under section 6.1.2 "IBPB On
Privileged Mode Entry / SMT Safety" says:

  Similar to the Jmp2Ret mitigation, if the code on the sibling thread
  cannot be trusted, software should set STIBP to 1 or disable SMT to
  ensure SMT safety when using this mitigation.

So, like already being done for retbleed=unret, and now also for
retbleed=ibpb, force STIBP on machines that have it, and report its SMT
vulnerability status accordingly.

 [ bp: Remove the "we" and remove "[AMD]" applicability parameter which
   doesn't work here. ]

Fixes: 3ebc170068 ("x86/bugs: Add retbleed=ibpb")
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org # 5.10, 5.15, 5.19
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Link: https://lore.kernel.org/r/20220804192201.439596-1-kim.phillips@amd.com
2022-08-08 19:12:17 +02:00
Linus Torvalds
eb5699ba31 Updates to various subsystems which I help look after. lib, ocfs2,
fatfs, autofs, squashfs, procfs, etc.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYu9BeQAKCRDdBJ7gKXxA
 jp1DAP4mjCSvAwYzXklrIt+Knv3CEY5oVVdS+pWOAOGiJpldTAD9E5/0NV+VmlD9
 kwS/13j38guulSlXRzDLmitbg81zAAI=
 =Zfum
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2022-08-06-2' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc updates from Andrew Morton:
 "Updates to various subsystems which I help look after. lib, ocfs2,
  fatfs, autofs, squashfs, procfs, etc. A relatively small amount of
  material this time"

* tag 'mm-nonmm-stable-2022-08-06-2' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (72 commits)
  scripts/gdb: ensure the absolute path is generated on initial source
  MAINTAINERS: kunit: add David Gow as a maintainer of KUnit
  mailmap: add linux.dev alias for Brendan Higgins
  mailmap: update Kirill's email
  profile: setup_profiling_timer() is moslty not implemented
  ocfs2: fix a typo in a comment
  ocfs2: use the bitmap API to simplify code
  ocfs2: remove some useless functions
  lib/mpi: fix typo 'the the' in comment
  proc: add some (hopefully) insightful comments
  bdi: remove enum wb_congested_state
  kernel/hung_task: fix address space of proc_dohung_task_timeout_secs
  lib/lzo/lzo1x_compress.c: replace ternary operator with min() and min_t()
  squashfs: support reading fragments in readahead call
  squashfs: implement readahead
  squashfs: always build "file direct" version of page actor
  Revert "squashfs: provide backing_dev_info in order to disable read-ahead"
  fs/ocfs2: Fix spelling typo in comment
  ia64: old_rr4 added under CONFIG_HUGETLB_PAGE
  proc: fix test for "vsyscall=xonly" boot option
  ...
2022-08-07 10:03:24 -07:00
Linus Torvalds
cae4199f93 powerpc updates for 6.0
- Add support for syscall stack randomization.
 
  - Add support for atomic operations to the 32 & 64-bit BPF JIT.
 
  - Full support for KASAN on 64-bit Book3E.
 
  - Add a watchdog driver for the new PowerVM hypervisor watchdog.
 
  - Add a number of new selftests for the Power10 PMU support.
 
  - Add a driver for the PowerVM Platform KeyStore.
 
  - Increase the NMI watchdog timeout during live partition migration, to avoid timeouts
    due to increased memory access latency.
 
  - Add support for using the 'linux,pci-domain' device tree property for PCI domain
    assignment.
 
  - Many other small features and fixes.
 
 Thanks to: Alexey Kardashevskiy, Andy Shevchenko, Arnd Bergmann, Athira Rajeev, Bagas
 Sanjaya, Christophe Leroy, Erhard Furtner, Fabiano Rosas, Greg Kroah-Hartman, Greg Kurz,
 Haowen Bai, Hari Bathini, Jason A. Donenfeld, Jason Wang, Jiang Jian, Joel Stanley, Juerg
 Haefliger, Kajol Jain, Kees Cook, Laurent Dufour, Madhavan Srinivasan, Masahiro Yamada,
 Maxime Bizon, Miaoqian Lin, Murilo Opsfelder Araújo, Nathan Lynch, Naveen N. Rao, Nayna
 Jain, Nicholas Piggin, Ning Qiang, Pali Rohár, Petr Mladek, Rashmica Gupta, Sachin Sant,
 Scott Cheloha, Segher Boessenkool, Stephen Rothwell, Uwe Kleine-König, Wolfram Sang, Xiu
 Jianfeng, Zhouyi Zhou.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmLuAPgTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgBPpD/9kY/T0qlOXABxlZCgtqeAjPX+2xpnY
 BF+TlsN1TS1auFcEZL2BapmVacsvOeGEFDVuZHZvZJc69Hx+gSjnjFCnZjp6n+Yz
 wt6y9w9Pu0t/sjD5vNQ46O15/dXqm6RoVI7um12j/WLMN8Ko5+x3gKAyQONjQd2/
 1kPcxVH6FUosAdnCuvIcqCX4e4IIHl2ZkitHOTXoQUvUy9oAK/mOBnwqZ6zLGUKC
 E5M+Zyt4RFGxhPs48FkX6Nq6crDGU/P0VJpDKkR/t7GHnE67Bm70gZougAPrzrgP
 nx8zoTWgDKpqDeuqK7pFcyKgNS3dKbxsN3sAfKHOWu/YnV4wMyy+7fmwagMauki7
 lXccKN6F/r+8JcMNx80Jp/dAw3ZdLceP38M3Ryf8IL6lTfkNySumUvrKJn6r1Cu1
 wvzhgyEuDawss9KHdEmXcA2i3+XVZvitaipO7JWUC8pblrP1SJMoPfIIe9zh3y3M
 pyZj0TcGJ8XaK+badvI+PW/K/KeRgXEY8HpC3wDHSoIkli3OE4jDwXn6TiZgvm3n
 k0sKL8YSmQZ8hP8QAkR+r8NQKYqLlfyPxdslK5omDPxfub5Uzk9ZV2Ep7svkaiQn
 Wqjq27Dpz8+w0XPjsQ0Tkv+ByTkOhrawOH7x9SpFLHpv9g5otcYmS79NkO/htx8C
 6LyPNx1VYn5IRA==
 =tRkm
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - Add support for syscall stack randomization

 - Add support for atomic operations to the 32 & 64-bit BPF JIT

 - Full support for KASAN on 64-bit Book3E

 - Add a watchdog driver for the new PowerVM hypervisor watchdog

 - Add a number of new selftests for the Power10 PMU support

 - Add a driver for the PowerVM Platform KeyStore

 - Increase the NMI watchdog timeout during live partition migration, to
   avoid timeouts due to increased memory access latency

 - Add support for using the 'linux,pci-domain' device tree property for
   PCI domain assignment

 - Many other small features and fixes

Thanks to Alexey Kardashevskiy, Andy Shevchenko, Arnd Bergmann, Athira
Rajeev, Bagas Sanjaya, Christophe Leroy, Erhard Furtner, Fabiano Rosas,
Greg Kroah-Hartman, Greg Kurz, Haowen Bai, Hari Bathini, Jason A.
Donenfeld, Jason Wang, Jiang Jian, Joel Stanley, Juerg Haefliger, Kajol
Jain, Kees Cook, Laurent Dufour, Madhavan Srinivasan, Masahiro Yamada,
Maxime Bizon, Miaoqian Lin, Murilo Opsfelder Araújo, Nathan Lynch,
Naveen N.  Rao, Nayna Jain, Nicholas Piggin, Ning Qiang, Pali Rohár,
Petr Mladek, Rashmica Gupta, Sachin Sant, Scott Cheloha, Segher
Boessenkool, Stephen Rothwell, Uwe Kleine-König, Wolfram Sang, Xiu
Jianfeng, and Zhouyi Zhou.

* tag 'powerpc-6.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (191 commits)
  powerpc/64e: Fix kexec build error
  EDAC/ppc_4xx: Include required of_irq header directly
  powerpc/pci: Fix PHB numbering when using opal-phbid
  powerpc/64: Init jump labels before parse_early_param()
  selftests/powerpc: Avoid GCC 12 uninitialised variable warning
  powerpc/cell/axon_msi: Fix refcount leak in setup_msi_msg_address
  powerpc/xive: Fix refcount leak in xive_get_max_prio
  powerpc/spufs: Fix refcount leak in spufs_init_isolated_loader
  powerpc/perf: Include caps feature for power10 DD1 version
  powerpc: add support for syscall stack randomization
  powerpc: Move system_call_exception() to syscall.c
  powerpc/powernv: rename remaining rng powernv_ functions to pnv_
  powerpc/powernv/kvm: Use darn for H_RANDOM on Power9
  powerpc/powernv: Avoid crashing if rng is NULL
  selftests/powerpc: Fix matrix multiply assist test
  powerpc/signal: Update comment for clarity
  powerpc: make facility_unavailable_exception 64s
  powerpc/platforms/83xx/suspend: Remove write-only global variable
  powerpc/platforms/83xx/suspend: Prevent unloading the driver
  powerpc/platforms/83xx/suspend: Reorder to get rid of a forward declaration
  ...
2022-08-06 16:38:17 -07:00
Linus Torvalds
c993e07be0 dma-mapping updates
- convert arm32 to the common dma-direct code (Arnd Bergmann, Robin Murphy,
    Christoph Hellwig)
  - restructure the PCIe peer to peer mapping support (Logan Gunthorpe)
  - allow the IOMMU code to communicate an optional DMA mapping length
    and use that in scsi and libata (John Garry)
  - split the global swiotlb lock (Tianyu Lan)
  - various fixes and cleanup (Chao Gao, Dan Carpenter, Dongli Zhang,
    Lukas Bulwahn, Robin Murphy)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmLuIYULHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYPS5A//Ty1ZNyXExmwZ6J6g7/oIvQlpAHilDr22mCd8tR8Y
 Ne7TgLa/X+usFvJTxJfkvg/LNMDjD7qx0J/mhDGm4reOFcEL4/PBy0rDSOgnmntV
 k/fPhgwnpuztiAQ+s+WkJ3pkrmG1HaEId7GGj2JaoYdas6RX2mGX7vL8uvUFepjw
 lYPAqWMtJHkOfsDK0PqqyQsr7dcC6lyFLqnn/wqvHtTJeKCfGs6W/SIrlWme2SZY
 3dNx84ZR1uPjaazAmtf2IWfjh/TBmd0ETRYycgUUKRP9iwsCkBQDBwsBGSIYXiWj
 BUKQ5oMvjAlUGRF0jYz9e77KuedE6GxWiXNQstitBmid142M37DHA5tvZRf65MPS
 THHcjTDmmoaO4YfFhhXOcFOrjG4/V8bF7fgHB6XkHDjhVVTcnIx8zuOAXIVBZvIV
 VAALmamBqEfIZZrCqgr7hzFssK2bip+TIMkdoD46Wcr+D7bAlujhuzWxubn9+ulT
 23v/pAvC80ut6LvKj6EA+GpRm/pejfOtEbjXPoO2hguNxvuUKvPQqNh9hy0q+v1e
 8n2Y/4lhy5bv02S7wKooNkfCoV753jBY1TIru45UmEYc3EkTQPii6okYe0DvW4QX
 VCnKgo156wSBfE+9eWdxCROv2SZqJFMV/wL3vw54dpJQMbDy7VkNsh4mGREdUkU1
 uek=
 =Bv19
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-5.20-2022-08-06' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping updates from Christoph Hellwig:

 - convert arm32 to the common dma-direct code (Arnd Bergmann, Robin
   Murphy, Christoph Hellwig)

 - restructure the PCIe peer to peer mapping support (Logan Gunthorpe)

 - allow the IOMMU code to communicate an optional DMA mapping length
   and use that in scsi and libata (John Garry)

 - split the global swiotlb lock (Tianyu Lan)

 - various fixes and cleanup (Chao Gao, Dan Carpenter, Dongli Zhang,
   Lukas Bulwahn, Robin Murphy)

* tag 'dma-mapping-5.20-2022-08-06' of git://git.infradead.org/users/hch/dma-mapping: (45 commits)
  swiotlb: fix passing local variable to debugfs_create_ulong()
  dma-mapping: reformat comment to suppress htmldoc warning
  PCI/P2PDMA: Remove pci_p2pdma_[un]map_sg()
  RDMA/rw: drop pci_p2pdma_[un]map_sg()
  RDMA/core: introduce ib_dma_pci_p2p_dma_supported()
  nvme-pci: convert to using dma_map_sgtable()
  nvme-pci: check DMA ops when indicating support for PCI P2PDMA
  iommu/dma: support PCI P2PDMA pages in dma-iommu map_sg
  iommu: Explicitly skip bus address marked segments in __iommu_map_sg()
  dma-mapping: add flags to dma_map_ops to indicate PCI P2PDMA support
  dma-direct: support PCI P2PDMA pages in dma-direct map_sg
  dma-mapping: allow EREMOTEIO return code for P2PDMA transfers
  PCI/P2PDMA: Introduce helpers for dma_map_sg implementations
  PCI/P2PDMA: Attempt to set map_type if it has not been set
  lib/scatterlist: add flag for indicating P2PDMA segments in an SGL
  swiotlb: clean up some coding style and minor issues
  dma-mapping: update comment after dmabounce removal
  scsi: sd: Add a comment about limiting max_sectors to shost optimal limit
  ata: libata-scsi: cap ata_device->max_sectors according to shost->max_sectors
  scsi: scsi_transport_sas: cap shost opt_sectors according to DMA optimal limit
  ...
2022-08-06 10:56:45 -07:00
Linus Torvalds
1d239c1eb8 IOMMU Updates for Linux v5.20/v6.0:
Including:
 
 	- Most intrusive patch is small and changes the default
 	  allocation policy for DMA addresses. Before the change the
 	  allocator tried its best to find an address in the first 4GB.
 	  But that lead to performance problems when that space gets
 	  exhaused, and since most devices are capable of 64-bit DMA
 	  these days, we changed it to search in the full DMA-mask
 	  range from the beginning.  This change has the potential to
 	  uncover bugs elsewhere, in the kernel or the hardware. There
 	  is a Kconfig option and a command line option to restore the
 	  old behavior, but none of them is enabled by default.
 
 	- Add Robin Murphy as reviewer of IOMMU code and maintainer for
 	  the dma-iommu and iova code
 
 	- Chaning IOVA magazine size from 1032 to 1024 bytes to save
 	  memory
 
 	- Some core code cleanups and dead-code removal
 
 	- Support for ACPI IORT RMR node
 
 	- Support for multiple PCI domains in the AMD-Vi driver
 
 	- ARM SMMU changes from Will Deacon:
 
 	  - Add even more Qualcomm device-tree compatible strings
 
 	  - Support dumping of IMP DEF Qualcomm registers on TLB sync
 	    timeout
 
 	  - Fix reference count leak on device tree node in Qualcomm
 	    driver
 
 	- Intel VT-d driver updates from Lu Baolu:
 
 	  - Make intel-iommu.h private
 
 	  - Optimize the use of two locks
 
 	  - Extend the driver to support large-scale platforms
 
 	  - Cleanup some dead code
 
 	- MediaTek IOMMU refactoring and support for TTBR up to 35bit
 
 	- Basic support for Exynos SysMMU v7
 
 	- VirtIO IOMMU driver gets a map/unmap_pages() implementation
 
 	- Other smaller cleanups and fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmLs3DIACgkQK/BELZcB
 GuMizhAAguAnLLOkOLlR9/MhrTZfNXCUX+bfrEIevjFXMw4iPNfCCr4ydQ7EdVK6
 ZA/3Z89huYl0d0x/FELolnQi+HOeqYrfTDe4rB7TgNgwZnWa+fdHcyYkgBGyfPaV
 ilgjNcx8o//9o4NasyB6kU395jVmFxb735gMTTb+tcO9fr+/qIB6hxrHuCklxrNr
 C7wK6kkoDPi5n0QuXCSjXEx2Hk245pAWKPLwqxsUYzHGlLfl7ULOxw65BUBGvn/H
 uCsTfJFu7u+ErwQYf0qPuOwRBnRdsx9g5EAnfab8p074SoKWvbNnftIxgIRp8ZEM
 YgCbhYa1GOFI4r+XzqRzEbc0/vPSttims4Jqz0KxYs7pr5EoVifrWLJFjJdCdc2h
 Tio1gTvOq8HbH63kwYNKJhg4iSC6zVd37ihEhvfFO6LcgFl4iCfd2o9zK7oY40J4
 XoOxofVnJ2e3tzdhZ/n5quCXiudHixm6WuVa7QYKscF7Ud0tY1wWKuibdlMQTeNM
 68MvtlteKcfs1BrWzZyrFMrFeAfIY8LI82y6jdJuoNMU5LE9+5yelXBdJhnVygZ+
 Jglv1TIt6W/z1H5JgXtNVZ1wWgBm7rurOqNyfN8XCd8eP1z321CLfX8ujkhKrIWP
 ApG15cwvpnh1JX630+UFiEikTGU0fb2orMdPwYmwuu8DAsoLVHE=
 =hI2K
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v5.20-or-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu updates from Joerg Roedel:

 - The most intrusive patch is small and changes the default allocation
   policy for DMA addresses.

   Before the change the allocator tried its best to find an address in
   the first 4GB. But that lead to performance problems when that space
   gets exhaused, and since most devices are capable of 64-bit DMA these
   days, we changed it to search in the full DMA-mask range from the
   beginning.

   This change has the potential to uncover bugs elsewhere, in the
   kernel or the hardware. There is a Kconfig option and a command line
   option to restore the old behavior, but none of them is enabled by
   default.

 - Add Robin Murphy as reviewer of IOMMU code and maintainer for the
   dma-iommu and iova code

 - Chaning IOVA magazine size from 1032 to 1024 bytes to save memory

 - Some core code cleanups and dead-code removal

 - Support for ACPI IORT RMR node

 - Support for multiple PCI domains in the AMD-Vi driver

 - ARM SMMU changes from Will Deacon:
      - Add even more Qualcomm device-tree compatible strings
      - Support dumping of IMP DEF Qualcomm registers on TLB sync
        timeout
      - Fix reference count leak on device tree node in Qualcomm driver

 - Intel VT-d driver updates from Lu Baolu:
      - Make intel-iommu.h private
      - Optimize the use of two locks
      - Extend the driver to support large-scale platforms
      - Cleanup some dead code

 - MediaTek IOMMU refactoring and support for TTBR up to 35bit

 - Basic support for Exynos SysMMU v7

 - VirtIO IOMMU driver gets a map/unmap_pages() implementation

 - Other smaller cleanups and fixes

* tag 'iommu-updates-v5.20-or-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (116 commits)
  iommu/amd: Fix compile warning in init code
  iommu/amd: Add support for AVIC when SNP is enabled
  iommu/amd: Simplify and Consolidate Virtual APIC (AVIC) Enablement
  ACPI/IORT: Fix build error implicit-function-declaration
  drivers: iommu: fix clang -wformat warning
  iommu/arm-smmu: qcom_iommu: Add of_node_put() when breaking out of loop
  iommu/arm-smmu-qcom: Add SM6375 SMMU compatible
  dt-bindings: arm-smmu: Add compatible for Qualcomm SM6375
  MAINTAINERS: Add Robin Murphy as IOMMU SUBSYTEM reviewer
  iommu/amd: Do not support IOMMUv2 APIs when SNP is enabled
  iommu/amd: Do not support IOMMU_DOMAIN_IDENTITY after SNP is enabled
  iommu/amd: Set translation valid bit only when IO page tables are in use
  iommu/amd: Introduce function to check and enable SNP
  iommu/amd: Globally detect SNP support
  iommu/amd: Process all IVHDs before enabling IOMMU features
  iommu/amd: Introduce global variable for storing common EFR and EFR2
  iommu/amd: Introduce Support for Extended Feature 2 Register
  iommu/amd: Change macro for IOMMU control register bit shift to decimal value
  iommu/exynos: Enable default VM instance on SysMMU v7
  iommu/exynos: Add SysMMU v7 register set
  ...
2022-08-06 10:42:38 -07:00
Linus Torvalds
6614a3c316 - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe
Lin, Yang Shi, Anshuman Khandual and Mike Rapoport
 
 - Some kmemleak fixes from Patrick Wang and Waiman Long
 
 - DAMON updates from SeongJae Park
 
 - memcg debug/visibility work from Roman Gushchin
 
 - vmalloc speedup from Uladzislau Rezki
 
 - more folio conversion work from Matthew Wilcox
 
 - enhancements for coherent device memory mapping from Alex Sierra
 
 - addition of shared pages tracking and CoW support for fsdax, from
   Shiyang Ruan
 
 - hugetlb optimizations from Mike Kravetz
 
 - Mel Gorman has contributed some pagealloc changes to improve latency
   and realtime behaviour.
 
 - mprotect soft-dirty checking has been improved by Peter Xu
 
 - Many other singleton patches all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYuravgAKCRDdBJ7gKXxA
 jpqSAQDrXSdII+ht9kSHlaCVYjqRFQz/rRvURQrWQV74f6aeiAD+NHHeDPwZn11/
 SPktqEUrF1pxnGQxqLh1kUFUhsVZQgE=
 =w/UH
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:
 "Most of the MM queue. A few things are still pending.

  Liam's maple tree rework didn't make it. This has resulted in a few
  other minor patch series being held over for next time.

  Multi-gen LRU still isn't merged as we were waiting for mapletree to
  stabilize. The current plan is to merge MGLRU into -mm soon and to
  later reintroduce mapletree, with a view to hopefully getting both
  into 6.1-rc1.

  Summary:

   - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe
     Lin, Yang Shi, Anshuman Khandual and Mike Rapoport

   - Some kmemleak fixes from Patrick Wang and Waiman Long

   - DAMON updates from SeongJae Park

   - memcg debug/visibility work from Roman Gushchin

   - vmalloc speedup from Uladzislau Rezki

   - more folio conversion work from Matthew Wilcox

   - enhancements for coherent device memory mapping from Alex Sierra

   - addition of shared pages tracking and CoW support for fsdax, from
     Shiyang Ruan

   - hugetlb optimizations from Mike Kravetz

   - Mel Gorman has contributed some pagealloc changes to improve
     latency and realtime behaviour.

   - mprotect soft-dirty checking has been improved by Peter Xu

   - Many other singleton patches all over the place"

 [ XFS merge from hell as per Darrick Wong in

   https://lore.kernel.org/all/YshKnxb4VwXycPO8@magnolia/ ]

* tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (282 commits)
  tools/testing/selftests/vm/hmm-tests.c: fix build
  mm: Kconfig: fix typo
  mm: memory-failure: convert to pr_fmt()
  mm: use is_zone_movable_page() helper
  hugetlbfs: fix inaccurate comment in hugetlbfs_statfs()
  hugetlbfs: cleanup some comments in inode.c
  hugetlbfs: remove unneeded header file
  hugetlbfs: remove unneeded hugetlbfs_ops forward declaration
  hugetlbfs: use helper macro SZ_1{K,M}
  mm: cleanup is_highmem()
  mm/hmm: add a test for cross device private faults
  selftests: add soft-dirty into run_vmtests.sh
  selftests: soft-dirty: add test for mprotect
  mm/mprotect: fix soft-dirty check in can_change_pte_writable()
  mm: memcontrol: fix potential oom_lock recursion deadlock
  mm/gup.c: fix formatting in check_and_migrate_movable_page()
  xfs: fail dax mount if reflink is enabled on a partition
  mm/memcontrol.c: remove the redundant updating of stats_flush_threshold
  userfaultfd: don't fail on unrecognized features
  hugetlb_cgroup: fix wrong hugetlb cgroup numa stat
  ...
2022-08-05 16:32:45 -07:00
Linus Torvalds
7c5c3a6177 ARM:
* Unwinder implementations for both nVHE modes (classic and
   protected), complete with an overflow stack
 
 * Rework of the sysreg access from userspace, with a complete
   rewrite of the vgic-v3 view to allign with the rest of the
   infrastructure
 
 * Disagregation of the vcpu flags in separate sets to better track
   their use model.
 
 * A fix for the GICv2-on-v3 selftest
 
 * A small set of cosmetic fixes
 
 RISC-V:
 
 * Track ISA extensions used by Guest using bitmap
 
 * Added system instruction emulation framework
 
 * Added CSR emulation framework
 
 * Added gfp_custom flag in struct kvm_mmu_memory_cache
 
 * Added G-stage ioremap() and iounmap() functions
 
 * Added support for Svpbmt inside Guest
 
 s390:
 
 * add an interface to provide a hypervisor dump for secure guests
 
 * improve selftests to use TAP interface
 
 * enable interpretive execution of zPCI instructions (for PCI passthrough)
 
 * First part of deferred teardown
 
 * CPU Topology
 
 * PV attestation
 
 * Minor fixes
 
 x86:
 
 * Permit guests to ignore single-bit ECC errors
 
 * Intel IPI virtualization
 
 * Allow getting/setting pending triple fault with KVM_GET/SET_VCPU_EVENTS
 
 * PEBS virtualization
 
 * Simplify PMU emulation by just using PERF_TYPE_RAW events
 
 * More accurate event reinjection on SVM (avoid retrying instructions)
 
 * Allow getting/setting the state of the speaker port data bit
 
 * Refuse starting the kvm-intel module if VM-Entry/VM-Exit controls are inconsistent
 
 * "Notify" VM exit (detect microarchitectural hangs) for Intel
 
 * Use try_cmpxchg64 instead of cmpxchg64
 
 * Ignore benign host accesses to PMU MSRs when PMU is disabled
 
 * Allow disabling KVM's "MONITOR/MWAIT are NOPs!" behavior
 
 * Allow NX huge page mitigation to be disabled on a per-vm basis
 
 * Port eager page splitting to shadow MMU as well
 
 * Enable CMCI capability by default and handle injected UCNA errors
 
 * Expose pid of vcpu threads in debugfs
 
 * x2AVIC support for AMD
 
 * cleanup PIO emulation
 
 * Fixes for LLDT/LTR emulation
 
 * Don't require refcounted "struct page" to create huge SPTEs
 
 * Miscellaneous cleanups:
 ** MCE MSR emulation
 ** Use separate namespaces for guest PTEs and shadow PTEs bitmasks
 ** PIO emulation
 ** Reorganize rmap API, mostly around rmap destruction
 ** Do not workaround very old KVM bugs for L0 that runs with nesting enabled
 ** new selftests API for CPUID
 
 Generic:
 
 * Fix races in gfn->pfn cache refresh; do not pin pages tracked by the cache
 
 * new selftests API using struct kvm_vcpu instead of a (vm, id) tuple
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmLnyo4UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMtQQf/XjVWiRcWLPR9dqzRM/vvRXpiG+UL
 jU93R7m6ma99aqTtrxV/AE+kHgamBlma3Cwo+AcWk9uCVNbIhFjv2YKg6HptKU0e
 oJT3zRYp+XIjEo7Kfw+TwroZbTlG6gN83l1oBLFMqiFmHsMLnXSI2mm8MXyi3dNB
 vR2uIcTAl58KIprqNNsYJ2dNn74ogOMiXYx9XzoA9/5Xb6c0h4rreHJa5t+0s9RO
 Gz7Io3PxumgsbJngjyL1Ve5oxhlIAcZA8DU0PQmjxo3eS+k6BcmavGFd45gNL5zg
 iLpCh4k86spmzh8CWkAAwWPQE4dZknK6jTctJc0OFVad3Z7+X7n0E8TFrA==
 =PM8o
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "Quite a large pull request due to a selftest API overhaul and some
  patches that had come in too late for 5.19.

  ARM:

   - Unwinder implementations for both nVHE modes (classic and
     protected), complete with an overflow stack

   - Rework of the sysreg access from userspace, with a complete rewrite
     of the vgic-v3 view to allign with the rest of the infrastructure

   - Disagregation of the vcpu flags in separate sets to better track
     their use model.

   - A fix for the GICv2-on-v3 selftest

   - A small set of cosmetic fixes

  RISC-V:

   - Track ISA extensions used by Guest using bitmap

   - Added system instruction emulation framework

   - Added CSR emulation framework

   - Added gfp_custom flag in struct kvm_mmu_memory_cache

   - Added G-stage ioremap() and iounmap() functions

   - Added support for Svpbmt inside Guest

  s390:

   - add an interface to provide a hypervisor dump for secure guests

   - improve selftests to use TAP interface

   - enable interpretive execution of zPCI instructions (for PCI
     passthrough)

   - First part of deferred teardown

   - CPU Topology

   - PV attestation

   - Minor fixes

  x86:

   - Permit guests to ignore single-bit ECC errors

   - Intel IPI virtualization

   - Allow getting/setting pending triple fault with
     KVM_GET/SET_VCPU_EVENTS

   - PEBS virtualization

   - Simplify PMU emulation by just using PERF_TYPE_RAW events

   - More accurate event reinjection on SVM (avoid retrying
     instructions)

   - Allow getting/setting the state of the speaker port data bit

   - Refuse starting the kvm-intel module if VM-Entry/VM-Exit controls
     are inconsistent

   - "Notify" VM exit (detect microarchitectural hangs) for Intel

   - Use try_cmpxchg64 instead of cmpxchg64

   - Ignore benign host accesses to PMU MSRs when PMU is disabled

   - Allow disabling KVM's "MONITOR/MWAIT are NOPs!" behavior

   - Allow NX huge page mitigation to be disabled on a per-vm basis

   - Port eager page splitting to shadow MMU as well

   - Enable CMCI capability by default and handle injected UCNA errors

   - Expose pid of vcpu threads in debugfs

   - x2AVIC support for AMD

   - cleanup PIO emulation

   - Fixes for LLDT/LTR emulation

   - Don't require refcounted "struct page" to create huge SPTEs

   - Miscellaneous cleanups:
      - MCE MSR emulation
      - Use separate namespaces for guest PTEs and shadow PTEs bitmasks
      - PIO emulation
      - Reorganize rmap API, mostly around rmap destruction
      - Do not workaround very old KVM bugs for L0 that runs with nesting enabled
      - new selftests API for CPUID

  Generic:

   - Fix races in gfn->pfn cache refresh; do not pin pages tracked by
     the cache

   - new selftests API using struct kvm_vcpu instead of a (vm, id)
     tuple"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (606 commits)
  selftests: kvm: set rax before vmcall
  selftests: KVM: Add exponent check for boolean stats
  selftests: KVM: Provide descriptive assertions in kvm_binary_stats_test
  selftests: KVM: Check stat name before other fields
  KVM: x86/mmu: remove unused variable
  RISC-V: KVM: Add support for Svpbmt inside Guest/VM
  RISC-V: KVM: Use PAGE_KERNEL_IO in kvm_riscv_gstage_ioremap()
  RISC-V: KVM: Add G-stage ioremap() and iounmap() functions
  KVM: Add gfp_custom flag in struct kvm_mmu_memory_cache
  RISC-V: KVM: Add extensible CSR emulation framework
  RISC-V: KVM: Add extensible system instruction emulation framework
  RISC-V: KVM: Factor-out instruction emulation into separate sources
  RISC-V: KVM: move preempt_disable() call in kvm_arch_vcpu_ioctl_run
  RISC-V: KVM: Make kvm_riscv_guest_timer_init a void function
  RISC-V: KVM: Fix variable spelling mistake
  RISC-V: KVM: Improve ISA extension by using a bitmap
  KVM, x86/mmu: Fix the comment around kvm_tdp_mmu_zap_leafs()
  KVM: SVM: Dump Virtual Machine Save Area (VMSA) to klog
  KVM: x86/mmu: Treat NX as a valid SPTE bit for NPT
  KVM: x86: Do not block APIC write for non ICR registers
  ...
2022-08-04 14:59:54 -07:00
Linus Torvalds
12b68040a5 media updates for v5.20-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+QmuaPwR3wnBdVwACF8+vY7k4RUFAmLpNF4ACgkQCF8+vY7k
 4RX6eQ//cYDxEbyeCbjfQH7ePDYCK8vW5EbjqBIcdFWV8rv4QdSwQbM7q4ySnxd/
 nOWulm1lgpGrV3bPXE1nQ+RH7T2dQDeJobWcyZd2F2gGlnf515GYIOOnAGWnmM2Z
 qgDhYKVVf3cleMS44n3T2kRVIFO39MmheuucboR8PCBPaT4BoRVIEee0xZIrbsBx
 /ggFXSM/ywA456g09cQbhGLz48YBGshwvbN9vhrdMNcJ74cPQ+sBPaT6lRwxEBYT
 soaDwohH7jLGqIHxccd9qljss31q1glQrK/Szwr/kXe2d4Es0L1ctDK3oP0B9Bcv
 9P6xgXIbMM3cVK57ITXsTKfBFDrxLILy+8bMHWGgRTuZVjoiybRvoDKOKqckyTfu
 CVsHgxkasSPr/M+WoRUX2zFTOB0Q4CTTS6A6Na+sZUGgX5dBe7Y0VDRCzskL3Q+8
 KvlAzA6sF1qyI/Lm4+CprodJ8Ae4qbe0HzkWkaHzsSJfRsazzvzX391T4wCFpBOC
 QW3Ac7r27ZVqrIraDxGuOrEf/NRLTLpDskjnAdSERlrxtU6oQqwgRajBXU46cEOx
 GGV/fgwAbFXGHjN9tmRAgsJZlKBTd1wLGKWU3YUG5Emwo7ycHoLDj8+NilxyzC26
 rnzK5oOaxBm9rnjoxhsz6HWN36mlY0h3bMwS8byclv6USF+ZQZk=
 =AUYy
 -----END PGP SIGNATURE-----

Merge tag 'media/v5.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media updates from Mauro Carvalho Chehab:

 - New driver for Semi AR0521 sensor

 - rkisp1 CSI code was split into a separate file

 - sun6i has gained support for the A31 MIPI CSI-2 controller

 - sun8i has gained support for the A83T MIPI CSI-2 controller

 - vimc driver got support for virtual lens

 - HEVC uAPI has gained its final form and got added to public headers

 - hantro and cedrus got updates on H-265 support

 - stkwebcam was promoted from staging

 - atomisp staging driver got cleanups on its hmm and kmap related logic

 - ov5640 gained support for more modes and got some rework

 - imx7-media-csi staging driver got several improvements related to mc
   API support

 - uvcvideo now handles better power line control

 - mediatec vcodec gained support for new hardware and got some codec
   updates

 - Lots of other bug fixes, improvements and cleanups.

* tag 'media/v5.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (446 commits)
  media: hantro: Remove dedicated control documentation
  hantro: Remove incorrect HEVC SPS validation
  media: cedrus: hevc: Add check for invalid timestamp
  media: sunxi: sun6i_mipi_csi2.c/sun8i_a83t_mipi_csi2.c: clarify error handling
  media: uvcvideo: Fix invalid pointer in uvc_ctrl_init_ctrl()
  media: Documentation: mc-core: Fix typo
  media: videodev2.h.rst.exceptions: add missing exceptions
  media: vimc: wrong pointer is used with PTR_ERR
  media: rkisp1: debug: Add dump file in debugfs for MI main path registers
  media: rkisp1: Make the internal CSI-2 receiver optional
  media: rkisp1: Add infrastructure to support ISP features
  media: rkisp1: Support the ISP parallel input
  media: dt-bindings: media: rkisp1: Add port for parallel interface
  media: rkisp1: Use fwnode_graph_for_each_endpoint
  media: rkisp1: csi: Plumb the CSI RX subdev
  media: rkisp1: csi: Implement a V4L2 subdev for the CSI receiver
  media: rkisp1: isp: Disallow multiple active sources
  media: rkisp1: isp: Rename rkisp1_get_remote_source()
  media: rkisp1: isp: Constify various local variables
  media: rkisp1: isp: Fix whitespace issues
  ...
2022-08-03 19:29:28 -07:00
Linus Torvalds
f86d1fbbe7 Networking changes for 6.0.
Core
 ----
 
  - Refactor the forward memory allocation to better cope with memory
    pressure with many open sockets, moving from a per socket cache to
    a per-CPU one
 
  - Replace rwlocks with RCU for better fairness in ping, raw sockets
    and IP multicast router.
 
  - Network-side support for IO uring zero-copy send.
 
  - A few skb drop reason improvements, including codegen the source file
    with string mapping instead of using macro magic.
 
  - Rename reference tracking helpers to a more consistent
    netdev_* schema.
 
  - Adapt u64_stats_t type to address load/store tearing issues.
 
  - Refine debug helper usage to reduce the log noise caused by bots.
 
 BPF
 ---
  - Improve socket map performance, avoiding skb cloning on read
    operation.
 
  - Add support for 64 bits enum, to match types exposed by kernel.
 
  - Introduce support for sleepable uprobes program.
 
  - Introduce support for enum textual representation in libbpf.
 
  - New helpers to implement synproxy with eBPF/XDP.
 
  - Improve loop performances, inlining indirect calls when
    possible.
 
  - Removed all the deprecated libbpf APIs.
 
  - Implement new eBPF-based LSM flavor.
 
  - Add type match support, which allow accurate queries to the
    eBPF used types.
 
  - A few TCP congetsion control framework usability improvements.
 
  - Add new infrastructure to manipulate CT entries via eBPF programs.
 
  - Allow for livepatch (KLP) and BPF trampolines to attach to the same
    kernel function.
 
 Protocols
 ---------
 
  - Introduce per network namespace lookup tables for unix sockets,
    increasing scalability and reducing contention.
 
  - Preparation work for Wi-Fi 7 Multi-Link Operation (MLO) support.
 
  - Add support to forciby close TIME_WAIT TCP sockets via user-space
    tools.
 
  - Significant performance improvement for the TLS 1.3 receive path,
    both for zero-copy and not-zero-copy.
 
  - Support for changing the initial MTPCP subflow priority/backup
    status
 
  - Introduce virtually contingus buffers for sockets over RDMA,
    to cope better with memory pressure.
 
  - Extend CAN ethtool support with timestamping capabilities
 
  - Refactor CAN build infrastructure to allow building only the needed
    features.
 
 Driver API
 ----------
 
  - Remove devlink mutex to allow parallel commands on multiple links.
 
  - Add support for pause stats in distributed switch.
 
  - Implement devlink helpers to query and flash line cards.
 
  - New helper for phy mode to register conversion.
 
 New hardware / drivers
 ----------------------
 
  - Ethernet DSA driver for the rockchip mt7531 on BPI-R2 Pro.
 
  - Ethernet DSA driver for the Renesas RZ/N1 A5PSW switch.
 
  - Ethernet DSA driver for the Microchip LAN937x switch.
 
  - Ethernet PHY driver for the Aquantia AQR113C EPHY.
 
  - CAN driver for the OBD-II ELM327 interface.
 
  - CAN driver for RZ/N1 SJA1000 CAN controller.
 
  - Bluetooth: Infineon CYW55572 Wi-Fi plus Bluetooth combo device.
 
 Drivers
 -------
 
  - Intel Ethernet NICs:
    - i40e: add support for vlan pruning
    - i40e: add support for XDP framented packets
    - ice: improved vlan offload support
    - ice: add support for PPPoE offload
 
  - Mellanox Ethernet (mlx5)
    - refactor packet steering offload for performance and scalability
    - extend support for TC offload
    - refactor devlink code to clean-up the locking schema
    - support stacked vlans for bridge offloads
    - use TLS objects pool to improve connection rate
 
  - Netronome Ethernet NICs (nfp):
    - extend support for IPv6 fields mangling offload
    - add support for vepa mode in HW bridge
    - better support for virtio data path acceleration (VDPA)
    - enable TSO by default
 
  - Microsoft vNIC driver (mana)
    - add support for XDP redirect
 
  - Others Ethernet drivers:
    - bonding: add per-port priority support
    - microchip lan743x: extend phy support
    - Fungible funeth: support UDP segmentation offload and XDP xmit
    - Solarflare EF100: add support for virtual function representors
    - MediaTek SoC: add XDP support
 
  - Mellanox Ethernet/IB switch (mlxsw):
    - dropped support for unreleased H/W (XM router).
    - improved stats accuracy
    - unified bridge model coversion improving scalability
      (parts 1-6)
    - support for PTP in Spectrum-2 asics
 
  - Broadcom PHYs
    - add PTP support for BCM54210E
    - add support for the BCM53128 internal PHY
 
  - Marvell Ethernet switches (prestera):
    - implement support for multicast forwarding offload
 
  - Embedded Ethernet switches:
    - refactor OcteonTx MAC filter for better scalability
    - improve TC H/W offload for the Felix driver
    - refactor the Microchip ksz8 and ksz9477 drivers to share
      the probe code (parts 1, 2), add support for phylink
      mac configuration
 
  - Other WiFi:
    - Microchip wilc1000: diable WEP support and enable WPA3
    - Atheros ath10k: encapsulation offload support
 
 Old code removal:
 
  - Neterion vxge ethernet driver: this is untouched since more than
    10 years.
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmLqN+oSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkB9kQAI9VqW0c3SfiTJnkVBEIovZ6Tnh5stD2
 UYFkh1BdchLsYxi7W4XMpVPSzRztiTP87mIx5c/KvIzj+QNeWL1XWRJSPdI9HhTD
 pTAA/tM2OG7bqrbyQiKDNfpQdNl7+kk1RwnYd+f9RFl1QVuIJaYhmjVwrsN5xF/+
 jUsotpROarM2dGFWiFwJbKhP2zMDT+6qEEahM8pEPggKhv8wRLYjany2cZVEe4e0
 WGUpbINAS8gEKm0Ob922WaDfDrcK/N1Z0jNz/kMaENkK18Vvc7F6bCO0DzAawKX9
 QZMMwm6mHp3EThflJAMAzCGIYiIcwLhykgdyj8rrjPhFrWbMD2Sdsbo21HOXU/8j
 u4aAhVl+d+h7emmbgBoJ8sycVJ7BQlXz7lX20sTgADv9xI4/dPhQ17CMRuwX6fXX
 JSrn6P6e1LTV5CEg6vrlSPnKPY6uhFn/cPw47FxCjRwJ9phVnp+8uZWQmf9Pz3yf
 Ok/tcj+juFbsmuOshHy2cbRkuNZNS0oRWlSTBo5795ZwOLSakMonR3L+ev2aOvzz
 DVrFp2Y/iIVwMSFdCbouYdYnhArPRhOAtCmZc2afY8aBN7aaMgrdTy3+mzUoHy3I
 FG3K+VuKpfi0vY4zn6ZoLZDIpyXIoJJ93RcSGltD32t3Dp1RaQMVEI4s45k05PVm
 1nYpXKHA8qML
 =hxEG
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking changes from Paolo Abeni:
 "Core:

   - Refactor the forward memory allocation to better cope with memory
     pressure with many open sockets, moving from a per socket cache to
     a per-CPU one

   - Replace rwlocks with RCU for better fairness in ping, raw sockets
     and IP multicast router.

   - Network-side support for IO uring zero-copy send.

   - A few skb drop reason improvements, including codegen the source
     file with string mapping instead of using macro magic.

   - Rename reference tracking helpers to a more consistent netdev_*
     schema.

   - Adapt u64_stats_t type to address load/store tearing issues.

   - Refine debug helper usage to reduce the log noise caused by bots.

  BPF:

   - Improve socket map performance, avoiding skb cloning on read
     operation.

   - Add support for 64 bits enum, to match types exposed by kernel.

   - Introduce support for sleepable uprobes program.

   - Introduce support for enum textual representation in libbpf.

   - New helpers to implement synproxy with eBPF/XDP.

   - Improve loop performances, inlining indirect calls when possible.

   - Removed all the deprecated libbpf APIs.

   - Implement new eBPF-based LSM flavor.

   - Add type match support, which allow accurate queries to the eBPF
     used types.

   - A few TCP congetsion control framework usability improvements.

   - Add new infrastructure to manipulate CT entries via eBPF programs.

   - Allow for livepatch (KLP) and BPF trampolines to attach to the same
     kernel function.

  Protocols:

   - Introduce per network namespace lookup tables for unix sockets,
     increasing scalability and reducing contention.

   - Preparation work for Wi-Fi 7 Multi-Link Operation (MLO) support.

   - Add support to forciby close TIME_WAIT TCP sockets via user-space
     tools.

   - Significant performance improvement for the TLS 1.3 receive path,
     both for zero-copy and not-zero-copy.

   - Support for changing the initial MTPCP subflow priority/backup
     status

   - Introduce virtually contingus buffers for sockets over RDMA, to
     cope better with memory pressure.

   - Extend CAN ethtool support with timestamping capabilities

   - Refactor CAN build infrastructure to allow building only the needed
     features.

  Driver API:

   - Remove devlink mutex to allow parallel commands on multiple links.

   - Add support for pause stats in distributed switch.

   - Implement devlink helpers to query and flash line cards.

   - New helper for phy mode to register conversion.

  New hardware / drivers:

   - Ethernet DSA driver for the rockchip mt7531 on BPI-R2 Pro.

   - Ethernet DSA driver for the Renesas RZ/N1 A5PSW switch.

   - Ethernet DSA driver for the Microchip LAN937x switch.

   - Ethernet PHY driver for the Aquantia AQR113C EPHY.

   - CAN driver for the OBD-II ELM327 interface.

   - CAN driver for RZ/N1 SJA1000 CAN controller.

   - Bluetooth: Infineon CYW55572 Wi-Fi plus Bluetooth combo device.

  Drivers:

   - Intel Ethernet NICs:
      - i40e: add support for vlan pruning
      - i40e: add support for XDP framented packets
      - ice: improved vlan offload support
      - ice: add support for PPPoE offload

   - Mellanox Ethernet (mlx5)
      - refactor packet steering offload for performance and scalability
      - extend support for TC offload
      - refactor devlink code to clean-up the locking schema
      - support stacked vlans for bridge offloads
      - use TLS objects pool to improve connection rate

   - Netronome Ethernet NICs (nfp):
      - extend support for IPv6 fields mangling offload
      - add support for vepa mode in HW bridge
      - better support for virtio data path acceleration (VDPA)
      - enable TSO by default

   - Microsoft vNIC driver (mana)
      - add support for XDP redirect

   - Others Ethernet drivers:
      - bonding: add per-port priority support
      - microchip lan743x: extend phy support
      - Fungible funeth: support UDP segmentation offload and XDP xmit
      - Solarflare EF100: add support for virtual function representors
      - MediaTek SoC: add XDP support

   - Mellanox Ethernet/IB switch (mlxsw):
      - dropped support for unreleased H/W (XM router).
      - improved stats accuracy
      - unified bridge model coversion improving scalability (parts 1-6)
      - support for PTP in Spectrum-2 asics

   - Broadcom PHYs
      - add PTP support for BCM54210E
      - add support for the BCM53128 internal PHY

   - Marvell Ethernet switches (prestera):
      - implement support for multicast forwarding offload

   - Embedded Ethernet switches:
      - refactor OcteonTx MAC filter for better scalability
      - improve TC H/W offload for the Felix driver
      - refactor the Microchip ksz8 and ksz9477 drivers to share the
        probe code (parts 1, 2), add support for phylink mac
        configuration

   - Other WiFi:
      - Microchip wilc1000: diable WEP support and enable WPA3
      - Atheros ath10k: encapsulation offload support

  Old code removal:

   - Neterion vxge ethernet driver: this is untouched since more than 10 years"

* tag 'net-next-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1890 commits)
  doc: sfp-phylink: Fix a broken reference
  wireguard: selftests: support UML
  wireguard: allowedips: don't corrupt stack when detecting overflow
  wireguard: selftests: update config fragments
  wireguard: ratelimiter: use hrtimer in selftest
  net/mlx5e: xsk: Discard unaligned XSK frames on striding RQ
  net: usb: ax88179_178a: Bind only to vendor-specific interface
  selftests: net: fix IOAM test skip return code
  net: usb: make USB_RTL8153_ECM non user configurable
  net: marvell: prestera: remove reduntant code
  octeontx2-pf: Reduce minimum mtu size to 60
  net: devlink: Fix missing mutex_unlock() call
  net/tls: Remove redundant workqueue flush before destroy
  net: txgbe: Fix an error handling path in txgbe_probe()
  net: dsa: Fix spelling mistakes and cleanup code
  Documentation: devlink: add add devlink-selftests to the table of contents
  dccp: put dccp_qpolicy_full() and dccp_qpolicy_push() in the same lock
  net: ionic: fix error check for vlan flags in ionic_set_nic_features()
  net: ice: fix error NETIF_F_HW_VLAN_CTAG_FILTER check in ice_vsi_sync_fltr()
  nfp: flower: add support for tunnel offload without key ID
  ...
2022-08-03 16:29:08 -07:00
Linus Torvalds
f00654007f Folio changes for 6.0
- Fix an accounting bug that made NR_FILE_DIRTY grow without limit
    when running xfstests
 
  - Convert more of mpage to use folios
 
  - Remove add_to_page_cache() and add_to_page_cache_locked()
 
  - Convert find_get_pages_range() to filemap_get_folios()
 
  - Improvements to the read_cache_page() family of functions
 
  - Remove a few unnecessary checks of PageError
 
  - Some straightforward filesystem conversions to use folios
 
  - Split PageMovable users out from address_space_operations into their
    own movable_operations
 
  - Convert aops->migratepage to aops->migrate_folio
 
  - Remove nobh support (Christoph Hellwig)
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAmLpViQACgkQDpNsjXcp
 gj5pBgf/f3+K7Hi3qw7aYQCYJQ7IA/bLyE/DLWI59kuiao6wDSve40B9YH9X++Ha
 mRLp55bkQS+bwS2xa4jlqrIDJzAfNoWlXaXZHUXGL1C/52ChTF6jaH2cvO9PVlDS
 7fLv1hy2LwiIdzpKJkUW7T+kcQGj3QLKqtQ4x8zD0LGMg055yvt/qndHSUi41nWT
 /58+6W8Sk4vvRgkpeChFzF1lGLy00+FGT8y5V2kM9uRliFQ7XPCwqB2a3e5jbW6z
 C1NXQmRnopCrnOT1TFIhK3DyX6MDIWV5qcikNAmCKFb9fQFPmjDLPt9iSoMGjw2M
 Z+UVhJCaU3ISccd0DG5Ra/vzs9/O9Q==
 =DgUi
 -----END PGP SIGNATURE-----

Merge tag 'folio-6.0' of git://git.infradead.org/users/willy/pagecache

Pull folio updates from Matthew Wilcox:

 - Fix an accounting bug that made NR_FILE_DIRTY grow without limit
   when running xfstests

 - Convert more of mpage to use folios

 - Remove add_to_page_cache() and add_to_page_cache_locked()

 - Convert find_get_pages_range() to filemap_get_folios()

 - Improvements to the read_cache_page() family of functions

 - Remove a few unnecessary checks of PageError

 - Some straightforward filesystem conversions to use folios

 - Split PageMovable users out from address_space_operations into
   their own movable_operations

 - Convert aops->migratepage to aops->migrate_folio

 - Remove nobh support (Christoph Hellwig)

* tag 'folio-6.0' of git://git.infradead.org/users/willy/pagecache: (78 commits)
  fs: remove the NULL get_block case in mpage_writepages
  fs: don't call ->writepage from __mpage_writepage
  fs: remove the nobh helpers
  jfs: stop using the nobh helper
  ext2: remove nobh support
  ntfs3: refactor ntfs_writepages
  mm/folio-compat: Remove migration compatibility functions
  fs: Remove aops->migratepage()
  secretmem: Convert to migrate_folio
  hugetlb: Convert to migrate_folio
  aio: Convert to migrate_folio
  f2fs: Convert to filemap_migrate_folio()
  ubifs: Convert to filemap_migrate_folio()
  btrfs: Convert btrfs_migratepage to migrate_folio
  mm/migrate: Add filemap_migrate_folio()
  mm/migrate: Convert migrate_page() to migrate_folio()
  nfs: Convert to migrate_folio
  btrfs: Convert btree_migratepage to migrate_folio
  mm/migrate: Convert expected_page_refs() to folio_expected_refs()
  mm/migrate: Convert buffer_migrate_page() to buffer_migrate_folio()
  ...
2022-08-03 10:35:43 -07:00
Linus Torvalds
b6bb70f9ab Several core optimizations:
* threadgroup_rwsem write locking is skipped when configuring controllers in
   empty subtrees. Combined with CLONE_INTO_CGROUP, this allows the common
   static usage pattern to not grab threadgroup_rwsem at all (glibc still
   doesn't seem ready for CLONE_INTO_CGROUP unfortunately).
 
 * threadgroup_rwsem used to be put into non-percpu mode by default due to
   latency concerns in specific use cases. There's no reason for everyone
   else to pay for it. Make the behavior optional.
 
 * psi no longer allocates memory when disabled.
 
 along with some code cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYIACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCYugHIQ4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGd+oAP9lfD3fTRdNo4qWV2VsZsYzoOxzNIuJSwN/dnYx
 IEbQOwD/cd2YMfeo6zcb427U/VfTFqjJjFK04OeljYtJU8fFywo=
 =sucy
 -----END PGP SIGNATURE-----

Merge tag 'cgroup-for-5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup updates from Tejun Heo:
 "Several core optimizations:

   - threadgroup_rwsem write locking is skipped when configuring
     controllers in empty subtrees.

     Combined with CLONE_INTO_CGROUP, this allows the common static
     usage pattern to not grab threadgroup_rwsem at all (glibc still
     doesn't seem ready for CLONE_INTO_CGROUP unfortunately).

   - threadgroup_rwsem used to be put into non-percpu mode by default
     due to latency concerns in specific use cases. There's no reason
     for everyone else to pay for it. Make the behavior optional.

   - psi no longer allocates memory when disabled.

  ... along with some code cleanups"

* tag 'cgroup-for-5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: Skip subtree root in cgroup_update_dfl_csses()
  cgroup: remove "no" prefixed mount options
  cgroup: Make !percpu threadgroup_rwsem operations optional
  cgroup: Add "no" prefixed mount options
  cgroup: Elide write-locking threadgroup_rwsem when updating csses on an empty subtree
  cgroup.c: remove redundant check for mixable cgroup in cgroup_migrate_vet_dst
  cgroup.c: add helper __cset_cgroup_from_root to cleanup duplicated codes
  psi: dont alloc memory for psi by default
2022-08-03 09:45:08 -07:00
Daniel Sneddon
2b12993220 x86/speculation: Add RSB VM Exit protections
tl;dr: The Enhanced IBRS mitigation for Spectre v2 does not work as
documented for RET instructions after VM exits. Mitigate it with a new
one-entry RSB stuffing mechanism and a new LFENCE.

== Background ==

Indirect Branch Restricted Speculation (IBRS) was designed to help
mitigate Branch Target Injection and Speculative Store Bypass, i.e.
Spectre, attacks. IBRS prevents software run in less privileged modes
from affecting branch prediction in more privileged modes. IBRS requires
the MSR to be written on every privilege level change.

To overcome some of the performance issues of IBRS, Enhanced IBRS was
introduced.  eIBRS is an "always on" IBRS, in other words, just turn
it on once instead of writing the MSR on every privilege level change.
When eIBRS is enabled, more privileged modes should be protected from
less privileged modes, including protecting VMMs from guests.

== Problem ==

Here's a simplification of how guests are run on Linux' KVM:

void run_kvm_guest(void)
{
	// Prepare to run guest
	VMRESUME();
	// Clean up after guest runs
}

The execution flow for that would look something like this to the
processor:

1. Host-side: call run_kvm_guest()
2. Host-side: VMRESUME
3. Guest runs, does "CALL guest_function"
4. VM exit, host runs again
5. Host might make some "cleanup" function calls
6. Host-side: RET from run_kvm_guest()

Now, when back on the host, there are a couple of possible scenarios of
post-guest activity the host needs to do before executing host code:

* on pre-eIBRS hardware (legacy IBRS, or nothing at all), the RSB is not
touched and Linux has to do a 32-entry stuffing.

* on eIBRS hardware, VM exit with IBRS enabled, or restoring the host
IBRS=1 shortly after VM exit, has a documented side effect of flushing
the RSB except in this PBRSB situation where the software needs to stuff
the last RSB entry "by hand".

IOW, with eIBRS supported, host RET instructions should no longer be
influenced by guest behavior after the host retires a single CALL
instruction.

However, if the RET instructions are "unbalanced" with CALLs after a VM
exit as is the RET in #6, it might speculatively use the address for the
instruction after the CALL in #3 as an RSB prediction. This is a problem
since the (untrusted) guest controls this address.

Balanced CALL/RET instruction pairs such as in step #5 are not affected.

== Solution ==

The PBRSB issue affects a wide variety of Intel processors which
support eIBRS. But not all of them need mitigation. Today,
X86_FEATURE_RSB_VMEXIT triggers an RSB filling sequence that mitigates
PBRSB. Systems setting RSB_VMEXIT need no further mitigation - i.e.,
eIBRS systems which enable legacy IBRS explicitly.

However, such systems (X86_FEATURE_IBRS_ENHANCED) do not set RSB_VMEXIT
and most of them need a new mitigation.

Therefore, introduce a new feature flag X86_FEATURE_RSB_VMEXIT_LITE
which triggers a lighter-weight PBRSB mitigation versus RSB_VMEXIT.

The lighter-weight mitigation performs a CALL instruction which is
immediately followed by a speculative execution barrier (INT3). This
steers speculative execution to the barrier -- just like a retpoline
-- which ensures that speculation can never reach an unbalanced RET.
Then, ensure this CALL is retired before continuing execution with an
LFENCE.

In other words, the window of exposure is opened at VM exit where RET
behavior is troublesome. While the window is open, force RSB predictions
sampling for RET targets to a dead end at the INT3. Close the window
with the LFENCE.

There is a subset of eIBRS systems which are not vulnerable to PBRSB.
Add these systems to the cpu_vuln_whitelist[] as NO_EIBRS_PBRSB.
Future systems that aren't vulnerable will set ARCH_CAP_PBRSB_NO.

  [ bp: Massage, incorporate review comments from Andy Cooper. ]

Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Co-developed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-08-03 11:23:52 +02:00
Linus Torvalds
665fe72a7d linux-kselftest-kunit-5.20-rc1
This KUnit update for Linux 5.20-rc1 consists of several fixes and an
 important feature to discourage running KUnit tests on production
 systems. Running tests on a production system could leave the system
 in a bad state. This new feature adds:
 
 - adds a new taint type, TAINT_TEST to signal that a test has been run.
   This should discourage people from running these tests on production
   systems, and to make it easier to tell if tests have been run
   accidentally (by loading the wrong configuration, etc.)
 
 - several documentation and tool enhancements and fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmLoOXcACgkQCwJExA0N
 Qxy5HQ//QehcBsN0rvNM5enP0HyJjDFxoF9HI7RxhHbwAE3LEkMQTNnFJOViJ7cY
 XZgvPipySkekPkvbm9uAnJw160hUSTCM3Oikf7JaxSTKS9Zvfaq9k78miQNrU2rT
 C9ljhLBF9y2eXxj9348jwlIHmjBwV5iMn6ncSvUkdUpDAkll2qIvtmmdiSgl33Et
 CRhdc07XBwhlz/hBDwj8oK2ZYGPsqjxf2CyrhRMJAOEJtY0wt971COzPj8cDGtmi
 nmQXiUhGejXPlzL/7hPYNr83YmYa/xGjecgDPKR3hOf5dVEVRUE2lKQ00F4GrwdZ
 KC6CWyXCzhhbtH7tfpWBU4ZoBdmyxhVOMDPFNJdHzuAHVAI3WbHmGjnptgV9jT7o
 KqgPVDW2n0fggMMUjmxR4fV2VrKoVy8EvLfhsanx961KhnPmQ6MXxL1cWoMT5BwA
 JtwPlNomwaee2lH9534Qgt1brybYZRGx1RDbWn2CW3kJabODptL80sZ62X5XxxRi
 I/keCbSjDO1mL3eEeGg/n7AsAhWrZFsxCThxSXH6u6d6jrrvCF3X2Ki5m27D1eGD
 Yh40Fy+FhwHSXNyVOav6XHYKhyRzJvPxM/mTGe5DtQ6YnP7G7SnfPchX4irZQOkv
 T2soJdtAcshnpG6z38Yd3uWM/8ARtSMaBU891ZAkFD9foniIYWE=
 =WzBX
 -----END PGP SIGNATURE-----

Merge tag 'linux-kselftest-kunit-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull KUnit updates from Shuah Khan:
 "This consists of several fixes and an important feature to discourage
  running KUnit tests on production systems. Running tests on a
  production system could leave the system in a bad state.

  Summary:

   - Add a new taint type, TAINT_TEST to signal that a test has been
     run.

     This should discourage people from running these tests on
     production systems, and to make it easier to tell if tests have
     been run accidentally (by loading the wrong configuration, etc)

   - Several documentation and tool enhancements and fixes"

* tag 'linux-kselftest-kunit-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (29 commits)
  Documentation: KUnit: Fix example with compilation error
  Documentation: kunit: Add CLI args for kunit_tool
  kcsan: test: Add a .kunitconfig to run KCSAN tests
  kunit: executor: Fix a memory leak on failure in kunit_filter_tests
  clk: explicitly disable CONFIG_UML_PCI_OVER_VIRTIO in .kunitconfig
  mmc: sdhci-of-aspeed: test: Use kunit_test_suite() macro
  nitro_enclaves: test: Use kunit_test_suite() macro
  thunderbolt: test: Use kunit_test_suite() macro
  kunit: flatten kunit_suite*** to kunit_suite** in .kunit_test_suites
  kunit: unify module and builtin suite definitions
  selftest: Taint kernel when test module loaded
  module: panic: Taint the kernel when selftest modules load
  Documentation: kunit: fix example run_kunit func to allow spaces in args
  Documentation: kunit: Cleanup run_wrapper, fix x-ref
  kunit: test.h: fix a kernel-doc markup
  kunit: tool: Enable virtio/PCI by default on UML
  kunit: tool: make --kunitconfig repeatable, blindly concat
  kunit: add coverage_uml.config to enable GCOV on UML
  kunit: tool: refactor internal kconfig handling, allow overriding
  kunit: tool: introduce --qemu_args
  ...
2022-08-02 19:34:45 -07:00
Linus Torvalds
aad26f55f4 This was a moderately busy cycle for documentation, but nothing all that
earth-shaking:
 
 - More Chinese translations, and an update to the Italian translations.
   The Japanese, Korean, and traditional Chinese translations are
   more-or-less unmaintained at this point, instead.
 
 - Some build-system performance improvements.
 
 - The removal of the archaic submitting-drivers.rst document, with the
   movement of what useful material that remained into other docs.
 
 - Improvements to sphinx-pre-install to, hopefully, give more useful
   suggestions.
 
 - A number of build-warning fixes
 
 Plus the usual collection of typo fixes, updates, and more.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmLn9OwPHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5YtrwIAJNZoDYJJIRuVHnFkAn5EJ4b/chnR1dSTBtn
 WdE/1zdAlMBWVlEGO48VZybph9Sk0v+cUGf+yviDgASQrfOhRRTkg/0u6XaBAYO0
 +C2D1QDd9DggGgajxsfJfTdD3IuB78mGmCQvP17XIJW+NK1CK9rXZBnj6WC5/HJw
 PCHzeeVreBxOS3W9GelMYa6vjVl7dv81x4DPllnsgU2AMk0/Ce0MVjeIZ695sOeP
 Ki6jZgC2GsgFSK5kBC35OiDe5q+fDzlLfek34EUCn4SIbMALSUYWO1db122w5Pme
 Ej0+UTBhD19WH1uB/rcVKnVWugi7UEUJexZsao+nC7UrdIVtYq0=
 =83BG
 -----END PGP SIGNATURE-----

Merge tag 'docs-6.0' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "This was a moderately busy cycle for documentation, but nothing
  all that earth-shaking:

   - More Chinese translations, and an update to the Italian
     translations.

     The Japanese, Korean, and traditional Chinese translations
     are more-or-less unmaintained at this point, instead.

   - Some build-system performance improvements.

   - The removal of the archaic submitting-drivers.rst document,
     with the movement of what useful material that remained into
     other docs.

   - Improvements to sphinx-pre-install to, hopefully, give more
     useful suggestions.

   - A number of build-warning fixes

  Plus the usual collection of typo fixes, updates, and more"

* tag 'docs-6.0' of git://git.lwn.net/linux: (92 commits)
  docs: efi-stub: Fix paths for x86 / arm stubs
  Docs/zh_CN: Update the translation of sched-stats to 5.19-rc8
  Docs/zh_CN: Update the translation of pci to 5.19-rc8
  Docs/zh_CN: Update the translation of pci-iov-howto to 5.19-rc8
  Docs/zh_CN: Update the translation of usage to 5.19-rc8
  Docs/zh_CN: Update the translation of testing-overview to 5.19-rc8
  Docs/zh_CN: Update the translation of sparse to 5.19-rc8
  Docs/zh_CN: Update the translation of kasan to 5.19-rc8
  Docs/zh_CN: Update the translation of iio_configfs to 5.19-rc8
  doc:it_IT: align Italian documentation
  docs: Remove spurious tag from admin-guide/mm/overcommit-accounting.rst
  Documentation: process: Update email client instructions for Thunderbird
  docs: ABI: correct QEMU fw_cfg spec path
  doc/zh_CN: remove submitting-driver reference from docs
  docs: zh_TW: align to submitting-drivers removal
  docs: zh_CN: align to submitting-drivers removal
  docs: ko_KR: howto: remove reference to removed submitting-drivers
  docs: ja_JP: howto: remove reference to removed submitting-drivers
  docs: it_IT: align to submitting-drivers removal
  docs: process: remove outdated submitting-drivers.rst
  ...
2022-08-02 19:24:24 -07:00
Linus Torvalds
7d9d077c78 RCU pull request for v5.20 (or whatever)
This pull request contains the following branches:
 
 doc.2022.06.21a: Documentation updates.
 
 fixes.2022.07.19a: Miscellaneous fixes.
 
 nocb.2022.07.19a: Callback-offload updates, perhaps most notably a new
 	RCU_NOCB_CPU_DEFAULT_ALL Kconfig option that causes all CPUs to
 	be offloaded at boot time, regardless of kernel boot parameters.
 	This is useful to battery-powered systems such as ChromeOS
 	and Android.  In addition, a new RCU_NOCB_CPU_CB_BOOST kernel
 	boot parameter prevents offloaded callbacks from interfering
 	with real-time workloads and with energy-efficiency mechanisms.
 
 poll.2022.07.21a: Polled grace-period updates, perhaps most notably
 	making these APIs account for both normal and expedited grace
 	periods.
 
 rcu-tasks.2022.06.21a: Tasks RCU updates, perhaps most notably reducing
 	the CPU overhead of RCU tasks trace grace periods by more than
 	a factor of two on a system with 15,000 tasks.	The reduction
 	is expected to increase with the number of tasks, so it seems
 	reasonable to hypothesize that a system with 150,000 tasks might
 	see a 20-fold reduction in CPU overhead.
 
 torture.2022.06.21a: Torture-test updates.
 
 ctxt.2022.07.05a: Updates that merge RCU's dyntick-idle tracking into
 	context tracking, thus reducing the overhead of transitioning to
 	kernel mode from either idle or nohz_full userspace execution
 	for kernels that track context independently of RCU.  This is
 	expected to be helpful primarily for kernels built with
 	CONFIG_NO_HZ_FULL=y.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmLgMcgTHHBhdWxtY2tA
 a2VybmVsLm9yZwAKCRCevxLzctn7jArXD/0fjbCwqpRjHVTzjMY8jN4zDkqZZD6m
 g8Fx27hZ4ToNFwRptyHwNezrNj14skjAJEXfdjaVw32W62ivXvf0HINvSzsTLCSq
 k2kWyBdXLc9CwY5p5W4smnpn5VoAScjg5PoPL59INoZ/Zziji323C7Zepl/1DYJt
 0T6bPCQjo1ZQoDUCyVpSjDmAqxnderWG0MeJVt74GkLqmnYLANg0GH8c7mH4+9LL
 kVGlLp5nlPgNJ4FEoFdMwNU8T/ETmaVld/m2dkiawjkXjJzB2XKtBigU91DDmXz5
 7DIdV4ABrxiy4kGNqtIe/jFgnKyVD7xiDpyfjd6KTeDr/rDS8u2ZH7+1iHsyz3g0
 Np/tS3vcd0KR+gI/d0eXxPbgm5sKlCmKw/nU2eArpW/+4LmVXBUfHTG9Jg+LJmBc
 JrUh6aEdIZJZHgv/nOQBNig7GJW43IG50rjuJxAuzcxiZNEG5lUSS23ysaA9CPCL
 PxRWKSxIEfK3kdmvVO5IIbKTQmIBGWlcWMTcYictFSVfBgcCXpPAksGvqA5JiUkc
 egW+xLFo/7K+E158vSKsVqlWZcEeUbsNJ88QOlpqnRgH++I2Yv/LhK41XfJfpH+Y
 ALxVaDd+mAq6v+qSHNVq9wT3ozXIPy/zK1hDlMIqx40h2YvaEsH4je+521oSoN9r
 vX60+QNxvUBLwA==
 =vUNm
 -----END PGP SIGNATURE-----

Merge tag 'rcu.2022.07.26a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu

Pull RCU updates from Paul McKenney:

 - Documentation updates

 - Miscellaneous fixes

 - Callback-offload updates, perhaps most notably a new
   RCU_NOCB_CPU_DEFAULT_ALL Kconfig option that causes all CPUs to be
   offloaded at boot time, regardless of kernel boot parameters.

   This is useful to battery-powered systems such as ChromeOS and
   Android. In addition, a new RCU_NOCB_CPU_CB_BOOST kernel boot
   parameter prevents offloaded callbacks from interfering with
   real-time workloads and with energy-efficiency mechanisms

 - Polled grace-period updates, perhaps most notably making these APIs
   account for both normal and expedited grace periods

 - Tasks RCU updates, perhaps most notably reducing the CPU overhead of
   RCU tasks trace grace periods by more than a factor of two on a
   system with 15,000 tasks.

   The reduction is expected to increase with the number of tasks, so it
   seems reasonable to hypothesize that a system with 150,000 tasks
   might see a 20-fold reduction in CPU overhead

 - Torture-test updates

 - Updates that merge RCU's dyntick-idle tracking into context tracking,
   thus reducing the overhead of transitioning to kernel mode from
   either idle or nohz_full userspace execution for kernels that track
   context independently of RCU.

   This is expected to be helpful primarily for kernels built with
   CONFIG_NO_HZ_FULL=y

* tag 'rcu.2022.07.26a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (98 commits)
  rcu: Add irqs-disabled indicator to expedited RCU CPU stall warnings
  rcu: Diagnose extended sync_rcu_do_polled_gp() loops
  rcu: Put panic_on_rcu_stall() after expedited RCU CPU stall warnings
  rcutorture: Test polled expedited grace-period primitives
  rcu: Add polled expedited grace-period primitives
  rcutorture: Verify that polled GP API sees synchronous grace periods
  rcu: Make Tiny RCU grace periods visible to polled APIs
  rcu: Make polled grace-period API account for expedited grace periods
  rcu: Switch polled grace-period APIs to ->gp_seq_polled
  rcu/nocb: Avoid polling when my_rdp->nocb_head_rdp list is empty
  rcu/nocb: Add option to opt rcuo kthreads out of RT priority
  rcu: Add nocb_cb_kthread check to rcu_is_callbacks_kthread()
  rcu/nocb: Add an option to offload all CPUs on boot
  rcu/nocb: Fix NOCB kthreads spawn failure with rcu_nocb_rdp_deoffload() direct call
  rcu/nocb: Invert rcu_state.barrier_mutex VS hotplug lock locking order
  rcu/nocb: Add/del rdp to iterate from rcuog itself
  rcu/tree: Add comment to describe GP-done condition in fqs loop
  rcu: Initialize first_gp_fqs at declaration in rcu_gp_fqs()
  rcu/kvfree: Remove useless monitor_todo flag
  rcu: Cleanup RCU urgency state for offline CPU
  ...
2022-08-02 19:12:45 -07:00
Linus Torvalds
a0b09f2d6f Random number generator updates for Linux 6.0-rc1.
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmLnDOwACgkQSfxwEqXe
 A65Fiw//Z0YaPejSslQIGitQ1b0XzdWBhyJArYDieaaiQRXMqlaSKlIUqHz38xb7
 +FykUY51/SJLjHV2riPxq1OK3/MPmk6VlTd0HHihcHVmg77oZcFcv2tPnDpZoqND
 TsBOujLbXKwxP8tNFedRY/4+K7w+ue9BTfDjuH7aCtz7uWd+4cNJmPg3x9FCfkMA
 +hbcRluwE9W3Pg4OCKwv+qxL0JF3qQtNKEOp1wpnjGAZZW/I9gFNgFBEkykvcAsj
 TkIRDc3agPFj6QgDeRIgLdnf9KCsLubKAg5oJneeCvQztJJUCSkn8nQXxpx+4sLo
 GsRgvCdfL/GyJqfSAzQJVYDHKtKMkJiCiWCC/oOALR8dzHJfSlULDAjbY1m/DAr9
 at+vi4678Or7TNx2ZSaUlCXXKZ+UT7yWMlQWax9JuxGk1hGYP5/eT1AH5SGjqUwF
 w1q8oyzxt1vUcnOzEddFXPFirnqqhAk4dQFtu83+xKM4ZssMVyeB4NZdEhAdW0ng
 MX+RjrVj4l5gWWuoS0Cx3LUxDCgV6WT0dN+Vl9axAZkoJJbcXLEmXwQ6NbzTLPWg
 1/MT7qFTxNcTCeAArMdZvvFbeh7pOBXO42pafrK/7vDRnTMUIw9tqXNLQUfvdFQp
 F5flPgiVRHDU2vSzKIFtnPTyXU0RBBGvNb4n0ss2ehH2DSsCxYE=
 =Zy3d
 -----END PGP SIGNATURE-----

Merge tag 'random-6.0-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random

Pull random number generator updates from Jason Donenfeld:
 "Though there's been a decent amount of RNG-related development during
  this last cycle, not all of it is coming through this tree, as this
  cycle saw a shift toward tackling early boot time seeding issues,
  which took place in other trees as well.

  Here's a summary of the various patches:

   - The CONFIG_ARCH_RANDOM .config option and the "nordrand" boot
     option have been removed, as they overlapped with the more widely
     supported and more sensible options, CONFIG_RANDOM_TRUST_CPU and
     "random.trust_cpu". This change allowed simplifying a bit of arch
     code.

   - x86's RDRAND boot time test has been made a bit more robust, with
     RDRAND disabled if it's clearly producing bogus results. This would
     be a tip.git commit, technically, but I took it through random.git
     to avoid a large merge conflict.

   - The RNG has long since mixed in a timestamp very early in boot, on
     the premise that a computer that does the same things, but does so
     starting at different points in wall time, could be made to still
     produce a different RNG state. Unfortunately, the clock isn't set
     early in boot on all systems, so now we mix in that timestamp when
     the time is actually set.

   - User Mode Linux now uses the host OS's getrandom() syscall to
     generate a bootloader RNG seed and later on treats getrandom() as
     the platform's RDRAND-like faculty.

   - The arch_get_random_{seed_,}_long() family of functions is now
     arch_get_random_{seed_,}_longs(), which enables certain platforms,
     such as s390, to exploit considerable performance advantages from
     requesting multiple CPU random numbers at once, while at the same
     time compiling down to the same code as before on platforms like
     x86.

   - A small cleanup changing a cmpxchg() into a try_cmpxchg(), from
     Uros.

   - A comment spelling fix"

More info about other random number changes that come in through various
architecture trees in the full commentary in the pull request:

  https://lore.kernel.org/all/20220731232428.2219258-1-Jason@zx2c4.com/

* tag 'random-6.0-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
  random: correct spelling of "overwrites"
  random: handle archrandom with multiple longs
  um: seed rng using host OS rng
  random: use try_cmpxchg in _credit_init_bits
  timekeeping: contribute wall clock to rng on time change
  x86/rdrand: Remove "nordrand" flag in favor of "random.trust_cpu"
  random: remove CONFIG_ARCH_RANDOM
2022-08-02 17:31:35 -07:00
Linus Torvalds
79802ada87 selinux/stable-6.0 PR 20220801
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmLoEeIUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNSOhAAwWwRcmcHnk+k2agT9QjKrLo26NCO
 MQLE89o4y2ChEFHxC7F7SKoQRxtfYa323p1vmlGzKrlB+IZ6oqERVp4QNQQbXsfn
 n9VvVpxjRNHAetcRhCM9ZOchWjUdw6AMaJ8e3fdRNRESadAUUFDxifw1wpjgG9+i
 LmtDbfZ7vLs2grTf9OZy3JIl1VF3lVRUTI7ZBQggfJncMa+LXNWdVNmEe3yfyboA
 1MwpSao7K2si0hBGAQo/UGQz4b19Tm4xMg8bSy7oTsP5Lae5ciPkeI3qazvs9usp
 WScZYhQ8NugqLbDbjs7dm6QCpj4x3dUs6ei48LKe3GF2mcGesFfOPo9sNHao4kKv
 C9t0f9qw+EhGvnNL7uQIDDf8OuTjuLWDvZSrMLID/IJKFF5NJ3y+XzaS9aPM3VEY
 qyOsX+cEzheXGhD6xE1sCo+AyPUDYqNDMIKBj2wlIGCKlzDGa8RT6VsQuvgf3c3K
 43CnRCQeWDWOHCq3MnRe/fmYtW+JB7tsXiKAq4OJADacwPP36bsP3bqU8AlWYwDt
 tnuMa+LKusHnMEQpMPI8FW8qGdxwGSen+mymfLFIMgtwNGkV7WGRJ6Lbyn0SaR6v
 HyXgZASIOQRnamK3yZCDpxo0K81IVxPWJIjHyg53znqT5TCpXccPyV4HwbJKI/KG
 8PtHrXOdPOGCZ2g=
 =WWq1
 -----END PGP SIGNATURE-----

Merge tag 'selinux-pr-20220801' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull selinux updates from Paul Moore:
 "A relatively small set of patches for SELinux this time, eight patches
  in total with really only one significant change.

  The highlights are:

   - Add support for proper labeling of memfd_secret anonymous inodes.

     This will allow LSMs that implement the anonymous inode hooks to
     apply security policy to memfd_secret() fds.

   - Various small improvements to memory management: fixed leaks, freed
     memory when needed, boundary checks.

   - Hardened the selinux_audit_data struct with __randomize_layout.

   - A minor documentation tweak to fix a formatting/style issue"

* tag 'selinux-pr-20220801' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  selinux: selinux_add_opt() callers free memory
  selinux: Add boundary check in put_entry()
  selinux: fix memleak in security_read_state_kernel()
  docs: selinux: add '=' signs to kernel boot options
  mm: create security context for memfd_secret inodes
  selinux: fix typos in comments
  selinux: drop unnecessary NULL check
  selinux: add __randomize_layout to selinux_audit_data
2022-08-02 14:51:47 -07:00
Linus Torvalds
8374cfe647 - Refactor DM core's mempool allocation so that it clearer by not
being split acorss files.
 
 - Improve DM core's BLK_STS_DM_REQUEUE and BLK_STS_AGAIN handling.
 
 - Optimize DM core's more common bio splitting by eliminating the use
   of bio cloning with bio_split+bio_chain. Shift that cloning cost to
   the relatively unlikely dm_io requeue case that only occurs during
   error handling. Introduces dm_io_rewind() that will clone a bio that
   reflects the subset of the original bio that must be requeued.
 
 - Remove DM core's dm_table_get_num_targets() wrapper and audit all
   dm_table_get_target() callers.
 
 - Fix potential for OOM with DM writecache target by setting a default
   MAX_WRITEBACK_JOBS (set to 256MiB or 1/16 of total system memory,
   whichever is smaller).
 
 - Fix DM writecache target's stats that are reported through
   DM-specific table info.
 
 - Fix use-after-free crash in dm_sm_register_threshold_callback().
 
 - Refine DM core's Persistent Reservation handling in preparation for
   broader work Mike Christie is doing to add compatibility with
   Microsoft Windows Failover Cluster.
 
 - Fix various KASAN reported bugs in the DM raid target.
 
 - Fix DM raid target crash due to md_handle_request() bio splitting
   that recurses to block core without properly initializing the bio's
   bi_dev.
 
 - Fix some code comment typos and fix some Documentation formatting.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAmLoAAUACgkQxSPxCi2d
 A1rFUAf/RnLkzNPS1QJ1uVbiiW64zQUD2o5U8kAliOZxoXQ45U+CgL22a5VaT2R+
 rI9+YSg/VX9YkdJwB6I+y7VRWjUCO3NfYOfQUX5NS8GfcPQQn2Zp+jy3t8VnjjXN
 5+8Ylu5tX1+QSQiBB9JvIgiK71rSorT6H+KF6bZypL7te5kj39hDaRbmh40VOMOY
 QWfs8DmyJft9IHPG7Mku/rFJbdVJph3f31IoqUOoGTOKoDPDUDezhCVHi2vpFcV+
 S+iCKkeXmP6eUdCnNzBreYPTcP9OtCvkZEPhFg4lh+jyhjyFq1o0B8G5cRo5jvgr
 GcN1GUT040sSqNXFquA4UU6RGvaxlg==
 =gBRJ
 -----END PGP SIGNATURE-----

Merge tag 'for-6.0/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

 - Refactor DM core's mempool allocation so that it clearer by not being
   split acorss files.

 - Improve DM core's BLK_STS_DM_REQUEUE and BLK_STS_AGAIN handling.

 - Optimize DM core's more common bio splitting by eliminating the use
   of bio cloning with bio_split+bio_chain. Shift that cloning cost to
   the relatively unlikely dm_io requeue case that only occurs during
   error handling. Introduces dm_io_rewind() that will clone a bio that
   reflects the subset of the original bio that must be requeued.

 - Remove DM core's dm_table_get_num_targets() wrapper and audit all
   dm_table_get_target() callers.

 - Fix potential for OOM with DM writecache target by setting a default
   MAX_WRITEBACK_JOBS (set to 256MiB or 1/16 of total system memory,
   whichever is smaller).

 - Fix DM writecache target's stats that are reported through
   DM-specific table info.

 - Fix use-after-free crash in dm_sm_register_threshold_callback().

 - Refine DM core's Persistent Reservation handling in preparation for
   broader work Mike Christie is doing to add compatibility with
   Microsoft Windows Failover Cluster.

 - Fix various KASAN reported bugs in the DM raid target.

 - Fix DM raid target crash due to md_handle_request() bio splitting
   that recurses to block core without properly initializing the bio's
   bi_dev.

 - Fix some code comment typos and fix some Documentation formatting.

* tag 'for-6.0/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (29 commits)
  dm: fix dm-raid crash if md_handle_request() splits bio
  dm raid: fix address sanitizer warning in raid_resume
  dm raid: fix address sanitizer warning in raid_status
  dm: Start pr_preempt from the same starting path
  dm: Fix PR release handling for non All Registrants
  dm: Start pr_reserve from the same starting path
  dm: Allow dm_call_pr to be used for path searches
  dm: return early from dm_pr_call() if DM device is suspended
  dm thin: fix use-after-free crash in dm_sm_register_threshold_callback
  dm writecache: count number of blocks discarded, not number of discard bios
  dm writecache: count number of blocks written, not number of write bios
  dm writecache: count number of blocks read, not number of read bios
  dm writecache: return void from functions
  dm kcopyd: use __GFP_HIGHMEM when allocating pages
  dm writecache: set a default MAX_WRITEBACK_JOBS
  Documentation: dm writecache: Render status list as list
  Documentation: dm writecache: add blank line before optional parameters
  dm snapshot: fix typo in snapshot_map() comment
  dm raid: remove redundant "the" in parse_raid_params() comment
  dm cache: fix typo in 2 comment blocks
  ...
2022-08-02 14:21:25 -07:00
Linus Torvalds
0cec3f24a7 arm64 updates for 5.20
- Remove unused generic cpuidle support (replaced by PSCI version)
 
 - Fix documentation describing the kernel virtual address space
 
 - Handling of some new CPU errata in Arm implementations
 
 - Rework of our exception table code in preparation for handling
   machine checks (i.e. RAS errors) more gracefully
 
 - Switch over to the generic implementation of ioremap()
 
 - Fix lockdep tracking in NMI context
 
 - Instrument our memory barrier macros for KCSAN
 
 - Rework of the kPTI G->nG page-table repainting so that the MMU remains
   enabled and the boot time is no longer slowed to a crawl for systems
   which require the late remapping
 
 - Enable support for direct swapping of 2MiB transparent huge-pages on
   systems without MTE
 
 - Fix handling of MTE tags with allocating new pages with HW KASAN
 
 - Expose the SMIDR register to userspace via sysfs
 
 - Continued rework of the stack unwinder, particularly improving the
   behaviour under KASAN
 
 - More repainting of our system register definitions to match the
   architectural terminology
 
 - Improvements to the layout of the vDSO objects
 
 - Support for allocating additional bits of HWCAP2 and exposing
   FEAT_EBF16 to userspace on CPUs that support it
 
 - Considerable rework and optimisation of our early boot code to reduce
   the need for cache maintenance and avoid jumping in and out of the
   kernel when handling relocation under KASLR
 
 - Support for disabling SVE and SME support on the kernel command-line
 
 - Support for the Hisilicon HNS3 PMU
 
 - Miscellanous cleanups, trivial updates and minor fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmLeccUQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNCysB/4ml92RJLhVwRAofbtFfVgVz3JLTSsvob9x
 Z7FhNDxfM/G32wKtOHU9tHkGJ+PMVWOPajukzxkMhxmilfTyHBbiisNWVRjKQxj4
 wrd07DNXPIv3bi8SWzS1y2y8ZqujZWjNJlX8SUCzEoxCVtuNKwrh96kU1jUjrkFZ
 kBo4E4wBWK/qW29nClGSCgIHRQNJaB/jvITlQhkqIb0pwNf3sAUzW7QoF1iTZWhs
 UswcLh/zC4q79k9poegdCt8chV5OBDLtLPnMxkyQFvsLYRp3qhyCSQQY/BxvO5JS
 jT9QR6d+1ewET9BFhqHlIIuOTYBCk3xn/PR9AucUl+ZBQd2tO4B1
 =LVH0
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Will Deacon:
 "Highlights include a major rework of our kPTI page-table rewriting
  code (which makes it both more maintainable and considerably faster in
  the cases where it is required) as well as significant changes to our
  early boot code to reduce the need for data cache maintenance and
  greatly simplify the KASLR relocation dance.

  Summary:

   - Remove unused generic cpuidle support (replaced by PSCI version)

   - Fix documentation describing the kernel virtual address space

   - Handling of some new CPU errata in Arm implementations

   - Rework of our exception table code in preparation for handling
     machine checks (i.e. RAS errors) more gracefully

   - Switch over to the generic implementation of ioremap()

   - Fix lockdep tracking in NMI context

   - Instrument our memory barrier macros for KCSAN

   - Rework of the kPTI G->nG page-table repainting so that the MMU
     remains enabled and the boot time is no longer slowed to a crawl
     for systems which require the late remapping

   - Enable support for direct swapping of 2MiB transparent huge-pages
     on systems without MTE

   - Fix handling of MTE tags with allocating new pages with HW KASAN

   - Expose the SMIDR register to userspace via sysfs

   - Continued rework of the stack unwinder, particularly improving the
     behaviour under KASAN

   - More repainting of our system register definitions to match the
     architectural terminology

   - Improvements to the layout of the vDSO objects

   - Support for allocating additional bits of HWCAP2 and exposing
     FEAT_EBF16 to userspace on CPUs that support it

   - Considerable rework and optimisation of our early boot code to
     reduce the need for cache maintenance and avoid jumping in and out
     of the kernel when handling relocation under KASLR

   - Support for disabling SVE and SME support on the kernel
     command-line

   - Support for the Hisilicon HNS3 PMU

   - Miscellanous cleanups, trivial updates and minor fixes"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (136 commits)
  arm64: Delay initialisation of cpuinfo_arm64::reg_{zcr,smcr}
  arm64: fix KASAN_INLINE
  arm64/hwcap: Support FEAT_EBF16
  arm64/cpufeature: Store elf_hwcaps as a bitmap rather than unsigned long
  arm64/hwcap: Document allocation of upper bits of AT_HWCAP
  arm64: enable THP_SWAP for arm64
  arm64/mm: use GENMASK_ULL for TTBR_BADDR_MASK_52
  arm64: errata: Remove AES hwcap for COMPAT tasks
  arm64: numa: Don't check node against MAX_NUMNODES
  drivers/perf: arm_spe: Fix consistency of SYS_PMSCR_EL1.CX
  perf: RISC-V: Add of_node_put() when breaking out of for_each_of_cpu_node()
  docs: perf: Include hns3-pmu.rst in toctree to fix 'htmldocs' WARNING
  arm64: kasan: Revert "arm64: mte: reset the page tag in page->flags"
  mm: kasan: Skip page unpoisoning only if __GFP_SKIP_KASAN_UNPOISON
  mm: kasan: Skip unpoisoning of user pages
  mm: kasan: Ensure the tags are visible before the tag in page->flags
  drivers/perf: hisi: add driver for HNS3 PMU
  drivers/perf: hisi: Add description for HNS3 PMU driver
  drivers/perf: riscv_pmu_sbi: perf format
  perf/arm-cci: Use the bitmap API to allocate bitmaps
  ...
2022-08-01 10:37:00 -07:00
Linus Torvalds
42efa5e3a8 - Remove the vendor check when selecting MWAIT as the default idle state
- Respect idle=nomwait when supplied on the kernel cmdline
 
 - Two small cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmLntx0ACgkQEsHwGGHe
 VUqlRxAAkULobsk6Dx3wrQcYlpA8Mt/ctttTQXWiIQwhK1j7uP0zlGWBqImr5Wsk
 T04g1s29azulnPs3PydCF2QlLqSyF4v2PyyUwnpKfTP6CPM+MLtz98Gm6Xcbkt+s
 f28ISYgNP+15tskWdNqB5XIVGkuyBdNne9TiFwtnVrJYF47FSwqEWRyqMH+bIOGT
 wSZUCfjcw7PtKwfIAmYq4beS2+wbY9bsfVyIz+H0ks2EVFQdjYWb/kH9PgUYEQFe
 VEOBsPvTHDOJt0QXEXSJjmoSRUS77Wduw56Y3L2T4jWdXXQFWJ79rqNYDBvXGAdh
 Y8BKM5IYFZpzrmfw2RB6jbDY/JWO5PPFvHTXogQf9+wttSerZEffVQdOeTwjT8VD
 wc9/ZnNkT7915033VI90V+hdFkwarq8FXuFH8TkzcxP9DQNYG8CRTZBceq0UWBl0
 5RpIDwNX9JxGrR+frJi0D24qxz//wLe56UqW9hLp73NP8QtEYEW1nb1q30Q2eM3N
 iQblgmh63qQ/dy6JV1GFb3aePiWMUNQwcTrj1pd8YDfNlp4IsFsSswnsdAZWtr1A
 l9qewHkBZbbzyTQkBjExUsaIdiaMywFwnUmcQNL+fHqznZIvMhJC/oCJeS0Pe/RH
 alTUrYsk6Y87HFpxoXpd85a9+20m8yrA64uY8cSQguGZ9i5Lm8g=
 =jkpj
 -----END PGP SIGNATURE-----

Merge tag 'x86_cpu_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cpu updates from Borislav Petkov:

 - Remove the vendor check when selecting MWAIT as the default idle
   state

 - Respect idle=nomwait when supplied on the kernel cmdline

 - Two small cleanups

* tag 'x86_cpu_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Use MSR_IA32_MISC_ENABLE constants
  x86: Fix comment for X86_FEATURE_ZEN
  x86: Remove vendor checks from prefer_mwait_c1_over_halt
  x86: Handle idle=nomwait cmdline properly for x86_idle
2022-08-01 09:49:29 -07:00
Paolo Bonzini
63f4b21041 Merge remote-tracking branch 'kvm/next' into kvm-next-5.20
KVM/s390, KVM/x86 and common infrastructure changes for 5.20

x86:

* Permit guests to ignore single-bit ECC errors

* Fix races in gfn->pfn cache refresh; do not pin pages tracked by the cache

* Intel IPI virtualization

* Allow getting/setting pending triple fault with KVM_GET/SET_VCPU_EVENTS

* PEBS virtualization

* Simplify PMU emulation by just using PERF_TYPE_RAW events

* More accurate event reinjection on SVM (avoid retrying instructions)

* Allow getting/setting the state of the speaker port data bit

* Refuse starting the kvm-intel module if VM-Entry/VM-Exit controls are inconsistent

* "Notify" VM exit (detect microarchitectural hangs) for Intel

* Cleanups for MCE MSR emulation

s390:

* add an interface to provide a hypervisor dump for secure guests

* improve selftests to use TAP interface

* enable interpretive execution of zPCI instructions (for PCI passthrough)

* First part of deferred teardown

* CPU Topology

* PV attestation

* Minor fixes

Generic:

* new selftests API using struct kvm_vcpu instead of a (vm, id) tuple

x86:

* Use try_cmpxchg64 instead of cmpxchg64

* Bugfixes

* Ignore benign host accesses to PMU MSRs when PMU is disabled

* Allow disabling KVM's "MONITOR/MWAIT are NOPs!" behavior

* x86/MMU: Allow NX huge pages to be disabled on a per-vm basis

* Port eager page splitting to shadow MMU as well

* Enable CMCI capability by default and handle injected UCNA errors

* Expose pid of vcpu threads in debugfs

* x2AVIC support for AMD

* cleanup PIO emulation

* Fixes for LLDT/LTR emulation

* Don't require refcounted "struct page" to create huge SPTEs

x86 cleanups:

* Use separate namespaces for guest PTEs and shadow PTEs bitmasks

* PIO emulation

* Reorganize rmap API, mostly around rmap destruction

* Do not workaround very old KVM bugs for L0 that runs with nesting enabled

* new selftests API for CPUID
2022-08-01 03:21:00 -04:00
Yosry Ahmed
73b73bac90 mm: vmpressure: don't count proactive reclaim in vmpressure
memory.reclaim is a cgroup v2 interface that allows users to proactively
reclaim memory from a memcg, without real memory pressure.  Reclaim
operations invoke vmpressure, which is used: (a) To notify userspace of
reclaim efficiency in cgroup v1, and (b) As a signal for a memcg being
under memory pressure for networking (see
mem_cgroup_under_socket_pressure()).

For (a), vmpressure notifications in v1 are not affected by this change
since memory.reclaim is a v2 feature.

For (b), the effects of the vmpressure signal (according to Shakeel [1])
are as follows:
1. Reducing send and receive buffers of the current socket.
2. May drop packets on the rx path.
3. May throttle current thread on the tx path.

Since proactive reclaim is invoked directly by userspace, not by memory
pressure, it makes sense not to throttle networking.  Hence, this change
makes sure that proactive reclaim caused by memory.reclaim does not
trigger vmpressure.

[1] https://lore.kernel.org/lkml/CALvZod68WdrXEmBpOkadhB5GPYmCXaDZzXH=yyGOCAjFRn4NDQ@mail.gmail.com/

[yosryahmed@google.com: update documentation]
  Link: https://lkml.kernel.org/r/20220721173015.2643248-1-yosryahmed@google.com
Link: https://lkml.kernel.org/r/20220714064918.2576464-1-yosryahmed@google.com
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Hildenbrand <david@redhat.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: NeilBrown <neilb@suse.de>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-29 18:07:15 -07:00
Eiichi Tsukata
ea304a8b89 docs/kernel-parameters: Update descriptions for "mitigations=" param with retbleed
Updates descriptions for "mitigations=off" and "mitigations=auto,nosmt"
with the respective retbleed= settings.

Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: corbet@lwn.net
Link: https://lore.kernel.org/r/20220728043907.165688-1-eiichi.tsukata@nutanix.com
2022-07-29 20:47:07 +02:00
Joerg Roedel
c10100a416 Merge branches 'arm/exynos', 'arm/mediatek', 'arm/msm', 'arm/smmu', 'virtio', 'x86/vt-d', 'x86/amd' and 'core' into next 2022-07-29 12:06:56 +02:00
Jakub Kicinski
272ac32f56 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28 18:21:16 -07:00
João Paulo Rechi Vita
339170d8d3 docs: efi-stub: Fix paths for x86 / arm stubs
This fixes the paths of x86 / arm efi-stub source files.

Signed-off-by: João Paulo Rechi Vita <jprvita@endlessos.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220727140539.10021-1-jprvita@endlessos.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-07-28 09:41:56 -06:00
Tejun Heo
c808f46323 cgroup: remove "no" prefixed mount options
30312730bd ("cgroup: Add "no" prefixed mount options") added "no" prefixed
mount options to allow turning them off and 6a010a49b6 ("cgroup: Make
!percpu threadgroup_rwsem operations optional") added one more "no" prefixed
mount option. However, Michal pointed out that the "no" prefixed options
aren't necessary in allowing mount options to be turned off:

  # grep group /proc/mounts
  cgroup2 /sys/fs/cgroup cgroup2 rw,nosuid,nodev,relatime,nsdelegate,memory_recursiveprot 0 0
  # mount -o remount,nsdelegate,memory_recursiveprot none /sys/fs/cgroup
  # grep cgroup /proc/mounts
  cgroup2 /sys/fs/cgroup cgroup2 rw,relatime,nsdelegate,memory_recursiveprot 0 0

Note that this is different from the remount behavior when the mount(1) is
invoked without the device argument - "none":

 # grep cgroup /proc/mounts
 cgroup2 /sys/fs/cgroup cgroup2 rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot 0 0
 # mount -o remount,nsdelegate,memory_recursiveprot /sys/fs/cgroup
 # grep cgroup /proc/mounts
 cgroup2 /sys/fs/cgroup cgroup2 rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot 0 0

While a bit confusing, given that there is a way to turn off the options,
there's no reason to have the explicit "no" prefixed options. Let's remove
them.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2022-07-27 07:54:55 -10:00
Laurent Dufour
118b136693 powerpc/pseries/mobility: set NMI watchdog factor during an LPM
During an LPM, while the memory transfer is in progress on the arrival
side, some latencies are generated when accessing not yet transferred
pages on the arrival side. Thus, the NMI watchdog may be triggered too
frequently, which increases the risk to hit an NMI interrupt in a bad
place in the kernel, leading to a kernel panic.

Disabling the Hard Lockup Watchdog until the memory transfer could be a
too strong work around, some users would want this timeout to be
eventually triggered if the system is hanging even during an LPM.

Introduce a new sysctl variable nmi_watchdog_factor. It allows to apply
a factor to the NMI watchdog timeout during an LPM. Just before the CPUs
are stopped for the switchover sequence, the NMI watchdog timer is set
to watchdog_thresh + factor%

A value of 0 has no effect. The default value is 200, meaning that the
NMI watchdog is set to 30s during LPM (based on a 10s watchdog_thresh
value). Once the memory transfer is achieved, the factor is reset to 0.

Setting this value to a high number is like disabling the NMI watchdog
during an LPM.

Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220713154729.80789-5-ldufour@linux.ibm.com
2022-07-27 21:36:03 +10:00
Will Deacon
f96d67a8af Merge branch 'for-next/boot' into for-next/core
* for-next/boot: (34 commits)
  arm64: fix KASAN_INLINE
  arm64: Add an override for ID_AA64SMFR0_EL1.FA64
  arm64: Add the arm64.nosve command line option
  arm64: Add the arm64.nosme command line option
  arm64: Expose a __check_override primitive for oddball features
  arm64: Allow the idreg override to deal with variable field width
  arm64: Factor out checking of a feature against the override into a macro
  arm64: Allow sticky E2H when entering EL1
  arm64: Save state of HCR_EL2.E2H before switch to EL1
  arm64: Rename the VHE switch to "finalise_el2"
  arm64: mm: fix booting with 52-bit address space
  arm64: head: remove __PHYS_OFFSET
  arm64: lds: use PROVIDE instead of conditional definitions
  arm64: setup: drop early FDT pointer helpers
  arm64: head: avoid relocating the kernel twice for KASLR
  arm64: kaslr: defer initialization to initcall where permitted
  arm64: head: record CPU boot mode after enabling the MMU
  arm64: head: populate kernel page tables with MMU and caches on
  arm64: head: factor out TTBR1 assignment into a macro
  arm64: idreg-override: use early FDT mapping in ID map
  ...
2022-07-25 10:59:15 +01:00
Will Deacon
288e21b6b2 Merge branch 'for-next/perf' into for-next/core
* for-next/perf:
  drivers/perf: arm_spe: Fix consistency of SYS_PMSCR_EL1.CX
  perf: RISC-V: Add of_node_put() when breaking out of for_each_of_cpu_node()
  docs: perf: Include hns3-pmu.rst in toctree to fix 'htmldocs' WARNING
  drivers/perf: hisi: add driver for HNS3 PMU
  drivers/perf: hisi: Add description for HNS3 PMU driver
  drivers/perf: riscv_pmu_sbi: perf format
  perf/arm-cci: Use the bitmap API to allocate bitmaps
  drivers/perf: riscv_pmu: Add riscv pmu pm notifier
  perf: hisi: Extract hisi_pmu_init
  perf/marvell_cn10k: Fix TAD PMU register offset
  perf/marvell_cn10k: Remove useless license text when SPDX-License-Identifier is already used
  arm64: cpufeature: Allow different PMU versions in ID_DFR0_EL1
  perf/arm-cci: fix typo in comment
  drivers/perf:Directly use ida_alloc()/free()
  drivers/perf: Directly use ida_alloc()/free()
2022-07-25 10:57:14 +01:00
Tejun Heo
6a010a49b6 cgroup: Make !percpu threadgroup_rwsem operations optional
3942a9bd7b ("locking, rcu, cgroup: Avoid synchronize_sched() in
__cgroup_procs_write()") disabled percpu operations on threadgroup_rwsem
because the impiled synchronize_rcu() on write locking was pushing up the
latencies too much for android which constantly moves processes between
cgroups.

This makes the hotter paths - fork and exit - slower as they're always
forced into the slow path. There is no reason to force this on everyone
especially given that more common static usage pattern can now completely
avoid write-locking the rwsem. Write-locking is elided when turning on and
off controllers on empty sub-trees and CLONE_INTO_CGROUP enables seeding a
cgroup without grabbing the rwsem.

Restore the default percpu operations and introduce the mount option
"favordynmods" and config option CGROUP_FAVOR_DYNMODS for users who need
lower latencies for the dynamic operations.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Michal Koutn� <mkoutny@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Dmitry Shmidt <dimitrysh@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
2022-07-23 04:29:02 -10:00
Tejun Heo
30312730bd cgroup: Add "no" prefixed mount options
We allow modifying these mount options via remount. Let's add "no" prefixed
variants so that they can be turned off too.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
2022-07-22 19:12:52 -10:00
Linus Torvalds
4ba1329cbb Urgent RCU pull request for v5.19
This pull request contains a pair of commits that fix 282d8998e9 ("srcu:
 Prevent expedited GPs and blocking readers from consuming CPU"), which
 was itself a fix to an SRCU expedited grace-period problem that could
 prevent kernel live patching (KLP) from completing.  That SRCU fix for
 KLP introduced large (as in minutes) boot-time delays to embedded Linux
 kernels running on qemu/KVM.  These delays were due to the emulation of
 certain MMIO operations controlling memory layout, which were emulated
 with one expedited grace period per access.  Common configurations
 required thousands of boot-time MMIO accesses, and thus thousands of
 boot-time expedited SRCU grace periods.
 
 In these configurations, the occasional sleeps that allowed KLP to proceed
 caused excessive boot delays.  These commits preserve enough sleeps to
 permit KLP to proceed, but few enough that the virtual embedded kernels
 still boot reasonably quickly.
 
 This represents a regression introduced in the v5.19 merge window,
 and the bug is causing significant inconvenience, hence this pull request.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmLZ6LoTHHBhdWxtY2tA
 a2VybmVsLm9yZwAKCRCevxLzctn7jNHgD/4tb8Un6vZlrEaYbyA/ztUITX/2DisS
 kiqbQz1BH8V3B3PxSo4ldEiw+z3fC3SMyIPymuu9bhwm6SFdjEsarFkIqySxkYnX
 jnuk0JbWxs4Kk64rIkHHzAxzvM2Iw1EjSzjP1M+DC7iymSJpsgp+0zFJJtcJ8Y87
 67hbQRQYk+1T7ZT+vq77NiyAAFEzSd8UydgBVxlsOOdkXQ91NYTyB8D6ldUJAnLU
 opwCEpgpu74Sp4Te5q6f9uAt8xZmXsyrm8zJgzTz0KSgivcpt4GmIoyEFYUQczj0
 Hewr6+qM9AWfvfQxNvRCS25yeox18kbdp1qdp9rl0BZMtYN2Zsk1Ec4c79s7NBLc
 G3TIvJkGLHuZO1dO4BhLkYczgRYlaPxOR/0GKNn4m69/TbVmseUL1WeZS0pswB0q
 cH1AKKEg9KdPoaX0hTLoOrlv/vwbgjhKKuoqEv7yEUhJJdACy50rmnhWhSxeuQDb
 aIITVKkjkwpDtRX5QTdG1f5uIMoGz9BbUDv7VeodB0mrYHluXEfyNTwlqcISKAgm
 T9kLmsdfvMrQ4fLR5S3i3dwnL3b52OB8h5NyfW3YRkXEnA7//ef/XpPiW2HY8BMT
 7QwPqOoUSr/IraAcI8j0QxRpioUk1oaNi+UJ3FSHni8re6rZ0kaxatRCT20h6Djq
 C9RVLaevw3bGXQ==
 =ndhB
 -----END PGP SIGNATURE-----

Merge tag 'rcu-urgent.2022.07.21a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu

Pull RCU fix from Paul McKenney:
 "This contains a pair of commits that fix 282d8998e9 ("srcu: Prevent
  expedited GPs and blocking readers from consuming CPU"), which was
  itself a fix to an SRCU expedited grace-period problem that could
  prevent kernel live patching (KLP) from completing.

  That SRCU fix for KLP introduced large (as in minutes) boot-time
  delays to embedded Linux kernels running on qemu/KVM. These delays
  were due to the emulation of certain MMIO operations controlling
  memory layout, which were emulated with one expedited grace period per
  access. Common configurations required thousands of boot-time MMIO
  accesses, and thus thousands of boot-time expedited SRCU grace
  periods.

  In these configurations, the occasional sleeps that allowed KLP to
  proceed caused excessive boot delays. These commits preserve enough
  sleeps to permit KLP to proceed, but few enough that the virtual
  embedded kernels still boot reasonably quickly.

  This represents a regression introduced in the v5.19 merge window, and
  the bug is causing significant inconvenience"

* tag 'rcu-urgent.2022.07.21a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
  srcu: Make expedited RCU grace periods block even less frequently
  srcu: Block less aggressively for expedited grace periods
2022-07-22 10:01:20 -07:00
Tianyu Lan
7231180903 swiotlb: clean up some coding style and minor issues
- Fix the used field of struct io_tlb_area wasn't initialized
- Set area number to be 0 if input area number parameter is 0
- Use array_size() to calculate io_tlb_area array size
- Make parameters of swiotlb_do_find_slots() more reasonable

Fixes: 26ffb91fa5e0 ("swiotlb: split up the global swiotlb lock")
Signed-off-by: Tianyu Lan <tiala@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-07-22 17:20:56 +02:00
Paul E. McKenney
d38c8fe483 Merge branches 'doc.2022.06.21a', 'fixes.2022.07.19a', 'nocb.2022.07.19a', 'poll.2022.07.21a', 'rcu-tasks.2022.06.21a' and 'torture.2022.06.21a' into HEAD
doc.2022.06.21a: Documentation updates.
fixes.2022.07.19a: Miscellaneous fixes.
nocb.2022.07.19a: Callback-offload updates.
poll.2022.07.21a: Polled grace-period updates.
rcu-tasks.2022.06.21a: Tasks RCU updates.
torture.2022.06.21a: Torture-test updates.
2022-07-21 17:43:16 -07:00
Joel Fernandes
b37a667c62 rcu/nocb: Add an option to offload all CPUs on boot
Systems built with CONFIG_RCU_NOCB_CPU=y but booted without either
the rcu_nocbs= or rcu_nohz_full= kernel-boot parameters will not have
callback offloading on any of the CPUs, nor can any of the CPUs be
switched to enable callback offloading at runtime.  Although this is
intentional, it would be nice to have a way to offload all the CPUs
without having to make random bootloaders specify either the rcu_nocbs=
or the rcu_nohz_full= kernel-boot parameters.

This commit therefore provides a new CONFIG_RCU_NOCB_CPU_DEFAULT_ALL
Kconfig option that switches the default so as to offload callback
processing on all of the CPUs.  This default can still be overridden
using the rcu_nocbs= and rcu_nohz_full= kernel-boot parameters.

Reviewed-by: Kalesh Singh <kaleshsingh@google.com>
Reviewed-by: Uladzislau Rezki <urezki@gmail.com>
(In v4.1, fixed issues with CONFIG maze reported by kernel test robot).
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
2022-07-19 11:43:34 -07:00
Neeraj Upadhyay
4f2bfd9494 srcu: Make expedited RCU grace periods block even less frequently
The purpose of commit 282d8998e9 ("srcu: Prevent expedited GPs
and blocking readers from consuming CPU") was to prevent a long
series of never-blocking expedited SRCU grace periods from blocking
kernel-live-patching (KLP) progress.  Although it was successful, it also
resulted in excessive boot times on certain embedded workloads running
under qemu with the "-bios QEMU_EFI.fd" command line.  Here "excessive"
means increasing the boot time up into the three-to-four minute range.
This increase in boot time was due to the more than 6000 back-to-back
invocations of synchronize_rcu_expedited() within the KVM host OS, which
in turn resulted from qemu's emulation of a long series of MMIO accesses.

Commit 640a7d37c3f4 ("srcu: Block less aggressively for expedited grace
periods") did not significantly help this particular use case.

Zhangfei Gao and Shameerali Kolothum Thodi did experiments varying the
value of SRCU_MAX_NODELAY_PHASE with HZ=250 and with various values
of non-sleeping per phase counts on a system with preemption enabled,
and observed the following boot times:

+──────────────────────────+────────────────+
| SRCU_MAX_NODELAY_PHASE   | Boot time (s)  |
+──────────────────────────+────────────────+
| 100                      | 30.053         |
| 150                      | 25.151         |
| 200                      | 20.704         |
| 250                      | 15.748         |
| 500                      | 11.401         |
| 1000                     | 11.443         |
| 10000                    | 11.258         |
| 1000000                  | 11.154         |
+──────────────────────────+────────────────+

Analysis on the experiment results show additional improvements with
CPU-bound delays approaching one jiffy in duration. This improvement was
also seen when number of per-phase iterations were scaled to one jiffy.

This commit therefore scales per-grace-period phase number of non-sleeping
polls so that non-sleeping polls extend for about one jiffy. In addition,
the delay-calculation call to srcu_get_delay() in srcu_gp_end() is
replaced with a simple check for an expedited grace period.  This change
schedules callback invocation immediately after expedited grace periods
complete, which results in greatly improved boot times.  Testing done
by Marc and Zhangfei confirms that this change recovers most of the
performance degradation in boottime; for CONFIG_HZ_250 configuration,
specifically, boot times improve from 3m50s to 41s on Marc's setup;
and from 2m40s to ~9.7s on Zhangfei's setup.

In addition to the changes to default per phase delays, this
change adds 3 new kernel parameters - srcutree.srcu_max_nodelay,
srcutree.srcu_max_nodelay_phase, and srcutree.srcu_retry_check_delay.
This allows users to configure the srcu grace period scanning delays in
order to more quickly react to additional use cases.

Fixes: 640a7d37c3f4 ("srcu: Block less aggressively for expedited grace periods")
Fixes: 282d8998e9 ("srcu: Prevent expedited GPs and blocking readers from consuming CPU")
Reported-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Reported-by: yueluck <yueluck@163.com>
Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Link: https://lore.kernel.org/all/20615615-0013-5adc-584f-2b1d5c03ebfc@linaro.org/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-07-19 11:39:59 -07:00
Jason A. Donenfeld
049f9ae93d x86/rdrand: Remove "nordrand" flag in favor of "random.trust_cpu"
The decision of whether or not to trust RDRAND is controlled by the
"random.trust_cpu" boot time parameter or the CONFIG_RANDOM_TRUST_CPU
compile time default. The "nordrand" flag was added during the early
days of RDRAND, when there were worries that merely using its values
could compromise the RNG. However, these days, RDRAND values are not
used directly but always go through the RNG's hash function, making
"nordrand" no longer useful.

Rather, the correct switch is "random.trust_cpu", which not only handles
the relevant trust issue directly, but also is general to multiple CPU
types, not just x86.

However, x86 RDRAND does have a history of being occasionally
problematic. Prior, when the kernel would notice something strange, it'd
warn in dmesg and suggest enabling "nordrand". We can improve on that by
making the test a little bit better and then taking the step of
automatically disabling RDRAND if we detect it's problematic.

Also disable RDSEED if the RDRAND test fails.

Cc: x86@kernel.org
Cc: Theodore Ts'o <tytso@mit.edu>
Suggested-by: H. Peter Anvin <hpa@zytor.com>
Suggested-by: Borislav Petkov <bp@suse.de>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-07-18 15:04:04 +02:00
Dan Moulding
5a704629f2 init: add "hostname" kernel parameter
The gethostname system call returns the hostname for the current machine. 
However, the kernel has no mechanism to initially set the current
machine's name in such a way as to guarantee that the first userspace
process to call gethostname will receive a meaningful result.  It relies
on some unspecified userspace process to first call sethostname before
gethostname can produce a meaningful name.

Traditionally the machine's hostname is set from userspace by the init
system.  The init system, in turn, often relies on a configuration file
(say, /etc/hostname) to provide the value that it will supply in the call
to sethostname.  Consequently, the file system containing /etc/hostname
usually must be available before the hostname will be set.  There may,
however, be earlier userspace processes that could call gethostname before
the file system containing /etc/hostname is mounted.  Such a process will
get some other, likely meaningless, name from gethostname (such as
"(none)", "localhost", or "darkstar").

A real-world example where this can happen, and lead to undesirable
results, is with mdadm.  When assembling arrays, mdadm distinguishes
between "local" arrays and "foreign" arrays.  A local array is one that
properly belongs to the current machine, and a foreign array is one that
is (possibly temporarily) attached to the current machine, but properly
belongs to some other machine.  To determine if an array is local or
foreign, mdadm may compare the "homehost" recorded on the array with the
current hostname.  If mdadm is run before the root file system is mounted,
perhaps because the root file system itself resides on an md-raid array,
then /etc/hostname isn't yet available and the init system will not yet
have called sethostname, causing mdadm to incorrectly conclude that all of
the local arrays are foreign.

Solving this problem *could* be delegated to the init system.  It could be
left up to the init system (including any init system that starts within
an initramfs, if one is in use) to ensure that sethostname is called
before any other userspace process could possibly call gethostname. 
However, it may not always be obvious which processes could call
gethostname (for example, udev itself might not call gethostname, but it
could via udev rules invoke processes that do).  Additionally, the init
system has to ensure that the hostname configuration value is stored in
some place where it will be readily accessible during early boot. 
Unfortunately, every init system will attempt to (or has already attempted
to) solve this problem in a different, possibly incorrect, way.  This
makes getting consistently working configurations harder for users.

I believe it is better for the kernel to provide the means by which the
hostname may be set early, rather than making this a problem for the init
system to solve.  The option to set the hostname during early startup, via
a kernel parameter, provides a simple, reliable way to solve this problem.
It also could make system configuration easier for some embedded systems.

[dmoulding@me.com: v2]
  Link: https://lkml.kernel.org/r/20220506060310.7495-2-dmoulding@me.com
Link: https://lkml.kernel.org/r/20220505180651.22849-2-dmoulding@me.com
Signed-off-by: Dan Moulding <dmoulding@me.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-17 17:31:37 -07:00
Yunke Cao
39146d1141 media: vimc: documentation for lens
Add documentation for vimc-lens.
Add a lens into the vimc topology graph.

Signed-off-by: Yunke Cao <yunkec@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2022-07-15 14:44:51 +01:00
Jakub Kicinski
816cd16883 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
include/net/sock.h
  310731e2f1 ("net: Fix data-races around sysctl_mem.")
  e70f3c7012 ("Revert "net: set SK_MEM_QUANTUM to 4096"")
https://lore.kernel.org/all/20220711120211.7c8b7cba@canb.auug.org.au/

net/ipv4/fib_semantics.c
  747c143072 ("ip: fix dflt addr selection for connected nexthop")
  d62607c3fe ("net: rename reference+tracking helpers")

net/tls/tls.h
include/net/tls.h
  3d8c51b25a ("net/tls: Check for errors in tls_device_init")
  5879031423 ("tls: create an internal header")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-14 15:27:35 -07:00
Lukas Bulwahn
3cb5e51686 docs: admin: devices: drop confusing outdated statement on Latex
The statement that the Latex version of Linux Device List is unmaintained,
must go back to some historic time where there were actually two separate
document sources, one in a txt format and another in a tex format (maybe
even just on lanana.org).

Nowadays, html and tex are generated from the ReST document file and the
statement might be confused, believing the actual generated LaTeX document
from the maintained ReST document could be an unmaintained one.

Remove this statement on the LaTeX version and only keep pointing out that
the version on lanana.org is no longer maintained.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Link: https://lore.kernel.org/r/20220704122537.3407-6-lukas.bulwahn@gmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-07-14 15:03:56 -06:00
Mikulas Patocka
2ee73ef60d dm writecache: count number of blocks discarded, not number of discard bios
Change dm-writecache, so that it counts the number of blocks discarded
instead of the number of discard bios. Make it consistent with the
read and write statistics counters that were changed to count the
number of blocks instead of bios.

Fixes: e3a35d0340 ("dm writecache: add event counters")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-07-14 15:54:46 -04:00
Mikulas Patocka
b2676e1482 dm writecache: count number of blocks written, not number of write bios
Change dm-writecache, so that it counts the number of blocks written
instead of the number of write bios. Bios can be split and requeued
using the dm_accept_partial_bio function, so counting bios caused
inaccurate results.

Fixes: e3a35d0340 ("dm writecache: add event counters")
Reported-by: Yu Kuai <yukuai1@huaweicloud.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-07-14 15:53:25 -04:00
Mikulas Patocka
2c6e755b49 dm writecache: count number of blocks read, not number of read bios
Change dm-writecache, so that it counts the number of blocks read
instead of the number of read bios. Bios can be split and requeued
using the dm_accept_partial_bio function, so counting bios caused
inaccurate results.

Fixes: e3a35d0340 ("dm writecache: add event counters")
Reported-by: Yu Kuai <yukuai1@huaweicloud.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-07-14 15:52:32 -04:00
Tianyu Lan
20347fca71 swiotlb: split up the global swiotlb lock
Traditionally swiotlb was not performance critical because it was only
used for slow devices. But in some setups, like TDX/SEV confidential
guests, all IO has to go through swiotlb. Currently swiotlb only has a
single lock. Under high IO load with multiple CPUs this can lead to
significat lock contention on the swiotlb lock.

This patch splits the swiotlb bounce buffer pool into individual areas
which have their own lock. Each CPU tries to allocate in its own area
first. Only if that fails does it search other areas. On freeing the
allocation is freed into the area where the memory was originally
allocated from.

Area number can be set via swiotlb kernel parameter and is default
to be possible cpu number. If possible cpu number is not power of
2, area number will be round up to the next power of 2.

This idea from Andi Kleen patch(https://github.com/intel/tdx/commit/
4529b5784c141782c72ec9bd9a92df2b68cb7d45).

Based-on-idea-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-07-13 13:23:10 +02:00
Saravana Kannan
ae39e9ed96 module: Add support for default value for module async_probe
Add a module.async_probe kernel command line option that allows enabling
async probing for all modules. When this command line option is used,
there might still be some modules for which we want to explicitly force
synchronous probing, so extend <modulename>.async_probe to take an
optional bool input so that async probing can be disabled for a specific
module.

Signed-off-by: Saravana Kannan <saravanak@google.com>
Reviewed-by: Aaron Tomlin <atomlin@redhat.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2022-07-11 10:49:14 -07:00
Antoine Tenart
40ad0a52ef Documentation: add a description for net.core.high_order_alloc_disable
A description is missing for the net.core.high_order_alloc_disable
option in admin-guide/sysctl/net.rst ; add it. The above sysctl option
was introduced by commit ce27ec6064 ("net: add high_order_alloc_disable
sysctl/static key").

Thanks to Eric for running again the benchmark cited in the above
commit, showing this knob is now mostly of historical importance.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220707080245.180525-1-atenart@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-08 20:15:59 -07:00
Mauro Carvalho Chehab
7ac3945d8e Documentation: KVM: update amd-memory-encryption.rst references
Changeset daec8d4083 ("Documentation: KVM: add separate directories for architecture-specific documentation")
renamed: Documentation/virt/kvm/amd-memory-encryption.rst
to: Documentation/virt/kvm/x86/amd-memory-encryption.rst.

Update the cross-references accordingly.

Fixes: daec8d4083 ("Documentation: KVM: add separate directories for architecture-specific documentation")
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Link: https://lore.kernel.org/r/fd80db889e34aae87a4ca88cad94f650723668f4.1656234456.git.mchehab@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-07-07 13:09:59 -06:00
Bagas Sanjaya
11093e6f0d Documentation: dm writecache: Render status list as list
The status list isn't rendered as list, but rather as normal paragraph,
because there is missing blank line between "Status:" line and the list.

Fix the issue by adding the blank line separator.

Fixes: 48debafe4f ("dm: add writecache target")
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-07-07 14:40:32 -04:00
Mauro Carvalho Chehab
8b301af4c6 Documentation: dm writecache: add blank line before optional parameters
Otherwise this warning occurs:
  Documentation/admin-guide/device-mapper/writecache.rst:23: WARNING: Unexpected indentation.

Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-07-07 14:37:58 -04:00
Will Deacon
aaaee7b55c docs: perf: Include hns3-pmu.rst in toctree to fix 'htmldocs' WARNING
After commit 39915b6b5f ("drivers/perf: hisi: Add description for HNS3
PMU driver"),building the 'htmldocs' target results in the following
warning:

 | Documentation/admin-guide/perf/hns3-pmu.rst: WARNING: document isn't included in any toctree

Add 'hns3-pmu' to the perf toctree to silence the warning.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Will Deacon <will@kernel.org>
2022-07-07 11:25:40 +01:00
Suravee Suthikulpanit
bbe3a10658 iommu/amd: Add PCI segment support for ivrs_[ioapic/hpet/acpihid] commands
By default, PCI segment is zero and can be omitted. To support system
with non-zero PCI segment ID, modify the parsing functions to allow
PCI segment ID.

Co-developed-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Link: https://lore.kernel.org/r/20220706113825.25582-33-vasant.hegde@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-07-07 09:37:53 +02:00
Guangbin Huang
39915b6b5f drivers/perf: hisi: Add description for HNS3 PMU driver
HNS3 PMU End Point device is supported on HiSilicon HIP09 platform, so
add document hns3-pmu.rst to provide guidance on how to use it.

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Reviewed-by: John Garry <john.garry@huawei.com>
Reviewed-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Link: https://lore.kernel.org/r/20220628063419.38514-2-huangguangbin2@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-07-06 11:25:53 +01:00
Muchun Song
6636109512 mm: memory_hotplug: make hugetlb_optimize_vmemmap compatible with memmap_on_memory
For now, the feature of hugetlb_free_vmemmap is not compatible with the
feature of memory_hotplug.memmap_on_memory, and hugetlb_free_vmemmap takes
precedence over memory_hotplug.memmap_on_memory.  However, someone wants
to make memory_hotplug.memmap_on_memory takes precedence over
hugetlb_free_vmemmap since memmap_on_memory makes it more likely to
succeed memory hotplug in close-to-OOM situations.  So the decision of
making hugetlb_free_vmemmap take precedence is not wise and elegant.

The proper approach is to have hugetlb_vmemmap.c do the check whether the
section which the HugeTLB pages belong to can be optimized.  If the
section's vmemmap pages are allocated from the added memory block itself,
hugetlb_free_vmemmap should refuse to optimize the vmemmap, otherwise, do
the optimization.  Then both kernel parameters are compatible.  So this
patch introduces VmemmapSelfHosted to mask any non-optimizable vmemmap
pages.  The hugetlb_vmemmap can use this flag to detect if a vmemmap page
can be optimized.

[songmuchun@bytedance.com: walk vmemmap page tables to avoid false-positive]
  Link: https://lkml.kernel.org/r/20220620110616.12056-3-songmuchun@bytedance.com
Link: https://lkml.kernel.org/r/20220617135650.74901-3-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Co-developed-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-03 18:08:49 -07:00
SeongJae Park
6acfcd0d75 Docs/admin-guide/damon: add a document for DAMON_LRU_SORT
This commit documents the usage of DAMON_LRU_SORT for admins.

Link: https://lkml.kernel.org/r/20220613192301.8817-10-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-03 18:08:43 -07:00
SeongJae Park
b57e39a743 Docs/admin-guide/damon/sysfs: document 'LRU_DEPRIO' scheme action
This commit documents the 'LRU_DEPRIO' scheme action for DAMON sysfs
interface.`

Link: https://lkml.kernel.org/r/20220613192301.8817-8-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-03 18:08:43 -07:00
SeongJae Park
0bcba960b1 Docs/admin-guide/damon/sysfs: document 'LRU_PRIO' scheme action
This commit documents the 'lru_prio' scheme action for DAMON sysfs
interface.

Link: https://lkml.kernel.org/r/20220613192301.8817-6-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-03 18:08:43 -07:00
Roman Gushchin
bbf535fd6f mm: shrinkers: add scan interface for shrinker debugfs
Add a scan interface which allows to trigger scanning of a particular
shrinker and specify memcg and numa node.  It's useful for testing,
debugging and profiling of a specific scan_objects() callback.  Unlike
alternatives (creating a real memory pressure and dropping caches via
/proc/sys/vm/drop_caches) this interface allows to interact with only one
shrinker at once.  Also, if a shrinker is misreporting the number of
objects (as some do), it doesn't affect scanning.

[roman.gushchin@linux.dev: improve typing, fix arg count checking]
  Link: https://lkml.kernel.org/r/YpgKttTowT22mKPQ@carbon
[akpm@linux-foundation.org: fix arg count checking]
Link: https://lkml.kernel.org/r/20220601032227.4076670-7-roman.gushchin@linux.dev
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Muchun Song <songmuchun@bytedance.com>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-03 18:08:41 -07:00
Roman Gushchin
7507f0991d mm: docs: document shrinker debugfs
Add a document describing the shrinker debugfs interface.

Link: https://lkml.kernel.org/r/20220601032227.4076670-5-roman.gushchin@linux.dev
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-03 18:08:40 -07:00
SeongJae Park
2054980125 Docs/admin-guide/damon/reclaim: remove a paragraph that been obsolete due to online tuning support
Patch series "mm/damon: trivial cleanups".

This patchset contains trivial cleansups for DAMON code.


This patch (of 6):

Commit 81a84182c3 ("Docs/admin-guide/mm/damon/reclaim: document
'commit_inputs' parameter") has documented the 'commit_inputs' parameter
which allows online parameter update, but it didn't remove a paragraph
saying the online parameter update is impossible.  This commit removes the
obsolete paragraph.

Link: https://lkml.kernel.org/r/20220606182310.48781-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20220606182310.48781-2-sj@kernel.org
Fixes: 81a84182c3 ("Docs/admin-guide/mm/damon/reclaim: document 'commit_inputs' parameter")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-03 18:08:37 -07:00
David Gow
2852ca7fba panic: Taint kernel if tests are run
Most in-kernel tests (such as KUnit tests) are not supposed to run on
production systems: they may do deliberately illegal things to trigger
errors, and have security implications (for example, KUnit assertions
will often deliberately leak kernel addresses).

Add a new taint type, TAINT_TEST to signal that a test has been run.
This will be printed as 'N' (originally for kuNit, as every other
sensible letter was taken.)

This should discourage people from running these tests on production
systems, and to make it easier to tell if tests have been run
accidentally (by loading the wrong configuration, etc.)

Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-07-01 16:38:35 -06:00
Marc Zyngier
504ee23611 arm64: Add the arm64.nosve command line option
In order to be able to completely disable SVE even if the HW
seems to support it (most likely because the FW is broken),
move the SVE setup into the EL2 finalisation block, and
use a new idreg override to deal with it.

Note that we also nuke id_aa64zfr0_el1 as a byproduct, and
that SME also gets disabled, due to the dependency between the
two features.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220630160500.1536744-9-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-07-01 15:22:52 +01:00
Marc Zyngier
b3000e2133 arm64: Add the arm64.nosme command line option
In order to be able to completely disable SME even if the HW
seems to support it (most likely because the FW is broken),
move the SME setup into the EL2 finalisation block, and
use a new idreg override to deal with it.

Note that we also nuke id_aa64smfr0_el1 as a byproduct.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220630160500.1536744-8-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-07-01 15:22:52 +01:00
Matthew Wilcox (Oracle)
2bb876b58d filemap: Remove add_to_page_cache() and add_to_page_cache_locked()
These functions have no more users, so delete them.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
2022-06-29 08:51:05 -04:00
Christophe Leroy
56e54b4e6c powerpc/32: Remove 'noltlbs' kernel parameter
Mapping without large TLBs has no added value on the 8xx.

Mapping without large TLBs is still necessary on 40x when
selecting CONFIG_KFENCE or CONFIG_DEBUG_PAGEALLOC or
CONFIG_STRICT_KERNEL_RWX, but this is done automatically
and doesn't require user selection.

Remove 'noltlbs' kernel parameter, the user has no reason
to use it.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/80ca17bd39cf608a8ebd0764d7064a498e131199.1655202721.git.christophe.leroy@csgroup.eu
2022-06-29 16:59:06 +10:00
Christophe Leroy
1ce844973b powerpc/32: Remove the 'nobats' kernel parameter
Mapping without BATs doesn't bring any added value to the user.

Remove that option.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/6977314c823cfb728bc0273cea634b41807bfb64.1655202721.git.christophe.leroy@csgroup.eu
2022-06-29 16:59:06 +10:00
Liu Song
e92b25731e arm64: correct the effect of mitigations off on kpti
If KASLR is enabled, then kpti will be forced to be enabled even if
mitigations off, so we need to adjust the description of this parameter.

Signed-off-by: Liu Song <liusong@linux.alibaba.com>
Link: https://lore.kernel.org/r/1656033648-84181-1-git-send-email-liusong@linux.alibaba.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-06-28 15:08:01 +01:00
Mike Rapoport
ee65728e10 docs: rename Documentation/vm to Documentation/mm
so it will be consistent with code mm directory and with
Documentation/admin-guide/mm and won't be confused with virtual machines.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Tested-by: Ira Weiny <ira.weiny@intel.com>
Acked-by: Jonathan Corbet <corbet@lwn.net>
Acked-by: Wu XiangCheng <bobwxc@email.cn>
2022-06-27 12:52:53 -07:00
akpm
46a3b11253 Merge branch 'master' into mm-stable 2022-06-27 10:31:34 -07:00
Peter Zijlstra
3ebc170068 x86/bugs: Add retbleed=ibpb
jmp2ret mitigates the easy-to-attack case at relatively low overhead.
It mitigates the long speculation windows after a mispredicted RET, but
it does not mitigate the short speculation window from arbitrary
instruction boundaries.

On Zen2, there is a chicken bit which needs setting, which mitigates
"arbitrary instruction boundaries" down to just "basic block boundaries".

But there is no fix for the short speculation window on basic block
boundaries, other than to flush the entire BTB to evict all attacker
predictions.

On the spectrum of "fast & blurry" -> "safe", there is (on top of STIBP
or no-SMT):

  1) Nothing		System wide open
  2) jmp2ret		May stop a script kiddy
  3) jmp2ret+chickenbit  Raises the bar rather further
  4) IBPB		Only thing which can count as "safe".

Tentative numbers put IBPB-on-entry at a 2.5x hit on Zen2, and a 10x hit
on Zen1 according to lmbench.

  [ bp: Fixup feature bit comments, document option, 32-bit build fix. ]

Suggested-by: Andrew Cooper <Andrew.Cooper3@citrix.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27 10:34:00 +02:00
Pawan Gupta
7c693f54c8 x86/speculation: Add spectre_v2=ibrs option to support Kernel IBRS
Extend spectre_v2= boot option with Kernel IBRS.

  [jpoimboe: no STIBP with IBRS]

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27 10:33:59 +02:00
Kim Phillips
e8ec1b6e08 x86/bugs: Enable STIBP for JMP2RET
For untrained return thunks to be fully effective, STIBP must be enabled
or SMT disabled.

Co-developed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27 10:33:59 +02:00
Alexandre Chartre
7fbf47c7ce x86/bugs: Add AMD retbleed= boot parameter
Add the "retbleed=<value>" boot parameter to select a mitigation for
RETBleed. Possible values are "off", "auto" and "unret"
(JMP2RET mitigation). The default value is "auto".

Currently, "retbleed=auto" will select the unret mitigation on
AMD and Hygon and no mitigation on Intel (JMP2RET is not effective on
Intel).

  [peterz: rebase; add hygon]
  [jpoimboe: cleanups]

Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27 10:33:59 +02:00
Stephen Kitt
30fb876141 docs: admin-guide/sysctl: Fix rendering error
Text in ``literal`` markup must be separated by word separators, so text
like ``lowwater``% renders incorrectly.  Add the suggested "\ " after two
problematic occurrences.

Signed-off-by: Stephen Kitt <steve@sk2.org>
Link: https://lore.kernel.org/r/20220624110230.595740-1-steve@sk2.org
[jc: tweaked to use "\ "]
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-06-24 13:07:37 -06:00
David Matlack
ada51a9de7 KVM: x86/mmu: Extend Eager Page Splitting to nested MMUs
Add support for Eager Page Splitting pages that are mapped by nested
MMUs. Walk through the rmap first splitting all 1GiB pages to 2MiB
pages, and then splitting all 2MiB pages to 4KiB pages.

Note, Eager Page Splitting is limited to nested MMUs as a policy rather
than due to any technical reason (the sp->role.guest_mode check could
just be deleted and Eager Page Splitting would work correctly for all
shadow MMU pages). There is really no reason to support Eager Page
Splitting for tdp_mmu=N, since such support will eventually be phased
out, and there is no current use case supporting Eager Page Splitting on
hosts where TDP is either disabled or unavailable in hardware.
Furthermore, future improvements to nested MMU scalability may diverge
the code from the legacy shadow paging implementation. These
improvements will be simpler to make if Eager Page Splitting does not
have to worry about legacy shadow paging.

Splitting huge pages mapped by nested MMUs requires dealing with some
extra complexity beyond that of the TDP MMU:

(1) The shadow MMU has a limit on the number of shadow pages that are
    allowed to be allocated. So, as a policy, Eager Page Splitting
    refuses to split if there are KVM_MIN_FREE_MMU_PAGES or fewer
    pages available.

(2) Splitting a huge page may end up re-using an existing lower level
    shadow page tables. This is unlike the TDP MMU which always allocates
    new shadow page tables when splitting.

(3) When installing the lower level SPTEs, they must be added to the
    rmap which may require allocating additional pte_list_desc structs.

Case (2) is especially interesting since it may require a TLB flush,
unlike the TDP MMU which can fully split huge pages without any TLB
flushes. Specifically, an existing lower level page table may point to
even lower level page tables that are not fully populated, effectively
unmapping a portion of the huge page, which requires a flush.  As of
this commit, a flush is always done always after dropping the huge page
and before installing the lower level page table.

This TLB flush could instead be delayed until the MMU lock is about to be
dropped, which would batch flushes for multiple splits.  However these
flushes should be rare in practice (a huge page must be aliased in
multiple SPTEs and have been split for NX Huge Pages in only some of
them). Flushing immediately is simpler to plumb and also reduces the
chances of tripping over a CPU bug (e.g. see iTLB multihit).

[ This commit is based off of the original implementation of Eager Page
  Splitting from Peter in Google's kernel from 2016. ]

Suggested-by: Peter Feiner <pfeiner@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220516232138.1783324-23-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24 04:52:00 -04:00
Paul E. McKenney
89f7f29140 doc: Document rcutree.nocb_nobypass_lim_per_jiffy kernel parameter
This commit provides documentation for the kernel parameter controlling
RCU's handling of callback floods on offloaded (rcu_nocbs) CPUs.
This parameter might be obscure, but it is always there when you need it.

Reported-by: Frederic Weisbecker <frederic@kernel.org>
Reported-by: Uladzislau Rezki <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
2022-06-21 11:57:53 -07:00
Paul E. McKenney
71de1e34f1 doc: Document the rcutree.rcu_divisor kernel boot parameter
This commit adds kernel-parameters.txt documentation for the
rcutree.rcu_divisor kernel boot parameter, which controls the softirq
callback-invocation batch limit.

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
2022-06-21 11:57:09 -07:00
Hans Verkuil
d7365ae8ea media: vivid.rst: document HDMI Video Guard Band control
Document this new vivid test control.

Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2022-06-20 10:30:31 +01:00
Qi Zheng
673520f8da mm: memcontrol: add {pgscan,pgsteal}_{kswapd,direct} items in memory.stat of cgroup v2
There are already statistics of {pgscan,pgsteal}_kswapd and
{pgscan,pgsteal}_direct of memcg event here, but now only the sum of the
two is displayed in memory.stat of cgroup v2.

In order to obtain more accurate information during monitoring and
debugging, and to align with the display in /proc/vmstat, it better to
display {pgscan,pgsteal}_kswapd and {pgscan,pgsteal}_direct separately.

Also, for forward compatibility, we still display pgscan and pgsteal items
so that it won't break existing applications.

[zhengqi.arch@bytedance.com: add comment for memcg_vm_event_stat (suggested by Michal)]
  Link: https://lkml.kernel.org/r/20220606154028.55030-1-zhengqi.arch@bytedance.com
[zhengqi.arch@bytedance.com: fix the doc, thanks to Johannes]
  Link: https://lkml.kernel.org/r/20220607064803.79363-1-zhengqi.arch@bytedance.com
Link: https://lkml.kernel.org/r/20220604082209.55174-1-zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-16 19:48:29 -07:00
Linus Torvalds
24625f7d91 ARM64:
* Properly reset the SVE/SME flags on vcpu load
 
 * Fix a vgic-v2 regression regarding accessing the pending
 state of a HW interrupt from userspace (and make the code
 common with vgic-v3)
 
 * Fix access to the idreg range for protected guests
 
 * Ignore 'kvm-arm.mode=protected' when using VHE
 
 * Return an error from kvm_arch_init_vm() on allocation failure
 
 * A bunch of small cleanups (comments, annotations, indentation)
 
 RISC-V:
 
 * Typo fix in arch/riscv/kvm/vmid.c
 
 * Remove broken reference pattern from MAINTAINERS entry
 
 x86-64:
 
 * Fix error in page tables with MKTME enabled
 
 * Dirty page tracking performance test extended to running a nested
   guest
 
 * Disable APICv/AVIC in cases that it cannot implement correctly
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmKjTIAUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNhPQgAiIVtp8aepujUM/NhkNyK3SIdLzlS
 oZCZiS6bvaecKXi/QvhBU0EBxAEyrovk3lmVuYNd41xI+PDjyaA4SDIl5DnToGUw
 bVPNFSYqjpF939vUUKjc0RCdZR4o5g3Od3tvWoHTHviS1a8aAe5o9pcpHpD0D6Mp
 Gc/o58nKAOPl3htcFKmjymqo3Y6yvkJU9NB7DCbL8T5mp5pJ959Mw1/LlmBaAzJC
 OofrynUm4NjMyAj/mAB1FhHKFyQfjBXLhiVlS0SLiiEA/tn9/OXyVFMKG+n5VkAZ
 Q337GMFe2RikEIuMEr3Rc4qbZK3PpxHhaj+6MPRuM0ho/P4yzl2Nyb/OhA==
 =h81Q
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "While last week's pull request contained miscellaneous fixes for x86,
  this one covers other architectures, selftests changes, and a bigger
  series for APIC virtualization bugs that were discovered during 5.20
  development. The idea is to base 5.20 development for KVM on top of
  this tag.

  ARM64:

   - Properly reset the SVE/SME flags on vcpu load

   - Fix a vgic-v2 regression regarding accessing the pending state of a
     HW interrupt from userspace (and make the code common with vgic-v3)

   - Fix access to the idreg range for protected guests

   - Ignore 'kvm-arm.mode=protected' when using VHE

   - Return an error from kvm_arch_init_vm() on allocation failure

   - A bunch of small cleanups (comments, annotations, indentation)

  RISC-V:

   - Typo fix in arch/riscv/kvm/vmid.c

   - Remove broken reference pattern from MAINTAINERS entry

  x86-64:

   - Fix error in page tables with MKTME enabled

   - Dirty page tracking performance test extended to running a nested
     guest

   - Disable APICv/AVIC in cases that it cannot implement correctly"

[ This merge also fixes a misplaced end parenthesis bug introduced in
  commit 3743c2f025 ("KVM: x86: inhibit APICv/AVIC on changes to APIC
  ID or APIC base") pointed out by Sean Christopherson ]

Link: https://lore.kernel.org/all/20220610191813.371682-1-seanjc@google.com/

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (34 commits)
  KVM: selftests: Restrict test region to 48-bit physical addresses when using nested
  KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2
  KVM: selftests: Clean up LIBKVM files in Makefile
  KVM: selftests: Link selftests directly with lib object files
  KVM: selftests: Drop unnecessary rule for STATIC_LIBS
  KVM: selftests: Add a helper to check EPT/VPID capabilities
  KVM: selftests: Move VMX_EPT_VPID_CAP_AD_BITS to vmx.h
  KVM: selftests: Refactor nested_map() to specify target level
  KVM: selftests: Drop stale function parameter comment for nested_map()
  KVM: selftests: Add option to create 2M and 1G EPT mappings
  KVM: selftests: Replace x86_page_size with PG_LEVEL_XX
  KVM: x86: SVM: fix nested PAUSE filtering when L0 intercepts PAUSE
  KVM: x86: SVM: drop preempt-safe wrappers for avic_vcpu_load/put
  KVM: x86: disable preemption around the call to kvm_arch_vcpu_{un|}blocking
  KVM: x86: disable preemption while updating apicv inhibition
  KVM: x86: SVM: fix avic_kick_target_vcpus_fast
  KVM: x86: SVM: remove avic's broken code that updated APIC ID
  KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base
  KVM: x86: document AVIC/APICv inhibit reasons
  KVM: x86/mmu: Set memory encryption "value", not "mask", in shadow PDPTRs
  ...
2022-06-14 07:57:18 -07:00
Linus Torvalds
8e8afafb0b Yet another hw vulnerability with a software mitigation: Processor MMIO
Stale Data.
 
 They are a class of MMIO-related weaknesses which can expose stale data
 by propagating it into core fill buffers. Data which can then be leaked
 using the usual speculative execution methods.
 
 Mitigations include this set along with microcode updates and are
 similar to MDS and TAA vulnerabilities: VERW now clears those buffers
 too.
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmKXMkMTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoWGPD/idalLIhhV5F2+hZIKm0WSnsBxAOh9K
 7y8xBxpQQ5FUfW3vm7Pg3ro6VJp7w2CzKoD4lGXzGHriusn3qst3vkza9Ay8xu8g
 RDwKe6hI+p+Il9BV9op3f8FiRLP9bcPMMReW/mRyYsOnJe59hVNwRAL8OG40PY4k
 hZgg4Psfvfx8bwiye5efjMSe4fXV7BUCkr601+8kVJoiaoszkux9mqP+cnnB5P3H
 zW1d1jx7d6eV1Y063h7WgiNqQRYv0bROZP5BJkufIoOHUXDpd65IRF3bDnCIvSEz
 KkMYJNXb3qh7EQeHS53NL+gz2EBQt+Tq1VH256qn6i3mcHs85HvC68gVrAkfVHJE
 QLJE3MoXWOqw+mhwzCRrEXN9O1lT/PqDWw8I4M/5KtGG/KnJs+bygmfKBbKjIVg4
 2yQWfMmOgQsw3GWCRjgEli7aYbDJQjany0K/qZTq54I41gu+TV8YMccaWcXgDKrm
 cXFGUfOg4gBm4IRjJ/RJn+mUv6u+/3sLVqsaFTs9aiib1dpBSSUuMGBh548Ft7g2
 5VbFVSDaLjB2BdlcG7enlsmtzw0ltNssmqg7jTK/L7XNVnvxwUoXw+zP7RmCLEYt
 UV4FHXraMKNt2ZketlomC8ui2hg73ylUp4pPdMXCp7PIXp9sVamRTbpz12h689VJ
 /s55bWxHkR6S
 =LBxT
 -----END PGP SIGNATURE-----

Merge tag 'x86-bugs-2022-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 MMIO stale data fixes from Thomas Gleixner:
 "Yet another hw vulnerability with a software mitigation: Processor
  MMIO Stale Data.

  They are a class of MMIO-related weaknesses which can expose stale
  data by propagating it into core fill buffers. Data which can then be
  leaked using the usual speculative execution methods.

  Mitigations include this set along with microcode updates and are
  similar to MDS and TAA vulnerabilities: VERW now clears those buffers
  too"

* tag 'x86-bugs-2022-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/speculation/mmio: Print SMT warning
  KVM: x86/speculation: Disable Fill buffer clear within guests
  x86/speculation/mmio: Reuse SRBDS mitigation for SBDS
  x86/speculation/srbds: Update SRBDS mitigation selection
  x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale Data
  x86/speculation/mmio: Enable CPU Fill buffer clearing on idle
  x86/bugs: Group MDS, TAA & Processor MMIO Stale Data mitigations
  x86/speculation/mmio: Add mitigation for Processor MMIO Stale Data
  x86/speculation: Add a common function for MD_CLEAR mitigation update
  x86/speculation/mmio: Enumerate Processor MMIO Stale Data bug
  Documentation: Add documentation for Processor MMIO Stale Data
2022-06-14 07:43:15 -07:00
Randy Dunlap
8d6d51edcb docs: selinux: add '=' signs to kernel boot options
Provide the full kernel boot option string (with ending '=' sign).
They won't work without that and that is how other boot options are
listed.

If used without an '=' sign (as listed here), they cause an "Unknown
parameters" message and are added to init's argument strings,
polluting them.

  Unknown kernel command line parameters "enforcing checkreqprot
    BOOT_IMAGE=/boot/bzImage-517rc6", will be passed to user space.

 Run /sbin/init as init process
   with arguments:
     /sbin/init
     enforcing
     checkreqprot
   with environment:
     HOME=/
     TERM=linux
     BOOT_IMAGE=/boot/bzImage-517rc6

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Eric Paris <eparis@parisplace.org>
Cc: selinux@vger.kernel.org
Cc: Jonathan Corbet <corbet@lwn.net>
[PM: removed bogus 'Fixes' line]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-06-13 14:59:53 -04:00
Will Deacon
cde5042adf KVM: arm64: Ignore 'kvm-arm.mode=protected' when using VHE
Ignore 'kvm-arm.mode=protected' when using VHE so that kvm_get_mode()
only returns KVM_MODE_PROTECTED on systems where the feature is available.

Cc: David Brazdil <dbrazdil@google.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220609121223.2551-4-will@kernel.org
2022-06-09 13:24:02 +01:00
Wyes Karny
8bcedb4ce0 x86: Handle idle=nomwait cmdline properly for x86_idle
When kernel is booted with idle=nomwait do not use MWAIT as the
default idle state.

If the user boots the kernel with idle=nomwait, it is a clear
direction to not use mwait as the default idle state.
However, the current code does not take this into consideration
while selecting the default idle state on x86.

Fix it by checking for the idle=nomwait boot option in
prefer_mwait_c1_over_halt().

Also update the documentation around idle=nomwait appropriately.

[ dhansen: tweak commit message ]

Signed-off-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Link: https://lkml.kernel.org/r/fdc2dc2d0a1bc21c2f53d989ea2d2ee3ccbc0dbe.1654538381.git-series.wyes.karny@amd.com
2022-06-08 12:58:58 -07:00
Linus Torvalds
500a434fc5 Driver core changes for 5.19-rc1
Here is the set of driver core changes for 5.19-rc1.
 
 Note, I'm not really happy with this pull request as-is, see below for
 details, but overall this is all good for everything but a small set of
 systems, which we have a fix for already.
 
 Lots of tiny driver core changes and cleanups happened this cycle,
 but the two major things were:
 
 	- firmware_loader reorganization and additions including the
 	  ability to have XZ compressed firmware images and the ability
 	  for userspace to initiate the firmware load when it needs to,
 	  instead of being always initiated by the kernel. FPGA devices
 	  specifically want this ability to have their firmware changed
 	  over the lifetime of the system boot, and this allows them to
 	  work without having to come up with yet-another-custom-uapi
 	  interface for loading firmware for them.
 	- physical location support added to sysfs so that devices that
 	  know this information, can tell userspace where they are
 	  located in a common way.  Some ACPI devices already support
 	  this today, and more bus types should support this in the
 	  future.
 
 Smaller changes included:
 	- driver_override api cleanups and fixes
 	- error path cleanups and fixes
 	- get_abi script fixes
 	- deferred probe timeout changes.
 
 It's that last change that I'm the most worried about.  It has been
 reported to cause boot problems for a number of systems, and I have a
 tested patch series that resolves this issue.  But I didn't get it
 merged into my tree before 5.18-final came out, so it has not gotten any
 linux-next testing.
 
 I'll send the fixup patches (there are 2) as a follow-on series to this
 pull request if you want to take them directly, _OR_ I can just revert
 the probe timeout changes and they can wait for the next -rc1 merge
 cycle.  Given that the fixes are tested, and pretty simple, I'm leaning
 toward that choice.  Sorry this all came at the end of the merge window,
 I should have resolved this all 2 weeks ago, that's my fault as it was
 in the middle of some travel for me.
 
 All have been tested in linux-next for weeks, with no reported issues
 other than the above-mentioned boot time outs.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYpnv/A8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+yk/fACgvmenbo5HipqyHnOmTQlT50xQ9EYAn2eTq6ai
 GkjLXBGNWOPBa5cU52qf
 =yEi/
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the set of driver core changes for 5.19-rc1.

  Lots of tiny driver core changes and cleanups happened this cycle, but
  the two major things are:

   - firmware_loader reorganization and additions including the ability
     to have XZ compressed firmware images and the ability for userspace
     to initiate the firmware load when it needs to, instead of being
     always initiated by the kernel. FPGA devices specifically want this
     ability to have their firmware changed over the lifetime of the
     system boot, and this allows them to work without having to come up
     with yet-another-custom-uapi interface for loading firmware for
     them.

   - physical location support added to sysfs so that devices that know
     this information, can tell userspace where they are located in a
     common way. Some ACPI devices already support this today, and more
     bus types should support this in the future.

  Smaller changes include:

   - driver_override api cleanups and fixes

   - error path cleanups and fixes

   - get_abi script fixes

   - deferred probe timeout changes.

  It's that last change that I'm the most worried about. It has been
  reported to cause boot problems for a number of systems, and I have a
  tested patch series that resolves this issue. But I didn't get it
  merged into my tree before 5.18-final came out, so it has not gotten
  any linux-next testing.

  I'll send the fixup patches (there are 2) as a follow-on series to this
  pull request.

  All have been tested in linux-next for weeks, with no reported issues
  other than the above-mentioned boot time-outs"

* tag 'driver-core-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (55 commits)
  driver core: fix deadlock in __device_attach
  kernfs: Separate kernfs_pr_cont_buf and rename_lock.
  topology: Remove unused cpu_cluster_mask()
  driver core: Extend deferred probe timeout on driver registration
  MAINTAINERS: add Russ Weight as a firmware loader maintainer
  driver: base: fix UAF when driver_attach failed
  test_firmware: fix end of loop test in upload_read_show()
  driver core: location: Add "back" as a possible output for panel
  driver core: location: Free struct acpi_pld_info *pld
  driver core: Add "*" wildcard support to driver_async_probe cmdline param
  driver core: location: Check for allocations failure
  arch_topology: Trace the update thermal pressure
  kernfs: Rename kernfs_put_open_node to kernfs_unlink_open_file.
  export: fix string handling of namespace in EXPORT_SYMBOL_NS
  rpmsg: use local 'dev' variable
  rpmsg: Fix calling device_lock() on non-initialized device
  firmware_loader: describe 'module' parameter of firmware_upload_register()
  firmware_loader: Move definitions from sysfs_upload.h to sysfs.h
  firmware_loader: Fix configs for sysfs split
  selftests: firmware: Add firmware upload selftests
  ...
2022-06-03 11:48:47 -07:00
Linus Torvalds
50fd82b3a9 A handful of late-arriving documentation fixes and the addition of an SVG
tux logo which, I'm assured, we're going to want.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmKZLSEPHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5YvDgH/RKxfYbTZmNinh8Sk0hvQwfhJdp92J6UjElc
 csheqKyK7Qvm8Gv4OtpXABhmrE7LBNQK9CU4EzqgW1t3vtA/syiYfpAQfjzSyEyG
 /RVFwd9XHJTBW+ZraYtDNrgmMPgDwQ6GFfDUzHKc2++Xn0Hj4jzeKgk9KiZu0JN3
 MJOPY4mCnRWWQd0jceQPp3ZqBiZuHKC5rIwVfWQXuCP10NsQxc3dpXOqKePfFC5A
 SUDJN5S8GLPj25qqjHBe4+MyDf+lhXBbTSu2aAjYJkbFWTEY1TrA6DHX9LHCaQng
 s3HDbQwrk2DOx6Uaz2foBbDM6tIpzyF3iWVbSoSbqhpurRG94xM=
 =hhkL
 -----END PGP SIGNATURE-----

Merge tag 'docs-5.19-2' of git://git.lwn.net/linux

Pull documentation fixes from Jonathan Corbet:
 "A handful of late-arriving documentation fixes and the addition of an
  SVG tux logo which, I'm assured, we're going to want"

* tag 'docs-5.19-2' of git://git.lwn.net/linux:
  documentation: Format button_dev as a pointer.
  docs: add SVG version of the Linux logo
  docs: move Linux logo into a new `images` folder
  docs: blockdev: change title to match section content
  docs/conf.py: Cope with removal of language=None in Sphinx 5.0.0
2022-06-02 15:36:06 -07:00
Joel Colledge
938285a130 docs: blockdev: change title to match section content
This index.rst was added in commit
39443104c7 docs: blockdev: convert to ReST

It appears that the title from the RapidIO index page was copied. This
title does not match the content of this directory. Change it to match.

Signed-off-by: Joel Colledge <joel.colledge@linbit.com>
Link: https://lore.kernel.org/r/20220530142849.717-1-joel.colledge@linbit.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-06-01 09:30:53 -06:00
Linus Torvalds
700170bf6b NFS Client Updates for Linux 5.18
- New Features:
   - Add support for 'dacl' and 'sacl' attributes
 
 - Bugfixes and Cleanups:
   - Fixes for reporting mapping errors
   - Fixes for memory allocation errors
   - Improve warning message when locks are lost
   - Update documentation for the nfs4_unique_id parameter
   - Add an explanation of NFSv4 client identifiers
   - Ensure the i_size attribute is written to the fscache storage
   - Fix freeing uninitialized nfs4_labels
   - Better handling when xprtrdma bc_serv is NULL
   - Marke qualified async operations as MOVEABLE tasks
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmKWhFQACgkQ18tUv7Cl
 QOszjRAAllmtKLbzOkwQwcT3e5ljh9NEJ8NL+Nv1SXjozpFY+1fuXc0ivT4rniU6
 68ZHz+faK2UtLytwOO94M0jo2RCAlYS5rfnts89CpdfP3bqmGPAj0Ytw/c/vg+Qf
 4eQbAzz++T35DgU7cdeKKZKg9Wtwbq7g0kYv1W8QCiCbxakSjnc/V9Ll5XhS/CAC
 1WqKD90TRKUkX0Y1NNsNdXB1dJn/6QAq9B6JTjan+2Rhn7/NCTU8p98mEZGcVD7r
 cPHyXTqkPF4IH7lgjEMIRf6eXEzDDZNIs98QLdHJ2Gk0LxW7p7IL7VW8TKzYunvl
 coA1bZfYhUZBUJ+eDrrKZ5hHMSn/+eNR5iiIcfqtyU8o3J0NXAXGlLh/iJSGsxIH
 PjyjWSfpCgoZVPc4dG3lxR9Iu7UZeAuuB2ZoiNakUkd+UNKK5U5PpaPnYT6adaIp
 TegivZclCmgyLQiAdPRifDzhaL5J2pp6kVb5iMY6oX+ObyclW/UcqzKMqIKSt3R8
 6+JAmZ6633ojS4r3xFsw/dlEUWuuVq7kYwXK209LqiBn5vvjWNa/WgH4MaSfnJe9
 rlw+fs8Aky0w59IhzRJMMVCJ/Q2EYDKmtQLQgYVw80RBFiFgBpMW0wDqMGiddTcu
 1IZ2c5+t1GxfASpyu8miexQjRJW6A2MTp0gfHGiHarxdCpaAycA=
 =0ccI
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.19-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client updates from Anna Schumaker:
 "New Features:
   - Add support for 'dacl' and 'sacl' attributes

  Bugfixes and Cleanups:
   - Fixes for reporting mapping errors
   - Fixes for memory allocation errors
   - Improve warning message when locks are lost
   - Update documentation for the nfs4_unique_id parameter
   - Add an explanation of NFSv4 client identifiers
   - Ensure the i_size attribute is written to the fscache storage
   - Fix freeing uninitialized nfs4_labels
   - Better handling when xprtrdma bc_serv is NULL
   - Mark qualified async operations as MOVEABLE tasks"

* tag 'nfs-for-5.19-1' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  NFSv4.1 mark qualified async operations as MOVEABLE tasks
  xprtrdma: treat all calls not a bcall when bc_serv is NULL
  NFSv4: Fix free of uninitialized nfs4_label on referral lookup.
  NFS: Pass i_size to fscache_unuse_cookie() when a file is released
  Documentation: Add an explanation of NFSv4 client identifiers
  NFS: update documentation for the nfs4_unique_id parameter
  NFS: Improve warning message when locks are lost.
  NFSv4.1: Enable access to the NFSv4.1 'dacl' and 'sacl' attributes
  NFSv4: Add encoders/decoders for the NFSv4.1 dacl and sacl attributes
  NFSv4: Specify the type of ACL to cache
  NFSv4: Don't hold the layoutget locks across multiple RPC calls
  pNFS/files: Fall back to I/O through the MDS on non-fatal layout errors
  NFS: Further fixes to the writeback error handling
  NFSv4/pNFS: Do not fail I/O when we fail to allocate the pNFS layout
  NFS: Memory allocation failures are not server fatal errors
  NFS: Don't report errors from nfs_pageio_complete() more than once
  NFS: Do not report flush errors in nfs_write_end()
  NFS: Don't report ENOSPC write errors twice
  NFS: fsync() should report filesystem errors over EINTR/ERESTARTSYS
  NFS: Do not report EINTR/ERESTARTSYS as mapping errors
2022-05-31 16:58:24 -07:00
Linus Torvalds
1ff7bc3ba7 More power management updates for 5.19-rc1
- Add Tegra234 cpufreq support (Sumit Gupta).
 
  - Clean up and enhance the Mediatek cpufreq driver (Wan Jiabing,
    Rex-BC Chen, and Jia-Wei Chang).
 
  - Fix up the CPPC cpufreq driver after recent changes (Zheng Bin,
    Pierre Gondois).
 
  - Minor update to dt-binding for Qcom's opp-v2-kryo-cpu (Yassine
    Oudjana).
 
  - Use list iterator only inside the list_for_each_entry loop (Xiaomeng
    Tong, and Jakob Koschel).
 
  - New APIs related to finding OPP based on interconnect bandwidth
    (Krzysztof Kozlowski).
 
  - Fix the missing of_node_put() in _bandwidth_supported() (Dan
    Carpenter).
 
  - Cleanups (Krzysztof Kozlowski, and Viresh Kumar).
 
  - Add Out of Band mode description to the intel-speed-select utility
    documentation (Srinivas Pandruvada).
 
  - Add power sequences support to the system reboot and power off
    code and make related platform-specific changes for multiple
    platforms (Dmitry Osipenko, Geert Uytterhoeven).
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmKU8lESHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxVz0P91LNCbkDSt60jzNkXdEjsvUnI/YjJ+QJ
 /+ta7iCwf90obb6s9soBkTyU8Ia7hJ/IWDJW/5xhdG0ySYF17hGNIGKK9xKGsJFK
 tzzWtjFsvT3PeUZQERekqWp8OYskHYmQMj8o4jqqFF7DZD/AswTgkVLALUd7YhVL
 UvLmcKsUA7eXy3ZrhtrGSzVSEbKOGXBLFyjy3IuWjfz6Uk/nGQRNKGf7byRWLM44
 y7zb75/5+p4MPyyJP8M/uiXzEYDKuubRtfx9PdmLgBUSMbtho6eB1x47dZWooaxe
 YKmcFjF80AmnwxHb+Te2rZHPeIYr+5hLBaEq7xaLQf/nAS3y5z1PIfI2wVQ5mXPz
 D599jHHda/6oSAKCVTq2fKfnlR6fetm5j66xOQINpD+G5b5tNSpllXJDamFZxFgP
 DiQAOFzdnRYnK7yTiLWVl1q76SVRxqsGz7/5Ak+NRj2OQK2wRkLzHuZfiV/8r0pk
 ksi6Ew9TerXkstoTQsSToPQxB2VvosSajNU3Oy27pmM0oal1XxP0LIPz9sMor5/g
 tfk5f6Yz/+FFIfXj3cZffZNdhsJgejmcqPdrSdCOV3sBrblnIMQNpHiYg4jGztoj
 IjYKYPVpSaWiSZLQOaK2moTEvm9CfQz1TQCF+/Kz88LX6/7ZaDJFxHG2FDEob0sg
 6KVbrZWweLI=
 =PAh+
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.19-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull more power management updates from Rafael Wysocki:
 "These update the ARM cpufreq drivers and fix up the CPPC cpufreq
  driver after recent changes, update the OPP code and PM documentation
  and add power sequences support to the system reboot and power off
  code.

  Specifics:

   - Add Tegra234 cpufreq support (Sumit Gupta)

   - Clean up and enhance the Mediatek cpufreq driver (Wan Jiabing,
     Rex-BC Chen, and Jia-Wei Chang)

   - Fix up the CPPC cpufreq driver after recent changes (Zheng Bin,
     Pierre Gondois)

   - Minor update to dt-binding for Qcom's opp-v2-kryo-cpu (Yassine
     Oudjana)

   - Use list iterator only inside the list_for_each_entry loop
     (Xiaomeng Tong, and Jakob Koschel)

   - New APIs related to finding OPP based on interconnect bandwidth
     (Krzysztof Kozlowski)

   - Fix the missing of_node_put() in _bandwidth_supported() (Dan
     Carpenter)

   - Cleanups (Krzysztof Kozlowski, and Viresh Kumar)

   - Add Out of Band mode description to the intel-speed-select utility
     documentation (Srinivas Pandruvada)

   - Add power sequences support to the system reboot and power off code
     and make related platform-specific changes for multiple platforms
     (Dmitry Osipenko, Geert Uytterhoeven)"

* tag 'pm-5.19-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (60 commits)
  cpufreq: CPPC: Fix unused-function warning
  cpufreq: CPPC: Fix build error without CONFIG_ACPI_CPPC_CPUFREQ_FIE
  Documentation: admin-guide: PM: Add Out of Band mode
  kernel/reboot: Change registration order of legacy power-off handler
  m68k: virt: Switch to new sys-off handler API
  kernel/reboot: Add devm_register_restart_handler()
  kernel/reboot: Add devm_register_power_off_handler()
  soc/tegra: pmc: Use sys-off handler API to power off Nexus 7 properly
  reboot: Remove pm_power_off_prepare()
  regulator: pfuze100: Use devm_register_sys_off_handler()
  ACPI: power: Switch to sys-off handler API
  memory: emif: Use kernel_can_power_off()
  mips: Use do_kernel_power_off()
  ia64: Use do_kernel_power_off()
  x86: Use do_kernel_power_off()
  sh: Use do_kernel_power_off()
  m68k: Switch to new sys-off handler API
  powerpc: Use do_kernel_power_off()
  xen/x86: Use do_kernel_power_off()
  parisc: Use do_kernel_power_off()
  ...
2022-05-30 11:37:26 -07:00
Linus Torvalds
76bfd3de34 tracing updates for 5.19:
- The majority of the changes are for fixes and clean ups.
 
 Noticeable changes:
 
 - Rework trace event triggers code to be easier to interact with.
 
 - Support for embedding bootconfig with the kernel (as suppose to having it
   embedded in initram). This is useful for embedded boards without initram
   disks.
 
 - Speed up boot by parallelizing the creation of tracefs files.
 
 - Allow absolute ring buffer timestamps handle timestamps that use more than
   59 bits.
 
 - Added new tracing clock "TAI" (International Atomic Time)
 
 - Have weak functions show up in available_filter_function list as:
    __ftrace_invalid_address___<invalid-offset>
   instead of using the name of the function before it.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYpOgXRQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qjkKAQDbpemxvpFyJlZqT8KgEIXubu+ag2/q
 p0XDHaPS0zF9OQEAjTxg6GMEbnFYl6fzxZtOoEbiaQ7ppfdhRI8t6sSMVA8=
 =+nDD
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "The majority of the changes are for fixes and clean ups.

  Notable changes:

   - Rework trace event triggers code to be easier to interact with.

   - Support for embedding bootconfig with the kernel (as suppose to
     having it embedded in initram). This is useful for embedded boards
     without initram disks.

   - Speed up boot by parallelizing the creation of tracefs files.

   - Allow absolute ring buffer timestamps handle timestamps that use
     more than 59 bits.

   - Added new tracing clock "TAI" (International Atomic Time)

   - Have weak functions show up in available_filter_function list as:
     __ftrace_invalid_address___<invalid-offset> instead of using the
     name of the function before it"

* tag 'trace-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (52 commits)
  ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function
  tracing: Fix comments for event_trigger_separate_filter()
  x86/traceponit: Fix comment about irq vector tracepoints
  x86,tracing: Remove unused headers
  ftrace: Clean up hash direct_functions on register failures
  tracing: Fix comments of create_filter()
  tracing: Disable kcov on trace_preemptirq.c
  tracing: Initialize integer variable to prevent garbage return value
  ftrace: Fix typo in comment
  ftrace: Remove return value of ftrace_arch_modify_*()
  tracing: Cleanup code by removing init "char *name"
  tracing: Change "char *" string form to "char []"
  tracing/timerlat: Do not wakeup the thread if the trace stops at the IRQ
  tracing/timerlat: Print stacktrace in the IRQ handler if needed
  tracing/timerlat: Notify IRQ new max latency only if stop tracing is set
  kprobes: Fix build errors with CONFIG_KRETPROBES=n
  tracing: Fix return value of trace_pid_write()
  tracing: Fix potential double free in create_var_ref()
  tracing: Use strim() to remove whitespace instead of doing it manually
  ftrace: Deal with error return code of the ftrace_process_locs() function
  ...
2022-05-29 10:31:36 -07:00
Linus Torvalds
3cc30140db pci-v5.19-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmKP/tQUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vzH0xAAojQowrSWzZ5FKTqI+L/L9ZXoAb+e
 9IvQljKc9taJldmXp+EB9wkS/5B+VtQcC2qUQuWEQXUoECF8qHlcB4l+XQyd1tWO
 O0vZxETH22xjLLrjG2F3l5rrfkJZAf2nEugwbDk97YEgiimeOiRcv3bx6AUCtj6I
 rPJ13Fop3Jke7sQMcXYJe3gQLT1o1AKiQGghiCFNi/gzx2lXI6mmHBgLxFoiqcby
 WpfXbvbJti95HRaahUR3HaDFfHj4HVkQNLlTtIykJ3Tl2/rOhWEJjI8JOIQpAA+M
 WBrWw9rfgbScTiGV+dZ3h7hKiPnHKl9YETIX7L0oA2sj0jZcIs0d6mSBZx0kYuI9
 eAlx+qSK9xpbQQr/fdYaUdF1q4QdtU0BYOvOWOzWsqYCECMRJ1PUHFSMbmR/+PNB
 P5lHnAbggRSoxdAtwFYv1HTr+VpGH9S+5oxHCz3ohpMjYy6mkCZwHpZn3doaU3ci
 KG6yIoVKftm3fZdtFvL03qHl/I8+X24ZhT/T/278PRGjkhSyr56hZo8hg0gqqTct
 ngip8qNABmSbqpr73/W6Vl42zAbYtNk1BykYahbKupgW8FbT7hqaZTB05V87pVu+
 Ko1aJM6VoOP9rMlKHI9ba8eYCzDrZbLZUFn7ljNPDpzutf0tAwtgwzvZBXN3za6+
 Z9+D5dxmvrZEIbA=
 =hEti
 -----END PGP SIGNATURE-----

Merge tag 'pci-v5.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull pci updates from Bjorn Helgaas:
 "Resource management:

   - Restrict E820 clipping to PCI host bridge windows (Bjorn Helgaas)

   - Log E820 clipping better (Bjorn Helgaas)

   - Add kernel cmdline options to enable/disable E820 clipping (Hans de
     Goede)

   - Disable E820 reserved region clipping for IdeaPads, Yoga, Yoga
     Slip, Acer Spin 5, Clevo Barebone systems where clipping leaves no
     usable address space for touchpads, Thunderbolt devices, etc (Hans
     de Goede)

   - Disable E820 clipping by default starting in 2023 (Hans de Goede)

  PCI device hotplug:

   - Include files to remove implicit dependencies (Christophe Leroy)

   - Only put Root Ports in D3 if they can signal and wake from D3 so
     AMD Yellow Carp doesn't miss hotplug events (Mario Limonciello)

  Power management:

   - Define pci_restore_standard_config() only for CONFIG_PM_SLEEP since
     it's unused otherwise (Krzysztof Kozlowski)

   - Power up devices completely, including anything platform firmware
     needs to do, during runtime resume (Rafael J. Wysocki)

   - Move pci_resume_bus() to PM callbacks so we observe the required
     bridge power-up delays (Rafael J. Wysocki)

   - Drop unneeded runtime_d3cold device flag (Rafael J. Wysocki)

   - Split pci_raw_set_power_state() between pci_power_up() and a new
     pci_set_low_power_state() (Rafael J. Wysocki)

   - Set current_state to D3cold if config read returns ~0, indicating
     the device is not accessible (Rafael J. Wysocki)

   - Do not call pci_update_current_state() from pci_power_up() so BARs
     and ASPM config are restored correctly (Rafael J. Wysocki)

   - Write 0 to PMCSR in pci_power_up() in all cases (Rafael J. Wysocki)

   - Split pci_power_up() to pci_set_full_power_state() to avoid some
     redundant operations (Rafael J. Wysocki)

   - Skip restoring BARs if device is not in D0 (Rafael J. Wysocki)

   - Rearrange and clarify pci_set_power_state() (Rafael J. Wysocki)

   - Remove redundant BAR restores from pci_pm_thaw_noirq() (Rafael J.
     Wysocki)

  Virtualization:

   - Acquire device lock before config space access lock to avoid AB/BA
     deadlock with sriov_numvfs_store() (Yicong Yang)

  Error handling:

   - Clear MULTI_ERR_COR/UNCOR_RCV bits, which a race could previously
     leave permanently set (Kuppuswamy Sathyanarayanan)

  Peer-to-peer DMA:

   - Whitelist Intel Skylake-E Root Ports regardless of which devfn they
     are (Shlomo Pongratz)

  ASPM:

   - Override L1 acceptable latency advertised by Intel DG2 so ASPM L1
     can be enabled (Mika Westerberg)

  Cadence PCIe controller driver:

   - Set up device-specific register to allow PTM Responder to be
     enabled by the normal architected bit (Christian Gmeiner)

   - Override advertised FLR support since the controller doesn't
     implement FLR correctly (Parshuram Thombare)

  Cadence PCIe endpoint driver:

   - Correct bitmap size for the ob_region_map of outbound window usage
     (Dan Carpenter)

  Freescale i.MX6 PCIe controller driver:

   - Fix PERST# assertion/deassertion so we observe the required delays
     before accessing device (Francesco Dolcini)

  Freescale Layerscape PCIe controller driver:

   - Add "big-endian" DT property (Hou Zhiqiang)

   - Update SCFG DT property (Hou Zhiqiang)

   - Add "aer", "pme", "intr" DT properties (Li Yang)

   - Add DT compatible strings for ls1028a (Xiaowei Bao)

  Intel VMD host bridge driver:

   - Assign VMD IRQ domain before enumeration to avoid IOMMU interrupt
     remapping errors when MSI-X remapping is disabled (Nirmal Patel)

   - Revert VMD workaround that kept MSI-X remapping enabled when IOMMU
     remapping was enabled (Nirmal Patel)

  Marvell MVEBU PCIe controller driver:

   - Add of_pci_get_slot_power_limit() to parse the
     'slot-power-limit-milliwatt' DT property (Pali Rohár)

   - Add mvebu support for sending Set_Slot_Power_Limit message (Pali
     Rohár)

  MediaTek PCIe controller driver:

   - Fix refcount leak in mtk_pcie_subsys_powerup() (Miaoqian Lin)

  MediaTek PCIe Gen3 controller driver:

   - Reset PHY and MAC at probe time (AngeloGioacchino Del Regno)

  Microchip PolarFlare PCIe controller driver:

   - Add chained_irq_enter()/chained_irq_exit() calls to mc_handle_msi()
     and mc_handle_intx() to avoid lost interrupts (Conor Dooley)

   - Fix interrupt handling race (Daire McNamara)

  NVIDIA Tegra194 PCIe controller driver:

   - Drop tegra194 MSI register save/restore, which is unnecessary since
     the DWC core does it (Jisheng Zhang)

  Qualcomm PCIe controller driver:

   - Add SM8150 SoC DT binding and support (Bhupesh Sharma)

   - Fix pipe clock imbalance (Johan Hovold)

   - Fix runtime PM imbalance on probe errors (Johan Hovold)

   - Fix PHY init imbalance on probe errors (Johan Hovold)

   - Convert DT binding to YAML (Dmitry Baryshkov)

   - Update DT binding to show that resets aren't required for
     MSM8996/APQ8096 platforms (Dmitry Baryshkov)

   - Add explicit register names per chipset in DT binding (Dmitry
     Baryshkov)

   - Add sc7280-specific clock and reset definitions to DT binding
     (Dmitry Baryshkov)

  Rockchip PCIe controller driver:

   - Fix bitmap size when searching for free outbound region (Dan
     Carpenter)

  Rockchip DesignWare PCIe controller driver:

   - Remove "snps,dw-pcie" from rockchip-dwc DT "compatible" property
     because it's not fully compatible with rockchip (Peter Geis)

   - Reset rockchip-dwc controller at probe (Peter Geis)

   - Add rockchip-dwc INTx support (Peter Geis)

  Synopsys DesignWare PCIe controller driver:

   - Return error instead of success if DMA mapping of MSI area fails
     (Jiantao Zhang)

  Miscellaneous:

   - Change pci_set_dma_mask() documentation references to
     dma_set_mask() (Alex Williamson)"

* tag 'pci-v5.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (64 commits)
  dt-bindings: PCI: qcom: Add schema for sc7280 chipset
  dt-bindings: PCI: qcom: Specify reg-names explicitly
  dt-bindings: PCI: qcom: Do not require resets on msm8996 platforms
  dt-bindings: PCI: qcom: Convert to YAML
  PCI: qcom: Fix unbalanced PHY init on probe errors
  PCI: qcom: Fix runtime PM imbalance on probe errors
  PCI: qcom: Fix pipe clock imbalance
  PCI: qcom: Add SM8150 SoC support
  dt-bindings: pci: qcom: Document PCIe bindings for SM8150 SoC
  x86/PCI: Disable E820 reserved region clipping starting in 2023
  x86/PCI: Disable E820 reserved region clipping via quirks
  x86/PCI: Add kernel cmdline options to use/ignore E820 reserved regions
  PCI: microchip: Fix potential race in interrupt handling
  PCI/AER: Clear MULTI_ERR_COR/UNCOR_RCV bits
  PCI: cadence: Clear FLR in device capabilities register
  PCI: cadence: Allow PTM Responder to be enabled
  PCI: vmd: Revert 2565e5b69c ("PCI: vmd: Do not disable MSI-X remapping if interrupt remapping is enabled by IOMMU.")
  PCI: vmd: Assign VMD IRQ domain before enumeration
  PCI: Avoid pci_dev_lock() AB/BA deadlock with sriov_numvfs_store()
  PCI: rockchip-dwc: Add legacy interrupt support
  ...
2022-05-27 15:25:10 -07:00
Linus Torvalds
98931dd95f Yang Shi has improved the behaviour of khugepaged collapsing of readonly
file-backed transparent hugepages.
 
 Johannes Weiner has arranged for zswap memory use to be tracked and
 managed on a per-cgroup basis.
 
 Munchun Song adds a /proc knob ("hugetlb_optimize_vmemmap") for runtime
 enablement of the recent huge page vmemmap optimization feature.
 
 Baolin Wang contributes a series to fix some issues around hugetlb
 pagetable invalidation.
 
 Zhenwei Pi has fixed some interactions between hwpoisoned pages and
 virtualization.
 
 Tong Tiangen has enabled the use of the presently x86-only
 page_table_check debugging feature on arm64 and riscv.
 
 David Vernet has done some fixup work on the memcg selftests.
 
 Peter Xu has taught userfaultfd to handle write protection faults against
 shmem- and hugetlbfs-backed files.
 
 More DAMON development from SeongJae Park - adding online tuning of the
 feature and support for monitoring of fixed virtual address ranges.  Also
 easier discovery of which monitoring operations are available.
 
 Nadav Amit has done some optimization of TLB flushing during mprotect().
 
 Neil Brown continues to labor away at improving our swap-over-NFS support.
 
 David Hildenbrand has some fixes to anon page COWing versus
 get_user_pages().
 
 Peng Liu fixed some errors in the core hugetlb code.
 
 Joao Martins has reduced the amount of memory consumed by device-dax's
 compound devmaps.
 
 Some cleanups of the arch-specific pagemap code from Anshuman Khandual.
 
 Muchun Song has found and fixed some errors in the TLB flushing of
 transparent hugepages.
 
 Roman Gushchin has done more work on the memcg selftests.
 
 And, of course, many smaller fixes and cleanups.  Notably, the customary
 million cleanup serieses from Miaohe Lin.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYo52xQAKCRDdBJ7gKXxA
 jtJFAQD238KoeI9z5SkPMaeBRYSRQmNll85mxs25KapcEgWgGQD9FAb7DJkqsIVk
 PzE+d9hEfirUGdL6cujatwJ6ejYR8Q8=
 =nFe6
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2022-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:
 "Almost all of MM here. A few things are still getting finished off,
  reviewed, etc.

   - Yang Shi has improved the behaviour of khugepaged collapsing of
     readonly file-backed transparent hugepages.

   - Johannes Weiner has arranged for zswap memory use to be tracked and
     managed on a per-cgroup basis.

   - Munchun Song adds a /proc knob ("hugetlb_optimize_vmemmap") for
     runtime enablement of the recent huge page vmemmap optimization
     feature.

   - Baolin Wang contributes a series to fix some issues around hugetlb
     pagetable invalidation.

   - Zhenwei Pi has fixed some interactions between hwpoisoned pages and
     virtualization.

   - Tong Tiangen has enabled the use of the presently x86-only
     page_table_check debugging feature on arm64 and riscv.

   - David Vernet has done some fixup work on the memcg selftests.

   - Peter Xu has taught userfaultfd to handle write protection faults
     against shmem- and hugetlbfs-backed files.

   - More DAMON development from SeongJae Park - adding online tuning of
     the feature and support for monitoring of fixed virtual address
     ranges. Also easier discovery of which monitoring operations are
     available.

   - Nadav Amit has done some optimization of TLB flushing during
     mprotect().

   - Neil Brown continues to labor away at improving our swap-over-NFS
     support.

   - David Hildenbrand has some fixes to anon page COWing versus
     get_user_pages().

   - Peng Liu fixed some errors in the core hugetlb code.

   - Joao Martins has reduced the amount of memory consumed by
     device-dax's compound devmaps.

   - Some cleanups of the arch-specific pagemap code from Anshuman
     Khandual.

   - Muchun Song has found and fixed some errors in the TLB flushing of
     transparent hugepages.

   - Roman Gushchin has done more work on the memcg selftests.

  ... and, of course, many smaller fixes and cleanups. Notably, the
  customary million cleanup serieses from Miaohe Lin"

* tag 'mm-stable-2022-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (381 commits)
  mm: kfence: use PAGE_ALIGNED helper
  selftests: vm: add the "settings" file with timeout variable
  selftests: vm: add "test_hmm.sh" to TEST_FILES
  selftests: vm: check numa_available() before operating "merge_across_nodes" in ksm_tests
  selftests: vm: add migration to the .gitignore
  selftests/vm/pkeys: fix typo in comment
  ksm: fix typo in comment
  selftests: vm: add process_mrelease tests
  Revert "mm/vmscan: never demote for memcg reclaim"
  mm/kfence: print disabling or re-enabling message
  include/trace/events/percpu.h: cleanup for "percpu: improve percpu_alloc_percpu event trace"
  include/trace/events/mmflags.h: cleanup for "tracing: incorrect gfp_t conversion"
  mm: fix a potential infinite loop in start_isolate_page_range()
  MAINTAINERS: add Muchun as co-maintainer for HugeTLB
  zram: fix Kconfig dependency warning
  mm/shmem: fix shmem folio swapoff hang
  cgroup: fix an error handling path in alloc_pagecache_max_30M()
  mm: damon: use HPAGE_PMD_SIZE
  tracing: incorrect isolate_mote_t cast in mm_vmscan_lru_isolate
  nodemask.h: fix compilation error with GCC12
  ...
2022-05-26 12:32:41 -07:00
Linus Torvalds
7e062cda7d Networking changes for 5.19.
Core
 ----
 
  - Support TCPv6 segmentation offload with super-segments larger than
    64k bytes using the IPv6 Jumbogram extension header (AKA BIG TCP).
 
  - Generalize skb freeing deferral to per-cpu lists, instead of
    per-socket lists.
 
  - Add a netdev statistic for packets dropped due to L2 address
    mismatch (rx_otherhost_dropped).
 
  - Continue work annotating skb drop reasons.
 
  - Accept alternative netdev names (ALT_IFNAME) in more netlink
    requests.
 
  - Add VLAN support for AF_PACKET SOCK_RAW GSO.
 
  - Allow receiving skb mark from the socket as a cmsg.
 
  - Enable memcg accounting for veth queues, sysctl tables and IPv6.
 
 BPF
 ---
 
  - Add libbpf support for User Statically-Defined Tracing (USDTs).
 
  - Speed up symbol resolution for kprobes multi-link attachments.
 
  - Support storing typed pointers to referenced and unreferenced
    objects in BPF maps.
 
  - Add support for BPF link iterator.
 
  - Introduce access to remote CPU map elements in BPF per-cpu map.
 
  - Allow middle-of-the-road settings for the
    kernel.unprivileged_bpf_disabled sysctl.
 
  - Implement basic types of dynamic pointers e.g. to allow for
    dynamically sized ringbuf reservations without extra memory copies.
 
 Protocols
 ---------
 
  - Retire port only listening_hash table, add a second bind table
    hashed by port and address. Avoid linear list walk when binding
    to very popular ports (e.g. 443).
 
  - Add bridge FDB bulk flush filtering support allowing user space
    to remove all FDB entries matching a condition.
 
  - Introduce accept_unsolicited_na sysctl for IPv6 to implement
    router-side changes for RFC9131.
 
  - Support for MPTCP path manager in user space.
 
  - Add MPTCP support for fallback to regular TCP for connections
    that have never connected additional subflows or transmitted
    out-of-sequence data (partial support for RFC8684 fallback).
 
  - Avoid races in MPTCP-level window tracking, stabilize and improve
    throughput.
 
  - Support lockless operation of GRE tunnels with seq numbers enabled.
 
  - WiFi support for host based BSS color collision detection.
 
  - Add support for SO_TXTIME/SCM_TXTIME on CAN sockets.
 
  - Support transmission w/o flow control in CAN ISOTP (ISO 15765-2).
 
  - Support zero-copy Tx with TLS 1.2 crypto offload (sendfile).
 
  - Allow matching on the number of VLAN tags via tc-flower.
 
  - Add tracepoint for tcp_set_ca_state().
 
 Driver API
 ----------
 
  - Improve error reporting from classifier and action offload.
 
  - Add support for listing line cards in switches (devlink).
 
  - Add helpers for reporting page pool statistics with ethtool -S.
 
  - Add support for reading clock cycles when using PTP virtual clocks,
    instead of having the driver convert to time before reporting.
    This makes it possible to report time from different vclocks.
 
  - Support configuring low-latency Tx descriptor push via ethtool.
 
  - Separate Clause 22 and Clause 45 MDIO accesses more explicitly.
 
 New hardware / drivers
 ----------------------
 
  - Ethernet:
    - Marvell's Octeon NIC PCI Endpoint support (octeon_ep)
    - Sunplus SP7021 SoC (sp7021_emac)
    - Add support for Renesas RZ/V2M (in ravb)
    - Add support for MediaTek mt7986 switches (in mtk_eth_soc)
 
  - Ethernet PHYs:
    - ADIN1100 industrial PHYs (w/ 10BASE-T1L and SQI reporting)
    - TI DP83TD510 PHY
    - Microchip LAN8742/LAN88xx PHYs
 
  - WiFi:
    - Driver for pureLiFi X, XL, XC devices (plfxlc)
    - Driver for Silicon Labs devices (wfx)
    - Support for WCN6750 (in ath11k)
    - Support Realtek 8852ce devices (in rtw89)
 
  - Mobile:
    - MediaTek T700 modems (Intel 5G 5000 M.2 cards)
 
  - CAN:
   - ctucanfd: add support for CTU CAN FD open-source IP core
     from Czech Technical University in Prague
 
 Drivers
 -------
 
  - Delete a number of old drivers still using virt_to_bus().
 
  - Ethernet NICs:
    - intel: support TSO on tunnels MPLS
    - broadcom: support multi-buffer XDP
    - nfp: support VF rate limiting
    - sfc: use hardware tx timestamps for more than PTP
    - mlx5: multi-port eswitch support
    - hyper-v: add support for XDP_REDIRECT
    - atlantic: XDP support (including multi-buffer)
    - macb: improve real-time perf by deferring Tx processing to NAPI
 
  - High-speed Ethernet switches:
    - mlxsw: implement basic line card information querying
    - prestera: add support for traffic policing on ingress and egress
 
  - Embedded Ethernet switches:
    - lan966x: add support for packet DMA (FDMA)
    - lan966x: add support for PTP programmable pins
    - ti: cpsw_new: enable bc/mc storm prevention
 
  - Qualcomm 802.11ax WiFi (ath11k):
    - Wake-on-WLAN support for QCA6390 and WCN6855
    - device recovery (firmware restart) support
    - support setting Specific Absorption Rate (SAR) for WCN6855
    - read country code from SMBIOS for WCN6855/QCA6390
    - enable keep-alive during WoWLAN suspend
    - implement remain-on-channel support
 
  - MediaTek WiFi (mt76):
    - support Wireless Ethernet Dispatch offloading packet movement
      between the Ethernet switch and WiFi interfaces
    - non-standard VHT MCS10-11 support
    - mt7921 AP mode support
    - mt7921 IPv6 NS offload support
 
  - Ethernet PHYs:
    - micrel: ksz9031/ksz9131: cabletest support
    - lan87xx: SQI support for T1 PHYs
    - lan937x: add interrupt support for link detection
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmKNMPQACgkQMUZtbf5S
 IrsRARAAuDyYs6jFYB3p+xazZdOnbF4iAgVv71+DQGvmsCl6CB9OrsNZMlvE85OL
 Q3gjcRbgjrkN4lhgI8DmiGYbsUJnAvVjFdNjccz1Z/vTLYvuIM0ol54MUp5S+9WY
 StncOJkOGJxxR/Gi5gzVmejPDsysU3Jik+hm/fpIcz8pybXxAsFKU5waY5qfl+/T
 TZepfV0VCfqRDjqcF1qA5+jJZNU8pdodQlZ1+mh8bwu6Jk1ZkWkj6Ov8MWdwQldr
 LnPeK/9hIGzkdJYHZfajxA3t8D0K5CHzSuih2bJ9ry8ZXgVBkXEThew778/R5izW
 uB0YZs9COFlrIP7XHjtRTy/2xHOdYIPlj2nWhVdfuQDX8Crvt4VRN6EZ1rjko1ZJ
 WanfG6WHF8NH5pXBRQbh3kIMKBnYn6OIzuCfCQSqd+niHcxFIM4vRiggeXI5C5TW
 vJgEWfK6X+NfDiFVa3xyCrEmp5ieA/pNecpwd8rVkql+MtFAAw4vfsotLKOJEAru
 J/XL6UE+YuLqIJV9ACZ9x1AFXXAo661jOxBunOo4VXhXVzWS9lYYz5r5ryIkgT/8
 /Fr0zjANJWgfIuNdIBtYfQ4qG+LozGq038VA06RhFUAZ5tF9DzhqJs2Q2AFuWWBC
 ewCePJVqo1j2Ceq2mGonXRt47OEnlePoOxTk9W+cKZb7ZWE+zEo=
 =Wjii
 -----END PGP SIGNATURE-----

Merge tag 'net-next-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core
  ----

   - Support TCPv6 segmentation offload with super-segments larger than
     64k bytes using the IPv6 Jumbogram extension header (AKA BIG TCP).

   - Generalize skb freeing deferral to per-cpu lists, instead of
     per-socket lists.

   - Add a netdev statistic for packets dropped due to L2 address
     mismatch (rx_otherhost_dropped).

   - Continue work annotating skb drop reasons.

   - Accept alternative netdev names (ALT_IFNAME) in more netlink
     requests.

   - Add VLAN support for AF_PACKET SOCK_RAW GSO.

   - Allow receiving skb mark from the socket as a cmsg.

   - Enable memcg accounting for veth queues, sysctl tables and IPv6.

  BPF
  ---

   - Add libbpf support for User Statically-Defined Tracing (USDTs).

   - Speed up symbol resolution for kprobes multi-link attachments.

   - Support storing typed pointers to referenced and unreferenced
     objects in BPF maps.

   - Add support for BPF link iterator.

   - Introduce access to remote CPU map elements in BPF per-cpu map.

   - Allow middle-of-the-road settings for the
     kernel.unprivileged_bpf_disabled sysctl.

   - Implement basic types of dynamic pointers e.g. to allow for
     dynamically sized ringbuf reservations without extra memory copies.

  Protocols
  ---------

   - Retire port only listening_hash table, add a second bind table
     hashed by port and address. Avoid linear list walk when binding to
     very popular ports (e.g. 443).

   - Add bridge FDB bulk flush filtering support allowing user space to
     remove all FDB entries matching a condition.

   - Introduce accept_unsolicited_na sysctl for IPv6 to implement
     router-side changes for RFC9131.

   - Support for MPTCP path manager in user space.

   - Add MPTCP support for fallback to regular TCP for connections that
     have never connected additional subflows or transmitted
     out-of-sequence data (partial support for RFC8684 fallback).

   - Avoid races in MPTCP-level window tracking, stabilize and improve
     throughput.

   - Support lockless operation of GRE tunnels with seq numbers enabled.

   - WiFi support for host based BSS color collision detection.

   - Add support for SO_TXTIME/SCM_TXTIME on CAN sockets.

   - Support transmission w/o flow control in CAN ISOTP (ISO 15765-2).

   - Support zero-copy Tx with TLS 1.2 crypto offload (sendfile).

   - Allow matching on the number of VLAN tags via tc-flower.

   - Add tracepoint for tcp_set_ca_state().

  Driver API
  ----------

   - Improve error reporting from classifier and action offload.

   - Add support for listing line cards in switches (devlink).

   - Add helpers for reporting page pool statistics with ethtool -S.

   - Add support for reading clock cycles when using PTP virtual clocks,
     instead of having the driver convert to time before reporting. This
     makes it possible to report time from different vclocks.

   - Support configuring low-latency Tx descriptor push via ethtool.

   - Separate Clause 22 and Clause 45 MDIO accesses more explicitly.

  New hardware / drivers
  ----------------------

   - Ethernet:
      - Marvell's Octeon NIC PCI Endpoint support (octeon_ep)
      - Sunplus SP7021 SoC (sp7021_emac)
      - Add support for Renesas RZ/V2M (in ravb)
      - Add support for MediaTek mt7986 switches (in mtk_eth_soc)

   - Ethernet PHYs:
      - ADIN1100 industrial PHYs (w/ 10BASE-T1L and SQI reporting)
      - TI DP83TD510 PHY
      - Microchip LAN8742/LAN88xx PHYs

   - WiFi:
      - Driver for pureLiFi X, XL, XC devices (plfxlc)
      - Driver for Silicon Labs devices (wfx)
      - Support for WCN6750 (in ath11k)
      - Support Realtek 8852ce devices (in rtw89)

   - Mobile:
      - MediaTek T700 modems (Intel 5G 5000 M.2 cards)

   - CAN:
      - ctucanfd: add support for CTU CAN FD open-source IP core from
        Czech Technical University in Prague

  Drivers
  -------

   - Delete a number of old drivers still using virt_to_bus().

   - Ethernet NICs:
      - intel: support TSO on tunnels MPLS
      - broadcom: support multi-buffer XDP
      - nfp: support VF rate limiting
      - sfc: use hardware tx timestamps for more than PTP
      - mlx5: multi-port eswitch support
      - hyper-v: add support for XDP_REDIRECT
      - atlantic: XDP support (including multi-buffer)
      - macb: improve real-time perf by deferring Tx processing to NAPI

   - High-speed Ethernet switches:
      - mlxsw: implement basic line card information querying
      - prestera: add support for traffic policing on ingress and egress

   - Embedded Ethernet switches:
      - lan966x: add support for packet DMA (FDMA)
      - lan966x: add support for PTP programmable pins
      - ti: cpsw_new: enable bc/mc storm prevention

   - Qualcomm 802.11ax WiFi (ath11k):
      - Wake-on-WLAN support for QCA6390 and WCN6855
      - device recovery (firmware restart) support
      - support setting Specific Absorption Rate (SAR) for WCN6855
      - read country code from SMBIOS for WCN6855/QCA6390
      - enable keep-alive during WoWLAN suspend
      - implement remain-on-channel support

   - MediaTek WiFi (mt76):
      - support Wireless Ethernet Dispatch offloading packet movement
        between the Ethernet switch and WiFi interfaces
      - non-standard VHT MCS10-11 support
      - mt7921 AP mode support
      - mt7921 IPv6 NS offload support

   - Ethernet PHYs:
      - micrel: ksz9031/ksz9131: cabletest support
      - lan87xx: SQI support for T1 PHYs
      - lan937x: add interrupt support for link detection"

* tag 'net-next-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1809 commits)
  ptp: ocp: Add firmware header checks
  ptp: ocp: fix PPS source selector debugfs reporting
  ptp: ocp: add .init function for sma_op vector
  ptp: ocp: vectorize the sma accessor functions
  ptp: ocp: constify selectors
  ptp: ocp: parameterize input/output sma selectors
  ptp: ocp: revise firmware display
  ptp: ocp: add Celestica timecard PCI ids
  ptp: ocp: Remove #ifdefs around PCI IDs
  ptp: ocp: 32-bit fixups for pci start address
  Revert "net/smc: fix listen processing for SMC-Rv2"
  ath6kl: Use cc-disable-warning to disable -Wdangling-pointer
  selftests/bpf: Dynptr tests
  bpf: Add dynptr data slices
  bpf: Add bpf_dynptr_read and bpf_dynptr_write
  bpf: Dynptr support for ring buffers
  bpf: Add bpf_dynptr_from_mem for local dynptrs
  bpf: Add verifier support for dynptrs
  bpf: Suppress 'passing zero to PTR_ERR' warning
  bpf: Introduce bpf_arch_text_invalidate for bpf_prog_pack
  ...
2022-05-25 12:22:58 -07:00
Linus Torvalds
88a618920e It was a moderately busy cycle for documentation; highlights include:
- After a long period of inactivity, the Japanese translations are seeing
    some much-needed maintenance and updating.
 
  - Reworked IOMMU documentation
 
  - Some new documentation for static-analysis tools
 
  - A new overall structure for the memory-management documentation.  This
    is an LSFMM outcome that, it is hoped, will help encourage developers to
    fill in the many gaps.  Optimism is eternal...but hopefully it will
    work.
 
  - More Chinese translations.
 
 Plus the usual typo fixes, updates, etc.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmKLqZQPHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5YdgQH/2/9+EgQDes93f/+iKtbO23EV67392dwrmXS
 kYg8lR4948/Q3jzgMloUo6hNOoxXeV/sqmdHu0LjUhFN+BGsp9fFjd/jp0XhWcqA
 nnc9foGbpmeFPxHeAg2aqV84eeasLoO5lUUm2rNoPBLd6HFV+IYC5R4VZ+w42StB
 5bYEOYwHXMvQZXkivZDse82YmvQK3/2rRGTUoFhME/Aap6rFgWJJ+XQcSKA7WmwW
 OpJqq+FOsjsxHe6IFVy6onzlqgGJM8zM2bLtqedid6yaE3uACcHMb/OyAjp0rdKF
 BQvaG+d3f7DugABqM6Y1oU75iBtJWWYgGeAm36JtX+3mz2uR/f0=
 =3UoR
 -----END PGP SIGNATURE-----

Merge tag 'docs-5.19' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "It was a moderately busy cycle for documentation; highlights include:

   - After a long period of inactivity, the Japanese translations are
     seeing some much-needed maintenance and updating.

   - Reworked IOMMU documentation

   - Some new documentation for static-analysis tools

   - A new overall structure for the memory-management documentation.
     This is an LSFMM outcome that, it is hoped, will help encourage
     developers to fill in the many gaps. Optimism is eternal...but
     hopefully it will work.

   - More Chinese translations.

  Plus the usual typo fixes, updates, etc"

* tag 'docs-5.19' of git://git.lwn.net/linux: (70 commits)
  docs: pdfdocs: Add space for chapter counts >= 100 in TOC
  docs/zh_CN: Add dev-tools/gdb-kernel-debugging.rst Chinese translation
  input: Docs: correct ntrig.rst typo
  input: Docs: correct atarikbd.rst typos
  MAINTAINERS: Become the docs/zh_CN maintainer
  docs/zh_CN: fix devicetree usage-model translation
  mm,doc: Add new documentation structure
  Documentation: drop more IDE boot options and ide-cd.rst
  Documentation/process: use scripts/get_maintainer.pl on patches
  MAINTAINERS: Add entry for DOCUMENTATION/JAPANESE
  docs/trans/ja_JP/howto: Don't mention specific kernel versions
  docs/ja_JP/SubmittingPatches: Request summaries for commit references
  docs/ja_JP/SubmittingPatches: Add Suggested-by as a standard signature
  docs/ja_JP/SubmittingPatches: Randy has moved
  docs/ja_JP/SubmittingPatches: Suggest the use of scripts/get_maintainer.pl
  docs/ja_JP/SubmittingPatches: Update GregKH links
  Documentation/sysctl: document max_rcu_stall_to_panic
  Documentation: add missing angle bracket in cgroup-v2 doc
  Documentation: dev-tools: use literal block instead of code-block
  docs/zh_CN: add vm numa translation
  ...
2022-05-25 11:17:41 -07:00
Srinivas Pandruvada
4fe4f15523 Documentation: admin-guide: PM: Add Out of Band mode
Update documentation for using the tool to support performance level
change via OOB (Out of Band) interface.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-05-25 15:48:26 +02:00
Linus Torvalds
827060261c media updates for v5.19-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+QmuaPwR3wnBdVwACF8+vY7k4RUFAmKLMPAACgkQCF8+vY7k
 4RVupA//Q4U2BV3xKn9H3g/+dbSIbm0YEeWc2xMd3wbDnqokXsKxOuIfG1omIxCf
 hfhfnEfhRo0WHc5AGaHtbodCU3HcCrNdjH59r6nwedQg3qgFwMT4FYDNW33E8PN1
 dj293GEUZ+BZGDoCatGOxlHi/I2/HcS3I6BiU/GlglALgM/8KaBvdMhY8S6lEXIK
 SqdiDDOsXVT+v1BB2PSSaRWU3KIccLzS2QlXQRTs2xKszXuQy6wT6IK//YqvFxQp
 PS9uziJhJNk2nIlZqE8U46iYv9v+HnTW/5cLrUDTj0VIxjWHgMSnpbubOABgBRYj
 gX1wXRqe/bx9frm1ksfBUyNmEmxF3LL2h6fnkc2RQh/T/DKwMFpE9u3P8aEIPtmk
 djACoQbdjJXKtez1Uw5KT1nH8EKfgeA5z/PkFw+zYMiXtNdhIgxFBg38veI+/jsB
 w8nUPydFWy/EpmgTbQTmvbF9c+LczqUgaMSbXKzLByVfxie/SsE/JRyLXQITByVT
 QmmZ3WTKjTJYa1gEz2lvIafhTxXWu7mnBo7ma1wSXVPSkwOr2m73aIfxf1h1vr/N
 +/sCOEeQiVT67arKbdlmf7IDvNzcRmnBTdSE//VkJGonaBAjrpUuRD4Yq61YCVon
 DqVh29LGeFDmkRkcvfZVT2Si/UwDAYhHa8yb0lPZpVXhKbCxLUQ=
 =HllR
 -----END PGP SIGNATURE-----

Merge tag 'media/v5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media updates from Mauro Carvalho Chehab:

 - dvb-usb drivers entries got reworked to avoid usage of magic numbers
   to refer to data position inside tables

 - vcodec driver has gained support for MT8186 and for vp8 and vp9
   stateless codecs

 - hantro has gained support for Hantro G1 on RK366x

 - Added more h264 levels on coda960

 - ccs gained support for MIPI CSI-2 28 bits per pixel raw data type

 - venus driver gained support for Qualcomm custom compressed pixel
   formats

 - lots of driver fixes and updates

* tag 'media/v5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (308 commits)
  media: hantro: Enable HOLD_CAPTURE_BUF for H.264
  media: hantro: Add H.264 field decoding support
  media: hantro: h264: Make dpb entry management more robust
  media: hantro: Stop using H.264 parameter pic_num
  media: rkvdec: Enable capture buffer holding for H264
  media: rkvdec-h264: Add field decoding support
  media: rkvdec: Ensure decoded resolution fit coded resolution
  media: rkvdec: h264: Fix reference frame_num wrap for second field
  media: rkvdec: h264: Validate and use pic width and height in mbs
  media: rkvdec: Move H264 SPS validation in rkvdec-h264
  media: rkvdec: h264: Fix bit depth wrap in pps packet
  media: rkvdec: h264: Fix dpb_valid implementation
  media: rkvdec: Stop overclocking the decoder
  media: v4l2: Reorder field reflist
  media: h264: Sort p/b reflist using frame_num
  media: v4l2: Trace calculated p/b0/b1 initial reflist
  media: h264: Store all fields into the unordered list
  media: h264: Store current picture fields
  media: h264: Increase reference lists size to 32
  media: h264: Use v4l2_h264_reference for reflist
  ...
2022-05-24 18:09:16 -07:00
Linus Torvalds
0350785b0a integrity-v5.19
-----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQQdXVVFGN5XqKr1Hj7LwZzRsCrn5QUCYo0tOhQcem9oYXJAbGlu
 dXguaWJtLmNvbQAKCRDLwZzRsCrn5QJfAP47Ym9vacLc1m8/MUaRA/QjbJ/8t3TX
 h/4McK8kiRudxgD/RiPHII6gJ8q+qpBrYWJZ4ZZaHE8v0oA1viuZfbuN2wc=
 =KQYi
 -----END PGP SIGNATURE-----

Merge tag 'integrity-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity

Pull IMA updates from Mimi Zohar:
 "New is IMA support for including fs-verity file digests and signatures
  in the IMA measurement list as well as verifying the fs-verity file
  digest based signatures, both based on policy.

  In addition, are two bug fixes:

   - avoid reading UEFI variables, which cause a page fault, on Apple
     Macs with T2 chips.

   - remove the original "ima" template Kconfig option to address a boot
     command line ordering issue.

  The rest is a mixture of code/documentation cleanup"

* tag 'integrity-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
  integrity: Fix sparse warnings in keyring_handler
  evm: Clean up some variables
  evm: Return INTEGRITY_PASS for enum integrity_status value '0'
  efi: Do not import certificates from UEFI Secure Boot for T2 Macs
  fsverity: update the documentation
  ima: support fs-verity file digest based version 3 signatures
  ima: permit fsverity's file digests in the IMA measurement list
  ima: define a new template field named 'd-ngv2' and templates
  fs-verity: define a function to return the integrity protected file digest
  ima: use IMA default hash algorithm for integrity violations
  ima: fix 'd-ng' comments and documentation
  ima: remove the IMA_TEMPLATE Kconfig option
  ima: remove redundant initialization of pointer 'file'.
2022-05-24 13:50:39 -07:00
Linus Torvalds
7cf6a8a17f tpmdd updates for v5.19-rc1
- Strictened validation of key hashes for SYSTEM_BLACKLIST_HASH_LIST.  An
   invalid hash format causes a compilation error.  Previously, they got
   included to the kernel binary but were silently ignored at run-time.
 - Allow root user to append new hashes to the blacklist keyring.
 - Trusted keys backed with Cryptographic Acceleration and Assurance Module
   (CAAM), which part of some of the new NXP's SoC's.  Now there is total
   three hardware backends for trusted keys: TPM, ARM TEE and CAAM.
 - A scattered set of fixes and small improvements for the TPM driver.
 
 Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iIgEABYIADAWIQRE6pSOnaBC00OEHEIaerohdGur0gUCYoux6xIcamFya2tvQGtl
 cm5lbC5vcmcACgkQGnq6IXRrq9LTQgEA4zRrlmLPjhZ1iZpPZiyBBv5eOx20/c+y
 R7tCfJFB2+ABAOT1E885vt+GgKTY4mYloHJ+ZtnTIf1QRMP6EoSX+TwP
 =oBOO
 -----END PGP SIGNATURE-----

Merge tag 'tpmdd-next-v5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd

Pull tpm updates from Jarkko Sakkinen:

 - Tightened validation of key hashes for SYSTEM_BLACKLIST_HASH_LIST. An
   invalid hash format causes a compilation error. Previously, they got
   included to the kernel binary but were silently ignored at run-time.

 - Allow root user to append new hashes to the blacklist keyring.

 - Trusted keys backed with Cryptographic Acceleration and Assurance
   Module (CAAM), which part of some of the new NXP's SoC's. Now there
   is total three hardware backends for trusted keys: TPM, ARM TEE and
   CAAM.

 - A scattered set of fixes and small improvements for the TPM driver.

* tag 'tpmdd-next-v5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
  MAINTAINERS: add KEYS-TRUSTED-CAAM
  doc: trusted-encrypted: describe new CAAM trust source
  KEYS: trusted: Introduce support for NXP CAAM-based trusted keys
  crypto: caam - add in-kernel interface for blob generator
  crypto: caam - determine whether CAAM supports blob encap/decap
  KEYS: trusted: allow use of kernel RNG for key material
  KEYS: trusted: allow use of TEE as backend without TCG_TPM support
  tpm: Add field upgrade mode support for Infineon TPM2 modules
  tpm: Fix buffer access in tpm2_get_tpm_pt()
  char: tpm: cr50_i2c: Suppress duplicated error message in .remove()
  tpm: cr50: Add new device/vendor ID 0x504a6666
  tpm: Remove read16/read32/write32 calls from tpm_tis_phy_ops
  tpm: ibmvtpm: Correct the return value in tpm_ibmvtpm_probe()
  tpm/tpm_ftpm_tee: Return true/false (not 1/0) from bool functions
  certs: Explain the rationale to call panic()
  certs: Allow root user to append signed hashes to the blacklist keyring
  certs: Check that builtin blacklist hashes are valid
  certs: Make blacklist_vet_description() more strict
  certs: Factor out the blacklist hash creation
  tools/certs: Add print-cert-tbs-hash.sh
2022-05-24 13:16:50 -07:00
Linus Torvalds
ac2ab99072 Random number generator updates for Linux 5.19-rc1.
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmKKpM8ACgkQSfxwEqXe
 A6726w/+OJimGd4arvpSmdn+vxepSyDLgKfwM0x5zprRVd16xg8CjJr4eMonTesq
 YvtJRqpetb53MB+sMhutlvQqQzrjtf2MBkgPwF4I2gUrk7vLD45Q+AGdGhi/rUwz
 wHGA7xg1FHLHia2M/9idSqi8QlZmUP4u4l5ZnMyTUHiwvRD6XOrWKfqvUSawNzyh
 hCWlTUxDrjizsW5YpsJX/MkRadSC8loJEk5ByZebow6nRPfurJvqfrcOMgHyNrbY
 pOZ/CGPxcetMqotL2TuuJt5wKmenqYhIWGAp3YM2SWWgU2ueBZekW8AYeMfgUcvh
 LWV93RpSuAnE5wsdjIULvjFnEDJBf8ihfMnMrd9G5QjQu44tuKWfY2MghLSpYzaR
 V6UFbRmhrqhqiStHQXOvk1oqxtpbHlc9zzJLmvPmDJcbvzXQ9Opk5GVXAmdtnHnj
 M/ty3wGWxucY6mHqT8MkCShSSslbgEtc1pEIWHdrUgnaiSVoCVBEO+9LqLbjvOTm
 XA/6YtoiCE5FasK51pir1zVb2GORQn0v8HnuAOsusD/iPAlRQ/G5jZkaXbwRQI6j
 atYL1svqvSKn5POnzqAlMUXfMUr19K5xqJdp7i6qmlO1Vq6Z+tWbCQgD1JV+Wjkb
 CMyvXomFCFu4aYKGRE2SBRnWLRghG3kYHqEQ15yTPMQerxbUDNg=
 =SUr3
 -----END PGP SIGNATURE-----

Merge tag 'random-5.19-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random

Pull random number generator updates from Jason Donenfeld:
 "These updates continue to refine the work began in 5.17 and 5.18 of
  modernizing the RNG's crypto and streamlining and documenting its
  code.

  New for 5.19, the updates aim to improve entropy collection methods
  and make some initial decisions regarding the "premature next" problem
  and our threat model. The cloc utility now reports that random.c is
  931 lines of code and 466 lines of comments, not that basic metrics
  like that mean all that much, but at the very least it tells you that
  this is very much a manageable driver now.

  Here's a summary of the various updates:

   - The random_get_entropy() function now always returns something at
     least minimally useful. This is the primary entropy source in most
     collectors, which in the best case expands to something like RDTSC,
     but prior to this change, in the worst case it would just return 0,
     contributing nothing. For 5.19, additional architectures are wired
     up, and architectures that are entirely missing a cycle counter now
     have a generic fallback path, which uses the highest resolution
     clock available from the timekeeping subsystem.

     Some of those clocks can actually be quite good, despite the CPU
     not having a cycle counter of its own, and going off-core for a
     stamp is generally thought to increase jitter, something positive
     from the perspective of entropy gathering. Done very early on in
     the development cycle, this has been sitting in next getting some
     testing for a while now and has relevant acks from the archs, so it
     should be pretty well tested and fine, but is nonetheless the thing
     I'll be keeping my eye on most closely.

   - Of particular note with the random_get_entropy() improvements is
     MIPS, which, on CPUs that lack the c0 count register, will now
     combine the high-speed but short-cycle c0 random register with the
     lower-speed but long-cycle generic fallback path.

   - With random_get_entropy() now always returning something useful,
     the interrupt handler now collects entropy in a consistent
     construction.

   - Rather than comparing two samples of random_get_entropy() for the
     jitter dance, the algorithm now tests many samples, and uses the
     amount of differing ones to determine whether or not jitter entropy
     is usable and how laborious it must be. The problem with comparing
     only two samples was that if the cycle counter was extremely slow,
     but just so happened to be on the cusp of a change, the slowness
     wouldn't be detected. Taking many samples fixes that to some
     degree.

     This, combined with the other improvements to random_get_entropy(),
     should make future unification of /dev/random and /dev/urandom
     maybe more possible. At the very least, were we to attempt it again
     today (we're not), it wouldn't break any of Guenter's test rigs
     that broke when we tried it with 5.18. So, not today, but perhaps
     down the road, that's something we can revisit.

   - We attempt to reseed the RNG immediately upon waking up from system
     suspend or hibernation, making use of the various timestamps about
     suspend time and such available, as well as the usual inputs such
     as RDRAND when available.

   - Batched randomness now falls back to ordinary randomness before the
     RNG is initialized. This provides more consistent guarantees to the
     types of random numbers being returned by the various accessors.

   - The "pre-init injection" code is now gone for good. I suspect you
     in particular will be happy to read that, as I recall you
     expressing your distaste for it a few months ago. Instead, to avoid
     a "premature first" issue, while still allowing for maximal amount
     of entropy availability during system boot, the first 128 bits of
     estimated entropy are used immediately as it arrives, with the next
     128 bits being buffered. And, as before, after the RNG has been
     fully initialized, it winds up reseeding anyway a few seconds later
     in most cases. This resulted in a pretty big simplification of the
     initialization code and let us remove various ad-hoc mechanisms
     like the ugly crng_pre_init_inject().

   - The RNG no longer pretends to handle the "premature next" security
     model, something that various academics and other RNG designs have
     tried to care about in the past. After an interesting mailing list
     thread, these issues are thought to be a) mainly academic and not
     practical at all, and b) actively harming the real security of the
     RNG by delaying new entropy additions after a potential compromise,
     making a potentially bad situation even worse. As well, in the
     first place, our RNG never even properly handled the premature next
     issue, so removing an incomplete solution to a fake problem was
     particularly nice.

     This allowed for numerous other simplifications in the code, which
     is a lot cleaner as a consequence. If you didn't see it before,
     https://lore.kernel.org/lkml/YmlMGx6+uigkGiZ0@zx2c4.com/ may be a
     thread worth skimming through.

   - While the interrupt handler received a separate code path years ago
     that avoids locks by using per-cpu data structures and a faster
     mixing algorithm, in order to reduce interrupt latency, input and
     disk events that are triggered in hardirq handlers were still
     hitting locks and more expensive algorithms. Those are now
     redirected to use the faster per-cpu data structures.

   - Rather than having the fake-crypto almost-siphash-based random32
     implementation be used right and left, and in many places where
     cryptographically secure randomness is desirable, the batched
     entropy code is now fast enough to replace that.

   - As usual, numerous code quality and documentation cleanups. For
     example, the initialization state machine now uses enum symbolic
     constants instead of just hard coding numbers everywhere.

   - Since the RNG initializes once, and then is always initialized
     thereafter, a pretty heavy amount of code used during that
     initialization is never used again. It is now completely cordoned
     off using static branches and it winds up in the .text.unlikely
     section so that it doesn't reduce cache compactness after the RNG
     is ready.

   - A variety of functions meant for waiting on the RNG to be
     initialized were only used by vsprintf, and in not a particularly
     optimal way. Replacing that usage with a more ordinary setup made
     it possible to remove those functions.

   - A cleanup of how we warn userspace about the use of uninitialized
     /dev/urandom and uninitialized get_random_bytes() usage.
     Interestingly, with the change you merged for 5.18 that attempts to
     use jitter (but does not block if it can't), the majority of users
     should never see those warnings for /dev/urandom at all now, and
     the one for in-kernel usage is mainly a debug thing.

   - The file_operations struct for /dev/[u]random now implements
     .read_iter and .write_iter instead of .read and .write, allowing it
     to also implement .splice_read and .splice_write, which makes
     splice(2) work again after it was broken here (and in many other
     places in the tree) during the set_fs() removal. This was a bit of
     a last minute arrival from Jens that hasn't had as much time to
     bake, so I'll be keeping my eye on this as well, but it seems
     fairly ordinary. Unfortunately, read_iter() is around 3% slower
     than read() in my tests, which I'm not thrilled about. But Jens and
     Al, spurred by this observation, seem to be making progress in
     removing the bottlenecks on the iter paths in the VFS layer in
     general, which should remove the performance gap for all drivers.

   - Assorted other bug fixes, cleanups, and optimizations.

   - A small SipHash cleanup"

* tag 'random-5.19-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: (49 commits)
  random: check for signals after page of pool writes
  random: wire up fops->splice_{read,write}_iter()
  random: convert to using fops->write_iter()
  random: convert to using fops->read_iter()
  random: unify batched entropy implementations
  random: move randomize_page() into mm where it belongs
  random: remove mostly unused async readiness notifier
  random: remove get_random_bytes_arch() and add rng_has_arch_random()
  random: move initialization functions out of hot pages
  random: make consistent use of buf and len
  random: use proper return types on get_random_{int,long}_wait()
  random: remove extern from functions in header
  random: use static branch for crng_ready()
  random: credit architectural init the exact amount
  random: handle latent entropy and command line from random_init()
  random: use proper jiffies comparison macro
  random: remove ratelimiting for in-kernel unseeded randomness
  random: move initialization out of reseeding hot path
  random: avoid initializing twice in credit race
  random: use symbolic constants for crng_init states
  ...
2022-05-24 11:58:10 -07:00
Jakub Kicinski
677fb75253 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/ethernet/cadence/macb_main.c
  5cebb40bc9 ("net: macb: Fix PTP one step sync support")
  138badbc21 ("net: macb: use NAPI for TX completion path")
https://lore.kernel.org/all/20220523111021.31489367@canb.auug.org.au/

net/smc/af_smc.c
  75c1edf23b ("net/smc: postpone sk_refcnt increment in connect()")
  3aba103006 ("net/smc: align the connect behaviour with TCP")
https://lore.kernel.org/all/20220524114408.4bf1af38@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-23 21:19:17 -07:00
Linus Torvalds
143a6252e1 arm64 updates for 5.19:
- Initial support for the ARMv9 Scalable Matrix Extension (SME). SME
   takes the approach used for vectors in SVE and extends this to provide
   architectural support for matrix operations. No KVM support yet, SME
   is disabled in guests.
 
 - Support for crashkernel reservations above ZONE_DMA via the
   'crashkernel=X,high' command line option.
 
 - btrfs search_ioctl() fix for live-lock with sub-page faults.
 
 - arm64 perf updates: support for the Hisilicon "CPA" PMU for monitoring
   coherent I/O traffic, support for Arm's CMN-650 and CMN-700
   interconnect PMUs, minor driver fixes, kerneldoc cleanup.
 
 - Kselftest updates for SME, BTI, MTE.
 
 - Automatic generation of the system register macros from a 'sysreg'
   file describing the register bitfields.
 
 - Update the type of the function argument holding the ESR_ELx register
   value to unsigned long to match the architecture register size
   (originally 32-bit but extended since ARMv8.0).
 
 - stacktrace cleanups.
 
 - ftrace cleanups.
 
 - Miscellaneous updates, most notably: arm64-specific huge_ptep_get(),
   avoid executable mappings in kexec/hibernate code, drop TLB flushing
   from get_clear_flush() (and rename it to get_clear_contig()),
   ARCH_NR_GPIO bumped to 2048 for ARCH_APPLE.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmKH19IACgkQa9axLQDI
 XvEFWg//bf0p6zjeNaOJmBbyVFsXsVyYiEaLUpFPUs3oB+81s2YZ+9i1rgMrNCft
 EIDQ9+/HgScKxJxnzWf68heMdcBDbk76VJtLALExbge6owFsjByQDyfb/b3v/bLd
 ezAcGzc6G5/FlI1IP7ct4Z9MnQry4v5AG8lMNAHjnf6GlBS/tYNAqpmj8HpQfgRQ
 ZbhfZ8Ayu3TRSLWL39NHVevpmxQm/bGcpP3Q9TtjUqg0r1FQ5sK/LCqOksueIAzT
 UOgUVYWSFwTpLEqbYitVqgERQp9LiLoK5RmNYCIEydfGM7+qmgoxofSq5e2hQtH2
 SZM1XilzsZctRbBbhMit1qDBqMlr/XAy/R5FO0GauETVKTaBhgtj6mZGyeC9nU/+
 RGDljaArbrOzRwMtSuXF+Fp6uVo5spyRn1m8UT/k19lUTdrV9z6EX5Fzuc4Mnhed
 oz4iokbl/n8pDObXKauQspPA46QpxUYhrAs10B/ELc3yyp/Qj3jOfzYHKDNFCUOq
 HC9mU+YiO9g2TbYgCrrFM6Dah2E8fU6/cR0ZPMeMgWK4tKa+6JMEINYEwak9e7M+
 8lZnvu3ntxiJLN+PrPkiPyG+XBh2sux1UfvNQ+nw4Oi9xaydeX7PCbQVWmzTFmHD
 q7UPQ8220e2JNCha9pULS8cxDLxiSksce06DQrGXwnHc1Ir7T04=
 =0DjE
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:

 - Initial support for the ARMv9 Scalable Matrix Extension (SME).

   SME takes the approach used for vectors in SVE and extends this to
   provide architectural support for matrix operations. No KVM support
   yet, SME is disabled in guests.

 - Support for crashkernel reservations above ZONE_DMA via the
   'crashkernel=X,high' command line option.

 - btrfs search_ioctl() fix for live-lock with sub-page faults.

 - arm64 perf updates: support for the Hisilicon "CPA" PMU for
   monitoring coherent I/O traffic, support for Arm's CMN-650 and
   CMN-700 interconnect PMUs, minor driver fixes, kerneldoc cleanup.

 - Kselftest updates for SME, BTI, MTE.

 - Automatic generation of the system register macros from a 'sysreg'
   file describing the register bitfields.

 - Update the type of the function argument holding the ESR_ELx register
   value to unsigned long to match the architecture register size
   (originally 32-bit but extended since ARMv8.0).

 - stacktrace cleanups.

 - ftrace cleanups.

 - Miscellaneous updates, most notably: arm64-specific huge_ptep_get(),
   avoid executable mappings in kexec/hibernate code, drop TLB flushing
   from get_clear_flush() (and rename it to get_clear_contig()),
   ARCH_NR_GPIO bumped to 2048 for ARCH_APPLE.

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (145 commits)
  arm64/sysreg: Generate definitions for FAR_ELx
  arm64/sysreg: Generate definitions for DACR32_EL2
  arm64/sysreg: Generate definitions for CSSELR_EL1
  arm64/sysreg: Generate definitions for CPACR_ELx
  arm64/sysreg: Generate definitions for CONTEXTIDR_ELx
  arm64/sysreg: Generate definitions for CLIDR_EL1
  arm64/sve: Move sve_free() into SVE code section
  arm64: Kconfig.platforms: Add comments
  arm64: Kconfig: Fix indentation and add comments
  arm64: mm: avoid writable executable mappings in kexec/hibernate code
  arm64: lds: move special code sections out of kernel exec segment
  arm64/hugetlb: Implement arm64 specific huge_ptep_get()
  arm64/hugetlb: Use ptep_get() to get the pte value of a huge page
  arm64: kdump: Do not allocate crash low memory if not needed
  arm64/sve: Generate ZCR definitions
  arm64/sme: Generate defintions for SVCR
  arm64/sme: Generate SMPRI_EL1 definitions
  arm64/sme: Automatically generate SMPRIMAP_EL2 definitions
  arm64/sme: Automatically generate SMIDR_EL1 defines
  arm64/sme: Automatically generate defines for SMCR
  ...
2022-05-23 21:06:11 -07:00
Linus Torvalds
a13dc4d409 - Serious sanitization and cleanup of the whole APERF/MPERF and
frequency invariance code along with removing the need for unnecessary IPIs
 
 - Finally remove a.out support
 
 - The usual trivial cleanups and fixes all over x86
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmKLn48ACgkQEsHwGGHe
 VUpbkg/+PELrc0y/qxLM/+dyftKYY16Rhk6ZVAXfwqlh5ldyVQcLMUgKwDqYyTn2
 XmgdI3cTcFlH2K7j6ANWLu0I9NPaviimUcEdMVcXt7aY5mGWk/q4hIyCYM8d41sV
 qKx4OjNSdyoofG6MtwFLJDuoeVg99Bqgvm4nP9BuxL0dZJ2hfcUZ7MTxYCx9ZYjK
 /3trx0NV287Yg/wm91EU0nLQzy9xbGS7WCmMnse6uxiUdm2vXbBt8oNFF4f747Dj
 0cArfNrMgYq4Cv5bgt/Ki0NU/n4EOGDpJUSyQwlnjDKeN81ESPy7IWtTQ6cE/rJK
 BZeUIPiGiYHwtqXv0UTAPGLG8cAqKeab8u0xAOyrFVDkTc0+WlPJRsUAOmRRGIGE
 M8ZjoxrLeuFgxw6vKpVjaA+mDRj3qEpSH+IrTcekS98PN7gmVzvq03GobgGbT7YB
 xmtbThJa+514FfUVckkyC0+A56BknUIgVxwFPqrthE2atzYTbH67hW4U0yVWXXr7
 2VI7ttozBrYVgHCWhD9eoT0uhyD74Vl6pqHnqzY9ShIfKVUGvMgKHHg04nLLtF7W
 hm87xV3Q5UEmXhTmDzT1rUZ99mBUxGbWxk227I9raMugIh7pp9wIr57+7O0LRYfX
 TdnE2+tL8RMi7+XzRH5iLhnwkrvahBESeHSQ7GVI1Y2zMmmFN+0=
 =Dks/
 -----END PGP SIGNATURE-----

Merge tag 'x86_cleanups_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cleanups from Borislav Petkov:

 - Serious sanitization and cleanup of the whole APERF/MPERF and
   frequency invariance code along with removing the need for
   unnecessary IPIs

 - Finally remove a.out support

 - The usual trivial cleanups and fixes all over x86

* tag 'x86_cleanups_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
  x86: Remove empty files
  x86/speculation: Add missing srbds=off to the mitigations= help text
  x86/prctl: Remove pointless task argument
  x86/aperfperf: Make it correct on 32bit and UP kernels
  x86/aperfmperf: Integrate the fallback code from show_cpuinfo()
  x86/aperfmperf: Replace arch_freq_get_on_cpu()
  x86/aperfmperf: Replace aperfmperf_get_khz()
  x86/aperfmperf: Store aperf/mperf data for cpu frequency reads
  x86/aperfmperf: Make parts of the frequency invariance code unconditional
  x86/aperfmperf: Restructure arch_scale_freq_tick()
  x86/aperfmperf: Put frequency invariance aperf/mperf data into a struct
  x86/aperfmperf: Untangle Intel and AMD frequency invariance init
  x86/aperfmperf: Separate AP/BP frequency invariance init
  x86/smp: Move APERF/MPERF code where it belongs
  x86/aperfmperf: Dont wake idle CPUs in arch_freq_get_on_cpu()
  x86/process: Fix kernel-doc warning due to a changed function name
  x86: Remove a.out support
  x86/mm: Replace nodes_weight() with nodes_empty() where appropriate
  x86: Replace cpumask_weight() with cpumask_empty() where appropriate
  x86/pkeys: Remove __arch_set_user_pkey_access() declaration
  ...
2022-05-23 18:17:09 -07:00
Linus Torvalds
c5a3d3c01e - Remove a bunch of chicken bit options to turn off CPU features which
are not really needed anymore
 
 - Misc fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmKLdfgACgkQEsHwGGHe
 VUpB5Q//TIGVgmnSd0YYxY2cIe047lfcd34D+3oEGk0d2FidtirP/tjgBqIXRuY5
 UncoveqBuI/6/7bodP/ANg9DNVXv2489eFYyZtEOLSGnfzV2AU10aw95cuQQG+BW
 YIc6bGSsgfiNo8Vtj4L3xkVqxOrqaCYnh74GTSNNANht3i8KH8Qq9n3qZTuMiF6R
 fH9xWak3TZB2nMzHdYrXh0sSR6eBHN3KYSiT0DsdlU9PUlavlSPFYQRiAlr6FL6J
 BuYQdlUaCQbINvaviGW4SG7fhX32RfF/GUNaBajB40TO6H98KZLpBBvstWQ841xd
 /o44o5wbghoGP1ne8OKwP+SaAV2bE6twd5eO1lpwcpXnQfATvjQ2imxvOiRhy5LY
 pFPt/hko9gKWJ6SI0SQ4tiKJALFPLWD6561scHU6PoriFhv0SRIaPmJyEsDYynMz
 bCXaPPsoovRwwwBfAxxQjljIlhQSBVt3gWZ8NWD1tYbNaqM+WK7xKBaONGh3OCw3
 iK7lsbbljtM0zmANImYyeo7+Hr1NVOmMiK2WZYbxhxgzH3l8v/6EbDt3I70WU57V
 9apCU3/nk/HFpX65SdW5qmuiWLVdH9NXrEqbvaUB4ApT18MdUUugewBhcGnf3Umu
 wEtltzziqcIkxzDoXXpBGWpX31S7PsM2XVDqYC7dwuNttgEw2Fc=
 =7AUX
 -----END PGP SIGNATURE-----

Merge tag 'x86_cpu_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 CPU feature updates from Borislav Petkov:

 - Remove a bunch of chicken bit options to turn off CPU features which
   are not really needed anymore

 - Misc fixes and cleanups

* tag 'x86_cpu_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/speculation: Add missing prototype for unpriv_ebpf_notify()
  x86/pm: Fix false positive kmemleak report in msr_build_context()
  x86/speculation/srbds: Do not try to turn mitigation off when not supported
  x86/cpu: Remove "noclflush"
  x86/cpu: Remove "noexec"
  x86/cpu: Remove "nosmep"
  x86/cpu: Remove CONFIG_X86_SMAP and "nosmap"
  x86/cpu: Remove "nosep"
  x86/cpu: Allow feature bit names from /proc/cpuinfo in clearcpuid=
2022-05-23 18:01:31 -07:00
Linus Torvalds
eb39e37d5c AMD SEV-SNP support
Add to confidential guests the necessary memory integrity protection
 against malicious hypervisor-based attacks like data replay, memory
 remapping and others, thus achieving a stronger isolation from the
 hypervisor.
 
 At the core of the functionality is a new structure called a reverse
 map table (RMP) with which the guest has a say in which pages get
 assigned to it and gets notified when a page which it owns, gets
 accessed/modified under the covers so that the guest can take an
 appropriate action.
 
 In addition, add support for the whole machinery needed to launch a SNP
 guest, details of which is properly explained in each patch.
 
 And last but not least, the series refactors and improves parts of the
 previous SEV support so that the new code is accomodated properly and
 not just bolted on.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmKLU2AACgkQEsHwGGHe
 VUpb/Q//f4LGiJf4nw1flzpe90uIsHNwAafng3NOjeXmhI/EcOlqPf23WHPCgg3Z
 2umfa4sRZyj4aZubDd7tYAoq4qWrQ7pO7viWCNTh0InxBAILOoMPMuq2jSAbq0zV
 ASUJXeQ2bqjYxX4JV4N5f3HT2l+k68M0mpGLN0H+O+LV9pFS7dz7Jnsg+gW4ZP25
 PMPLf6FNzO/1tU1aoYu80YDP1ne4eReLrNzA7Y/rx+S2NAetNwPn21AALVgoD4Nu
 vFdKh4MHgtVbwaQuh0csb/+4vD+tDXAhc8lbIl+Abl9ZxJaDWtAJW5D9e2CnsHk1
 NOkHwnrzizzhtGK1g56YPUVRFAWhZYMOI1hR0zGPLQaVqBnN4b+iahPeRiV0XnGE
 PSbIHSfJdeiCkvLMCdIAmpE5mRshhRSUfl1CXTCdetMn8xV/qz/vG6bXssf8yhTV
 cfLGPHU7gfVmsbR9nk5a8KZ78PaytxOxfIDXvCy8JfQwlIWtieaCcjncrj+sdMJy
 0fdOuwvi4jma0cyYuPolKiS1Hn4ldeibvxXT7CZQlIx6jZShMbpfpTTJs11XdtHm
 PdDAc1TY3AqI33mpy9DhDQmx/+EhOGxY3HNLT7evRhv4CfdQeK3cPVUWgo4bGNVv
 ZnFz7nvmwpyufltW9K8mhEZV267174jXGl6/idxybnlVE7ESr2Y=
 =Y8kW
 -----END PGP SIGNATURE-----

Merge tag 'x86_sev_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull AMD SEV-SNP support from Borislav Petkov:
 "The third AMD confidential computing feature called Secure Nested
  Paging.

  Add to confidential guests the necessary memory integrity protection
  against malicious hypervisor-based attacks like data replay, memory
  remapping and others, thus achieving a stronger isolation from the
  hypervisor.

  At the core of the functionality is a new structure called a reverse
  map table (RMP) with which the guest has a say in which pages get
  assigned to it and gets notified when a page which it owns, gets
  accessed/modified under the covers so that the guest can take an
  appropriate action.

  In addition, add support for the whole machinery needed to launch a
  SNP guest, details of which is properly explained in each patch.

  And last but not least, the series refactors and improves parts of the
  previous SEV support so that the new code is accomodated properly and
  not just bolted on"

* tag 'x86_sev_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
  x86/entry: Fixup objtool/ibt validation
  x86/sev: Mark the code returning to user space as syscall gap
  x86/sev: Annotate stack change in the #VC handler
  x86/sev: Remove duplicated assignment to variable info
  x86/sev: Fix address space sparse warning
  x86/sev: Get the AP jump table address from secrets page
  x86/sev: Add missing __init annotations to SEV init routines
  virt: sevguest: Rename the sevguest dir and files to sev-guest
  virt: sevguest: Change driver name to reflect generic SEV support
  x86/boot: Put globals that are accessed early into the .data section
  x86/boot: Add an efi.h header for the decompressor
  virt: sevguest: Fix bool function returning negative value
  virt: sevguest: Fix return value check in alloc_shared_pages()
  x86/sev-es: Replace open-coded hlt-loop with sev_es_terminate()
  virt: sevguest: Add documentation for SEV-SNP CPUID Enforcement
  virt: sevguest: Add support to get extended report
  virt: sevguest: Add support to derive key
  virt: Add SEV-SNP guest driver
  x86/sev: Register SEV-SNP guest request platform device
  x86/sev: Provide support for SNP guest request NAEs
  ...
2022-05-23 17:38:01 -07:00
Linus Torvalds
8a32f81a89 ata changes for 5.19-rc1
For this cycle, the libata.force kernel parameter changes stand out.
 Beside that, some small cleanups in various drivers. In more details:
 
 * Changes to the pata_mpc52xx driver in preparation for powerpc's
   asm/prom.h cleanup, from Christophe.
 
 * Improved ATA command allocation, from John.
 
 * Various small cleanups to the pata_via, pata_sil680, pata_ftide010,
   sata_gemini, ahci_brcm drivers and to libata-core, from Sergey, Diego,
   Ruyi, Mighao and Jiabing.
 
 * Add support for the RZ/G2H SoC to the rcar-sata driver, from Lad.
 
 * AHCI RAID ID cleanup, from Dan.
 
 * Improvement to the libata.force kernel parameter to allow most horkage
   flags to be manually forced for debugging drive issues in the field
   without needing recompiling a kernel, from me.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSRPv8tYSvhwAzJdzjdoc3SxdoYdgUCYosMtQAKCRDdoc3SxdoY
 dhi6APsGXkkiaTheBeshjhPZiet80iEh4gJknp5QwgJ6QovjDwEAzjApUC0S1sq2
 atD4Y7T6HnKQBp66lJHvvgbFuHlxMgg=
 =YuEq
 -----END PGP SIGNATURE-----

Merge tag 'ata-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata

Pull ata updates from Damien Le Moal:
 "For this cycle, the libata.force kernel parameter changes stand out.
  Beside that, some small cleanups in various drivers. In more detail:

   - Changes to the pata_mpc52xx driver in preparation for powerpc's
     asm/prom.h cleanup, from Christophe.

   - Improved ATA command allocation, from John.

   - Various small cleanups to the pata_via, pata_sil680, pata_ftide010,
     sata_gemini, ahci_brcm drivers and to libata-core, from Sergey,
     Diego, Ruyi, Mighao and Jiabing.

   - Add support for the RZ/G2H SoC to the rcar-sata driver, from Lad.

   - AHCI RAID ID cleanup, from Dan.

   - Improvement to the libata.force kernel parameter to allow most
     horkage flags to be manually forced for debugging drive issues in
     the field without needing recompiling a kernel, from me"

* tag 'ata-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
  ata: pata_ftide010: Remove unneeded ERROR check before clk_disable_unprepare
  doc: admin-guide: Update libata kernel parameters
  ata: libata-core: Allow forcing most horkage flags
  ata: libata-core: Improve link flags forced settings
  ata: libata-core: Refactor force_tbl definition
  ata: libata-core: cleanup ata_device_blacklist
  ata: simplify the return expression of brcm_ahci_remove
  ata: Make use of the helper function devm_platform_ioremap_resource()
  ata: libata-core: replace "its" with "it is"
  ahci: Add a generic 'controller2' RAID id
  dt-bindings: ata: renesas,rcar-sata: Add r8a774e1 support
  ata: pata_via: fix sloppy typing in via_do_set_mode()
  ata: pata_sil680: fix result type of sil680_sel{dev|reg}()
  ata: libata-core: fix parameter type in ata_xfer_mode2shift()
  libata: Improve ATA queued command allocation
  ata: pata_mpc52xx: Prepare cleanup of powerpc's asm/prom.h
2022-05-23 14:14:50 -07:00
Ahmad Fatoum
e9c5048c2d KEYS: trusted: Introduce support for NXP CAAM-based trusted keys
The Cryptographic Acceleration and Assurance Module (CAAM) is an IP core
built into many newer i.MX and QorIQ SoCs by NXP.

The CAAM does crypto acceleration, hardware number generation and
has a blob mechanism for encapsulation/decapsulation of sensitive material.

This blob mechanism depends on a device specific random 256-bit One Time
Programmable Master Key that is fused in each SoC at manufacturing
time. This key is unreadable and can only be used by the CAAM for AES
encryption/decryption of user data.

This makes it a suitable backend (source) for kernel trusted keys.

Previous commits generalized trusted keys to support multiple backends
and added an API to access the CAAM blob mechanism. Based on these,
provide the necessary glue to use the CAAM for trusted keys.

Reviewed-by: David Gstir <david@sigma-star.at>
Reviewed-by: Pankaj Gupta <pankaj.gupta@nxp.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Tim Harvey <tharvey@gateworks.com>
Tested-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com>
Tested-by: Pankaj Gupta <pankaj.gupta@nxp.com>
Tested-by: Michael Walle <michael@walle.cc> # on ls1028a (non-E and E)
Tested-by: John Ernberg <john.ernberg@actia.se> # iMX8QXP
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2022-05-23 18:47:50 +03:00
Ahmad Fatoum
fcd7c26901 KEYS: trusted: allow use of kernel RNG for key material
The two existing trusted key sources don't make use of the kernel RNG,
but instead let the hardware doing the sealing/unsealing also
generate the random key material. However, both users and future
backends may want to place less trust into the quality of the trust
source's random number generator and instead reuse the kernel entropy
pool, which can be seeded from multiple entropy sources.

Make this possible by adding a new trusted.rng parameter,
that will force use of the kernel RNG. In its absence, it's up
to the trust source to decide, which random numbers to use,
maintaining the existing behavior.

Suggested-by: Jarkko Sakkinen <jarkko@kernel.org>
Acked-by: Sumit Garg <sumit.garg@linaro.org>
Acked-by: Pankaj Gupta <pankaj.gupta@nxp.com>
Reviewed-by: David Gstir <david@sigma-star.at>
Reviewed-by: Pankaj Gupta <pankaj.gupta@nxp.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Pankaj Gupta <pankaj.gupta@nxp.com>
Tested-by: Michael Walle <michael@walle.cc> # on ls1028a (non-E and E)
Tested-by: John Ernberg <john.ernberg@actia.se> # iMX8QXP
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2022-05-23 18:47:50 +03:00
Pawan Gupta
8cb861e9e3 x86/speculation/mmio: Add mitigation for Processor MMIO Stale Data
Processor MMIO Stale Data is a class of vulnerabilities that may
expose data after an MMIO operation. For details please refer to
Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst.

These vulnerabilities are broadly categorized as:

Device Register Partial Write (DRPW):
  Some endpoint MMIO registers incorrectly handle writes that are
  smaller than the register size. Instead of aborting the write or only
  copying the correct subset of bytes (for example, 2 bytes for a 2-byte
  write), more bytes than specified by the write transaction may be
  written to the register. On some processors, this may expose stale
  data from the fill buffers of the core that created the write
  transaction.

Shared Buffers Data Sampling (SBDS):
  After propagators may have moved data around the uncore and copied
  stale data into client core fill buffers, processors affected by MFBDS
  can leak data from the fill buffer.

Shared Buffers Data Read (SBDR):
  It is similar to Shared Buffer Data Sampling (SBDS) except that the
  data is directly read into the architectural software-visible state.

An attacker can use these vulnerabilities to extract data from CPU fill
buffers using MDS and TAA methods. Mitigate it by clearing the CPU fill
buffers using the VERW instruction before returning to a user or a
guest.

On CPUs not affected by MDS and TAA, user application cannot sample data
from CPU fill buffers using MDS or TAA. A guest with MMIO access can
still use DRPW or SBDR to extract data architecturally. Mitigate it with
VERW instruction to clear fill buffers before VMENTER for MMIO capable
guests.

Add a kernel parameter mmio_stale_data={off|full|full,nosmt} to control
the mitigation.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-05-21 12:14:52 +02:00
Pawan Gupta
4419470191 Documentation: Add documentation for Processor MMIO Stale Data
Add the admin guide for Processor MMIO stale data vulnerabilities.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-05-21 12:14:26 +02:00
Xin Long
582a2dbc72 Documentation: add description for net.core.gro_normal_batch
Describe it in admin-guide/sysctl/net.rst like other Network core options.
Users need to know gro_normal_batch for performance tuning.

Fixes: 323ebb61e3 ("net: use listified RX for handling GRO_NORMAL skbs")
Reported-by: Prijesh Patel <prpatel@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://lore.kernel.org/r/acf8a2c03b91bcde11f67ff89b6050089c0712a3.1652888963.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-19 17:46:23 -07:00
Johannes Weiner
f4840ccfca zswap: memcg accounting
Applications can currently escape their cgroup memory containment when
zswap is enabled.  This patch adds per-cgroup tracking and limiting of
zswap backend memory to rectify this.

The existing cgroup2 memory.stat file is extended to show zswap statistics
analogous to what's in meminfo and vmstat.  Furthermore, two new control
files, memory.zswap.current and memory.zswap.max, are added to allow
tuning zswap usage on a per-workload basis.  This is important since not
all workloads benefit from zswap equally; some even suffer compared to
disk swap when memory contents don't compress well.  The optimal size of
the zswap pool, and the threshold for writeback, also depends on the size
of the workload's warm set.

The implementation doesn't use a traditional page_counter transaction. 
zswap is unconventional as a memory consumer in that we only know the
amount of memory to charge once expensive compression has occurred.  If
zwap is disabled or the limit is already exceeded we obviously don't want
to compress page upon page only to reject them all.  Instead, the limit is
checked against current usage, then we compress and charge.  This allows
some limit overrun, but not enough to matter in practice.

[hannes@cmpxchg.org: fix for CONFIG_SLOB builds]
  Link: https://lkml.kernel.org/r/YnwD14zxYjUJPc2w@cmpxchg.org
[hannes@cmpxchg.org: opt out of cgroups v1]
  Link: https://lkml.kernel.org/r/Yn6it9mBYFA+/lTb@cmpxchg.org
Link: https://lkml.kernel.org/r/20220510152847.230957-7-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-19 14:08:53 -07:00
Hans de Goede
fa6dae5d82 x86/PCI: Add kernel cmdline options to use/ignore E820 reserved regions
Some firmware supplies PCI host bridge _CRS that includes address space
unusable by PCI devices, e.g., space occupied by host bridge registers or
used by hidden PCI devices.

To avoid this unusable space, Linux currently excludes E820 reserved
regions from _CRS windows; see 4dc2287c18 ("x86: avoid E820 regions when
allocating address space").

However, this use of E820 reserved regions to clip things out of _CRS is
not supported by ACPI, UEFI, or PCI Firmware specs, and some systems have
E820 reserved regions that cover the entire memory window from _CRS.
4dc2287c18 clips the entire window, leaving no space for hot-added or
uninitialized PCI devices.

For example, from a Lenovo IdeaPad 3 15IIL 81WE:

  BIOS-e820: [mem 0x4bc50000-0xcfffffff] reserved
  pci_bus 0000:00: root bus resource [mem 0x65400000-0xbfffffff window]
  pci 0000:00:15.0: BAR 0: [mem 0x00000000-0x00000fff 64bit]
  pci 0000:00:15.0: BAR 0: no space for [mem size 0x00001000 64bit]

Future patches will add quirks to enable/disable E820 clipping
automatically.

Add a "pci=no_e820" kernel command line option to disable clipping with
E820 reserved regions.  Also add a matching "pci=use_e820" option to enable
clipping with E820 reserved regions if that has been disabled by default by
further patches in this patch-set.

Both options taint the kernel because they are intended for debugging and
workaround purposes until a quirk can set them automatically.

[bhelgaas: commit log, add printk]
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1868899 Lenovo IdeaPad 3
Link: https://lore.kernel.org/r/20220519152150.6135-2-hdegoede@redhat.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Benoit Grégoire <benoitg@coeus.ca>
Cc: Hui Wang <hui.wang@canonical.com>
2022-05-19 14:26:55 -05:00
Saravana Kannan
2b28a1a84a driver core: Extend deferred probe timeout on driver registration
The deferred probe timer that's used for this currently starts at
late_initcall and runs for driver_deferred_probe_timeout seconds. The
assumption being that all available drivers would be loaded and
registered before the timer expires. This means, the
driver_deferred_probe_timeout has to be pretty large for it to cover the
worst case. But if we set the default value for it to cover the worst
case, it would significantly slow down the average case. For this
reason, the default value is set to 0.

Also, with CONFIG_MODULES=y and the current default values of
driver_deferred_probe_timeout=0 and fw_devlink=on, devices with missing
drivers will cause their consumer devices to always defer their probes.
This is because device links created by fw_devlink defer the probe even
before the consumer driver's probe() is called.

Instead of a fixed timeout, if we extend an unexpired deferred probe
timer on every successful driver registration, with the expectation more
modules would be loaded in the near future, then the default value of
driver_deferred_probe_timeout only needs to be as long as the worst case
time difference between two consecutive module loads.

So let's implement that and set the default value to 10 seconds when
CONFIG_MODULES=y.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Rob Herring <robh@kernel.org>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Will Deacon <will@kernel.org>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: Kevin Hilman <khilman@kernel.org>
Cc: Thierry Reding <treding@nvidia.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Cc: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
Cc: linux-gpio@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: iommu@lists.linux-foundation.org
Reviewed-by: Mark Brown <broonie@kernel.org>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Saravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20220429220933.1350374-1-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-19 19:32:33 +02:00
Saravana Kannan
f79f662e4c driver core: Add "*" wildcard support to driver_async_probe cmdline param
There's currently no way to use driver_async_probe kernel cmdline param
to enable default async probe for all drivers.  So, add support for "*"
to match with all driver names.  When "*" is used, all other drivers
listed in driver_async_probe are drivers that will NOT match the "*".

For example:
* driver_async_probe=drvA,drvB,drvC
  drvA, drvB and drvC do asynchronous probing.

* driver_async_probe=*
  All drivers do asynchronous probing except those that have set
  PROBE_FORCE_SYNCHRONOUS flag.

* driver_async_probe=*,drvA,drvB,drvC
  All drivers do asynchronous probing except drvA, drvB, drvC and those
  that have set PROBE_FORCE_SYNCHRONOUS flag.

Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Feng Tang <feng.tang@intel.com>
Signed-off-by: Saravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20220504005344.117803-1-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-19 19:28:24 +02:00
NeilBrown
5e12f172db NFS: update documentation for the nfs4_unique_id parameter
The documentation for nfs4_unique_id is out-of-date.  In particular it
claim that when nfs4_unique_id is set, the host name is not used.  since
Commit 55b592933b ("NFSv4: Fix nfs4_init_uniform_client_string for net
namespaces") both the unique_id AND the host name are used.

Update the documentation to match the code.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-05-17 15:30:03 -04:00
Zhen Lei
8f0f104e2a arm64: kdump: Do not allocate crash low memory if not needed
When "crashkernel=X,high" is specified, the specified "crashkernel=Y,low"
memory is not required in the following corner cases:
1. If both CONFIG_ZONE_DMA and CONFIG_ZONE_DMA32 are disabled, it means
   that the devices can access any memory.
2. If the system memory is small, the crash high memory may be allocated
   from the DMA zones. If that happens, there's no need to allocate
   another crash low memory because there's already one.

Add condition '(crash_base >= CRASH_ADDR_LOW_MAX)' to determine whether
the 'high' memory is allocated above DMA zones. Note: when both
CONFIG_ZONE_DMA and CONFIG_ZONE_DMA32 are disabled, the entire physical
memory is DMA accessible, CRASH_ADDR_LOW_MAX equals 'PHYS_MASK + 1'.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Acked-by: Baoquan He <bhe@redhat.com>
Link: https://lore.kernel.org/r/20220511032033.426-1-thunder.leizhen@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-16 20:01:38 +01:00
Eric Dumazet
39564c3fdc net: add skb_defer_max sysctl
commit 68822bdf76 ("net: generalize skb freeing
deferral to per-cpu lists") added another per-cpu
cache of skbs. It was expected to be small,
and an IPI was forced whenever the list reached 128
skbs.

We might need to be able to control more precisely
queue capacity and added latency.

An IPI is generated whenever queue reaches half capacity.

Default value of the new limit is 64.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-16 11:33:59 +01:00
Ganesan Rajagopal
8e20d4b332 mm/memcontrol: export memcg->watermark via sysfs for v2 memcg
We run a lot of automated tests when building our software and run into
OOM scenarios when the tests run unbounded.  v1 memcg exports
memcg->watermark as "memory.max_usage_in_bytes" in sysfs.  We use this
metric to heuristically limit the number of tests that can run in parallel
based on per test historical data.

This metric is currently not exported for v2 memcg and there is no other
easy way of getting this information.  getrusage() syscall returns
"ru_maxrss" which can be used as an approximation but that's the max RSS
of a single child process across all children instead of the aggregated
max for all child processes.  The only work around is to periodically poll
"memory.current" but that's not practical for short-lived one-off cgroups.

Hence, expose memcg->watermark as "memory.peak" for v2 memcg.

Link: https://lkml.kernel.org/r/20220507050916.GA13577@us192.sjc.aristanetworks.com
Signed-off-by: Ganesan Rajagopal <rganesan@arista.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13 16:48:57 -07:00
Muchun Song
78f39084b4 mm: hugetlb_vmemmap: add hugetlb_optimize_vmemmap sysctl
We must add hugetlb_free_vmemmap=on (or "off") to the boot cmdline and
reboot the server to enable or disable the feature of optimizing vmemmap
pages associated with HugeTLB pages.  However, rebooting usually takes a
long time.  So add a sysctl to enable or disable the feature at runtime
without rebooting.  Why we need this?  There are 3 use cases.

1) The feature of minimizing overhead of struct page associated with
   each HugeTLB is disabled by default without passing
   "hugetlb_free_vmemmap=on" to the boot cmdline.  When we (ByteDance)
   deliver the servers to the users who want to enable this feature, they
   have to configure the grub (change boot cmdline) and reboot the
   servers, whereas rebooting usually takes a long time (we have thousands
   of servers).  It's a very bad experience for the users.  So we need a
   approach to enable this feature after rebooting.  This is a use case in
   our practical environment.

2) Some use cases are that HugeTLB pages are allocated 'on the fly'
   instead of being pulled from the HugeTLB pool, those workloads would be
   affected with this feature enabled.  Those workloads could be
   identified by the characteristics of they never explicitly allocating
   huge pages with 'nr_hugepages' but only set 'nr_overcommit_hugepages'
   and then let the pages be allocated from the buddy allocator at fault
   time.  We can confirm it is a real use case from the commit
   099730d674.  For those workloads, the page fault time could be ~2x
   slower than before.  We suspect those users want to disable this
   feature if the system has enabled this before and they don't think the
   memory savings benefit is enough to make up for the performance drop.

3) If the workload which wants vmemmap pages to be optimized and the
   workload which wants to set 'nr_overcommit_hugepages' and does not want
   the extera overhead at fault time when the overcommitted pages be
   allocated from the buddy allocator are deployed in the same server. 
   The user could enable this feature and set 'nr_hugepages' and
   'nr_overcommit_hugepages', then disable the feature.  In this case, the
   overcommited HugeTLB pages will not encounter the extra overhead at
   fault time.

Link: https://lkml.kernel.org/r/20220512041142.39501-5-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13 16:48:56 -07:00
Muchun Song
9c54c522bb mm: hugetlb_vmemmap: use kstrtobool for hugetlb_vmemmap param parsing
Use kstrtobool rather than open coding "on" and "off" parsing in
mm/hugetlb_vmemmap.c, which is more powerful to handle all kinds of
parameters like 'Yy1Nn0' or [oO][NnFf] for "on" and "off".

Link: https://lkml.kernel.org/r/20220512041142.39501-4-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13 16:48:56 -07:00
Jason A. Donenfeld
069c4ea687 random: fix sysctl documentation nits
A semicolon was missing, and the almost-alphabetical-but-not ordering
was confusing, so regroup these by category instead.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-13 23:59:12 +02:00
SeongJae Park
81a84182c3 Docs/admin-guide/mm/damon/reclaim: document 'commit_inputs' parameter
This commit documents the new DAMON_RECLAIM parameter, 'commit_inputs' in
its usage document.

Link: https://lkml.kernel.org/r/20220429160606.127307-15-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13 07:20:09 -07:00