Commit graph

1216127 commits

Author SHA1 Message Date
Steven Rostedt (Google)
1e0cb399c7 ring-buffer: Update "shortest_full" in polling
It was discovered that the ring buffer polling was incorrectly stating
that read would not block, but that's because polling did not take into
account that reads will block if the "buffer-percent" was set. Instead,
the ring buffer polling would say reads would not block if there was any
data in the ring buffer. This was incorrect behavior from a user space
point of view. This was fixed by commit 42fb0a1e84 by having the polling
code check if the ring buffer had more data than what the user specified
"buffer percent" had.

The problem now is that the polling code did not register itself to the
writer that it wanted to wait for a specific "full" value of the ring
buffer. The result was that the writer would wake the polling waiter
whenever there was a new event. The polling waiter would then wake up, see
that there's not enough data in the ring buffer to notify user space and
then go back to sleep. The next event would wake it up again.

Before the polling fix was added, the code would wake up around 100 times
for a hackbench 30 benchmark. After the "fix", due to the constant waking
of the writer, it would wake up over 11,0000 times! It would never leave
the kernel, so the user space behavior was still "correct", but this
definitely is not the desired effect.

To fix this, have the polling code add what it's waiting for to the
"shortest_full" variable, to tell the writer not to wake it up if the
buffer is not as full as it expects to be.

Note, after this fix, it appears that the waiter is now woken up around 2x
the times it was before (~200). This is a tremendous improvement from the
11,000 times, but I will need to spend some time to see why polling is
more aggressive in its wakeups than the read blocking code.

Link: https://lore.kernel.org/linux-trace-kernel/20230929180113.01c2cae3@rorschach.local.home

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Fixes: 42fb0a1e84 ("tracing/ring-buffer: Have polling block on watermark")
Reported-by: Julia Lawall <julia.lawall@inria.fr>
Tested-by: Julia Lawall <julia.lawall@inria.fr>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-09-30 16:17:34 -04:00
Linus Torvalds
3b517966c5 dma-mapping fixes for Linux 6.6
- fix the narea calculation in swiotlb initialization (Ross Lagerwall)
  - fix the check whether a device has used swiotlb (Petr Tesarik)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmUYWTULHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYMilA/8DomLRCrDy792MoRvBCThWaY6auW4bCUoY7S6VZe7
 LVECKAJRwFH7b+dk1mNjPhWTyq8/wB7kr/OLuU9HDcVIeiP9zxks4BMQ4RGc/ZuK
 rSZ5ZHPVVCC4EOI3ncjQXrODwgGkGUvtdriByCtX2r4NLBjO1T0vUQB4bLyBTZf+
 GnTCLxCkSrIxaqniRvM0K34yO/0rq0ci5840MNneR7MKQkVqPUDY83sHwL1KcQPf
 s16lwclQdjZdOVpFMPxFin5NpvPIrjdrvhoaxdnz+8ZuwSACqRUZDQuNlZ3+Zep6
 iaynNR04o0c2p0PTT5l3ZRD5vsyCjvc+/3kB3KlM33XbBArWi6XV+694QQn59JnZ
 5MmHoIulwZGLsIlTG188QreZBlLrmxylUX311Kot5ood/HW8DsYbTo/krbiiUgEk
 MXKWq9k6cQOdhgriS4zxvUl+xkjby12jvSFxv9tN3HHvFsFB8+veVrTuLZzEDXpX
 a5PrmI/dcQmlVpCZllzVzeTgL2KeE1Jo0uRZ1vXhuoX8IBys4/TstIXOB4jnyVb4
 kzrHbLIoVqLSN42eVMRKBrqGXGlZSWETBpkdSQ41St6t/3MurhKAWZ/1SFPXlI06
 SnatIdOU7nSZRofK8/Xe1CnWia5NUpyUQpb+tLUHTgo4kZzGV330bf34iJ+BvC1h
 aks=
 =bhgc
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-6.6-2023-09-30' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping fixes from Christoph Hellwig:

 - fix the narea calculation in swiotlb initialization (Ross Lagerwall)

 - fix the check whether a device has used swiotlb (Petr Tesarik)

* tag 'dma-mapping-6.6-2023-09-30' of git://git.infradead.org/users/hch/dma-mapping:
  swiotlb: fix the check whether a device has used software IO TLB
  swiotlb: use the calculated number of areas
2023-09-30 11:07:26 -07:00
Linus Torvalds
25d48d570e Bug fixes for 6.6-rc4:
* Handle a race between writing and shrinking block devices by
    returning EIO.
  * Fix a typo in a comment.
 
 Signed-off-by: Darrick J. Wong <djwong@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZRWpqAAKCRBKO3ySh0YR
 puinAP46EI8AvxQOid2ukGIEP09ZdhYNcJkWsigZ8k7Z/wcqagD/UVliaDPGC2kk
 rlrPN6jmNyqDzAP6muBmqu2v44GVWwY=
 =7nql
 -----END PGP SIGNATURE-----

Merge tag 'iomap-6.6-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull iomap fixes from Darrick Wong:

 - Handle a race between writing and shrinking block devices by
   returning EIO

 - Fix a typo in a comment

* tag 'iomap-6.6-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  iomap: Spelling s/preceeding/preceding/g
  iomap: add a workaround for racy i_size updates on block devices
2023-09-30 11:01:38 -07:00
Linus Torvalds
cefc06e4de Usual buisness: a driver fix, a DT fix, a minor core fix
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEOZGx6rniZ1Gk92RdFA3kzBSgKbYFAmUX8OkACgkQFA3kzBSg
 KbYasA/+KhTMt8iMduKHTjjHgB8hv3AYw5QY9gYxIFMMjM6UpSHqM+LI9bwAQTfq
 dpTykOXNHlROVs9iZeU8g7oYDX6H3ku5yYBJPptYDdcG7U3WLFt7ndUUOReQVU8L
 aPQ8yyFMuyDF1s5ItIE3DF0rSVX6OngIOOLcSGPALEcbUSuyfbjf0yw1bL2pW7dF
 2bX8RLKaeuvo1tt9g4AUzg+7gOQs9dxCE44Lpiqy1R2gJC/2WBxRdkYVdCYOR23s
 CZX7ds1TKUQJiFpyZsgTQt0Qc9K7253TpqxemQAjsfN982VPmIjVeX4OvjjJlgul
 AFISVN58WW4V28drw6kN9oCOwTal8Use8nBzVtVKqa7ebJ46UrUM6oJRHr3Z/xDP
 nBPnS48MGwyEym7QKy97ZV10u14OSj9J5sHn+125EHPo6TnW2SvQ/qfr51ejs619
 4lorzyBBwSgFoGk+jaeG85E46GkedmJmBm6iDlz+ADQIamMIlRgolNltClZusB3x
 DMsCI9zpx96tkHL7cG8OVynD+392kKSsSkYhzQbBQjZ10dbNtmBY92jkq2Q3zd2m
 w99pWhjREn82e4MYyAVwAMiNpCwDM4kbYVXr608FWryrctp0RKNNSS3Hu7w9AcT4
 6QcKhQulmSBvstfLXHqxu1Nuey25j91sNfVnTUulCgdmaBK7c00=
 =Fla+
 -----END PGP SIGNATURE-----

Merge tag 'i2c-for-6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
 "Usual business: a driver fix, a DT fix, a minor core fix"

* tag 'i2c-for-6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: npcm7xx: Fix callback completion ordering
  i2c: mux: Avoid potential false error message in i2c_mux_add_adapter
  dt-bindings: i2c: mxs: Pass ref and 'unevaluatedProperties: false'
2023-09-30 10:07:33 -07:00
Linus Torvalds
830380e317 ACPI fix for 6.6-rc4
Fix a possible NULL pointer dereference in the error path of
 acpi_video_bus_add() resulting from recent changes (Dinghao Liu).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmUXBBwSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxFxQP/3WiDVfRlcDpKJd/221dSdtttBziH8SX
 1r5n0eHgf5czl59DzzkCF57C+Quw9konBXj3hVYsP+YkgcgvgknctIlFMtw1inSr
 91ybwDDb6qeL2S99GxXkltx61o2DO/Xkv8AQ2HVhNKUtfCY6ihcaoS+Yfnm5ZYCX
 dEQ4BgWjfmYyfWXQKRA8NoSO+c1r8CtXGFpq7bA/og+z+YWS9MjZZb/3b03EhyIG
 vMuGbJzoxoWAUhmAVYwezv0r8iGCcdyv1z+nAwYq/XZnPZJPv5IKNjLz7Ou0hP4K
 MyorDD/wCDCZCV6hTTqy+ezS6xGnmNE6+xexe9fUv72+F2NTBRpj2sEeaWsxhC8Z
 l8vxWTJB0gZbuvDQIu/iNXA+edun+h2Inyg+AS2nYnPZWQqZLVrzcXNfeT6Q+uoM
 G3P+qsiJJxfDgbieoOsA1AoY0ITZzFlZzm5DuoXSHn8VAdGbtYiidpq6wCqo8Mhl
 sbFF4W01iE6EUosv2mzfA7DxHbq9zG4ydMeY/xb3hC22hDOdYXrnmormpIFjWVda
 VRhpRKBQvcH3isPA8WcaZNQIvArpLkc7cPbYWepQF7nsxLZqiA8ksacX2itFTF7j
 Hb5fHvvY+XUquEOqsm+73OBipiYYq79qlUVEs7QnShoUENBI8VpV4mojelQ+okwS
 pzDGDXt28H+R
 =+sV7
 -----END PGP SIGNATURE-----

Merge tag 'acpi-6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fix from Rafael Wysocki:
 "Fix a possible NULL pointer dereference in the error path of
  acpi_video_bus_add() resulting from recent changes (Dinghao Liu)"

* tag 'acpi-6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: video: Fix NULL pointer dereference in acpi_video_bus_add()
2023-09-30 09:59:37 -07:00
Linus Torvalds
1c9d831221 powerpc fixes for 6.6 #3
- Fix arch_stack_walk_reliable(), used by live patching.
 
  - Fix powerpc selftests to work with run_kselftest.sh
 
 Thanks to: Joe Lawrence, Petr Mladek.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmUYGLITHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgMJ3D/9UqgUm8f0vIGYu2jhvTYoR+b2Dqwls
 PFQKqiN78gWi/8njduY1j1U4EB8fJuFS972mDV04+mvUBM1Qj/jcCDKyO0lpYb9C
 ns+UUYFXF5QNBNrnSUSaItPRpXPceHEcIsVMolhRMBZHvKO1sl8D+kmpmnUML3Do
 qf/u508Xz3oB/+hPPPF/rOew9UHTroosD5/BPmaYzvjLx+8NTtJGuQI5spqyBI80
 nerphmJoE7TjA08QgbtKd15xyYd+wciADJVPRzCnOJ2SNka8D+marmFG53c1LGLc
 j/5gbmogy7PtKYpCDOg8XAjqs1VMggZVGECC6VRAF2+IykdQP+6eV9rGluUafbQr
 51bRClOe4dUIBqr0RIbNZGA80+hb9tqVEe5J2jyO4/f9XLd49prY2OSxAIgteCt8
 Ak3W8TcSKcmzx45yPyg0Xac66HmvFzvCkj8oZTLT4m+11oPjWoxNSy3040anxJ8Y
 htYhl5T/tsEU8ZC96y4RfP1T20HQZW4PmXsJKLW/MADy5sN7YCzpMf9G2uj4id5J
 fizB7tRm2CmIfHsw6e8KGJ6NrD2uJcOwLR6Y7zb7QJ0PzwgYQWfjTCMXo2D1zEak
 RzyaKcY/ySkFZMb1sb867nc+VFCAG1UNJiNiZKNifZ0RElx+6U2KgZTUwYYs2aj2
 jdudr2tl5BwhGQ==
 =gh/A
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix arch_stack_walk_reliable(), used by live patching

 - Fix powerpc selftests to work with run_kselftest.sh

Thanks to Joe Lawrence and Petr Mladek.

* tag 'powerpc-6.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  selftests/powerpc: Fix emit_tests to work with run_kselftest.sh
  powerpc/stacktrace: Fix arch_stack_walk_reliable()
2023-09-30 09:53:09 -07:00
Linus Torvalds
ae21363998 nfsd-6.6 fixes:
- Fix NFSv4 READ corner case
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmUYSnEACgkQM2qzM29m
 f5dtSxAAk/n3CzeIJFALIcP+JmHW6Zh2KbpygcqhxFoO1FAyRBJgOIEhtdD/c65Q
 fzfCQDZUzdTSirklBS5m0lIMhqaYMe1LUC8OIq/hh309vQ6WnlLltIU3fsaZFva5
 kAaDeI/oGq2YnbjQENWCGHvKC7RR1jOTWASI/+52oYg/fwRkt14so/wDl54mxhNP
 q6Gt0Xw/mXkslAkQNQ+7a5eAa4TtepzkjkNWL4hSqboHQ4QFT2tmXqDV3E+V9tzX
 F/tlEN3EJo3nDNcUYAh6ec/YXo0tBh4DsnJxA9bRYloU40EUGP19ewXLEQgXPw9M
 IgSNluxTmDTQGz6kSUkQpepevWDUN0+fDt1kVqACIRXuvuMcGuDUCbD4VMa8HNnc
 bCMg8R/SkrBnzTamHJpPhbV/F/C0Iwp80RpqDMAnt4splrQUCJeB+Wl1qJAnYUW5
 TdJxSJKCvypMIcNCd1ZMSVbeByUgZ/qXKKsS2YNS4l8+DGdnEJQ2mnFKw3Nk5bEF
 byWoyW6RMXCIwaqQa8hNsO5kW4MTxURxLXcgqzN7+5L2ht7pAYpNjnOc0vIBPe5t
 9z+/dSk8a9QXBwRTxPL4QOgludXPqvokGBtCIV1LD5xQA9OZuXivnBz2DWmCt+IR
 GOYfjs3gSlvVrv7+EGffYlOh9f7GN/wlUfUxERKcB6HPweGOtqs=
 =aD10
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fix from Chuck Lever:

 - Fix NFSv4 READ corner case

* tag 'nfsd-6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  NFSD: Fix zero NFSv4 READ results when RQ_SPLICE_OK is not set
2023-09-30 09:44:48 -07:00
Linus Torvalds
ba77f7a63f smb3 client fix for password freeing potential oops
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmUXmlQACgkQiiy9cAdy
 T1Gh8Av8DprSP5ARIljIzuzPL60R/TF8WvWTuQZ93+KzGkM0eMXSvZ6VH0T3WXGs
 amfgwZFw1TOiQ6j3OqxX7ppd2PeooY65mX2tKVEQ27POQXB3VyUy2gBSPCTsDckD
 uOFX55GoPzkxxRgTWO7UHAbBzrZDJ8geH12Z0z9EXgJJjnJhkQi7dPkA+OusE3y0
 wo4AjKOk+7HOFiuvG3p1XZKYHeB36PD+xATqRBxJAaUGkgV76stFEK/7lNWzxg/t
 NdiHQG5ILZP6L2RZrCVX88Et3HggGV5AF3GpDb8ZNfD7xQcu8lgVzHlw2E6CA5DS
 WFR6zrBiBsz9mPrS9cWY83C+aL8MHeumalBjLKQ5krL+IpC46H7a12yAWt5Yk6Q7
 zdRQ9TtE8CMozqEJiaCFgL1Hz1CuTCvK974q+p5RG8OejRLzgmzFKYp8k7+L0qrc
 tZXQ16McYkBRcRIgaOk0i8GpJ44qNdhu/KTI5iKNqH0ScfHn56CIBiwSmZHiX75t
 zcCdF0bz
 =KBcN
 -----END PGP SIGNATURE-----

Merge tag '6.6-rc3-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fix from Steve French:
 "Fix for password freeing potential oops (also for stable)"

* tag '6.6-rc3-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6:
  fs/smb/client: Reset password pointer to NULL
2023-09-30 09:39:23 -07:00
Baoquan He
e2a8f20dd8 Crash: add lock to serialize crash hotplug handling
Eric reported that handling corresponding crash hotplug event can be
failed easily when many memory hotplug event are notified in a short
period.  They failed because failing to take __kexec_lock.

=======
[   78.714569] Fallback order for Node 0: 0
[   78.714575] Built 1 zonelists, mobility grouping on.  Total pages: 1817886
[   78.717133] Policy zone: Normal
[   78.724423] crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate
[   78.727207] crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate
[   80.056643] PEFILE: Unsigned PE binary
=======

The memory hotplug events are notified very quickly and very many, while
the handling of crash hotplug is much slower relatively.  So the atomic
variable __kexec_lock and kexec_trylock() can't guarantee the
serialization of crash hotplug handling.

Here, add a new mutex lock __crash_hotplug_lock to serialize crash hotplug
handling specifically.  This doesn't impact the usage of __kexec_lock.

Link: https://lkml.kernel.org/r/20230926120905.392903-1-bhe@redhat.com
Fixes: 2472627561 ("crash: add generic infrastructure for crash hotplug support")
Signed-off-by: Baoquan He <bhe@redhat.com>
Tested-by: Eric DeVolder <eric.devolder@oracle.com>
Reviewed-by: Eric DeVolder <eric.devolder@oracle.com>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-29 17:20:48 -07:00
Juntong Deng
bbe246f875 selftests/mm: fix awk usage in charge_reserved_hugetlb.sh and hugetlb_reparenting_test.sh that may cause error
According to the awk manual, the -e option does not need to be specified
in front of 'program' (unless you need to mix program-file).

The redundant -e option can cause error when users use awk tools other
than gawk (for example, mawk does not support the -e option).

Error Example:
awk: not an option: -e

Link: https://lkml.kernel.org/r/VI1P193MB075228810591AF2FDD7D42C599C3A@VI1P193MB0752.EURP193.PROD.OUTLOOK.COM
Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-29 17:20:48 -07:00
Yang Shi
24526268f4 mm: mempolicy: keep VMA walk if both MPOL_MF_STRICT and MPOL_MF_MOVE are specified
When calling mbind() with MPOL_MF_{MOVE|MOVEALL} | MPOL_MF_STRICT, kernel
should attempt to migrate all existing pages, and return -EIO if there is
misplaced or unmovable page.  Then commit 6f4576e368 ("mempolicy: apply
page table walker on queue_pages_range()") messed up the return value and
didn't break VMA scan early ianymore when MPOL_MF_STRICT alone.  The
return value problem was fixed by commit a7f40cfe3b ("mm: mempolicy:
make mbind() return -EIO when MPOL_MF_STRICT is specified"), but it broke
the VMA walk early if unmovable page is met, it may cause some pages are
not migrated as expected.

The code should conceptually do:

 if (MPOL_MF_MOVE|MOVEALL)
     scan all vmas
     try to migrate the existing pages
     return success
 else if (MPOL_MF_MOVE* | MPOL_MF_STRICT)
     scan all vmas
     try to migrate the existing pages
     return -EIO if unmovable or migration failed
 else /* MPOL_MF_STRICT alone */
     break early if meets unmovable and don't call mbind_range() at all
 else /* none of those flags */
     check the ranges in test_walk, EFAULT without mbind_range() if discontig.

Fixed the behavior.

Link: https://lkml.kernel.org/r/20230920223242.3425775-1-yang@os.amperecomputing.com
Fixes: a7f40cfe3b ("mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified")
Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org>	[4.9+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-29 17:20:48 -07:00
Jinjie Ruan
45120b1574 mm/damon/vaddr-test: fix memory leak in damon_do_test_apply_three_regions()
When CONFIG_DAMON_VADDR_KUNIT_TEST=y and making CONFIG_DEBUG_KMEMLEAK=y
and CONFIG_DEBUG_KMEMLEAK_AUTO_SCAN=y, the below memory leak is detected.

Since commit 9f86d62429 ("mm/damon/vaddr-test: remove unnecessary
variables"), the damon_destroy_ctx() is removed, but still call
damon_new_target() and damon_new_region(), the damon_region which is
allocated by kmem_cache_alloc() in damon_new_region() and the damon_target
which is allocated by kmalloc in damon_new_target() are not freed.  And
the damon_region which is allocated in damon_new_region() in
damon_set_regions() is also not freed.

So use damon_destroy_target to free all the damon_regions and damon_target.

    unreferenced object 0xffff888107c9a940 (size 64):
      comm "kunit_try_catch", pid 1069, jiffies 4294670592 (age 732.761s)
      hex dump (first 32 bytes):
        00 00 00 00 00 00 00 00 06 00 00 00 6b 6b 6b 6b  ............kkkk
        60 c7 9c 07 81 88 ff ff f8 cb 9c 07 81 88 ff ff  `...............
      backtrace:
        [<ffffffff817e0167>] kmalloc_trace+0x27/0xa0
        [<ffffffff819c11cf>] damon_new_target+0x3f/0x1b0
        [<ffffffff819c7d55>] damon_do_test_apply_three_regions.constprop.0+0x95/0x3e0
        [<ffffffff819c82be>] damon_test_apply_three_regions1+0x21e/0x260
        [<ffffffff829fce6a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
        [<ffffffff81237cf6>] kthread+0x2b6/0x380
        [<ffffffff81097add>] ret_from_fork+0x2d/0x70
        [<ffffffff81003791>] ret_from_fork_asm+0x11/0x20
    unreferenced object 0xffff8881079cc740 (size 56):
      comm "kunit_try_catch", pid 1069, jiffies 4294670592 (age 732.761s)
      hex dump (first 32 bytes):
        05 00 00 00 00 00 00 00 14 00 00 00 00 00 00 00  ................
        6b 6b 6b 6b 6b 6b 6b 6b 00 00 00 00 6b 6b 6b 6b  kkkkkkkk....kkkk
      backtrace:
        [<ffffffff819bc492>] damon_new_region+0x22/0x1c0
        [<ffffffff819c7d91>] damon_do_test_apply_three_regions.constprop.0+0xd1/0x3e0
        [<ffffffff819c82be>] damon_test_apply_three_regions1+0x21e/0x260
        [<ffffffff829fce6a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
        [<ffffffff81237cf6>] kthread+0x2b6/0x380
        [<ffffffff81097add>] ret_from_fork+0x2d/0x70
        [<ffffffff81003791>] ret_from_fork_asm+0x11/0x20
    unreferenced object 0xffff888107c9ac40 (size 64):
      comm "kunit_try_catch", pid 1071, jiffies 4294670595 (age 732.843s)
      hex dump (first 32 bytes):
        00 00 00 00 00 00 00 00 06 00 00 00 6b 6b 6b 6b  ............kkkk
        a0 cc 9c 07 81 88 ff ff 78 a1 76 07 81 88 ff ff  ........x.v.....
      backtrace:
        [<ffffffff817e0167>] kmalloc_trace+0x27/0xa0
        [<ffffffff819c11cf>] damon_new_target+0x3f/0x1b0
        [<ffffffff819c7d55>] damon_do_test_apply_three_regions.constprop.0+0x95/0x3e0
        [<ffffffff819c851e>] damon_test_apply_three_regions2+0x21e/0x260
        [<ffffffff829fce6a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
        [<ffffffff81237cf6>] kthread+0x2b6/0x380
        [<ffffffff81097add>] ret_from_fork+0x2d/0x70
        [<ffffffff81003791>] ret_from_fork_asm+0x11/0x20
    unreferenced object 0xffff8881079ccc80 (size 56):
      comm "kunit_try_catch", pid 1071, jiffies 4294670595 (age 732.843s)
      hex dump (first 32 bytes):
        05 00 00 00 00 00 00 00 14 00 00 00 00 00 00 00  ................
        6b 6b 6b 6b 6b 6b 6b 6b 00 00 00 00 6b 6b 6b 6b  kkkkkkkk....kkkk
      backtrace:
        [<ffffffff819bc492>] damon_new_region+0x22/0x1c0
        [<ffffffff819c7d91>] damon_do_test_apply_three_regions.constprop.0+0xd1/0x3e0
        [<ffffffff819c851e>] damon_test_apply_three_regions2+0x21e/0x260
        [<ffffffff829fce6a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
        [<ffffffff81237cf6>] kthread+0x2b6/0x380
        [<ffffffff81097add>] ret_from_fork+0x2d/0x70
        [<ffffffff81003791>] ret_from_fork_asm+0x11/0x20
    unreferenced object 0xffff888107c9af40 (size 64):
      comm "kunit_try_catch", pid 1073, jiffies 4294670597 (age 733.011s)
      hex dump (first 32 bytes):
        00 00 00 00 00 00 00 00 06 00 00 00 6b 6b 6b 6b  ............kkkk
        20 a2 76 07 81 88 ff ff b8 a6 76 07 81 88 ff ff   .v.......v.....
      backtrace:
        [<ffffffff817e0167>] kmalloc_trace+0x27/0xa0
        [<ffffffff819c11cf>] damon_new_target+0x3f/0x1b0
        [<ffffffff819c7d55>] damon_do_test_apply_three_regions.constprop.0+0x95/0x3e0
        [<ffffffff819c877e>] damon_test_apply_three_regions3+0x21e/0x260
        [<ffffffff829fce6a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
        [<ffffffff81237cf6>] kthread+0x2b6/0x380
        [<ffffffff81097add>] ret_from_fork+0x2d/0x70
        [<ffffffff81003791>] ret_from_fork_asm+0x11/0x20
    unreferenced object 0xffff88810776a200 (size 56):
      comm "kunit_try_catch", pid 1073, jiffies 4294670597 (age 733.011s)
      hex dump (first 32 bytes):
        05 00 00 00 00 00 00 00 14 00 00 00 00 00 00 00  ................
        6b 6b 6b 6b 6b 6b 6b 6b 00 00 00 00 6b 6b 6b 6b  kkkkkkkk....kkkk
      backtrace:
        [<ffffffff819bc492>] damon_new_region+0x22/0x1c0
        [<ffffffff819c7d91>] damon_do_test_apply_three_regions.constprop.0+0xd1/0x3e0
        [<ffffffff819c877e>] damon_test_apply_three_regions3+0x21e/0x260
        [<ffffffff829fce6a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
        [<ffffffff81237cf6>] kthread+0x2b6/0x380
        [<ffffffff81097add>] ret_from_fork+0x2d/0x70
        [<ffffffff81003791>] ret_from_fork_asm+0x11/0x20
    unreferenced object 0xffff88810776a740 (size 56):
      comm "kunit_try_catch", pid 1073, jiffies 4294670597 (age 733.025s)
      hex dump (first 32 bytes):
        3d 00 00 00 00 00 00 00 3f 00 00 00 00 00 00 00  =.......?.......
        6b 6b 6b 6b 6b 6b 6b 6b 00 00 00 00 6b 6b 6b 6b  kkkkkkkk....kkkk
      backtrace:
        [<ffffffff819bc492>] damon_new_region+0x22/0x1c0
        [<ffffffff819bfcc2>] damon_set_regions+0x4c2/0x8e0
        [<ffffffff819c7dbb>] damon_do_test_apply_three_regions.constprop.0+0xfb/0x3e0
        [<ffffffff819c877e>] damon_test_apply_three_regions3+0x21e/0x260
        [<ffffffff829fce6a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
        [<ffffffff81237cf6>] kthread+0x2b6/0x380
        [<ffffffff81097add>] ret_from_fork+0x2d/0x70
        [<ffffffff81003791>] ret_from_fork_asm+0x11/0x20
    unreferenced object 0xffff888108038240 (size 64):
      comm "kunit_try_catch", pid 1075, jiffies 4294670600 (age 733.022s)
      hex dump (first 32 bytes):
        00 00 00 00 00 00 00 00 03 00 00 00 6b 6b 6b 6b  ............kkkk
        48 ad 76 07 81 88 ff ff 98 ae 76 07 81 88 ff ff  H.v.......v.....
      backtrace:
        [<ffffffff817e0167>] kmalloc_trace+0x27/0xa0
        [<ffffffff819c11cf>] damon_new_target+0x3f/0x1b0
        [<ffffffff819c7d55>] damon_do_test_apply_three_regions.constprop.0+0x95/0x3e0
        [<ffffffff819c898d>] damon_test_apply_three_regions4+0x1cd/0x210
        [<ffffffff829fce6a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
        [<ffffffff81237cf6>] kthread+0x2b6/0x380
        [<ffffffff81097add>] ret_from_fork+0x2d/0x70
        [<ffffffff81003791>] ret_from_fork_asm+0x11/0x20
    unreferenced object 0xffff88810776ad28 (size 56):
      comm "kunit_try_catch", pid 1075, jiffies 4294670600 (age 733.022s)
      hex dump (first 32 bytes):
        05 00 00 00 00 00 00 00 07 00 00 00 00 00 00 00  ................
        6b 6b 6b 6b 6b 6b 6b 6b 00 00 00 00 6b 6b 6b 6b  kkkkkkkk....kkkk
      backtrace:
        [<ffffffff819bc492>] damon_new_region+0x22/0x1c0
        [<ffffffff819bfcc2>] damon_set_regions+0x4c2/0x8e0
        [<ffffffff819c7dbb>] damon_do_test_apply_three_regions.constprop.0+0xfb/0x3e0
        [<ffffffff819c898d>] damon_test_apply_three_regions4+0x1cd/0x210
        [<ffffffff829fce6a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
        [<ffffffff81237cf6>] kthread+0x2b6/0x380
        [<ffffffff81097add>] ret_from_fork+0x2d/0x70
        [<ffffffff81003791>] ret_from_fork_asm+0x11/0x20

Link: https://lkml.kernel.org/r/20230925072100.3725620-1-ruanjinjie@huawei.com
Fixes: 9f86d62429 ("mm/damon/vaddr-test: remove unnecessary variables")
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-29 17:20:47 -07:00
Michal Hocko
4597648fdd mm, memcg: reconsider kmem.limit_in_bytes deprecation
This reverts commits 86327e8eb9 ("memcg: drop kmem.limit_in_bytes") and
partially reverts 58056f7750 ("memcg, kmem: further deprecate
kmem.limit_in_bytes") which have incrementally removed support for the
kernel memory accounting hard limit.  Unfortunately it has turned out that
there is still userspace depending on the existence of
memory.kmem.limit_in_bytes [1].  The underlying functionality is not
really required but the non-existent file just confuses the userspace
which fails in the result.  The patch to fix this on the userspace side
has been submitted but it is hard to predict how it will propagate through
the maze of 3rd party consumers of the software.

Now, reverting alone 86327e8eb9 is not an option because there is
another set of userspace which cannot cope with ENOTSUPP returned when
writing to the file.  Therefore we have to go and revisit 58056f7750 as
well.  There are two ways to go ahead.  Either we give up on the
deprecation and fully revert 58056f7750 as well or we can keep
kmem.limit_in_bytes but make the write a noop and warn about the fact. 
This should work for both known breaking workloads which depend on the
existence but do not depend on the hard limit enforcement.

Note to backporters to stable trees.  a8c49af3be ("memcg: add per-memcg
total kernel memory stat") introduced in 4.18 has added memcg_account_kmem
so the accounting is not done by obj_cgroup_charge_pages directly for v1
anymore.  Prior kernels need to add it explicitly (thanks to Johannes for
pointing this out).

[akpm@linux-foundation.org: fix build - remove unused local]
Link: http://lkml.kernel.org/r/20230920081101.GA12096@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net [1]
Link: https://lkml.kernel.org/r/ZRE5VJozPZt9bRPy@dhcp22.suse.cz
Fixes: 86327e8eb9 ("memcg: drop kmem.limit_in_bytes")
Fixes: 58056f7750 ("memcg, kmem: further deprecate kmem.limit_in_bytes")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Tejun heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-29 17:20:47 -07:00
Domenico Cerasuolo
ca56489c2f mm: zswap: fix potential memory corruption on duplicate store
While stress-testing zswap a memory corruption was happening when writing
back pages.  __frontswap_store used to check for duplicate entries before
attempting to store a page in zswap, this was because if the store fails
the old entry isn't removed from the tree.  This change removes duplicate
entries in zswap_store before the actual attempt.

[cerasuolodomenico@gmail.com: add a warning and a comment, per Johannes]
  Link: https://lkml.kernel.org/r/20230925130002.1929369-1-cerasuolodomenico@gmail.com
Link: https://lkml.kernel.org/r/20230922172211.1704917-1-cerasuolodomenico@gmail.com
Fixes: 42c06a0e8e ("mm: kill frontswap")
Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-29 17:20:47 -07:00
Ryan Roberts
6f1bace9a9 arm64: hugetlb: fix set_huge_pte_at() to work with all swap entries
When called with a swap entry that does not embed a PFN (e.g. 
PTE_MARKER_POISONED or PTE_MARKER_UFFD_WP), the previous implementation of
set_huge_pte_at() would either cause a BUG() to fire (if CONFIG_DEBUG_VM
is enabled) or cause a dereference of an invalid address and subsequent
panic.

arm64's huge pte implementation supports multiple huge page sizes, some of
which are implemented in the page table with multiple contiguous entries. 
So set_huge_pte_at() needs to work out how big the logical pte is, so that
it can also work out how many physical ptes (or pmds) need to be written. 
It previously did this by grabbing the folio out of the pte and querying
its size.

However, there are cases when the pte being set is actually a swap entry. 
But this also used to work fine, because for huge ptes, we only ever saw
migration entries and hwpoison entries.  And both of these types of swap
entries have a PFN embedded, so the code would grab that and everything
still worked out.

But over time, more calls to set_huge_pte_at() have been added that set
swap entry types that do not embed a PFN.  And this causes the code to go
bang.  The triggering case is for the uffd poison test, commit
99aa77215a ("selftests/mm: add uffd unit test for UFFDIO_POISON"), which
causes a PTE_MARKER_POISONED swap entry to be set, coutesey of commit
8a13897fb0 ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs") -
added in v6.5-rc7.  Although review shows that there are other call sites
that set PTE_MARKER_UFFD_WP (which also has no PFN), these don't trigger
on arm64 because arm64 doesn't support UFFD WP.

Arguably, the root cause is really due to commit 18f3962953 ("mm:
hugetlb: kill set_huge_swap_pte_at()"), which aimed to simplify the
interface to the core code by removing set_huge_swap_pte_at() (which took
a page size parameter) and replacing it with calls to set_huge_pte_at()
where the size was inferred from the folio, as descibed above.  While that
commit didn't break anything at the time, it did break the interface
because it couldn't handle swap entries without PFNs.  And since then new
callers have come along which rely on this working.  But given the
brokeness is only observable after commit 8a13897fb0 ("mm: userfaultfd:
support UFFDIO_POISON for hugetlbfs"), that one gets the Fixes tag.

Now that we have modified the set_huge_pte_at() interface to pass the huge
page size in the previous patch, we can trivially fix this issue.

Link: https://lkml.kernel.org/r/20230922115804.2043771-3-ryan.roberts@arm.com
Fixes: 8a13897fb0 ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs")
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org>	[6.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-29 17:20:47 -07:00
Ryan Roberts
935d4f0c6d mm: hugetlb: add huge page size param to set_huge_pte_at()
Patch series "Fix set_huge_pte_at() panic on arm64", v2.

This series fixes a bug in arm64's implementation of set_huge_pte_at(),
which can result in an unprivileged user causing a kernel panic.  The
problem was triggered when running the new uffd poison mm selftest for
HUGETLB memory.  This test (and the uffd poison feature) was merged for
v6.5-rc7.

Ideally, I'd like to get this fix in for v6.6 and I've cc'ed stable
(correctly this time) to get it backported to v6.5, where the issue first
showed up.


Description of Bug
==================

arm64's huge pte implementation supports multiple huge page sizes, some of
which are implemented in the page table with multiple contiguous entries. 
So set_huge_pte_at() needs to work out how big the logical pte is, so that
it can also work out how many physical ptes (or pmds) need to be written. 
It previously did this by grabbing the folio out of the pte and querying
its size.

However, there are cases when the pte being set is actually a swap entry. 
But this also used to work fine, because for huge ptes, we only ever saw
migration entries and hwpoison entries.  And both of these types of swap
entries have a PFN embedded, so the code would grab that and everything
still worked out.

But over time, more calls to set_huge_pte_at() have been added that set
swap entry types that do not embed a PFN.  And this causes the code to go
bang.  The triggering case is for the uffd poison test, commit
99aa77215a ("selftests/mm: add uffd unit test for UFFDIO_POISON"), which
causes a PTE_MARKER_POISONED swap entry to be set, coutesey of commit
8a13897fb0 ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs") -
added in v6.5-rc7.  Although review shows that there are other call sites
that set PTE_MARKER_UFFD_WP (which also has no PFN), these don't trigger
on arm64 because arm64 doesn't support UFFD WP.

If CONFIG_DEBUG_VM is enabled, we do at least get a BUG(), but otherwise,
it will dereference a bad pointer in page_folio():

    static inline struct folio *hugetlb_swap_entry_to_folio(swp_entry_t entry)
    {
        VM_BUG_ON(!is_migration_entry(entry) && !is_hwpoison_entry(entry));

        return page_folio(pfn_to_page(swp_offset_pfn(entry)));
    }


Fix
===

The simplest fix would have been to revert the dodgy cleanup commit
18f3962953 ("mm: hugetlb: kill set_huge_swap_pte_at()"), but since
things have moved on, this would have required an audit of all the new
set_huge_pte_at() call sites to see if they should be converted to
set_huge_swap_pte_at().  As per the original intent of the change, it
would also leave us open to future bugs when people invariably get it
wrong and call the wrong helper.

So instead, I've added a huge page size parameter to set_huge_pte_at(). 
This means that the arm64 code has the size in all cases.  It's a bigger
change, due to needing to touch the arches that implement the function,
but it is entirely mechanical, so in my view, low risk.

I've compile-tested all touched arches; arm64, parisc, powerpc, riscv,
s390, sparc (and additionally x86_64).  I've additionally booted and run
mm selftests against arm64, where I observe the uffd poison test is fixed,
and there are no other regressions.


This patch (of 2):

In order to fix a bug, arm64 needs to be told the size of the huge page
for which the pte is being set in set_huge_pte_at().  Provide for this by
adding an `unsigned long sz` parameter to the function.  This follows the
same pattern as huge_pte_clear().

This commit makes the required interface modifications to the core mm as
well as all arches that implement this function (arm64, parisc, powerpc,
riscv, s390, sparc).  The actual arm64 bug will be fixed in a separate
commit.

No behavioral changes intended.

Link: https://lkml.kernel.org/r/20230922115804.2043771-1-ryan.roberts@arm.com
Link: https://lkml.kernel.org/r/20230922115804.2043771-2-ryan.roberts@arm.com
Fixes: 8a13897fb0 ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs")
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>	[powerpc 8xx]
Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>	[vmalloc change]
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org>	[6.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-29 17:20:47 -07:00
Liam R. Howlett
a8091f039c maple_tree: add MAS_UNDERFLOW and MAS_OVERFLOW states
When updating the maple tree iterator to avoid rewalks, an issue was
introduced when shifting beyond the limits.  This can be seen by trying to
go to the previous address of 0, which would set the maple node to
MAS_NONE and keep the range as the last entry.

Subsequent calls to mas_find() would then search upwards from mas->last
and skip the value at mas->index/mas->last.  This showed up as a bug in
mprotect which skips the actual VMA at the current range after attempting
to go to the previous VMA from 0.

Since MAS_NONE may already be set when searching for a value that isn't
contained within a node, changing the handling of MAS_NONE in mas_find()
would make the code more complicated and error prone.  Furthermore, there
was no way to tell which limit was hit, and thus which action to take
(next or the entry at the current range).

This solution is to add two states to track what happened with the
previous iterator action.  This allows for the expected behaviour of the
next command to return the correct item (either the item at the range
requested, or the next/previous).

Tests are also added and updated accordingly.

Link: https://lkml.kernel.org/r/20230921181236.509072-3-Liam.Howlett@oracle.com
Link: https://gist.github.com/heatd/85d2971fae1501b55b6ea401fbbe485b
Link: https://lore.kernel.org/linux-mm/20230921181236.509072-1-Liam.Howlett@oracle.com/
Fixes: 39193685d5 ("maple_tree: try harder to keep active node with mas_prev()")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: Pedro Falcato <pedro.falcato@gmail.com>
Closes: https://gist.github.com/heatd/85d2971fae1501b55b6ea401fbbe485b
Closes: https://bugs.archlinux.org/task/79656
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-29 17:20:46 -07:00
Liam R. Howlett
5c590804b6 maple_tree: add mas_is_active() to detect in-tree walks
Patch series "maple_tree: Fix mas_prev() state regression".

Pedro Falcato retported an mprotect regression [1] which was bisected back
to the iterator changes for maple tree.  Root cause analysis showed the
mas_prev() running off the end of the VMA space (previous from 0) followed
by mas_find(), would skip the first value.

This patchset introduces maple state underflow/overflow so the sequence of
calls on the maple state will return what the user expects.

Users who encounter this bug may see mprotect(), userfaultfd_register(),
and mlock() fail on VMAs mapped with address 0.


This patch (of 2):

Instead of constantly checking each possibility of the maple state,
create a fast path that will skip over checking unlikely states.

Link: https://lkml.kernel.org/r/20230921181236.509072-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20230921181236.509072-2-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-29 17:20:46 -07:00
Pan Bian
7ee29facd8 nilfs2: fix potential use after free in nilfs_gccache_submit_read_data()
In nilfs_gccache_submit_read_data(), brelse(bh) is called to drop the
reference count of bh when the call to nilfs_dat_translate() fails.  If
the reference count hits 0 and its owner page gets unlocked, bh may be
freed.  However, bh->b_page is dereferenced to put the page after that,
which may result in a use-after-free bug.  This patch moves the release
operation after unlocking and putting the page.

NOTE: The function in question is only called in GC, and in combination
with current userland tools, address translation using DAT does not occur
in that function, so the code path that causes this issue will not be
executed.  However, it is possible to run that code path by intentionally
modifying the userland GC library or by calling the GC ioctl directly.

[konishi.ryusuke@gmail.com: NOTE added to the commit log]
Link: https://lkml.kernel.org/r/1543201709-53191-1-git-send-email-bianpan2016@163.com
Link: https://lkml.kernel.org/r/20230921141731.10073-1-konishi.ryusuke@gmail.com
Fixes: a3d93f709e ("nilfs2: block cache for garbage collection")
Signed-off-by: Pan Bian <bianpan2016@163.com>
Reported-by: Ferry Meng <mengferry@linux.alibaba.com>
Closes: https://lkml.kernel.org/r/20230818092022.111054-1-mengferry@linux.alibaba.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-29 17:20:46 -07:00
Matthew Wilcox (Oracle)
ce60f27bb6 mm: abstract moving to the next PFN
In order to fix the L1TF vulnerability, x86 can invert the PTE bits for
PROT_NONE VMAs, which means we cannot move from one PTE to the next by
adding 1 to the PFN field of the PTE.  This results in the BUG reported at
[1].

Abstract advancing the PTE to the next PFN through a pte_next_pfn()
function/macro.

Link: https://lkml.kernel.org/r/20230920040958.866520-1-willy@infradead.org
Fixes: bcc6cc8325 ("mm: add default definition of set_ptes()")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reported-by: syzbot+55cc72f8cc3a549119df@syzkaller.appspotmail.com
Closes: https://lkml.kernel.org/r/000000000000d099fa0604f03351@google.com [1]
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-29 17:20:46 -07:00
Matthew Wilcox (Oracle)
a501a07030 mm: report success more often from filemap_map_folio_range()
Even though we had successfully mapped the relevant page, we would rarely
return success from filemap_map_folio_range().  That leads to falling back
from the VMA lock path to the mmap_lock path, which is a speed &
scalability issue.  Found by inspection.

Link: https://lkml.kernel.org/r/20230920035336.854212-1-willy@infradead.org
Fixes: 617c28ecab ("filemap: batch PTE mappings")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-29 17:20:45 -07:00
Greg Ungerer
7c31515857 fs: binfmt_elf_efpic: fix personality for ELF-FDPIC
The elf-fdpic loader hard sets the process personality to either
PER_LINUX_FDPIC for true elf-fdpic binaries or to PER_LINUX for normal ELF
binaries (in this case they would be constant displacement compiled with
-pie for example).  The problem with that is that it will lose any other
bits that may be in the ELF header personality (such as the "bug
emulation" bits).

On the ARM architecture the ADDR_LIMIT_32BIT flag is used to signify a
normal 32bit binary - as opposed to a legacy 26bit address binary.  This
matters since start_thread() will set the ARM CPSR register as required
based on this flag.  If the elf-fdpic loader loses this bit the process
will be mis-configured and crash out pretty quickly.

Modify elf-fdpic loader personality setting so that it preserves the upper
three bytes by using the SET_PERSONALITY macro to set it.  This macro in
the generic case sets PER_LINUX and preserves the upper bytes. 
Architectures can override this for their specific use case, and ARM does
exactly this.

The problem shows up quite easily running under qemu using the ARM
architecture, but not necessarily on all types of real ARM hardware.  If
the underlying ARM processor does not support the legacy 26-bit addressing
mode then everything will work as expected.

Link: https://lkml.kernel.org/r/20230907011808.2985083-1-gerg@kernel.org
Fixes: 1bde925d23 ("fs/binfmt_elf_fdpic.c: provide NOMMU loader for regular ELF binaries")
Signed-off-by: Greg Ungerer <gerg@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Greg Ungerer <gerg@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-29 17:20:45 -07:00
Linus Torvalds
9f3ebbef74 Two SMB3 server fixes
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmUXOMIACgkQiiy9cAdy
 T1G+XQv9Fj1kWJRPHih1wTRFHAgysHRtFw04KvW9SLmpGkPLBslJm3Fg1yPiXytc
 nfqZNCPS/tuWIdqc9YRJqEWKfCO6X/+0IESnN6Wl4jIqMSviL/Hg3DzXZr5YCsAy
 1vJ+DYmQkmWNgZ8grnFjKCSezTrAb+b+VLZqsx7dzT8NhTRxdKoTBDS31jTGMswV
 OIQ1b/aLv9hgUS08wqzSKveMq8DkX66UbkSakM+tVImu32eh7u1HG89P7y3e/dr3
 lGwd/Pq+IIiyAgZ0uoPdI9hQ2+Md6JfhVFMiMTLUfwh1LCDgrYnYrOkuZWYvDr9z
 t2Y0+IwEkljk7HcFaL0NKPJW2beG4eNh/t2t6ff6vK8MhzlXp3KM5yVBlXNYc7hA
 JfsMxVIFhUobeRKbQY9S6BstHyo19pdfeHDm/+RicIhRfOo++7kWYzwqKsD0pvLC
 wcr3CBLqqsPXamRwUBbxnMASjYVmoz4nSXusXLDxmSWK39NCjEIz3YeFZfcAdoou
 jnvMikMA
 =jim3
 -----END PGP SIGNATURE-----

Merge tag '6.6-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:
 "Two SMB3 server fixes for null pointer dereferences:

   - invalid SMB3 request case (fixes issue found in testing the read
     compound patch)

   - iovec error case in response processing"

* tag '6.6-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: check iov vector index in ksmbd_conn_write()
  ksmbd: return invalid parameter error response if smb2 request is invalid
2023-09-29 16:51:38 -07:00
Linus Torvalds
14c06b913d A series that fixes an involved "double watch error" deadlock in RBD
marked for stable and two cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmUW5QwTHGlkcnlvbW92
 QGdtYWlsLmNvbQAKCRBKf944AhHziw7+B/99D/BRIJUaCz8hm2xMZC3Yu6Cvi2de
 YlgHZBuHm5lmihzITEdoHTmWlIpgGchqjTaikCvVooKEe1w4sNr7nYFiXUFVw9sf
 W/I06dtlJlj2f4oyK91i4sIzpQKbXZznFDpTHThjRJt+uyUp3RYVbrCDMmGAnJv3
 foppstycm5fe2Y2e/RgNyYOHY+EAjvS5UrpvT3lAX+iw5KXR1pyMBrFo8iICLPlZ
 TIPP4mwlVwb/WB1rGnxaK65RJxGuXLwuMWdLF9kq1ZeCld6owPH3x2RDax2+vMvA
 bZxI6gQymU2SquKiNseYF3kQ+2KdC5mfjmkoncOH79Je4JPHf7xa4LYZ
 =7/Ez
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-6.6-rc4' of https://github.com/ceph/ceph-client

Pull ceph fixes from Ilya Dryomov:
 "A series that fixes an involved 'double watch error' deadlock in RBD
  marked for stable and two cleanups"

* tag 'ceph-for-6.6-rc4' of https://github.com/ceph/ceph-client:
  rbd: take header_rwsem in rbd_dev_refresh() only when updating
  rbd: decouple parent info read-in from updating rbd_dev
  rbd: decouple header read-in from updating rbd_dev->header
  rbd: move rbd_dev_refresh() definition
  Revert "ceph: make members in struct ceph_mds_request_args_ext a union"
  ceph: remove unnecessary check for NULL in parse_longname()
2023-09-29 16:46:24 -07:00
Linus Torvalds
10c0b6ba25 Bug fixes for 6.6-rc4:
* Include modifications made to commit "xfs: load uncached unlinked inodes
   into memory on demand" (Commit ID: 68b957f64f)
   which address review comments provided by Dave Chinner.
 
 Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQjMC4mbgVeU7MxEIYH7y4RirJu9AUCZRPtowAKCRAH7y4RirJu
 9EcNAQDnuVtf89FL0Qqqtho5TeK2UO4JhEcTWI4Wj1d9w7h4lAEA5ZTYu8oJDg0k
 zoTXgr9sbpzcf53fgY0hwqPVjdV8dwU=
 =WkWe
 -----END PGP SIGNATURE-----

Merge tag 'xfs-6.6-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fix from Chandan Babu:

 - fix for commit 68b957f64f ("xfs: load uncached unlinked inodes into
   memory on demand") which address review comments provided by Dave
   Chinner

* tag 'xfs-6.6-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: fix reloading entire unlinked bucket lists
2023-09-29 16:41:25 -07:00
Uwe Kleine-König
b0b88a585c MAINTAINERS: Fix Florian Fainelli's email address
Commit 31345a0f59 ("MAINTAINERS: Replace my email address") added 13
instances of ...@broadcom.com and one of only ...@broadcom. I didn't
double check if Broadcom really owns that TLD, but git send-email
doesn't accept it, so add ".com" to that one bogous(?) instance.

Fixes: 31345a0f59 ("MAINTAINERS: Replace my email address")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-09-29 17:49:18 -04:00
Linus Torvalds
95289e49f0 ATA fixes for 6.6-rc4
A larger than usual set of fixes for 6.6-rc4 due to the unexpected
 number of fixes needed to address ATA disks suspend/resume issues.
 In more details:
 
  - Add missing additionalProperties on child nodes to the pata-common DT
    bindings (Rob).
 
  - Fix handling of the REPORT SUPPORTED OPERATION CODES command to
    ignore reserved bits (Niklas).
 
  - Increase port multiplier soft reset timeout to accomodate slow
    devices and avoid issues on wakeup (Matthias).
 
  - A couple of minor code fixes to avoid compilation warnings in
    libata-core and libata-eh (me).
 
  - Many patches from me to address suspend/resume issues, and in
    particular a potential deadlock on resume due to the SCSI disk driver
    resume operation not being synchronized with libata EH port resume
    handling.  This is addressed by changing the scsi disk driver disk
    start/stop control to allow libata to execute disk suspend (spin
    down) and resume (spin up) on its own during system suspend/resume.
    Runtime suspend/resume control remains with the SCSI disk driver.
    Other fixes include:
     - Fix libata power management request issuing to avoid races.
     - Establish a link between ATA ports and SCSI devices to order PM
       operations.
     - Fix device removal to avoid issues with driver rmmod removal.
     - Fix synchronization of libata device rescan and SCSI disk resume
       operation.
     - Remove libsas PM operations as suspend/resume is handled directly
       by the sas controller resume.
     - Fix the SCSI disk driver to not issue commands to suspended disks,
       thus avoiding potential system lock-up on resume.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSRPv8tYSvhwAzJdzjdoc3SxdoYdgUCZRbR0gAKCRDdoc3SxdoY
 dkArAP9PFTRgsXEwfE7arBXCwQqXj/W0R2KgKug7Fno+SoQLnAD/ZKe2TR50uwxr
 9mwYROdMgi50T9ax1RX1jWA0npGXmQg=
 =cFzG
 -----END PGP SIGNATURE-----

Merge tag 'ata-6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata

Pull ATA fixes from Damien Le Moal:
 "A larger than usual set of fixes for 6.6-rc4 due to the unexpected
  number of fixes needed to address ATA disks suspend/resume issues.

  In more detail:

   - Add missing additionalProperties on child nodes to the pata-common
     DT bindings (Rob)

   - Fix handling of the REPORT SUPPORTED OPERATION CODES command to
     ignore reserved bits (Niklas)

   - Increase port multiplier soft reset timeout to accomodate slow
     devices and avoid issues on wakeup (Matthias)

   - A couple of minor code fixes to avoid compilation warnings in
     libata-core and libata-eh (me)

   - Many patches from me to address suspend/resume issues, and in
     particular a potential deadlock on resume due to the SCSI disk
     driver resume operation not being synchronized with libata EH port
     resume handling.

     This is addressed by changing the scsi disk driver disk start/stop
     control to allow libata to execute disk suspend (spin down) and
     resume (spin up) on its own during system suspend/resume. Runtime
     suspend/resume control remains with the SCSI disk driver.

     Other fixes include:
      - Fix libata power management request issuing to avoid races
      - Establish a link between ATA ports and SCSI devices to order PM
        operations
      - Fix device removal to avoid issues with driver rmmod removal
      - Fix synchronization of libata device rescan and SCSI disk resume
        operation
      - Remove libsas PM operations as suspend/resume is handled
        directly by the sas controller resume
      - Fix the SCSI disk driver to not issue commands to suspended
        disks, thus avoiding potential system lock-up on resume"

* tag 'ata-6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
  ata: libata-eh: Fix compilation warning in ata_eh_link_report()
  ata: libata-core: Fix compilation warning in ata_dev_config_ncq()
  scsi: sd: Do not issue commands to suspended disks on shutdown
  ata: libata-core: Do not register PM operations for SAS ports
  ata: libata-scsi: Fix delayed scsi_rescan_device() execution
  scsi: Do not attempt to rescan suspended devices
  ata: libata-scsi: Disable scsi device manage_system_start_stop
  scsi: sd: Differentiate system and runtime start/stop management
  ata: libata-scsi: link ata port and scsi device
  ata: libata-core: Fix port and device removal
  ata: libata-core: Fix ata_port_request_pm() locking
  ata: libata-sata: increase PMP SRST timeout to 10s
  ata: libata-scsi: ignore reserved bits for REPORT SUPPORTED OPERATION CODES
  dt-bindings: ata: pata-common: Add missing additionalProperties on child nodes
2023-09-29 13:38:34 -07:00
Linus Torvalds
eafdc50713 block-6.6-2023-09-28
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmUWZsIQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgprWHEACArb3g5WR/KhSspTETNfEM4TB96kwhx8co
 srFatZYbOh/DxA4eVOJH1EJK4G34roCtugg5yRhFVQicMAkhcUqPRIAkmO5YGxDW
 tcjbppzmg8b8n3dL6hevgw+EJbg6l9fjmi5GU2yjmWIeZ4H11zieMCcfStidh9iD
 efdtFAfTBf6kvo5E4f0VFzJ11a8gI4gz7HGGD2dRPaGgZtWo2K6xJxKoc0x5b8zz
 9Y2oN1+iAEoQCgoXfximDVgLo45laLVYaD1yyRc6bnkZVJEhN1iiOCob3cNyGUz+
 30daI5VUzcZBI8HWNBBeR/YmxW2pohPWz/UCwLlE3CoA+n4FRZkY2FzOaApgmKuw
 2iGDRaGj5Nuq2ODf6jtjxwMqbp3noyJv768Cg78KDE3VSWCYtftsXqB+XURM1zCK
 +HeYa4MN+2fzC4B7X37vlnA5mviztl1tAzWN2vuia2CStBr+R2QTW/YwOA06bG7m
 rTQZQIw2+OHfdHkbpF2t6+7CbmLviNbrlm2qNUa2WyCbkNdJGghXOnlBTi+CPHUo
 uaj1BSP3mAYBPnj0uImyLIUxlpYH9r8/V9ZrY3ZYV+wT6vl7UxyHQhRBN5Au9rMG
 qJpxScceIKeQjr+gYeqlX3mgQJHKdlt5xdXcCT0PKt9sCjzd/ZSIEoMULzflFK6A
 t6oyb1NWgw==
 =tdkz
 -----END PGP SIGNATURE-----

Merge tag 'block-6.6-2023-09-28' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:
 "Just two minor comment / documentation fixes for the block side"

* tag 'block-6.6-2023-09-28' of git://git.kernel.dk/linux:
  block: fix kernel-doc for disk_force_media_change()
  block: correct stale comment in rq_qos_wait
2023-09-29 13:28:49 -07:00
Linus Torvalds
a98b95959b io_uring-6.6-2023-09-28
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmUWk/gQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpiq5D/0ZVXb6LjbpX3FTmKdWk/Z8C6XGUxGN2XA3
 5QEjhGIfM8Cu35ZHPW2av6ffZk/9Nmpdu00Y+s22mirJ5qUX57AsAyk3E6eweDha
 yiWgo9lEC+EX7f6xDbnFgqxXD2y/pJBJ/ZjUifn/MklAR1j2mthaU7fKpXAb+unB
 yecbfk8zUNd6OUpXvAqWhclmaserR94j2MBNGheTTEEkjApOY6RcdjCcDPJOjyYq
 TDe5BpqsqryCWBSE7605wVnscxpCUBbnYxctXQej8WaBZVSGoJWTFdYeH4DMZdO5
 EHnM9sMw5zRv/2mAh/jukI41SZHI/YqJ0WBLrzbt0e3KNOiWx4QmWwmm/SGVyqy8
 0mB5zqWZRVtD2Cv9V9lxb7HKxOBs8vEPtAHz2JGvbjBQgHDDc3nVEToHysKrWOhO
 wfIXmk+/ZTNkuNL1dKSFL47f1BuwmdWXimq6l1ux1v9Lhe4TnPuIwrtUcKWIhuLf
 ZawJWQwL6NQBa7cGNQOm4Y6LrIHpFQtx9CnmcRNr1CQ+IercRllf5xAqNu4y44aj
 M5p/EjdNgAiaW9dufusmWiySLE5eeGlVu0IXUa0M6nqaGV2KvBqbx21CUg2QOYXJ
 LKrISnxCiEEGxDQaS/qTMCn2seyF9cAExENfpaNYzejO9CzoFQ3KpY0+lmr5zcMN
 NQS+itqbNw==
 =fAm6
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-6.6-2023-09-28' of git://git.kernel.dk/linux

Pull io_uring fix from Jens Axboe:
 "A single fix going to stable for the IORING_OP_LINKAT flag handling"

* tag 'io_uring-6.6-2023-09-28' of git://git.kernel.dk/linux:
  io_uring/fs: remove sqe->rw_flags checking from LINKAT
2023-09-29 12:56:34 -07:00
Linus Torvalds
1c84724ccb slab fixes for 6.6-rc4
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEe7vIQRWZI0iWSE3xu+CwddJFiJoFAmUWfrsACgkQu+CwddJF
 iJo6+QgAnn3klZX5wOfH93tdlOz2TNy8QVSmNuITDKThLJg9r8YkQJdp6NYHR0Rc
 vrbZ2pMqF/LQ/LW49uZahQwVi7811psfU3PqbSC3CRtUYq0RUMu5PaeItvRp4S5n
 2zYiWVSNGfSmG4jQm2L2nMjDRK8m3oLKwuxKejv3UQLDZ5U1Fh36k75lZK1PERmu
 +cBQATtncj4N1rF0eY8mif3ctqqkVqz79t/nU/FCBx0+v3s4wTzYB1y8l5FEH2cM
 iU4A4jsZe147DxHadUQF2ahnj6oaOacgtg846WN5P73BjiRhdrJaTS8HSeAS/RIo
 e/PpbLzOFp4Rz+2u1Me7nFK64qFjyw==
 =+WB7
 -----END PGP SIGNATURE-----

Merge tag 'slab-fixes-for-6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab

Pull slab fixes from Vlastimil Babka:

 - stable fix to prevent list corruption when destroying caches with
   leftover objects (Rafael Aquini)

 - fix for a gotcha in kmalloc_size_roundup() when calling it with too
   high size, discovered when recently a networking call site had to be
   fixed for a different issue (David Laight)

* tag 'slab-fixes-for-6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
  slab: kmalloc_size_roundup() must not return 0 for non-zero size
  mm/slab_common: fix slab_caches list corruption after kmem_cache_destroy()
2023-09-29 12:10:12 -07:00
Linus Torvalds
6edc84bc3f drm fixes for 6.6-rc4
MAINTAINERS:
 - add Danilo for nouveau
 
 ivpu:
 - Add PCI ids for Arrow Lake
 - Fix memory corruption during IPC
 - Avoid dmesg flooding
 - 40xx: Wait for clock resource
 - 40xx: Fix interrupt usage
 - 40xx: Support caching when loading firmware
 
 i915:
 - Fix a panic regression on gen8_ggtt_insert_entries
 - Fix load issue due to reservation address in ggtt_reserve_guc_top
 - Fix a possible deadlock with guc busyness worker
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmUWK6IACgkQDHTzWXnE
 hr5P2g/+PmspwKY7dqEy+SP8U3RCk4qD4+r0L8q//hytq8Ur1MIwXwNTRavOXR2M
 KPSCTHzYqQwxVi6J23BgLhW731UK3sLsxdTlxYL+dANEt5R4EXWkh4Ca55wQ4yVE
 I504J6uwIwd9mkFjqC5Xb1U4OYXuTK345HS1vcybMp2ryrM3F8r59ThXwYF5aoWt
 3QePmshb7QLwIdtV97nlsyssqzsDWkoWyqwPySfxtx5aA3i5NBUW8NVC623+iEw7
 FFFtfV8TJ3vOLHcDAfG/y/fhHh/osU7gF8Ra8g1Pcvp1cBALXy5dn2XAavckuBpZ
 wdoijySix11c+Gp6j3YJkWtB55hJnCJs3xoM/x/X9TtSbiqVFcZdbzV15HWdvcUN
 4shaVualmTcbG/TdiGtswWSmHgUhNKo1KGOjL7+RYjntb8EPfypsa3w+blx6u1Bn
 mzTNwQ2iI5BbaOEXkR+pnoT34GU+VZ6+Lh2U1dhVJv2YzZG+MaxZXOLg2Yeff1B0
 9Cf6oq77xM5b0eAv7b9St7MRUpCZZztnN8p+qiRu5kisqttApqzbewCuPQsY/eHb
 wPZR2kKP3YEuU/or8RNEqXu5lChna0MjcDS7vvY/npJVwTrb+IGwupvrMHjIg5/N
 I5TbvyJ5Xvt/sL2UO/90llIY39G1ziHFuqrdL1j3/zbqtG8OQKE=
 =7M9M
 -----END PGP SIGNATURE-----

Merge tag 'drm-fixes-2023-09-29' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
 "Regular pull, this feel suspiciously light so I expect next week might
  be a bit heavier? Let's see how we go. This is from a code point of
  view ivpu and i915 fixes.

  The only other patch is adding Danilo Krummrich to the nouveau
  maintainers, he's agreed to take on more of the roll after Ben
  retired.

  MAINTAINERS:
   - add Danilo for nouveau

  ivpu:
   - Add PCI ids for Arrow Lake
   - Fix memory corruption during IPC
   - Avoid dmesg flooding
   - 40xx: Wait for clock resource
   - 40xx: Fix interrupt usage
   - 40xx: Support caching when loading firmware

  i915:
   - Fix a panic regression on gen8_ggtt_insert_entries
   - Fix load issue due to reservation address in ggtt_reserve_guc_top
   - Fix a possible deadlock with guc busyness worker"

* tag 'drm-fixes-2023-09-29' of git://anongit.freedesktop.org/drm/drm:
  accel/ivpu: Use cached buffers for FW loading
  accel/ivpu/40xx: Fix missing VPUIP interrupts
  accel/ivpu/40xx: Disable frequency change interrupt
  accel/ivpu/40xx: Ensure clock resource ownership Ack before Power-Up
  accel/ivpu: Don't flood dmesg with VPU ready message
  accel/ivpu: Do not use wait event interruptible
  MAINTAINERS: update nouveau maintainers
  i915/guc: Get runtime pm in busyness worker only if already active
  drm/i915/gt: Fix reservation address in ggtt_reserve_guc_top
  i915: Limit the length of an sg list to the requested length
  accel/ivpu: Add Arrow Lake pci id
2023-09-29 10:24:49 -07:00
Linus Torvalds
71e58659bf gpio fixes for v6.6-rc4
- fix a potential spinlock deadlock in gpio-timberdale
 - mark the gpio-pmic-eic-sprd driver as one that can sleep
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEFp3rbAvDxGAT0sefEacuoBRx13IFAmUWwucACgkQEacuoBRx
 13KGrQ//Upnx6vEZ1STRwSjffItbwLPYxMXSx+7FSYg35cEPaL/zdbTXQ3QSfz16
 IS2zQfYH896Npt3MUQGhw/cJLB83/BUhqkyB3ayvu/TxNSgGAIdeeSXBWRQEDpSW
 B1llHt0ZCX/ppoIdZcUyBTVVEX5Q6Hc/wA3tRJUQ5R65W62O/uy73ZLxBUC2tP0Z
 f1Fjg5x/r89Osl06bwa1LgCoBcj286X6+SekoaICw+8yilj2cC1t9PniRHiU/loC
 l3sBYPsDj4Is8sGM+cPRNsSF8ilQ7WSy6MwROvCLsW+je0/APAWeA4v2AJxs62Lp
 JHhMryfl0VvY4YgV4WVonZxfAcPCgzYfstbDJuwn4xW1RJKP6Dk/7YWgnWOWsEsu
 yo8OYKGxbpekqx5U2rcbEK7KjAsB4oyNXuzNDQtoxPeZgmCZOawcXL3F9ezMUwY2
 TOuxU5RYVcCqWnJbu+n3D3TrxJSuFcw7zSovPL0A5gqj2eOdm7r7aeNeZ97USOe7
 ZhubzWV/20HYX/+z0pg4woRK02lANbFzPU7BTSw3/JVDlOZRChV1oo/RNSGNLv4k
 ccuMVz0sQ/1Jof06R7WLjcOe5AsOjWxTYh5zIJ+t7wHlwh1Lli7ojkE1fMp5ZhrF
 ZDY3rkf7xJeVlG7eWL/O9jmvmajYgzZ+nzJbrBv8k0ulcM9IB2I=
 =kFxF
 -----END PGP SIGNATURE-----

Merge tag 'gpio-fixes-for-v6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull gpio fixes from Bartosz Golaszewski:

 - fix a potential spinlock deadlock in gpio-timberdale

 - mark the gpio-pmic-eic-sprd driver as one that can sleep

* tag 'gpio-fixes-for-v6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpio: pmic-eic-sprd: Add can_sleep flag for PMIC EIC chip
  gpio: timberdale: Fix potential deadlock on &tgpio->lock
2023-09-29 09:25:23 -07:00
Linus Torvalds
acfdcaeed6 A bunch of clk driver fixes for issues found recently.
- Fix the binding for versaclock3 that was introduced this merge window
    so we know what the values are for clk consumers
  - Fix a 64-bit division issue in the versaclock3 driver
  - Avoid breakage in the versaclock3 driver by rejiggering the enums
    used to layout clks
  - Fix the parent name of a clk in the Spreadtrum ums512 clk driver
  - Fix a suspend/resume issue in Skyworks Si521xx clk driver where
    regmap restoration fails because writes are wedged
  - Return zero from Tegra bpmp recalc_rate() implementation when an
    error occurs so we don't consider an error as a large rate
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEE9L57QeeUxqYDyoaDrQKIl8bklSUFAmUWFIYRHHNib3lkQGtl
 cm5lbC5vcmcACgkQrQKIl8bklSWadBAAtcYh03OyGV8qnD6Uqx0SZuFYclXVTk6L
 muRUqQ7DwDNO3R5El8lHxpMF4fclBUwiZ8eFlsdabK5xyhVZOnPJYt+DF2n97cMk
 XxBg8gnc1JL/mETLUZ2uJ1aXOruDbFhb7QE3cYEkiCZXM9X5zw9yGXd6xG4U7f05
 MA81NoidaKPoakA2P9uewrrFEBpLGzSDj8AgQGUiKUX8NKDeaxkm9sDzW1HHlNOz
 n3MRqiR46cmVXZhud+rpyqtYXQZY7ullWkvJs+P6qmVTCgtHsgKT4a4l8kkiQk4Q
 afJFM2Z2ygH93LSxRjjL7RF7SyeeO4x6fVkTAa7/IBZSznH1uefZcq+kfT+qP226
 AMCQ2NhYspucuzP1p/i9ZSwbbaD+u1AmBllENiG2XEBbFErxdcJw36rnmOcDJELm
 f9TbX+K8mBvKuQyqX/0CQP8FESSg+7XgTRYuAVuM6aa206o+DhMBhm6S7vaW+SBR
 uQaR1b69Kc8ti+qG6f2pyuMpJNgOVbtHnGigELP6MH8NdjfpbOQI1/6PZWORoGfI
 VX7M3uG6uS5fp8DFDnMLw3nPEiidZ6KHMIrk71MbZ3eYCedfe3/11tMUxF30+85m
 FXE9jF5I6yN+fMdVurwn3qJk5LZrSeRMiSeCFBOTzLHrFL0RGXoa/kGPzvEDxchL
 R8MmTwtJfD4=
 =RdLn
 -----END PGP SIGNATURE-----

Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk fixes from Stephen Boyd:
 "A bunch of clk driver fixes for issues found recently:

   - Fix the binding for versaclock3 that was introduced this merge
     window so we know what the values are for clk consumers

   - Fix a 64-bit division issue in the versaclock3 driver

   - Avoid breakage in the versaclock3 driver by rejiggering the enums
     used to layout clks

   - Fix the parent name of a clk in the Spreadtrum ums512 clk driver

   - Fix a suspend/resume issue in Skyworks Si521xx clk driver where
     regmap restoration fails because writes are wedged

   - Return zero from Tegra bpmp recalc_rate() implementation when an
     error occurs so we don't consider an error as a large rate"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: tegra: fix error return case for recalc_rate
  clk: si521xx: Fix regmap write accessor
  clk: si521xx: Use REGCACHE_FLAT instead of NONE
  clk: sprd: Fix thm_parents incorrect configuration
  clk: vc3: Make vc3_clk_mux enum values based on vc3_clk enum values
  clk: vc3: Fix output clock mapping
  clk: vc3: Fix 64 by 64 division
  dt-bindings: clock: versaclock3: Add description for #clock-cells property
2023-09-29 08:55:21 -07:00
Linus Torvalds
94b7ed384f Power Supply Fixes for 6.6 cycle
* core: fix use after free during device release
  * ab8500: avoid reporting multiple batteries to userspace
  * rk817: fix DT node resource leak
  * misc. small fixes, mostly for compiler warnings/errors
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE72YNB0Y/i3JqeVQT2O7X88g7+poFAmUWCd4ACgkQ2O7X88g7
 +prYdQ/9HyOx8cTOXEBtS9KaIXWSx1L14QMpNoToVPR9Gleg/EzSqW/DIh+m2U1a
 ZvIvvi86Yt+IY4JqQHEbdLgySahEwBLGqxPJaFjbWs5NVeQEIsSYfjwML+IrIrW+
 pDYQ+SUtJEGWxFdzaoLFlkaV0YZG6rzf0L6/Nk2gPdu9ODqKIF5kOFjmD5N43w2v
 spO/VZ+5ndYc6jxQZ8xHiO4NFxyLKVZ//CnVi+Axe/xZdw1V6+bC0FtLYT3V8y/S
 hUambJecvVR2qAIRg9nUdN6Z1YR8vjCaUmWpKS3NM6y0sq4XhW5VMLv8ezGvwd3M
 duhfHN6gZbL2vByIZ//E38qLmDGKYvWzFW12vkQf35/bttEh1ft7h70PgQg4iHl3
 lQu8iKsdR/VxrWGG+Z4eJ1YuZ55fKu+yvLkTltADJI73Wns4qhDWziLHoRa/RwCt
 0jKbeuPDVrVMhhTfVtBVdxCONkJJupQMCIp1Bl9JAadHR88TiUpYSuxQ8bIyicV/
 a1N8oeQTMCsV2G/wmu57/zyEzwX3azPtYPoX8KD3FLV3O8gQdMIn4FhjR6LxD/kB
 +3RqhHlQTlgpTuBtTKmpW6rKEjXNq+5/IFaDh1FKEYh6/sB9Rw1MmzqTM0+0CfHm
 cRLVYajws8VuI7MIU98kreu6X4P7SODHFyALpJ9mL6ywOlDWcrQ=
 =TAvZ
 -----END PGP SIGNATURE-----

Merge tag 'for-v6.6-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply

Pull power supply fixes from Sebastian Reichel:

 - core: fix use after free during device release

 - ab8500: avoid reporting multiple batteries to userspace

 - rk817: fix DT node resource leak

 - misc. small fixes, mostly for compiler warnings/errors

* tag 'for-v6.6-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
  power: supply: rk817: Fix node refcount leak
  power: supply: core: fix use after free in uevent
  power: supply: rt9467: Fix rt9467_run_aicl()
  power: supply: rk817: Add missing module alias
  power: supply: ucs1002: fix error code in ucs1002_get_property()
  power: vexpress: fix -Wvoid-pointer-to-enum-cast warning
  power: reset: use capital "OR" for multiple licenses in SPDX
  pwr-mlxbf: extend Kconfig to include gpio-mlxbf3 dependency
  power: supply: rt5033_charger: recognize EXTCON setting
  power: supply: mt6370: Fix missing error code in mt6370_chg_toggle_cfo()
  power: supply: ab8500: Set typing and props
2023-09-29 08:51:57 -07:00
Linus Torvalds
b02afe1df5 Xtensa fixes for v6.6:
- fix build warnings from builds performed with W=1
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEK2eFS5jlMn3N6xfYUfnMkfg/oEQFAmUVydMTHGpjbXZia2Jj
 QGdtYWlsLmNvbQAKCRBR+cyR+D+gRDfvEACQNZReIdcMLCJidM35z2E+Fc/CsuKI
 sn1KtFmHm3Sq4JxskDc2f1zlK4xpt4bZfBe0YZEaboeBqxSuqmYWsBmrrRq5TOoA
 4MitI/yVK6A3WgRVLWLm6OAYVOuyr3WITiLs0fpOoMJP5DZP/r3vzRYMLoXuWe46
 ajIrBEYfR5Ul3F16veAwGWhGob3ZO2uqyTGdvVWY8GwKp1pYbATwH6VaYWbCNb4N
 mzOJNmy5ABcWPocj/owmu6Cp5UcMgxKf1v/1qMU3oF9biORHZM0okVmPOdsyKQiH
 4azeHJp0Z9sbr8NqNYJ5f5XLH9+T1q1AB+IgUpMLMlpwhUq9Fc92bVylx2rPuzOH
 Pzq3ccOHTBjpfpq7Jewtl1pwW25CatugzAUDYlYHx2o6gpSMgDEUlUUIuLKN4wiO
 /FyLRkdNnvjqsdH30In8f0TpfsWUv+XqS+eWt4KPjOVBEYOkGXHZWIxmMr9/0nwm
 sIgktKLsC7miT+BRdMrlnI9OdR5oKOPBLWkzlJSCVNrUzWYtMj/Bo+7Bt8hKbpoP
 0yAoUIjqL1/MqT8BkBDr1ovFurAK+42f5rHhRQjWDxrB4jcIzUdqm/94gwAnXdZt
 XCGM0H7dQVd/hEP+Y2YpBc8k5U58J9CAtSTeVT/mgt2+hBckm9WGrvn0vnAbmM/Q
 YgChoK+Mjz3G9g==
 =+MGs
 -----END PGP SIGNATURE-----

Merge tag 'xtensa-20230928' of https://github.com/jcmvbkbc/linux-xtensa

Pull Xtensa fixes from Max Filippov:

 - fix build warnings from builds performed with W=1

* tag 'xtensa-20230928' of https://github.com/jcmvbkbc/linux-xtensa:
  xtensa: boot/lib: fix function prototypes
  xtensa: umulsidi3: fix conditional expression
  xtensa: boot: don't add include-dirs
  xtensa: iss/network: make functions static
  xtensa: tlb: include <asm/tlb.h> for missing prototype
  xtensa: hw_breakpoint: include header for missing prototype
  xtensa: smp: add headers for missing function prototypes
  irqchip: irq-xtensa-mx: include header for missing prototype
  xtensa: traps: add <linux/cpu.h> for function prototype
  xtensa: stacktrace: include <asm/ftrace.h> for prototype
  xtensa: signal: include headers for function prototypes
  xtensa: processor.h: add init_arch() prototype
  xtensa: ptrace: add prototypes to <asm/ptrace.h>
  xtensa: irq: include <asm/traps.h>
  xtensa: fault: include <asm/traps.h>
  xtensa: add default definition for XCHAL_HAVE_DIV32
2023-09-29 08:41:56 -07:00
Jens Axboe
a52d4f6575 io_uring/fs: remove sqe->rw_flags checking from LINKAT
This is unionized with the actual link flags, so they can of course be
set and they will be evaluated further down. If not we fail any LINKAT
that has to set option flags.

Fixes: cf30da90bc ("io_uring: add support for IORING_OP_LINKAT")
Cc: stable@vger.kernel.org
Reported-by: Thomas Leonard <talex5@gmail.com>
Link: https://github.com/axboe/liburing/issues/955
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-09-29 03:07:09 -06:00
Qais Yousef
15874a3d27 sched/debug: Add new tracepoint to track compute energy computation
It was useful to track feec() placement decision and debug the spare
capacity and optimization issues vs uclamp_max.

Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230916232955.2099394-4-qyousef@layalina.io
2023-09-29 10:29:18 +02:00
Qais Yousef
23c9519def sched/uclamp: Ignore (util == 0) optimization in feec() when p_util_max = 0
find_energy_efficient_cpu() bails out early if effective util of the
task is 0 as the delta at this point will be zero and there's nothing
for EAS to do. When uclamp is being used, this could lead to wrong
decisions when uclamp_max is set to 0. In this case the task is capped
to performance point 0, but it is actually running and consuming energy
and we can benefit from EAS energy calculations.

Rework the condition so that it bails out when both util and uclamp_min
are 0.

We can do that without needing to use uclamp_task_util(); remove it.

Fixes: d81304bc61 ("sched/uclamp: Cater for uclamp in find_energy_efficient_cpu()'s early exit condition")
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230916232955.2099394-3-qyousef@layalina.io
2023-09-29 10:29:14 +02:00
Qais Yousef
6b00a40147 sched/uclamp: Set max_spare_cap_cpu even if max_spare_cap is 0
When uclamp_max is being used, the util of the task could be higher than
the spare capacity of the CPU, but due to uclamp_max value we force-fit
it there.

The way the condition for checking for max_spare_cap in
find_energy_efficient_cpu() was constructed; it ignored any CPU that has
its spare_cap less than or _equal_ to max_spare_cap. Since we initialize
max_spare_cap to 0; this lead to never setting max_spare_cap_cpu and
hence ending up never performing compute_energy() for this cluster and
missing an opportunity for a better energy efficient placement to honour
uclamp_max setting.

	max_spare_cap = 0;
	cpu_cap = capacity_of(cpu) - cpu_util(p);  // 0 if cpu_util(p) is high

	...

	util_fits_cpu(...);		// will return true if uclamp_max forces it to fit

	...

	// this logic will fail to update max_spare_cap_cpu if cpu_cap is 0
	if (cpu_cap > max_spare_cap) {
		max_spare_cap = cpu_cap;
		max_spare_cap_cpu = cpu;
	}

prev_spare_cap suffers from a similar problem.

Fix the logic by converting the variables into long and treating -1
value as 'not populated' instead of 0 which is a viable and correct
spare capacity value. We need to be careful signed comparison is used
when comparing with cpu_cap in one of the conditions.

Fixes: 1d42509e47 ("sched/fair: Make EAS wakeup placement consider uclamp restrictions")
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230916232955.2099394-2-qyousef@layalina.io
2023-09-29 10:29:14 +02:00
Valentin Schneider
5fe7765997 sched/deadline: Make dl_rq->pushable_dl_tasks update drive dl_rq->overloaded
dl_rq->dl_nr_migratory is increased whenever a DL entity is enqueued and it has
nr_cpus_allowed > 1. Unlike the pushable_dl_tasks tree, dl_rq->dl_nr_migratory
includes a dl_rq's current task. This means a dl_rq can have a migratable
current, N non-migratable queued tasks, and be flagged as overloaded and have
its CPU set in the dlo_mask, despite having an empty pushable_tasks tree.

Make an dl_rq's overload logic be driven by {enqueue,dequeue}_pushable_dl_task(),
in other words make DL RQs only be flagged as overloaded if they have at
least one runnable-but-not-current migratable task.

 o push_dl_task() is unaffected, as it is a no-op if there are no pushable
   tasks.

 o pull_dl_task() now no longer scans runqueues whose sole migratable task is
   their current one, which it can't do anything about anyway.
   It may also now pull tasks to a DL RQ with dl_nr_running > 1 if only its
   current task is migratable.

Since dl_rq->dl_nr_migratory becomes unused, remove it.

RT had the exact same mechanism (rt_rq->rt_nr_migratory) which was dropped
in favour of relying on rt_rq->pushable_tasks, see:

  612f769edd ("sched/rt: Make rt_rq->pushable_tasks updates drive rto_mask")

Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lore.kernel.org/r/20230928150251.463109-1-vschneid@redhat.com
2023-09-29 10:20:21 +02:00
Dave Airlie
06365a04fd - Fix a panic regression on gen8_ggtt_insert_entries (Matthew Wilcox)
- Fix load issue due to reservation address in ggtt_reserve_guc_top (Javier Pello)
 - Fix a possible deadlock with guc busyness worker (Umesh)
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEbSBwaO7dZQkcLOKj+mJfZA7rE8oFAmUVjBMACgkQ+mJfZA7r
 E8plzQf/dSxOvhyDvh/WDqT+Vk3aIxoypo7bBHrLyOzbYhdALCSBR70FijRS8OuK
 th15AYqUpa+Dqhl8RCTSGX+4aAeudN6pHzwEYMZtF8hwb7DnomcB+ztB853DdUMu
 U0NLi5Rc3d138oepTlHuwtKUJzpEqbjZKXUOfLx+GvFKdiM2p8js3LTF1XhAX9Sf
 u05xzi2tKjLJbnoxGbUOe9Np3Bqg2ril/HDVm0c7Yc+t98Sly6ZJljV0IA0fexqF
 ZFX+Bb4f8YFfqSuih3z1K75GpzEQiT6g2z9qOGWB64qFtAyV/ehpSjKP6fPf10FY
 G5I/iWRRf3dW6P9/4x/3W5wF+eZWoA==
 =xIyA
 -----END PGP SIGNATURE-----

Merge tag 'drm-intel-fixes-2023-09-28' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

- Fix a panic regression on gen8_ggtt_insert_entries (Matthew Wilcox)
- Fix load issue due to reservation address in ggtt_reserve_guc_top (Javier Pello)
- Fix a possible deadlock with guc busyness worker (Umesh)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZRWMI1HmUYPGGylp@intel.com
2023-09-29 10:28:21 +10:00
Haitao Huang
c6c2adcba5 x86/sgx: Resolves SECS reclaim vs. page fault for EAUG race
The SGX EPC reclaimer (ksgxd) may reclaim the SECS EPC page for an
enclave and set secs.epc_page to NULL. The SECS page is used for EAUG
and ELDU in the SGX page fault handler. However, the NULL check for
secs.epc_page is only done for ELDU, not EAUG before being used.

Fix this by doing the same NULL check and reloading of the SECS page as
needed for both EAUG and ELDU.

The SECS page holds global enclave metadata. It can only be reclaimed
when there are no other enclave pages remaining. At that point,
virtually nothing can be done with the enclave until the SECS page is
paged back in.

An enclave can not run nor generate page faults without a resident SECS
page. But it is still possible for a #PF for a non-SECS page to race
with paging out the SECS page: when the last resident non-SECS page A
triggers a #PF in a non-resident page B, and then page A and the SECS
both are paged out before the #PF on B is handled.

Hitting this bug requires that race triggered with a #PF for EAUG.
Following is a trace when it happens.

BUG: kernel NULL pointer dereference, address: 0000000000000000
RIP: 0010:sgx_encl_eaug_page+0xc7/0x210
Call Trace:
 ? __kmem_cache_alloc_node+0x16a/0x440
 ? xa_load+0x6e/0xa0
 sgx_vma_fault+0x119/0x230
 __do_fault+0x36/0x140
 do_fault+0x12f/0x400
 __handle_mm_fault+0x728/0x1110
 handle_mm_fault+0x105/0x310
 do_user_addr_fault+0x1ee/0x750
 ? __this_cpu_preempt_check+0x13/0x20
 exc_page_fault+0x76/0x180
 asm_exc_page_fault+0x27/0x30

Fixes: 5a90d2c3f5 ("x86/sgx: Support adding of pages to an initialized enclave")
Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Acked-by: Reinette Chatre <reinette.chatre@intel.com>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/20230728051024.33063-1-haitao.huang%40linux.intel.com
2023-09-28 16:16:40 -07:00
Dave Airlie
8c4a5e8936 Short summary of fixes pull:
* ivpu:
    * Add PCI ids for Arrow Lake
    * Fix memory corruption during IPC
    * Avoid dmesg flooding
    * 40xx: Wait for clock resource
    * 40xx: Fix interrupt usage
    * 40xx: Support caching when loading firmware
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAmUVNKwACgkQaA3BHVML
 eiPl1wgAs8Bhk7B0DOZyQdwbpNcfesvXtp/8CDpByXp95GwDinrmeV5+PQBD8rip
 kiozkp8OKXdVwI2c3RlrEzGrGE7kUMqzLsKjTEYRZHWWec9yphn/aFtFC+cE/5oL
 Ty6oh/1umphiTD549arv8EE/fpJY5ilJrpFl13u83qtvka3W2YyU///e1bSplT6X
 s4s1K3ZF9dw6HJD3mPatZdKcSD7xNHKfUrQGQEL/5Ow1d6cPylwE/x40bsII2+x/
 EMMcwm9nZJp8V68jVAtHeeaU6DuDz4V+6Lf2ZA0mV8Qdxu22qCG9+fypZnFXIdRi
 R8SrytgMmXDBWLpt+zhFVwozH5hyjw==
 =1hC6
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-fixes-2023-09-28' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

Short summary of fixes pull:

 * ivpu:
   * Add PCI ids for Arrow Lake
   * Fix memory corruption during IPC
   * Avoid dmesg flooding
   * 40xx: Wait for clock resource
   * 40xx: Fix interrupt usage
   * 40xx: Support caching when loading firmware

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230928081208.GA7881@linux-uq9g
2023-09-29 07:50:16 +10:00
Joel Fernandes (Google)
fc09027786 sched/rt: Fix live lock between select_fallback_rq() and RT push
During RCU-boost testing with the TREE03 rcutorture config, I found that
after a few hours, the machine locks up.

On tracing, I found that there is a live lock happening between 2 CPUs.
One CPU has an RT task running, while another CPU is being offlined
which also has an RT task running.  During this offlining, all threads
are migrated. The migration thread is repeatedly scheduled to migrate
actively running tasks on the CPU being offlined. This results in a live
lock because select_fallback_rq() keeps picking the CPU that an RT task
is already running on only to get pushed back to the CPU being offlined.

It is anyway pointless to pick CPUs for pushing tasks to if they are
being offlined only to get migrated away to somewhere else. This could
also add unwanted latency to this task.

Fix these issues by not selecting CPUs in RT if they are not 'active'
for scheduling, using the cpu_active_mask. Other parts in core.c already
use cpu_active_mask to prevent tasks from being put on CPUs going
offline.

With this fix I ran the tests for days and could not reproduce the
hang. Without the patch, I hit it in a few hours.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230923011409.3522762-1-joel@joelfernandes.org
2023-09-28 22:58:13 +02:00
Quang Le
e6e43b8aa7 fs/smb/client: Reset password pointer to NULL
Forget to reset ctx->password to NULL will lead to bug like double free

Cc: stable@vger.kernel.org
Cc: Willy Tarreau <w@1wt.eu>
Reviewed-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Quang Le <quanglex97@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-09-28 14:49:51 -05:00
Linus Torvalds
9ed22ae6be spi: Fixes for v6.6
A small set of device specific fixes, the most major one is for the GXP
 driver which would probably have been confusing some callers with
 returning the length rather than 0 on successful writes.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmUVfbcACgkQJNaLcl1U
 h9CrRAf+PmW5OJYtAsOo987ro01DvIWv5XDhEQjljQB5Dd+qbPL5XvR9cKpk4KUc
 kyvsWFS517Z298XItVGp+xDXrllvRjLn1ShUEoisEG8M+j84GN5M/eKw7W1Q7GxX
 iPUTdsgKJRhnKxtvuKtCpmBT24Ari54AiPHtS2tyZGJeo9ehdaUG+LIsJS/zzUuh
 KqfBBS7Gc9KE9alWqTeIx8Q2/ecVk3LEzdITPnNIkcXRHs2sUWGbs8O/u8ZNoD0F
 4+vibLweDQ+SNlYVGjSJAG6Qq9dikIPDZlW+WSv/aGuf4MR1gE6sl97Okwb1q26r
 sg6wbt/pDrYHOA33ZzoV5IMvjzsBRQ==
 =SQJb
 -----END PGP SIGNATURE-----

Merge tag 'spi-fix-v6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
 "A small set of device specific fixes, the most major one is for the
  GXP driver which would probably have been confusing some callers with
  returning the length rather than 0 on successful writes"

* tag 'spi-fix-v6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: spi-gxp: BUG: Correct spi write return value
  dt-bindings: spi: fsl-imx-cspi: Document missing entries
  spi: cs42l43: Remove spurious pm_runtime_disable
2023-09-28 11:12:42 -07:00
Linus Torvalds
5d959343ae LoongArch fixes for v6.6-rc4
-----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEzOlt8mkP+tbeiYy5AoYrw/LiJnoFAmUT9IEWHGNoZW5odWFj
 YWlAa2VybmVsLm9yZwAKCRAChivD8uImesv0D/0YfPF+lZ5riqJRuDOa+GLT2lir
 6LD1poWr1GTsRAM0qn0q7LLb64LmDOG4Y5WoU+I0/JZVMPjcLGKDOi9YUDD4YSgV
 6QAiWeHLMHrnLyTGr5N7gYStyqh28O0//Z7upVjXeDuJ20Z1tXk+oUpg8f/98q3j
 ANyGlEOObExghBbhrS7pWXN4BT3t/PyN8/GJTPvcaAfcjmyk0NVUk25758WYKiX1
 niPd2LSiTUplMYkXnxYnqheEo73tNRkK8HdYQnUEMOwYALwz1XyPMCIZGNhTrxd1
 krphBce1nCt1Q7M8A1vUqH8XeK6mdwMIsHrWRzg0CtU0BVUnfBJgCiM4lsPBGtZF
 p9z0YADMujM696SjVExFwam3r1gARketu+idz5Rtt7FUumOWXeXYGBF4yCZ+y5S8
 oZQ2E5hLX3z/WoMaYESiW5Gm6PLDJvJbX9UhaZINStl77CS088L5fU3leuhQM6jP
 e1fLBR68N5EHNJqjaWsp/7+ap3v7O6lwSO1QsRDjXiCWmZ96ybIBYNawPyEoKOSU
 1B9Qc7Er7aNUM8SDbIuzOmrWYhVCbq+5tyJAKbPifsl9jowA5P7hqFBWJ5l/ijsy
 njMg+bbl8/NBU/+PZAaRmnXZJGMnwy/W/zfNQjm/cGXRB1vXa7/symkFIIY8UI5c
 P3pJC4n1fedzOi4P5Q==
 =g9gX
 -----END PGP SIGNATURE-----

Merge tag 'loongarch-fixes-6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson

Pull LoongArch fixes from Huacai Chen:
 "Fix high_memory calculation and module loader errors with latest
  binutils"

* tag 'loongarch-fixes-6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  LoongArch: Add support for 64_PCREL relocation type
  LoongArch: Add support for 32_PCREL relocation type
  LoongArch: Define relocation types for ABI v2.10
  LoongArch: numa: Fix high_memory calculation
2023-09-28 10:27:45 -07:00
Linus Torvalds
52a6d9b53e - fix Alchemy build with MMC support disabled
-----BEGIN PGP SIGNATURE-----
 
 iQJOBAABCAA4FiEEbt46xwy6kEcDOXoUeZbBVTGwZHAFAmUVMw0aHHRzYm9nZW5k
 QGFscGhhLmZyYW5rZW4uZGUACgkQeZbBVTGwZHCRuA//a/mpqqNgINLX7RPgP+5e
 Fd8QWY3XdUVwVmL1KCU991igwLZePTzEqm92CGrystykJRCWvriDYkZlGTHRwj9I
 mNqR7bM0cCdlZ5J+/9qr1c6stWoPJQ37NIeO6xkfiBPfjIFi7BvilhzvlgYJ2ww8
 C5ADQOiUQ4vuqULZOrUhi6vAAXZozrtiBnhY7Woim6aMk3woznnaD0m6kmlggyi/
 Tr/7tg4Eba8RUjN6TQuRBJVTIn2ZVaZ9KUyZC3ehSCSvTZmiC1oNO1aSzKE0vdz4
 oUt7ug1yICAZU7+g03VZ+iTQnVhXUZGBqYzYnQXg9CpHXasiaWTejqynT4ugWKrU
 vBJ/h7sbjNU9T1UgcuT6vsEqwWBOdPqJFEAj4tJSlDP9lY08MKlTn8l9qIEN5cMA
 7kl8J5O3pwtdQAUwJmwppWynW8+jiAuvzN6vDkhNoGN1zgpn4bTLsKCsKy9ZzznN
 I62ApFRwkRo8RHViDjA7WR4l+8e/7UYJqf0vl3tBGni7q5tL83ns7wpy0uzqk7W/
 loHGJIqAEI/yRCNPFQfyeWBZXJs2faMUbLCpkB0McbIpESWae2vqCZnulgH+ecJ3
 BOgvlA1ut8uAljXoI4vWLziZsuO2aMqtUFIhqZjzlYr8YDaz1xodPcN99pQ7mkVq
 Arw07BQK6Ur7ThJDbQnFz9I=
 =O/EB
 -----END PGP SIGNATURE-----

Merge tag 'mips-fixes_6.6_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

Pull MIPS fix from Thomas Bogendoerfer:

 - fix Alchemy build with MMC support disabled

* tag 'mips-fixes_6.6_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  MIPS: Alchemy: only build mmc support helpers if au1xmmc is enabled
2023-09-28 10:24:26 -07:00
Geert Uytterhoeven
684f7e6d28 iomap: Spelling s/preceeding/preceding/g
Fix a misspelling of "preceding".

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-09-28 09:26:58 -07:00
Chuck Lever
0d32a6bbb8 NFSD: Fix zero NFSv4 READ results when RQ_SPLICE_OK is not set
nfsd4_encode_readv() uses xdr->buf->page_len as a starting point for
the nfsd_iter_read() sink buffer -- page_len is going to be offset
by the parts of the COMPOUND that have already been encoded into
xdr->buf->pages.

However, that value must be captured /before/
xdr_reserve_space_vec() advances page_len by the expected size of
the read payload. Otherwise, the whole front part of the first
page of the payload in the reply will be uninitialized.

Mantas hit this because sec=krb5i forces RQ_SPLICE_OK off, which
invokes the readv part of the nfsd4_encode_read() path. Also,
older Linux NFS clients appear to send shorter READ requests
for files smaller than a page, whereas newer clients just send
page-sized requests and let the server send as many bytes as
are in the file.

Reported-by: Mantas Mikulėnas <grawity@gmail.com>
Closes: https://lore.kernel.org/linux-nfs/f1d0b234-e650-0f6e-0f5d-126b3d51d1eb@gmail.com/
Fixes: 703d752155 ("NFSD: Hoist rq_vec preparation into nfsd_read() [step two]")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-09-28 10:34:28 -04:00