Commit Graph

1264735 Commits

Author SHA1 Message Date
Johannes Weiner d5d39c707a mm: cachestat: fix two shmem bugs
When cachestat on shmem races with swapping and invalidation, there
are two possible bugs:

1) A swapin error can have resulted in a poisoned swap entry in the
   shmem inode's xarray. Calling get_shadow_from_swap_cache() on it
   will result in an out-of-bounds access to swapper_spaces[].

   Validate the entry with non_swap_entry() before going further.

2) When we find a valid swap entry in the shmem's inode, the shadow
   entry in the swapcache might not exist yet: swap IO is still in
   progress and we're before __remove_mapping; swapin, invalidation,
   or swapoff have removed the shadow from swapcache after we saw the
   shmem swap entry.

   This will send a NULL to workingset_test_recent(). The latter
   purely operates on pointer bits, so it won't crash - node 0, memcg
   ID 0, eviction timestamp 0, etc. are all valid inputs - but it's a
   bogus test. In theory that could result in a false "recently
   evicted" count.

   Such a false positive wouldn't be the end of the world. But for
   code clarity and (future) robustness, be explicit about this case.

   Bail on get_shadow_from_swap_cache() returning NULL.

Link: https://lkml.kernel.org/r/20240315095556.GC581298@cmpxchg.org
Fixes: cf264e1329 ("cachestat: implement cachestat syscall")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Chengming Zhou <chengming.zhou@linux.dev>	[Bug #1]
Reported-by: Jann Horn <jannh@google.com>		[Bug #2]
Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>
Reviewed-by: Nhat Pham <nphamcs@gmail.com>
Cc: <stable@vger.kernel.org>				[v6.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-26 11:07:20 -07:00
Matthew Wilcox (Oracle) 9cecde80aa mm: increase folio batch size
On a 104 thread, 2 socket Skylake system, Intel report a 4.7% performance
reduction with will-it-scale page_fault2.  This was due to reducing the
size of the batch from 32 to 15.  Increasing the folio batch size from 15
to 31 gives a performance increase of 12.5% relative to the original, or
17.2% relative to the reduced performance commit.

The penalty of this commit is an additional 128 bytes of stack usage.  Six
folio_batches are also allocated from percpu memory in cpu_fbatches so
that will be an additional 768 bytes of percpu memory (per CPU).  Tim Chen
originally submitted a patch like this in 2020:
https://lore.kernel.org/linux-mm/d1cc9f12a8ad6c2a52cb600d93b06b064f2bbc57.1593205965.git.tim.c.chen@linux.intel.com/

Link: https://lkml.kernel.org/r/20240315140823.2478146-1-willy@infradead.org
Fixes: 99fbb6bfc1 ("mm: make folios_put() the basis of release_pages()")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: Yujie Liu <yujie.liu@intel.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202403151058.7048f6a8-oliver.sang@intel.com
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-26 11:07:20 -07:00
Oscar Salvador 7844c01472 mm,page_owner: fix recursion
Prior to 217b2119b9 ("mm,page_owner: implement the tracking of the
stacks count") the only place where page_owner could potentially go into
recursion due to its need of allocating more memory was in save_stack(),
which ends up calling into stackdepot code with the possibility of
allocating memory.

We made sure to guard against that by signaling that the current task was
already in page_owner code, so in case a recursion attempt was made, we
could catch that and return dummy_handle.

After above commit, a new place in page_owner code was introduced where we
could allocate memory, meaning we could go into recursion would we take
that path.

Make sure to signal that we are in page_owner in that codepath as well. 
Move the guard code into two helpers {un}set_current_in_page_owner() and
use them prior to calling in the two functions that might allocate memory.

Link: https://lkml.kernel.org/r/20240315222610.6870-1-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Fixes: 217b2119b9 ("mm,page_owner: implement the tracking of the stacks count")
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Marco Elver <elver@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-26 11:07:20 -07:00
Leonard Crestez 3290032466 mailmap: update entry for Leonard Crestez
Put my personal email first because NXP employment ended some time ago.
Also add my old intel email address.

Link: https://lkml.kernel.org/r/f568faa0-2380-4e93-a312-b80c1e367645@gmail.com
Signed-off-by: Leonard Crestez <cdleonard@gmail.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-26 11:07:20 -07:00
John Sperbeck 4624b346cf init: open /initrd.image with O_LARGEFILE
If initrd data is larger than 2Gb, we'll eventually fail to write to the
/initrd.image file when we hit that limit, unless O_LARGEFILE is set.

Link: https://lkml.kernel.org/r/20240317221522.896040-1-jsperbeck@google.com
Signed-off-by: John Sperbeck <jsperbeck@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-26 11:07:19 -07:00
Vitaly Chikunov 8b65ef5ad4 selftests/mm: Fix build with _FORTIFY_SOURCE
Add missing flags argument to open(2) call with O_CREAT.

Some tests fail to compile if _FORTIFY_SOURCE is defined (to any valid
value) (together with -O), resulting in similar error messages such as:

  In file included from /usr/include/fcntl.h:342,
                   from gup_test.c:1:
  In function 'open',
      inlined from 'main' at gup_test.c:206:10:
  /usr/include/bits/fcntl2.h:50:11: error: call to '__open_missing_mode' declared with attribute error: open with O_CREAT or O_TMPFILE in second argument needs 3 arguments
     50 |           __open_missing_mode ();
        |           ^~~~~~~~~~~~~~~~~~~~~~

_FORTIFY_SOURCE is enabled by default in some distributions, so the
tests are not built by default and are skipped.

open(2) man-page warns about missing flags argument: "if it is not
supplied, some arbitrary bytes from the stack will be applied as the
file mode."

Link: https://lkml.kernel.org/r/20240318023445.3192922-1-vt@altlinux.org
Fixes: aeb85ed4f4 ("tools/testing/selftests/vm/gup_benchmark.c: allow user specified file")
Fixes: fbe37501b2 ("mm: huge_memory: debugfs for file-backed THP split")
Fixes: c942f5bd17 ("selftests: soft-dirty: add test for mprotect")
Signed-off-by: Vitaly Chikunov <vt@altlinux.org>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-26 11:07:19 -07:00
Peter Xu f8572367ea mm/memory: fix missing pte marker for !page on pte zaps
Commit 0cf18e839f of large folio zap work broke uffd-wp.  Now mm's uffd
unit test "wp-unpopulated" will trigger this WARN_ON_ONCE().

The WARN_ON_ONCE() asserts that an VMA cannot be registered with
userfaultfd-wp if it contains a !normal page, but it's actually possible. 
One example is an anonymous vma, register with uffd-wp, read anything will
install a zero page.  Then when zap on it, this should trigger.

What's more, removing that WARN_ON_ONCE may not be enough either, because
we should also not rely on "whether it's a normal page" to decide whether
pte marker is needed.  For example, one can register wr-protect over some
DAX regions to track writes when UFFD_FEATURE_WP_ASYNC enabled, in which
case it can have page==NULL for a devmap but we may want to keep the
marker around.

Link: https://lkml.kernel.org/r/20240313213107.235067-1-peterx@redhat.com
Fixes: 0cf18e839f ("mm/memory: handle !page case in zap_present_pte() separately")
Signed-off-by: Peter Xu <peterx@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-26 11:07:19 -07:00
Linus Torvalds 7033999ecd printk changes for 6.9-rc2
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmYBhi4ACgkQUqAMR0iA
 lPKy0A/+Pl9ysymbBzQYZEMhiIEPz1Aakh56hcDz89g9Axqn9hPgDgZaA/AKmN3Y
 aRo3X/aZVk9uQfUQ1VCYX/738/9fATIt6P8WzukSID7PpYMd5wnjg2rdqtb6zErz
 wGDPDJabMMUWg0IT0LEXnERUI31TS5VCc8MlHkZFjnT1j7oKMGebC+kX1YNK2Jfr
 hk/4eaidDsFlXR9m+2P8DcmSlloXIvG326Ke7aLDs46FIRL+pzoQhjIIN/hn1Mi3
 FPNkYMvOWdDKA1s55EqOno/i7MpQRf2tjGjfQd0mzJgUxk9hXo0YYpHUpnY3MKDE
 +RO3P0MheNJHSatoTEY5/r5aclk1Kg4QnYdDhCWm57flq68iM0aM7i0ACfuVNr5z
 fXz5Uv4lRAd1993yK2nRrczljGN2OpNium4kTifHGhfMZd7UuY9Yh3CPjMIfDm3e
 iZ7wp6L7pBTv9px4cK3U3pM5+w1Rr8pCZPJRK36WxhyZ9ivNU9tzTxO7dwLvqalN
 3q2y3jjgBupTZVawCtodFl6XHY+2LMihR+4tVWXWKblAdQ2H9UkFLIeJqjlAYZeW
 O9OV/gDLB8VfKWnVkkouKcFm6GXngFpTSc7APPDHLJrfbUcZ7FyMjtjyDVSeeaz+
 +lcvDwf2n2nl3UM5kmm6tNma/TDprKQzxk3m2JgPXjDcDgmA24w=
 =7UMV
 -----END PGP SIGNATURE-----

Merge tag 'printk-for-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux

Pull printk fix from Petr Mladek:

 - Prevent scheduling in an atomic context when printk() takes over the
   console flushing duty

* tag 'printk-for-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
  printk: Update @console_may_schedule in console_trylock_spinning()
2024-03-26 09:25:57 -07:00
Linus Torvalds 576bb2d8e3 pwm: A fix targeting v6.9-rc2
This contains a single fix for a regression introduced in v5.18-rc1
 which made the img pwm driver fail to bind.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEP4GsaTp6HlmJrf7Tj4D7WH0S/k4FAmYCf3sACgkQj4D7WH0S
 /k59xggAn8fUj0xLQ3ho+rhH0uJkoTJlIYYQCX5CDkE5VM/a0JbyVd8q2oH708Z9
 KOKcWixUG6gGm8RXTlA1Hn6xpKnjUCSpC/37BcqtnTBhp5rqq2HHukZ331yFFOGw
 mf63QElYTFnWh3TzfVMJOa/tzVeJQ2nzPpm28VoJEl9lWZs845VwUaKCMtZJ6cpd
 gP6STcDJkUsY1jinN4nMQfS9iBalzvaHNVUMGPwxbnvVvexM/qjOULiSUmc7dKKY
 K7WPwFp3yNT4GtRaJFwV6sAJQ/R86XQOwYHBGnutUY5u0eOp2PGjYnFHTfl0hxth
 KTR5PpveQpx7v2EdYLGn2/WrNcvS8A==
 =Hq7u
 -----END PGP SIGNATURE-----

Merge tag 'pwm/for-6.9-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux

Pull pwm fix from Uwe Kleine-König:
 "This contains a single fix for a regression introduced in v5.18-rc1
  which made the img pwm driver fail to bind"

* tag 'pwm/for-6.9-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux:
  pwm: img: fix pwm clock lookup
2024-03-26 09:20:56 -07:00
Aleksandr Loktionov f37c4eac99 i40e: fix vf may be used uninitialized in this function warning
To fix the regression introduced by commit 52424f974b, which causes
servers hang in very hard to reproduce conditions with resets races.
Using two sources for the information is the root cause.
In this function before the fix bumping v didn't mean bumping vf
pointer. But the code used this variables interchangeably, so stale vf
could point to different/not intended vf.

Remove redundant "v" variable and iterate via single VF pointer across
whole function instead to guarantee VF pointer validity.

Fixes: 52424f974b ("i40e: Fix VF hang when reset is triggered on another VF")
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-03-26 09:02:03 -07:00
Aleksandr Loktionov eb58c598ce i40e: fix i40e_count_filters() to count only active/new filters
The bug usually affects untrusted VFs, because they are limited to 18 MACs,
it affects them badly, not letting to create MAC all filters.
Not stable to reproduce, it happens when VF user creates MAC filters
when other MACVLAN operations are happened in parallel.
But consequence is that VF can't receive desired traffic.

Fix counter to be bumped only for new or active filters.

Fixes: 621650cabe ("i40e: Refactoring VF MAC filters counting to make more reliable")
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-03-26 08:56:06 -07:00
Tavian Barnes ef1e68236b btrfs: fix race in read_extent_buffer_pages()
There are reports from tree-checker that detects corrupted nodes,
without any obvious pattern so possibly an overwrite in memory.
After some debugging it turns out there's a race when reading an extent
buffer the uptodate status can be missed.

To prevent concurrent reads for the same extent buffer,
read_extent_buffer_pages() performs these checks:

    /* (1) */
    if (test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags))
        return 0;

    /* (2) */
    if (test_and_set_bit(EXTENT_BUFFER_READING, &eb->bflags))
        goto done;

At this point, it seems safe to start the actual read operation. Once
that completes, end_bbio_meta_read() does

    /* (3) */
    set_extent_buffer_uptodate(eb);

    /* (4) */
    clear_bit(EXTENT_BUFFER_READING, &eb->bflags);

Normally, this is enough to ensure only one read happens, and all other
callers wait for it to finish before returning.  Unfortunately, there is
a racey interleaving:

    Thread A | Thread B | Thread C
    ---------+----------+---------
       (1)   |          |
             |    (1)   |
       (2)   |          |
       (3)   |          |
       (4)   |          |
             |    (2)   |
             |          |    (1)

When this happens, thread B kicks of an unnecessary read. Worse, thread
C will see UPTODATE set and return immediately, while the read from
thread B is still in progress.  This race could result in tree-checker
errors like this as the extent buffer is concurrently modified:

    BTRFS critical (device dm-0): corrupted node, root=256
    block=8550954455682405139 owner mismatch, have 11858205567642294356
    expect [256, 18446744073709551360]

Fix it by testing UPTODATE again after setting the READING bit, and if
it's been set, skip the unnecessary read.

Fixes: d7172f52e9 ("btrfs: use per-buffer locking for extent_buffer reading")
Link: https://lore.kernel.org/linux-btrfs/CAHk-=whNdMaN9ntZ47XRKP6DBes2E5w7fi-0U3H2+PS18p+Pzw@mail.gmail.com/
Link: https://lore.kernel.org/linux-btrfs/f51a6d5d7432455a6a858d51b49ecac183e0bbc9.1706312914.git.wqu@suse.com/
Link: https://lore.kernel.org/linux-btrfs/c7241ea4-fcc6-48d2-98c8-b5ea790d6c89@gmx.com/
CC: stable@vger.kernel.org # 6.5+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Tavian Barnes <tavianator@tavianator.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ minor update of changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-26 16:42:39 +01:00
Anand Jain 2f1aeab9fc btrfs: return accurate error code on open failure in open_fs_devices()
When attempting to exclusive open a device which has no exclusive open
permission, such as a physical device associated with the flakey dm
device, the open operation will fail, resulting in a mount failure.

In this particular scenario, we erroneously return -EINVAL instead of the
correct error code provided by the bdev_open_by_path() function, which is
-EBUSY.

Fix this, by returning error code from the bdev_open_by_path() function.
With this correction, the mount error message will align with that of
ext4 and xfs.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-26 16:42:39 +01:00
Johannes Thumshirn a8b70c7f86 btrfs: zoned: don't skip block groups with 100% zone unusable
Commit f4a9f21941 ("btrfs: do not delete unused block group if it may be
used soon") changed the behaviour of deleting unused block-groups on zoned
filesystems. Starting with this commit, we're using
btrfs_space_info_used() to calculate the number of used bytes in a
space_info. But btrfs_space_info_used() also accounts
btrfs_space_info::bytes_zone_unusable as used bytes.

So if a block group is 100% zone_unusable it is skipped from the deletion
step.

In order not to skip fully zone_unusable block-groups, also check if the
block-group has bytes left that can be used on a zoned filesystem.

Fixes: f4a9f21941 ("btrfs: do not delete unused block group if it may be used soon")
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-26 16:42:39 +01:00
Filipe Manana 2133460061 btrfs: use btrfs_warn() to log message at btrfs_add_extent_mapping()
At btrfs_add_extent_mapping(), if we failed to merge the extent map, which
is unexpected and theoretically should never happen, we use WARN_ONCE() to
log a message which is not great because we don't get information about
which filesystem it relates to in case we have multiple btrfs filesystems
mounted. So change this to use btrfs_warn() and surround the error check
with WARN_ON() so we always get a useful stack trace and the condition is
flagged as "unlikely" since it's not expected to ever happen.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-26 16:42:39 +01:00
Filipe Manana 379c872393 btrfs: fix message not properly printing interval when adding extent map
At btrfs_add_extent_mapping(), if we are unable to merge the existing
extent map, we print a warning message that suggests interval ranges in
the form "[X, Y)", where the first element is the inclusive start offset
of a range and the second element is the exclusive end offset. However
we end up printing the length of the ranges instead of the exclusive end
offsets. So fix this by printing the range end offsets.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-26 16:42:39 +01:00
Filipe Manana 4dc1d69c2b btrfs: fix warning messages not printing interval at unpin_extent_range()
At unpin_extent_range() we print warning messages that are supposed to
print an interval in the form "[X, Y)", with the first element being an
inclusive start offset and the second element being the exclusive end
offset of a range. However we end up printing the range's length instead
of the range's exclusive end offset, so fix that to avoid having confusing
and non-sense messages in case we hit one of these unexpected scenarios.

Fixes: 00deaf04df ("btrfs: log messages at unpin_extent_range() during unexpected cases")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-26 16:42:38 +01:00
Filipe Manana 8a565ec04d btrfs: fix extent map leak in unexpected scenario at unpin_extent_cache()
At unpin_extent_cache() if we happen to find an extent map with an
unexpected start offset, we jump to the 'out' label and never release the
reference we added to the extent map through the call to
lookup_extent_mapping(), therefore resulting in a leak. So fix this by
moving the free_extent_map() under the 'out' label.

Fixes: c03c89f821 ("btrfs: handle errors returned from unpin_extent_cache()")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-26 16:42:38 +01:00
Anand Jain 9f7eb8405d btrfs: validate device maj:min during open
Boris managed to create a device capable of changing its maj:min without
altering its device path.

Only multi-devices can be scanned. A device that gets scanned and remains
in the btrfs kernel cache might end up with an incorrect maj:min.

Despite the temp-fsid feature patch did not introduce this bug, it could
lead to issues if the above multi-device is converted to a single device
with a stale maj:min. Subsequently, attempting to mount the same device
with the correct maj:min might mistake it for another device with the same
fsid, potentially resulting in wrongly auto-enabling the temp-fsid feature.

To address this, this patch validates the device's maj:min at the time of
device open and updates it if it has changed since the last scan.

CC: stable@vger.kernel.org # 6.7+
Fixes: a5b8a5f9f8 ("btrfs: support cloned-device mount capability")
Reported-by: Boris Burkov <boris@bur.io>
Co-developed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Boris Burkov <boris@bur.io>#
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-26 16:42:38 +01:00
Johannes Thumshirn 1ec17ef591 btrfs: zoned: fix use-after-free in do_zone_finish()
Shinichiro reported the following use-after-free triggered by the device
replace operation in fstests btrfs/070.

 BTRFS info (device nullb1): scrub: finished on devid 1 with status: 0
 ==================================================================
 BUG: KASAN: slab-use-after-free in do_zone_finish+0x91a/0xb90 [btrfs]
 Read of size 8 at addr ffff8881543c8060 by task btrfs-cleaner/3494007

 CPU: 0 PID: 3494007 Comm: btrfs-cleaner Tainted: G        W          6.8.0-rc5-kts #1
 Hardware name: Supermicro Super Server/X11SPi-TF, BIOS 3.3 02/21/2020
 Call Trace:
  <TASK>
  dump_stack_lvl+0x5b/0x90
  print_report+0xcf/0x670
  ? __virt_addr_valid+0x200/0x3e0
  kasan_report+0xd8/0x110
  ? do_zone_finish+0x91a/0xb90 [btrfs]
  ? do_zone_finish+0x91a/0xb90 [btrfs]
  do_zone_finish+0x91a/0xb90 [btrfs]
  btrfs_delete_unused_bgs+0x5e1/0x1750 [btrfs]
  ? __pfx_btrfs_delete_unused_bgs+0x10/0x10 [btrfs]
  ? btrfs_put_root+0x2d/0x220 [btrfs]
  ? btrfs_clean_one_deleted_snapshot+0x299/0x430 [btrfs]
  cleaner_kthread+0x21e/0x380 [btrfs]
  ? __pfx_cleaner_kthread+0x10/0x10 [btrfs]
  kthread+0x2e3/0x3c0
  ? __pfx_kthread+0x10/0x10
  ret_from_fork+0x31/0x70
  ? __pfx_kthread+0x10/0x10
  ret_from_fork_asm+0x1b/0x30
  </TASK>

 Allocated by task 3493983:
  kasan_save_stack+0x33/0x60
  kasan_save_track+0x14/0x30
  __kasan_kmalloc+0xaa/0xb0
  btrfs_alloc_device+0xb3/0x4e0 [btrfs]
  device_list_add.constprop.0+0x993/0x1630 [btrfs]
  btrfs_scan_one_device+0x219/0x3d0 [btrfs]
  btrfs_control_ioctl+0x26e/0x310 [btrfs]
  __x64_sys_ioctl+0x134/0x1b0
  do_syscall_64+0x99/0x190
  entry_SYSCALL_64_after_hwframe+0x6e/0x76

 Freed by task 3494056:
  kasan_save_stack+0x33/0x60
  kasan_save_track+0x14/0x30
  kasan_save_free_info+0x3f/0x60
  poison_slab_object+0x102/0x170
  __kasan_slab_free+0x32/0x70
  kfree+0x11b/0x320
  btrfs_rm_dev_replace_free_srcdev+0xca/0x280 [btrfs]
  btrfs_dev_replace_finishing+0xd7e/0x14f0 [btrfs]
  btrfs_dev_replace_by_ioctl+0x1286/0x25a0 [btrfs]
  btrfs_ioctl+0xb27/0x57d0 [btrfs]
  __x64_sys_ioctl+0x134/0x1b0
  do_syscall_64+0x99/0x190
  entry_SYSCALL_64_after_hwframe+0x6e/0x76

 The buggy address belongs to the object at ffff8881543c8000
  which belongs to the cache kmalloc-1k of size 1024
 The buggy address is located 96 bytes inside of
  freed 1024-byte region [ffff8881543c8000, ffff8881543c8400)

 The buggy address belongs to the physical page:
 page:00000000fe2c1285 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1543c8
 head:00000000fe2c1285 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
 flags: 0x17ffffc0000840(slab|head|node=0|zone=2|lastcpupid=0x1fffff)
 page_type: 0xffffffff()
 raw: 0017ffffc0000840 ffff888100042dc0 ffffea0019e8f200 dead000000000002
 raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
 page dumped because: kasan: bad access detected

 Memory state around the buggy address:
  ffff8881543c7f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  ffff8881543c7f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 >ffff8881543c8000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                        ^
  ffff8881543c8080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff8881543c8100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

This UAF happens because we're accessing stale zone information of a
already removed btrfs_device in do_zone_finish().

The sequence of events is as follows:

btrfs_dev_replace_start
  btrfs_scrub_dev
   btrfs_dev_replace_finishing
    btrfs_dev_replace_update_device_in_mapping_tree <-- devices replaced
    btrfs_rm_dev_replace_free_srcdev
     btrfs_free_device                              <-- device freed

cleaner_kthread
 btrfs_delete_unused_bgs
  btrfs_zone_finish
   do_zone_finish              <-- refers the freed device

The reason for this is that we're using a cached pointer to the chunk_map
from the block group, but on device replace this cached pointer can
contain stale device entries.

The staleness comes from the fact, that btrfs_block_group::physical_map is
not a pointer to a btrfs_chunk_map but a memory copy of it.

Also take the fs_info::dev_replace::rwsem to prevent
btrfs_dev_replace_update_device_in_mapping_tree() from changing the device
underneath us again.

Note: btrfs_dev_replace_update_device_in_mapping_tree() is holding
fs_info::mapping_tree_lock, but as this is a spinning read/write lock we
cannot take it as the call to blkdev_zone_mgmt() requires a memory
allocation which may not sleep.
But btrfs_dev_replace_update_device_in_mapping_tree() is always called with
the fs_info::dev_replace::rwsem held in write mode.

Many thanks to Shinichiro for analyzing the bug.

Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
CC: stable@vger.kernel.org # 6.8
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-26 16:41:01 +01:00
Ivan Vecera ea558de723 i40e: Enforce software interrupt during busy-poll exit
As for ice bug fixed by commit b7306b42be ("ice: manage interrupts
during poll exit") followed by commit 23be7075b3 ("ice: fix software
generating extra interrupts") I'm seeing the similar issue also with
i40e driver.

In certain situation when busy-loop is enabled together with adaptive
coalescing, the driver occasionally misses that there are outstanding
descriptors to clean when exiting busy poll.

Try to catch the remaining work by triggering a software interrupt
when exiting busy poll. No extra interrupts will be generated when
busy polling is not used.

The issue was found when running sockperf ping-pong tcp test with
adaptive coalescing and busy poll enabled (50 as value busy_pool
and busy_read sysctl knobs) and results in huge latency spikes
with more than 100000us.

The fix is inspired from the ice driver and do the following:
1) During napi poll exit in case of busy-poll (napo_complete_done()
   returns false) this is recorded to q_vector that we were in busy
   loop.
2) Extends i40e_buildreg_itr() to be able to add an enforced software
   interrupt into built value
2) In i40e_update_enable_itr() enforces a software interrupt trigger
   if we are exiting busy poll to catch any pending clean-ups
3) Reuses unused 3rd ITR (interrupt throttle) index and set it to
   20K interrupts per second to limit the number of these sw interrupts.

Test results
============
Prior:
[root@dell-per640-07 net]# sockperf ping-pong -i 10.9.9.1 --tcp -m 1000 --mps=max -t 120
sockperf: == version #3.10-no.git ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)

[ 0] IP = 10.9.9.1        PORT = 11111 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=119.999 sec; Warm up time=400 msec; SentMessages=2438563; ReceivedMessages=2438562
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=119.549 sec; SentMessages=2429473; ReceivedMessages=2429473
sockperf: ====> avg-latency=24.571 (std-dev=93.297, mean-ad=4.904, median-ad=1.510, siqr=1.063, cv=3.797, std-error=0.060, 99.0% ci=[24.417, 24.725])
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 24.571 usec
sockperf: Total 2429473 observations; each percentile contains 24294.73 observations
sockperf: ---> <MAX> observation = 103294.331
sockperf: ---> percentile 99.999 =   45.633
sockperf: ---> percentile 99.990 =   37.013
sockperf: ---> percentile 99.900 =   35.910
sockperf: ---> percentile 99.000 =   33.390
sockperf: ---> percentile 90.000 =   28.626
sockperf: ---> percentile 75.000 =   27.741
sockperf: ---> percentile 50.000 =   26.743
sockperf: ---> percentile 25.000 =   25.614
sockperf: ---> <MIN> observation =   12.220

After:
[root@dell-per640-07 net]# sockperf ping-pong -i 10.9.9.1 --tcp -m 1000 --mps=max -t 120
sockperf: == version #3.10-no.git ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)

[ 0] IP = 10.9.9.1        PORT = 11111 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=119.999 sec; Warm up time=400 msec; SentMessages=2400055; ReceivedMessages=2400054
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=119.549 sec; SentMessages=2391186; ReceivedMessages=2391186
sockperf: ====> avg-latency=24.965 (std-dev=5.934, mean-ad=4.642, median-ad=1.485, siqr=1.067, cv=0.238, std-error=0.004, 99.0% ci=[24.955, 24.975])
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 24.965 usec
sockperf: Total 2391186 observations; each percentile contains 23911.86 observations
sockperf: ---> <MAX> observation =  195.841
sockperf: ---> percentile 99.999 =   45.026
sockperf: ---> percentile 99.990 =   39.009
sockperf: ---> percentile 99.900 =   35.922
sockperf: ---> percentile 99.000 =   33.482
sockperf: ---> percentile 90.000 =   28.902
sockperf: ---> percentile 75.000 =   27.821
sockperf: ---> percentile 50.000 =   26.860
sockperf: ---> percentile 25.000 =   25.685
sockperf: ---> <MIN> observation =   12.277

Fixes: 0bcd952fee ("ethernet/intel: consolidate NAPI and NAPI exit")
Reported-by: Hugo Ferreira <hferreir@redhat.com>
Reviewed-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-03-26 08:34:40 -07:00
Paolo Abeni c1fd3a9433 Merge branch 'there-are-some-bugfix-for-the-hns3-ethernet-driver'
Jijie Shao says:

====================
There are some bugfix for the HNS3 ethernet driver
====================

Link: https://lore.kernel.org/r/20240325124311.1866197-1-shaojijie@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-26 15:32:42 +01:00
Jian Shen 5bd088d6c2 net: hns3: mark unexcuted loopback test result as UNEXECUTED
Currently, loopback test may be skipped when resetting, but the test
result will still show as 'PASS', because the driver doesn't set
ETH_TEST_FL_FAILED flag. Fix it by setting the flag and
initializating the value to UNEXECUTED.

Fixes: 4c8dab1c70 ("net: hns3: reconstruct function hns3_self_test")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-26 15:32:39 +01:00
Yonglong Liu 93305b77ff net: hns3: fix kernel crash when devlink reload during pf initialization
The devlink reload process will access the hardware resources,
but the register operation is done before the hardware is initialized.
So, processing the devlink reload during initialization may lead to kernel
crash. This patch fixes this by taking devl_lock during initialization.

Fixes: b741269b27 ("net: hns3: add support for registering devlink for PF")
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-26 15:32:39 +01:00
Jie Wang 47e39d213e net: hns3: fix index limit to support all queue stats
Currently, hns hardware supports more than 512 queues and the index limit
in hclge_comm_tqps_update_stats is wrong. So this patch removes it.

Fixes: 287db5c40d ("net: hns3: create new set of common tqp stats APIs for PF and VF reuse")
Signed-off-by: Jie Wang <wangjie125@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-26 15:32:39 +01:00
Francesco Dolcini 8ea3f4f1a1 MAINTAINERS: wifi: mwifiex: add Francesco as reviewer
As discussed on the mailing list, add myself as mwifiex driver reviewer.

Link: https://lore.kernel.org/all/20240318112830.GA9565@francesco-nb/
Signed-off-by: Francesco Dolcini <francesco@dolcini.it>
Acked-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://msgid.link/20240321163420.11158-1-francesco@dolcini.it
2024-03-26 14:31:09 +02:00
Paolo Abeni 37ccdf7f11 bpf-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZgHmTAAKCRDbK58LschI
 g1gWAP9HjAWE/Sy0B2t9opIiTqRzdMJLYs2B4OFeHRI6+qQg0gD6A4jsKEh/xmtG
 Hhjw+AElJRFZ3SUIT4mZlljzUHIYYAA=
 =T0lM
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2024-03-25

The following pull-request contains BPF updates for your *net* tree.

We've added 17 non-merge commits during the last 12 day(s) which contain
a total of 19 files changed, 184 insertions(+), 61 deletions(-).

The main changes are:

1) Fix an arm64 BPF JIT bug in BPF_LDX_MEMSX implementation's offset handling
   found via test_bpf module, from Puranjay Mohan.

2) Various fixups to the BPF arena code in particular in the BPF verifier and
   around BPF selftests to match latest corresponding LLVM implementation,
   from Puranjay Mohan and Alexei Starovoitov.

3) Fix xsk to not assume that metadata is always requested in TX completion,
   from Stanislav Fomichev.

4) Fix riscv BPF JIT's kfunc parameter incompatibility between BPF and the riscv
   ABI which requires sign-extension on int/uint, from Pu Lehui.

5) Fix s390x BPF JIT's bpf_plt pointer arithmetic which triggered a crash when
   testing struct_ops, from Ilya Leoshkevich.

6) Fix libbpf's arena mmap handling which had incorrect u64-to-pointer cast on
   32-bit architectures, from Andrii Nakryiko.

7) Fix libbpf to define MFD_CLOEXEC when not available, from Arnaldo Carvalho de Melo.

8) Fix arm64 BPF JIT implementation for 32bit unconditional bswap which
   resulted in an incorrect swap as indicated by test_bpf, from Artem Savkov.

9) Fix BPF man page build script to use silent mode, from Hangbin Liu.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  riscv, bpf: Fix kfunc parameters incompatibility between bpf and riscv abi
  bpf: verifier: reject addr_space_cast insn without arena
  selftests/bpf: verifier_arena: fix mmap address for arm64
  bpf: verifier: fix addr_space_cast from as(1) to as(0)
  libbpf: Define MFD_CLOEXEC if not available
  arm64: bpf: fix 32bit unconditional bswap
  bpf, arm64: fix bug in BPF_LDX_MEMSX
  libbpf: fix u64-to-pointer cast on 32-bit arches
  s390/bpf: Fix bpf_plt pointer arithmetic
  xsk: Don't assume metadata is always requested in TX completion
  selftests/bpf: Add arena test case for 4Gbyte corner case
  selftests/bpf: Remove hard coded PAGE_SIZE macro.
  libbpf, selftests/bpf: Adjust libbpf, bpftool, selftests to match LLVM
  bpf: Clarify bpf_arena comments.
  MAINTAINERS: Update email address for Quentin Monnet
  scripts/bpf_doc: Use silent mode when exec make cmd
  bpf: Temporarily disable atomic operations in BPF arena
====================

Link: https://lore.kernel.org/r/20240325213520.26688-1-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-26 12:55:18 +01:00
Ido Schimmel f1425529c3 selftests: vxlan_mdb: Fix failures with old libnet
Locally generated IP multicast packets (such as the ones used in the
test) do not perform routing and simply egress the bound device.

However, as explained in commit 8bcfb4ae4d ("selftests: forwarding:
Fix failing tests with old libnet"), old versions of libnet (used by
mausezahn) do not use the "SO_BINDTODEVICE" socket option. Specifically,
the library started using the option for IPv6 sockets in version 1.1.6
and for IPv4 sockets in version 1.2. This explains why on Ubuntu - which
uses version 1.1.6 - the IPv4 overlay tests are failing whereas the IPv6
ones are passing.

Fix by specifying the source and destination MAC of the packets which
will cause mausezahn to use a packet socket instead of an IP socket.

Fixes: 62199e3f16 ("selftests: net: Add VXLAN MDB test")
Reported-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Closes: https://lore.kernel.org/netdev/5bb50349-196d-4892-8ed2-f37543aa863f@alu.unizg.hr/
Tested-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20240325075030.2379513-1-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-26 12:29:25 +01:00
Sergey Shtylyov 8c05813df2 MAINTAINERS: split Renesas Ethernet drivers entry
Since the Renesas Ethernet Switch driver was added by Yoshihiro Shimoda,
I started receiving the patches to review for it -- which I was unable to
do, as I don't know this hardware and don't even have the manuals for it.
Fortunately, Shimoda-san has volunteered to be a reviewer for this new
driver, thus let's now split the single entry into 3 per-driver entries,
each with its own reviewer...

Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Link: https://lore.kernel.org/r/de0ccc1d-6fc0-583f-4f80-f70e6461d62d@omp.ru
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-26 11:46:34 +01:00
Arınç ÜNAL 5f563c31ff net: dsa: mt7530: fix improper frames on all 25MHz and 40MHz XTAL MT7530
The MT7530 switch after reset initialises with a core clock frequency that
works with a 25MHz XTAL connected to it. For 40MHz XTAL, the core clock
frequency must be set to 500MHz.

The mt7530_pll_setup() function is responsible of setting the core clock
frequency. Currently, it runs on MT7530 with 25MHz and 40MHz XTAL. This
causes MT7530 switch with 25MHz XTAL to egress and ingress frames
improperly.

Introduce a check to run it only on MT7530 with 40MHz XTAL.

The core clock frequency is set by writing to a switch PHY's register.
Access to the PHY's register is done via the MDIO bus the switch is also
on. Therefore, it works only when the switch makes switch PHYs listen on
the MDIO bus the switch is on. This is controlled either by the state of
the ESW_P1_LED_1 pin after reset deassertion or modifying bit 5 of the
modifiable trap register.

When ESW_P1_LED_1 is pulled high, PHY indirect access is used. That means
accessing PHY registers via the PHY indirect access control register of the
switch.

When ESW_P1_LED_1 is pulled low, PHY direct access is used. That means
accessing PHY registers via the MDIO bus the switch is on.

For MT7530 switch with 40MHz XTAL on a board with ESW_P1_LED_1 pulled high,
the core clock frequency won't be set to 500MHz, causing the switch to
egress and ingress frames improperly.

Run mt7530_pll_setup() after PHY direct access is set on the modifiable
trap register.

With these two changes, all MT7530 switches with 25MHz and 40MHz, and
P1_LED_1 pulled high or low, will egress and ingress frames properly.

Link: 4a5dd143f2/linux-mt/drivers/net/ethernet/mediatek/gsw_mt7623.c (L1039)
Fixes: b8f126a8d5 ("net-next: dsa: add dsa support for Mediatek MT7530 switch")
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Link: https://lore.kernel.org/r/20240320-for-net-mt7530-fix-25mhz-xtal-with-direct-phy-access-v1-1-d92f605f1160@arinc9.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-26 09:17:33 +01:00
Bjørn Mork 7d5a7dd5a3 net: wwan: t7xx: Split 64bit accesses to fix alignment issues
Some of the registers are aligned on a 32bit boundary, causing
alignment faults on 64bit platforms.

 Unable to handle kernel paging request at virtual address ffffffc084a1d004
 Mem abort info:
 ESR = 0x0000000096000061
 EC = 0x25: DABT (current EL), IL = 32 bits
 SET = 0, FnV = 0
 EA = 0, S1PTW = 0
 FSC = 0x21: alignment fault
 Data abort info:
 ISV = 0, ISS = 0x00000061, ISS2 = 0x00000000
 CM = 0, WnR = 1, TnD = 0, TagAccess = 0
 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
 swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000046ad6000
 [ffffffc084a1d004] pgd=100000013ffff003, p4d=100000013ffff003, pud=100000013ffff003, pmd=0068000020a00711
 Internal error: Oops: 0000000096000061 [#1] SMP
 Modules linked in: mtk_t7xx(+) qcserial pppoe ppp_async option nft_fib_inet nf_flow_table_inet mt7921u(O) mt7921s(O) mt7921e(O) mt7921_common(O) iwlmvm(O) iwldvm(O) usb_wwan rndis_host qmi_wwan pppox ppp_generic nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_redir nft_quota nft_numgen nft_nat nft_masq nft_log nft_limit nft_hash nft_flow_offload nft_fib_ipv6 nft_fib_ipv4 nft_fib nft_ct nft_chain_nat nf_tables nf_nat nf_flow_table nf_conntrack mt7996e(O) mt792x_usb(O) mt792x_lib(O) mt7915e(O) mt76_usb(O) mt76_sdio(O) mt76_connac_lib(O) mt76(O) mac80211(O) iwlwifi(O) huawei_cdc_ncm cfg80211(O) cdc_ncm cdc_ether wwan usbserial usbnet slhc sfp rtc_pcf8563 nfnetlink nf_reject_ipv6 nf_reject_ipv4 nf_log_syslog nf_defrag_ipv6 nf_defrag_ipv4 mt6577_auxadc mdio_i2c libcrc32c compat(O) cdc_wdm cdc_acm at24 crypto_safexcel pwm_fan i2c_gpio i2c_smbus industrialio i2c_algo_bit i2c_mux_reg i2c_mux_pca954x i2c_mux_pca9541 i2c_mux_gpio i2c_mux dummy oid_registry tun sha512_arm64 sha1_ce sha1_generic seqiv
 md5 geniv des_generic libdes cbc authencesn authenc leds_gpio xhci_plat_hcd xhci_pci xhci_mtk_hcd xhci_hcd nvme nvme_core gpio_button_hotplug(O) dm_mirror dm_region_hash dm_log dm_crypt dm_mod dax usbcore usb_common ptp aquantia pps_core mii tpm encrypted_keys trusted
 CPU: 3 PID: 5266 Comm: kworker/u9:1 Tainted: G O 6.6.22 #0
 Hardware name: Bananapi BPI-R4 (DT)
 Workqueue: md_hk_wq t7xx_fsm_uninit [mtk_t7xx]
 pstate: 804000c5 (Nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : t7xx_cldma_hw_set_start_addr+0x1c/0x3c [mtk_t7xx]
 lr : t7xx_cldma_start+0xac/0x13c [mtk_t7xx]
 sp : ffffffc085d63d30
 x29: ffffffc085d63d30 x28: 0000000000000000 x27: 0000000000000000
 x26: 0000000000000000 x25: ffffff80c804f2c0 x24: ffffff80ca196c05
 x23: 0000000000000000 x22: ffffff80c814b9b8 x21: ffffff80c814b128
 x20: 0000000000000001 x19: ffffff80c814b080 x18: 0000000000000014
 x17: 0000000055c9806b x16: 000000007c5296d0 x15: 000000000f6bca68
 x14: 00000000dbdbdce4 x13: 000000001aeaf72a x12: 0000000000000001
 x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
 x8 : ffffff80ca1ef6b4 x7 : ffffff80c814b818 x6 : 0000000000000018
 x5 : 0000000000000870 x4 : 0000000000000000 x3 : 0000000000000000
 x2 : 000000010a947000 x1 : ffffffc084a1d004 x0 : ffffffc084a1d004
 Call trace:
 t7xx_cldma_hw_set_start_addr+0x1c/0x3c [mtk_t7xx]
 t7xx_fsm_uninit+0x578/0x5ec [mtk_t7xx]
 process_one_work+0x154/0x2a0
 worker_thread+0x2ac/0x488
 kthread+0xe0/0xec
 ret_from_fork+0x10/0x20
 Code: f9400800 91001000 8b214001 d50332bf (f9000022)
 ---[ end trace 0000000000000000 ]---

The inclusion of io-64-nonatomic-lo-hi.h indicates that all 64bit
accesses can be replaced by pairs of nonatomic 32bit access.  Fix
alignment by forcing all accesses to be 32bit on 64bit platforms.

Link: https://forum.openwrt.org/t/fibocom-fm350-gl-support/142682/72
Fixes: 39d439047f ("net: wwan: t7xx: Add control DMA interface")
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Tested-by: Liviu Dudau <liviu@dudau.co.uk>
Link: https://lore.kernel.org/r/20240322144000.1683822-1-bjorn@mork.no
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-25 19:51:57 -07:00
Eric Dumazet 151c9c724d tcp: properly terminate timers for kernel sockets
We had various syzbot reports about tcp timers firing after
the corresponding netns has been dismantled.

Fortunately Josef Bacik could trigger the issue more often,
and could test a patch I wrote two years ago.

When TCP sockets are closed, we call inet_csk_clear_xmit_timers()
to 'stop' the timers.

inet_csk_clear_xmit_timers() can be called from any context,
including when socket lock is held.
This is the reason it uses sk_stop_timer(), aka del_timer().
This means that ongoing timers might finish much later.

For user sockets, this is fine because each running timer
holds a reference on the socket, and the user socket holds
a reference on the netns.

For kernel sockets, we risk that the netns is freed before
timer can complete, because kernel sockets do not hold
reference on the netns.

This patch adds inet_csk_clear_xmit_timers_sync() function
that using sk_stop_timer_sync() to make sure all timers
are terminated before the kernel socket is released.
Modules using kernel sockets close them in their netns exit()
handler.

Also add sock_not_owned_by_me() helper to get LOCKDEP
support : inet_csk_clear_xmit_timers_sync() must not be called
while socket lock is held.

It is very possible we can revert in the future commit
3a58f13a88 ("net: rds: acquire refcount on TCP sockets")
which attempted to solve the issue in rds only.
(net/smc/af_smc.c and net/mptcp/subflow.c have similar code)

We probably can remove the check_net() tests from
tcp_out_of_resources() and __tcp_close() in the future.

Reported-by: Josef Bacik <josef@toxicpanda.com>
Closes: https://lore.kernel.org/netdev/20240314210740.GA2823176@perftesting/
Fixes: 26abe14379 ("net: Modify sk_alloc to not reference count the netns of kernel sockets.")
Fixes: 8a68173691 ("net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket")
Link: https://lore.kernel.org/bpf/CANn89i+484ffqb93aQm1N-tjxxvb3WDKX0EbD7318RwRgsatjw@mail.gmail.com/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Josef Bacik <josef@toxicpanda.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Link: https://lore.kernel.org/r/20240322135732.1535772-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-25 19:51:57 -07:00
Ravi Gunasekaran b11c81731c net: hsr: hsr_slave: Fix the promiscuous mode in offload mode
commit e748d0fd66 ("net: hsr: Disable promiscuous mode in
offload mode") disables promiscuous mode of slave devices
while creating an HSR interface. But while deleting the
HSR interface, it does not take care of it. It decreases the
promiscuous mode count, which eventually enables promiscuous
mode on the slave devices when creating HSR interface again.

Fix this by not decrementing the promiscuous mode count while
deleting the HSR interface when offload is enabled.

Fixes: e748d0fd66 ("net: hsr: Disable promiscuous mode in offload mode")
Signed-off-by: Ravi Gunasekaran <r-gunasekaran@ti.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240322100447.27615-1-r-gunasekaran@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-25 19:51:56 -07:00
Claus Hansen Ries 3a38a829c8 net: ll_temac: platform_get_resource replaced by wrong function
The function platform_get_resource was replaced with
devm_platform_ioremap_resource_byname and is called using 0 as name.

This eventually ends up in platform_get_resource_byname in the call
stack, where it causes a null pointer in strcmp.

	if (type == resource_type(r) && !strcmp(r->name, name))

It should have been replaced with devm_platform_ioremap_resource.

Fixes: bd69058f50 ("net: ll_temac: Use devm_platform_ioremap_resource_byname()")
Signed-off-by: Claus Hansen Ries <chr@terma.com>
Cc: stable@vger.kernel.org
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/cca18f9c630a41c18487729770b492bb@terma.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-25 19:46:56 -07:00
Alexandra Winter afb373ff3f s390/qeth: handle deferred cc1
The IO subsystem expects a driver to retry a ccw_device_start, when the
subsequent interrupt response block (irb) contains a deferred
condition code 1.

Symptoms before this commit:
On the read channel we always trigger the next read anyhow, so no
different behaviour here.
On the write channel we may experience timeout errors, because the
expected reply will never be received without the retry.
Other callers of qeth_send_control_data() may wrongly assume that the ccw
was successful, which may cause problems later.

Note that since
commit 2297791c92 ("s390/cio: dont unregister subchannel from child-drivers")
and
commit 5ef1dc40ff ("s390/cio: fix invalid -EBUSY on ccw_device_start")
deferred CC1s are much more likely to occur. See the commit message of the
latter for more background information.

Fixes: 2297791c92 ("s390/cio: dont unregister subchannel from child-drivers")
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Co-developed-by: Thorsten Winkler <twinkler@linux.ibm.com>
Signed-off-by: Thorsten Winkler <twinkler@linux.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Link: https://lore.kernel.org/r/20240321115337.3564694-1-wintera@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-25 19:46:49 -07:00
Prasad Pandit cc26992681 dpll: indent DPLL option type by a tab
Indent config option type by a tab. It helps Kconfig parsers
to read file without error.

Fixes: 9431063ad3 ("dpll: core: Add DPLL framework base functions")
Signed-off-by: Prasad Pandit <pjp@fedoraproject.org>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240322114819.1801795-1-ppandit@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-25 19:40:23 -07:00
Pu Lehui 443574b033 riscv, bpf: Fix kfunc parameters incompatibility between bpf and riscv abi
We encountered a failing case when running selftest in no_alu32 mode:

The failure case is `kfunc_call/kfunc_call_test4` and its source code is
like bellow:
```
long bpf_kfunc_call_test4(signed char a, short b, int c, long d) __ksym;
int kfunc_call_test4(struct __sk_buff *skb)
{
	...
	tmp = bpf_kfunc_call_test4(-3, -30, -200, -1000);
	...
}
```

And its corresponding asm code is:
```
0: r1 = -3
1: r2 = -30
2: r3 = 0xffffff38 # opcode: 18 03 00 00 38 ff ff ff 00 00 00 00 00 00 00 00
4: r4 = -1000
5: call bpf_kfunc_call_test4
```

insn 2 is parsed to ld_imm64 insn to emit 0x00000000ffffff38 imm, and
converted to int type and then send to bpf_kfunc_call_test4. But since
it is zero-extended in the bpf calling convention, riscv jit will
directly treat it as an unsigned 32-bit int value, and then fails with
the message "actual 4294966063 != expected -1234".

The reason is the incompatibility between bpf and riscv abi, that is,
bpf will do zero-extension on uint, but riscv64 requires sign-extension
on int or uint. We can solve this problem by sign extending the 32-bit
parameters in kfunc.

The issue is related to [0], and thanks to Yonghong and Alexei.

Link: https://github.com/llvm/llvm-project/pull/84874 [0]
Fixes: d40c3847b4 ("riscv, bpf: Add kfunc support for RV64")
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Tested-by: Puranjay Mohan <puranjay12@gmail.com>
Reviewed-by: Puranjay Mohan <puranjay12@gmail.com>
Link: https://lore.kernel.org/r/20240324103306.2202954-1-pulehui@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-25 11:39:31 -07:00
Linus Torvalds 928a87efa4 gfs2 fix
- Fix boundary check in punch_hole
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmYBaM0UHGFncnVlbmJh
 QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTrzqw/9GpK71h1dIA8vYqInumdrUabksLKy
 jRMR2ZxfzBKLdAfgn9AS3nrWNos72vjAxbjYCi/fbY9uvIK1/zzq7Ef7601kCetM
 NzxShY8AwLJa9mO8O5yReLL7O/61gjlcdD6rSjkYwphWuobd5vpudKkibgpdJyH8
 bn6U1/2K5ASFtWyTRbudOIsz4AqPUE6ZB4KxSuCDx7uFiQjnuh6sk8wfg48pdig7
 GAsNPmBFfWAQXClPnI/WFG0hpkuRIK1hk9ITWx1ybu2JqaNeVXRBqGoRZbEkPYju
 qEkp4oT3j/1siBz1sMOjC5tfmAzhLvAeL61pD2EOcm5Bpd3iKJibYt/uCIpYFHM0
 WfRcUmqEduN1zhDuSR4KSe49JQ5dFXVf83YqUgbtrHFiHHXNBYYqFNUVfcDAB1p7
 IH9AlNd82zyxJ3fsBX7VpEbGC2qNa3K8hYO7px8DNVrPGzW7AhPF1Lsh0OE9GlZU
 H5f70Nryi98iwadbePBUchTrx0S3iYjk2TQgLGf5L/lAl6J/MRNG31kittDtehri
 cct/JBr8sUAK014TS5NxPbpxqDnVot3UsYk7h6s7WdmM1svfs7j5f1mo3ovMEGqX
 io5Z6pFEE7n1ce5hbieDKr3JFh6LxP1ArUSY8oz5rR0shE2XHMcIdq3J26Vfi0Q0
 4VjdBic/7rUUBXI=
 =QXEK
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-v6.8-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 fix from Andreas Gruenbacher:

 - Fix boundary check in punch_hole

* tag 'gfs2-v6.8-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Fix invalid metadata access in punch_hole
2024-03-25 10:53:39 -07:00
Linus Torvalds 174fdc93a2 This push fixes a regression that broke iwd as well as a divide by
zero in iaa.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmX9buQACgkQxycdCkmx
 i6dw1w/+IiwZX9A62NLDEBJbaXavQEHqoI1mhXq4Zwqeor5DhUlUGbmAhdZ5bg88
 oA60zjroLgt+Wke3phXFMjavSdwyLMd8zkDU3BjhqHAD6ASZz4ml51MCi48ROKB8
 lKSSAvjJX3VAHlYSxctUB/KWfzZZaGGUWQdtkC/90Tqp6OGeVyBbQCiQPyF4I55P
 e1B8ADY5Ey2d+Rgos1whMuLkKa077yoWdiDgX5PFjfGalRdNh4jNsojrCSx6AEiI
 KijWUtaGBNj9Exc04NeCa0JQNff4vynhn21ygMbgPMEMTde0SlHHdf6FWKrZcm6h
 JlNYYVGXjcobMC62dQdTosTLxuAqSd4kr5UaiCjO52QdM8txfBJCm/PDARS2z9Gl
 xROvBpOfRSn0Z8GpuHaBMNot4DlKv6y/puwmef6o3qYlUJH+CnuHUFaLOHRYVt6d
 B0tWi7PWyx2jorj0VHUCpqiGGYlbUq6lWhGYt8XzPHeQ2ZewzH6EbbF5qKpEWGiB
 3X8Dxl1PpO8sSwOQRabBpoWoXxJLL6G9uvyJCjJeftL9ZDEeTBDvdA9qis7t7kR5
 Ckv4q/5rOq4MK3NsXsL6lJZY7ckQavLhCoZlRU9kcQt3MG06i6KUDoo83L/49rUL
 zq5adU8sI3lP/2VplfA0XEeYFuLlBWH4TwNAmwJBRoWFiG8j8XM=
 =Mau0
 -----END PGP SIGNATURE-----

Merge tag 'v6.9-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:
 "This fixes a regression that broke iwd as well as a divide by zero in
  iaa"

* tag 'v6.9-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: iaa - Fix nr_cpus < nr_iaa case
  Revert "crypto: pkcs7 - remove sha1 support"
2024-03-25 10:48:23 -07:00
Kurt Kanzenbach 47ce2956c7 igc: Remove stale comment about Tx timestamping
The initial igc Tx timestamping implementation used only one register for
retrieving Tx timestamps. Commit 3ed247e789 ("igc: Add support for
multiple in-flight TX timestamps") added support for utilizing all four of
them e.g., for multiple domain support. Remove the stale comment/FIXME.

Fixes: 3ed247e789 ("igc: Add support for multiple in-flight TX timestamps")
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-03-25 09:57:21 -07:00
Przemek Kitszel aec806fb4a ixgbe: avoid sleeping allocation in ixgbe_ipsec_vf_add_sa()
Change kzalloc() flags used in ixgbe_ipsec_vf_add_sa() to GFP_ATOMIC, to
avoid sleeping in IRQ context.

Dan Carpenter, with the help of Smatch, has found following issue:
The patch eda0333ac293: "ixgbe: add VF IPsec management" from Aug 13,
2018 (linux-next), leads to the following Smatch static checker
warning: drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c:917 ixgbe_ipsec_vf_add_sa()
	warn: sleeping in IRQ context

The call tree that Smatch is worried about is:
ixgbe_msix_other() <- IRQ handler
-> ixgbe_msg_task()
   -> ixgbe_rcv_msg_from_vf()
      -> ixgbe_ipsec_vf_add_sa()

Fixes: eda0333ac2 ("ixgbe: add VF IPsec management")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/intel-wired-lan/db31a0b0-4d9f-4e6b-aed8-88266eb5665c@moroto.mountain
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-03-25 09:57:21 -07:00
Jesse Brandeburg 1cb7fdb1df ice: fix memory corruption bug with suspend and rebuild
The ice driver would previously panic after suspend. This is caused
from the driver *only* calling the ice_vsi_free_q_vectors() function by
itself, when it is suspending. Since commit b3e7b3a6ee ("ice: prevent
NULL pointer deref during reload") the driver has zeroed out
num_q_vectors, and only restored it in ice_vsi_cfg_def().

This further causes the ice_rebuild() function to allocate a zero length
buffer, after which num_q_vectors is updated, and then the new value of
num_q_vectors is used to index into the zero length buffer, which
corrupts memory.

The fix entails making sure all the code referencing num_q_vectors only
does so after it has been reset via ice_vsi_cfg_def().

I didn't perform a full bisect, but I was able to test against 6.1.77
kernel and that ice driver works fine for suspend/resume with no panic,
so sometime since then, this problem was introduced.

Also clean up an un-needed init of a local variable in the function
being modified.

PANIC from 6.8.0-rc1:

[1026674.915596] PM: suspend exit
[1026675.664697] ice 0000:17:00.1: PTP reset successful
[1026675.664707] ice 0000:17:00.1: 2755 msecs passed between update to cached PHC time
[1026675.667660] ice 0000:b1:00.0: PTP reset successful
[1026675.675944] ice 0000:b1:00.0: 2832 msecs passed between update to cached PHC time
[1026677.137733] ixgbe 0000:31:00.0 ens787: NIC Link is Up 1 Gbps, Flow Control: None
[1026677.190201] BUG: kernel NULL pointer dereference, address: 0000000000000010
[1026677.192753] ice 0000:17:00.0: PTP reset successful
[1026677.192764] ice 0000:17:00.0: 4548 msecs passed between update to cached PHC time
[1026677.197928] #PF: supervisor read access in kernel mode
[1026677.197933] #PF: error_code(0x0000) - not-present page
[1026677.197937] PGD 1557a7067 P4D 0
[1026677.212133] ice 0000:b1:00.1: PTP reset successful
[1026677.212143] ice 0000:b1:00.1: 4344 msecs passed between update to cached PHC time
[1026677.212575]
[1026677.243142] Oops: 0000 [#1] PREEMPT SMP NOPTI
[1026677.247918] CPU: 23 PID: 42790 Comm: kworker/23:0 Kdump: loaded Tainted: G        W          6.8.0-rc1+ #1
[1026677.257989] Hardware name: Intel Corporation M50CYP2SBSTD/M50CYP2SBSTD, BIOS SE5C620.86B.01.01.0005.2202160810 02/16/2022
[1026677.269367] Workqueue: ice ice_service_task [ice]
[1026677.274592] RIP: 0010:ice_vsi_rebuild_set_coalesce+0x130/0x1e0 [ice]
[1026677.281421] Code: 0f 84 3a ff ff ff 41 0f b7 74 ec 02 66 89 b0 22 02 00 00 81 e6 ff 1f 00 00 e8 ec fd ff ff e9 35 ff ff ff 48 8b 43 30 49 63 ed <41> 0f b7 34 24 41 83 c5 01 48 8b 3c e8 66 89 b7 aa 02 00 00 81 e6
[1026677.300877] RSP: 0018:ff3be62a6399bcc0 EFLAGS: 00010202
[1026677.306556] RAX: ff28691e28980828 RBX: ff28691e41099828 RCX: 0000000000188000
[1026677.314148] RDX: 0000000000000000 RSI: 0000000000000010 RDI: ff28691e41099828
[1026677.321730] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[1026677.329311] R10: 0000000000000007 R11: ffffffffffffffc0 R12: 0000000000000010
[1026677.336896] R13: 0000000000000000 R14: 0000000000000000 R15: ff28691e0eaa81a0
[1026677.344472] FS:  0000000000000000(0000) GS:ff28693cbffc0000(0000) knlGS:0000000000000000
[1026677.353000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1026677.359195] CR2: 0000000000000010 CR3: 0000000128df4001 CR4: 0000000000771ef0
[1026677.366779] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[1026677.374369] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[1026677.381952] PKRU: 55555554
[1026677.385116] Call Trace:
[1026677.388023]  <TASK>
[1026677.390589]  ? __die+0x20/0x70
[1026677.394105]  ? page_fault_oops+0x82/0x160
[1026677.398576]  ? do_user_addr_fault+0x65/0x6a0
[1026677.403307]  ? exc_page_fault+0x6a/0x150
[1026677.407694]  ? asm_exc_page_fault+0x22/0x30
[1026677.412349]  ? ice_vsi_rebuild_set_coalesce+0x130/0x1e0 [ice]
[1026677.418614]  ice_vsi_rebuild+0x34b/0x3c0 [ice]
[1026677.423583]  ice_vsi_rebuild_by_type+0x76/0x180 [ice]
[1026677.429147]  ice_rebuild+0x18b/0x520 [ice]
[1026677.433746]  ? delay_tsc+0x8f/0xc0
[1026677.437630]  ice_do_reset+0xa3/0x190 [ice]
[1026677.442231]  ice_service_task+0x26/0x440 [ice]
[1026677.447180]  process_one_work+0x174/0x340
[1026677.451669]  worker_thread+0x27e/0x390
[1026677.455890]  ? __pfx_worker_thread+0x10/0x10
[1026677.460627]  kthread+0xee/0x120
[1026677.464235]  ? __pfx_kthread+0x10/0x10
[1026677.468445]  ret_from_fork+0x2d/0x50
[1026677.472476]  ? __pfx_kthread+0x10/0x10
[1026677.476671]  ret_from_fork_asm+0x1b/0x30
[1026677.481050]  </TASK>

Fixes: b3e7b3a6ee ("ice: prevent NULL pointer deref during reload")
Reported-by: Robert Elliott <elliott@hpe.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-03-25 09:57:21 -07:00
Steven Zou 817b18965b ice: Refactor FW data type and fix bitmap casting issue
According to the datasheet, the recipe association data is an 8-byte
little-endian value. It is described as 'Bitmap of the recipe indexes
associated with this profile', it is from 24 to 31 byte area in FW.
Therefore, it is defined to '__le64 recipe_assoc' in struct
ice_aqc_recipe_to_profile. And then fix the bitmap casting issue, as we
must never ever use castings for bitmap type.

Fixes: 1e0f9881ef ("ice: Flesh out implementation of support for SRIOV on bonded interface")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Andrii Staikov <andrii.staikov@intel.com>
Reviewed-by: Jan Sokolowski <jan.sokolowski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Steven Zou <steven.zou@intel.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-03-25 09:57:03 -07:00
Johannes Berg dbde9fd49a kunit: fix wireless test dependencies
For the wireless tests, CONFIG_WLAN and CONFIG_NETDEVICES are
needed, though seem to be available by default on ARCH=um, so
we didn't notice this before. Add them to fix kunit running
on other architectures.

Fixes: 28b3df1fe6 ("kunit: add wireless unit tests")
Reported-by: Mark Brown <broonie@kernel.org>
Closes: https://lore.kernel.org/r/b743a5ec-3d07-4747-85e0-2fb2ef69db7c@sirena.org.uk/
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25 17:45:18 +01:00
linke li c2deb2e971 net: mark racy access on sk->sk_rcvbuf
sk->sk_rcvbuf in __sock_queue_rcv_skb() and __sk_receive_skb() can be
changed by other threads. Mark this as benign using READ_ONCE().

This patch is aimed at reducing the number of benign races reported by
KCSAN in order to focus future debugging effort on harmful races.

Signed-off-by: linke li <lilinke99@qq.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-25 14:46:59 +00:00
Benjamin Berg e78d787730 wifi: iwlwifi: mvm: include link ID when releasing frames
When releasing frames from the reorder buffer, the link ID was not
included in the RX status information. This subsequently led mac80211 to
drop the frame. Change it so that the link information is set
immediately when possible so that it doesn't not need to be filled in
anymore when submitting the frame to mac80211.

Fixes: b8a85a1d42 ("wifi: iwlwifi: mvm: rxmq: report link ID to mac80211")
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Tested-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240320232419.bbbd5e9bfe80.Iec1bf5c884e371f7bc5ea2534ed9ea8d3f2c0bf6@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25 15:23:26 +01:00
Johannes Berg 19d82bdeda wifi: iwlwifi: mvm: handle debugfs names more carefully
With debugfs=off, we can get here with the dbgfs_dir being
an ERR_PTR(). Instead of checking for all this, which is
often flagged as a mistake, simply handle the names here
more carefully by printing them, then we don't need extra
checks.

Also, while checking, I noticed theoretically 'buf' is too
small, so fix that size as well.

Cc: stable@vger.kernel.org
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218422
Fixes: c36235acb3 ("wifi: iwlwifi: mvm: rework debugfs handling")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240320232419.4dc1eb3dd015.I32f308b0356ef5bcf8d188dd98ce9b210e3ab9fd@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25 15:23:07 +01:00
Benjamin Berg 17f64517bf wifi: iwlwifi: mvm: guard against invalid STA ID on removal
Guard against invalid station IDs in iwl_mvm_mld_rm_sta_id as that would
result in out-of-bounds array accesses. This prevents issues should the
driver get into a bad state during error handling.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240320232419.d523167bda9c.I1cffd86363805bf86a95d8bdfd4b438bb54baddc@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25 15:23:07 +01:00
Johannes Berg c2ace63006 wifi: iwlwifi: read txq->read_ptr under lock
If we read txq->read_ptr without lock, we can read the same
value twice, then obtain the lock, and reclaim from there
to two different places, but crucially reclaim the same
entry twice, resulting in the WARN_ONCE() a little later.
Fix that by reading txq->read_ptr under lock.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240319100755.bf4c62196504.I978a7ca56c6bd6f1bf42c15aa923ba03366a840b@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25 15:23:07 +01:00
Johannes Berg 045a5b645d wifi: iwlwifi: fw: don't always use FW dump trig
Since the dump_data (struct iwl_fwrt_dump_data) is a union,
it's not safe to unconditionally access and use the 'trig'
member, it might be 'desc' instead. Access it only if it's
known to be 'trig' rather than 'desc', i.e. if ini-debug
is present.

Cc: stable@vger.kernel.org
Fixes: 0eb50c674a ("iwlwifi: yoyo: send hcmd to fw after dump collection completes.")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240319100755.e2976bc58b29.I72fbd6135b3623227de53d8a2bb82776066cb72b@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25 15:23:07 +01:00