commit 9fd6dad126 upstream.
Currently, the follow_pfn function is exported for modules but
follow_pte is not. However, follow_pfn is very easy to misuse,
because it does not provide protections (so most of its callers
assume the page is writable!) and because it returns after having
already unlocked the page table lock.
Provide instead a simplified version of follow_pte that does
not have the pmdpp and range arguments. The older version
survives as follow_invalidate_pte() for use by fs/dax.c.
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bd2fae8da7 upstream.
In order to convert an HVA to a PFN, KVM usually tries to use
the get_user_pages family of functinso. This however is not
possible for VM_IO vmas; in that case, KVM instead uses follow_pfn.
In doing this however KVM loses the information on whether the
PFN is writable. That is usually not a problem because the main
use of VM_IO vmas with KVM is for BARs in PCI device assignment,
however it is a bug. To fix it, use follow_pte and check pte_write
while under the protection of the PTE lock. The information can
be used to fail hva_to_pfn_remapped or passed back to the
caller via *writable.
Usage of follow_pfn was introduced in commit add6a0cd1c ("KVM: MMU: try to fix
up page faults before giving up", 2016-07-05); however, even older version
have the same issue, all the way back to commit 2e2e3738af ("KVM:
Handle vma regions with no backing page", 2008-07-20), as they also did
not check whether the PFN was writable.
Fixes: 2e2e3738af ("KVM: Handle vma regions with no backing page")
Reported-by: David Stevens <stevensd@google.com>
Cc: 3pvd@google.com
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ff5c19ed4b upstream.
Merge __follow_pte_pmd, follow_pte_pmd and follow_pte into a single
follow_pte function and just pass two additional NULL arguments for the
two previous follow_pte callers.
[sfr@canb.auug.org.au: merge fix for "s390/pci: remove races against pte updates"]
Link: https://lkml.kernel.org/r/20201111221254.7f6a3658@canb.auug.org.au
Link: https://lkml.kernel.org/r/20201029101432.47011-3-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7336375734 upstream.
Patch series "simplify follow_pte a bit".
This small series drops the not needed follow_pte_pmd exports, and
simplifies the follow_pte family of functions a bit.
This patch (of 2):
follow_pte_pmd() is only used by the DAX code, which can't be modular.
Link: https://lkml.kernel.org/r/20201029101432.47011-2-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8fc517267f upstream.
Walk the list of MMU pages in reverse in kvm_mmu_zap_oldest_mmu_pages().
The list is FIFO, meaning new pages are inserted at the head and thus
the oldest pages are at the tail. Using a "forward" iterator causes KVM
to zap MMU pages that were just added, which obliterates guest
performance once the max number of shadow MMU pages is reached.
Fixes: 6b82ef2c9c ("KVM: x86/mmu: Batch zap MMU pages when recycling oldest pages")
Reported-by: Zdenek Kaspar <zkaspar82@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210113205030.3481307-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1e0ca54674 upstream.
HDA initialization is failing occasionally on Tegra210 and following
print is observed in the boot log. Because of this probe() fails and
no sound card is registered.
[16.800802] tegra-hda 70030000.hda: no codecs found!
Codecs request a state change and enumeration by the controller. In
failure cases this does not seem to happen as STATETS register reads 0.
The problem seems to be related to the HDA codec dependency on SOR
power domain. If it is gated during HDA probe then the failure is
observed. Building Tegra HDA driver into kernel image avoids this
failure but does not completely address the dependency part. Fix this
problem by adding 'power-domains' DT property for Tegra210 HDA. Note
that Tegra186 and Tegra194 HDA do this already.
Fixes: 742af7e7a0 ("arm64: tegra: Add Tegra210 support")
Depends-on: 96d1f078ff ("arm64: tegra: Add SOR power-domain for Tegra210")
Cc: <stable@vger.kernel.org>
Signed-off-by: Sameer Pujar <spujar@nvidia.com>
Acked-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 234f414efd upstream.
This issue starts from linux-5.10-rc1, I reproduced this issue on my
Dell Inspiron 7447 with BT adapter 0cf3:e005, the kernel will print
out: "Bluetooth: hci0: don't support firmware rome 0x31010000", and
someone else also reported the similar issue to bugzilla #211571.
I found this is a regression introduced by 'commit b40f58b973
("Bluetooth: btusb: Add Qualcomm Bluetooth SoC WCN6855 support"), the
patch assumed that if high ROM version is not zero, it is an adapter
on WCN6855, but many old adapters don't need to load rampatch or nvm,
and they have non-zero high ROM version.
To fix it, let the driver match the rom_version in the
qca_devices_table first, if there is no entry matched, check the
high ROM version, if it is not zero, we assume this adapter is ready
to work and no need to load rampatch and nvm like previously.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=211571
Fixes: b40f58b973 ("Bluetooth: btusb: Add Qualcomm Bluetooth SoC WCN6855 support")
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: Salvatore Bonaccorso <carnil@debian.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ccd1acdf1c upstream.
While the MDS cluster is unstable and changing state the client may get
mdsmap updates that will trigger warnings:
[144692.478400] ceph: mdsmap_decode got incorrect state(up:standby-replay)
[144697.489552] ceph: mdsmap_decode got incorrect state(up:standby-replay)
[144697.489580] ceph: mdsmap_decode got incorrect state(up:standby-replay)
This patch downgrades these warnings to debug, as they may flood the logs
if the cluster is unstable for a while.
Signed-off-by: Luis Henriques <lhenriques@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1ebe718bb4 upstream.
Without this quirk starting a video capture from the device often fails with
kernel: uvcvideo: Failed to set UVC probe control : -110 (exp. 34).
Signed-off-by: Stefan Ursella <stefan.ursella@wolfvision.net>
Link: https://lore.kernel.org/r/20210210140713.18711-1-stefan.ursella@wolfvision.net
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 22dd4c7076 upstream.
->dma_device is a private implementation detail of the RDMA core. Use the
ibdev_to_node helper to get the NUMA node for a ib_device instead of
poking into ->dma_device.
Link: https://lore.kernel.org/r/20201106181941.1878556-5-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Cc: Krishnamraju Eraparaju <krishna2@chelsio.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8ecfca68dc upstream.
Lift the ibdev_to_node from rds to common code and document it.
Link: https://lore.kernel.org/r/20201106181941.1878556-4-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Cc: Krishnamraju Eraparaju <krishna2@chelsio.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ed9be64eef upstream.
The HID subsystem allows an "HID report field" to have a different
number of "values" and "usages" when it is allocated. When a field
struct is created, the size of the usage array is guaranteed to be at
least as large as the values array, but it may be larger. This leads to
a potential out-of-bounds write in
__hidinput_change_resolution_multipliers() and an out-of-bounds read in
hidinput_count_leds().
To fix this, let's make sure that both the usage and value arrays are
the same size.
Cc: stable@vger.kernel.org
Signed-off-by: Will McVicker <willmcvicker@google.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9b00f1b788 upstream.
Recently noticed that when mod32 with a known src reg of 0 is performed,
then the dst register is 32-bit truncated in verifier:
0: R1=ctx(id=0,off=0,imm=0) R10=fp0
0: (b7) r0 = 0
1: R0_w=inv0 R1=ctx(id=0,off=0,imm=0) R10=fp0
1: (b7) r1 = -1
2: R0_w=inv0 R1_w=inv-1 R10=fp0
2: (b4) w2 = -1
3: R0_w=inv0 R1_w=inv-1 R2_w=inv4294967295 R10=fp0
3: (9c) w1 %= w0
4: R0_w=inv0 R1_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R2_w=inv4294967295 R10=fp0
4: (b7) r0 = 1
5: R0_w=inv1 R1_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R2_w=inv4294967295 R10=fp0
5: (1d) if r1 == r2 goto pc+1
R0_w=inv1 R1_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R2_w=inv4294967295 R10=fp0
6: R0_w=inv1 R1_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R2_w=inv4294967295 R10=fp0
6: (b7) r0 = 2
7: R0_w=inv2 R1_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R2_w=inv4294967295 R10=fp0
7: (95) exit
7: R0=inv1 R1=inv(id=0,umin_value=4294967295,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R2=inv4294967295 R10=fp0
7: (95) exit
However, as a runtime result, we get 2 instead of 1, meaning the dst
register does not contain (u32)-1 in this case. The reason is fairly
straight forward given the 0 test leaves the dst register as-is:
# ./bpftool p d x i 23
0: (b7) r0 = 0
1: (b7) r1 = -1
2: (b4) w2 = -1
3: (16) if w0 == 0x0 goto pc+1
4: (9c) w1 %= w0
5: (b7) r0 = 1
6: (1d) if r1 == r2 goto pc+1
7: (b7) r0 = 2
8: (95) exit
This was originally not an issue given the dst register was marked as
completely unknown (aka 64 bit unknown). However, after 468f6eafa6
("bpf: fix 32-bit ALU op verification") the verifier casts the register
output to 32 bit, and hence it becomes 32 bit unknown. Note that for
the case where the src register is unknown, the dst register is marked
64 bit unknown. After the fix, the register is truncated by the runtime
and the test passes:
# ./bpftool p d x i 23
0: (b7) r0 = 0
1: (b7) r1 = -1
2: (b4) w2 = -1
3: (16) if w0 == 0x0 goto pc+2
4: (9c) w1 %= w0
5: (05) goto pc+1
6: (bc) w1 = w1
7: (b7) r0 = 1
8: (1d) if r1 == r2 goto pc+1
9: (b7) r0 = 2
10: (95) exit
Semantics also match with {R,W}x mod{64,32} 0 -> {R,W}x. Invalid div
has always been {R,W}x div{64,32} 0 -> 0. Rewrites are as follows:
mod32: mod64:
(16) if w0 == 0x0 goto pc+2 (15) if r0 == 0x0 goto pc+1
(9c) w1 %= w0 (9f) r1 %= r0
(05) goto pc+1
(bc) w1 = w1
Fixes: 468f6eafa6 ("bpf: fix 32-bit ALU op verification")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Whenever we attempt to do a non-aligned direct IO write with O_DSYNC, we
end up triggering an assertion and crashing. Example reproducer:
$ cat test.sh
#!/bin/bash
DEV=/dev/sdj
MNT=/mnt/sdj
mkfs.btrfs -f $DEV > /dev/null
mount $DEV $MNT
# Do a direct IO write with O_DSYNC into a non-aligned range...
xfs_io -f -d -s -c "pwrite -S 0xab -b 64K 1111 64K" $MNT/foobar
umount $MNT
When running the reproducer an assertion fails and produces the following
trace:
[ 2418.403134] assertion failed: !current->journal_info || flush != BTRFS_RESERVE_FLUSH_DATA, in fs/btrfs/space-info.c:1467
[ 2418.403745] ------------[ cut here ]------------
[ 2418.404306] kernel BUG at fs/btrfs/ctree.h:3286!
[ 2418.404862] invalid opcode: 0000 [#2] PREEMPT SMP DEBUG_PAGEALLOC PTI
[ 2418.405451] CPU: 1 PID: 64705 Comm: xfs_io Tainted: G D 5.10.15-btrfs-next-87 #1
[ 2418.406026] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 2418.407228] RIP: 0010:assertfail.constprop.0+0x18/0x26 [btrfs]
[ 2418.407835] Code: e6 48 c7 (...)
[ 2418.409078] RSP: 0018:ffffb06080d13c98 EFLAGS: 00010246
[ 2418.409696] RAX: 000000000000006c RBX: ffff994c1debbf08 RCX: 0000000000000000
[ 2418.410302] RDX: 0000000000000000 RSI: 0000000000000027 RDI: 00000000ffffffff
[ 2418.410904] RBP: ffff994c21770000 R08: 0000000000000000 R09: 0000000000000000
[ 2418.411504] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000010000
[ 2418.412111] R13: ffff994c22198400 R14: ffff994c21770000 R15: 0000000000000000
[ 2418.412713] FS: 00007f54fd7aff00(0000) GS:ffff994d35200000(0000) knlGS:0000000000000000
[ 2418.413326] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2418.413933] CR2: 000056549596d000 CR3: 000000010b928003 CR4: 0000000000370ee0
[ 2418.414528] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2418.415109] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2418.415669] Call Trace:
[ 2418.416254] btrfs_reserve_data_bytes.cold+0x22/0x22 [btrfs]
[ 2418.416812] btrfs_check_data_free_space+0x4c/0xa0 [btrfs]
[ 2418.417380] btrfs_buffered_write+0x1b0/0x7f0 [btrfs]
[ 2418.418315] btrfs_file_write_iter+0x2a9/0x770 [btrfs]
[ 2418.418920] new_sync_write+0x11f/0x1c0
[ 2418.419430] vfs_write+0x2bb/0x3b0
[ 2418.419972] __x64_sys_pwrite64+0x90/0xc0
[ 2418.420486] do_syscall_64+0x33/0x80
[ 2418.420979] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 2418.421486] RIP: 0033:0x7f54fda0b986
[ 2418.421981] Code: 48 c7 c0 (...)
[ 2418.423019] RSP: 002b:00007ffc40569c38 EFLAGS: 00000246 ORIG_RAX: 0000000000000012
[ 2418.423547] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f54fda0b986
[ 2418.424075] RDX: 0000000000010000 RSI: 000056549595e000 RDI: 0000000000000003
[ 2418.424596] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000400
[ 2418.425119] R10: 0000000000000400 R11: 0000000000000246 R12: 00000000ffffffff
[ 2418.425644] R13: 0000000000000400 R14: 0000000000010000 R15: 0000000000000000
[ 2418.426148] Modules linked in: btrfs blake2b_generic (...)
[ 2418.429540] ---[ end trace ef2aeb44dc0afa34 ]---
1) At btrfs_file_write_iter() we set current->journal_info to
BTRFS_DIO_SYNC_STUB;
2) We then call __btrfs_direct_write(), which calls btrfs_direct_IO();
3) We can't do the direct IO write because it starts at a non-aligned
offset (1111). So at btrfs_direct_IO() we return -EINVAL (coming from
check_direct_IO() which does the alignment check), but we leave
current->journal_info set to BTRFS_DIO_SYNC_STUB - we only clear it
at btrfs_dio_iomap_begin(), because we assume we always get there;
4) Then at __btrfs_direct_write() we see that the attempt to do the
direct IO write was not successful, 0 bytes written, so we fallback
to a buffered write by calling btrfs_buffered_write();
5) There we call btrfs_check_data_free_space() which in turn calls
btrfs_alloc_data_chunk_ondemand() and that calls
btrfs_reserve_data_bytes() with flush == BTRFS_RESERVE_FLUSH_DATA;
6) Then at btrfs_reserve_data_bytes() we have current->journal_info set to
BTRFS_DIO_SYNC_STUB, therefore not NULL, and flush has the value
BTRFS_RESERVE_FLUSH_DATA, triggering the second assertion:
int btrfs_reserve_data_bytes(struct btrfs_fs_info *fs_info, u64 bytes,
enum btrfs_reserve_flush_enum flush)
{
struct btrfs_space_info *data_sinfo = fs_info->data_sinfo;
int ret;
ASSERT(flush == BTRFS_RESERVE_FLUSH_DATA ||
flush == BTRFS_RESERVE_FLUSH_FREE_SPACE_INODE);
ASSERT(!current->journal_info || flush != BTRFS_RESERVE_FLUSH_DATA);
(...)
So fix that by setting the journal to NULL whenever check_direct_IO()
returns a failure.
This bug only affects 5.10 kernels, and the regression was introduced in
5.10-rc1 by commit 0eb79294db ("btrfs: dio iomap DSYNC workaround").
The bug does not exist in 5.11 kernels due to commit ecfdc08b8c
("btrfs: remove dio iomap DSYNC workaround"), which depends on a large
patchset that went into the merge window for 5.11. So this is a fix only
for 5.10.x stable kernels, as there are people hitting this bug.
Fixes: 0eb79294db ("btrfs: dio iomap DSYNC workaround")
CC: stable@vger.kernel.org # 5.10 (and only 5.10)
Acked-by: David Sterba <dsterba@suse.com>
Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1181605
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There's a mistake in backport of upstream commit 2175bf57dc ("btrfs:
fix possible free space tree corruption with online conversion") as
5.10.13 commit 2175bf57dc.
The enum value BTRFS_FS_FREE_SPACE_TREE_UNTRUSTED has been added to the
wrong enum set, colliding with value of BTRFS_FS_QUOTA_ENABLE. This
could cause problems during the tree conversion, where the quotas
wouldn't be set up properly but the related code executed anyway due to
the bit set.
Link: https://lore.kernel.org/linux-btrfs/20210219111741.95DD.409509F4@e16-tech.com
Reported-by: Wang Yugui <wangyugui@e16-tech.com>
CC: stable@vger.kernel.org # 5.10.13+
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 517b693351 upstream.
When alt mode 6 is not available, fallback to the kernel <= 5.7 behavior
of always using alt mode 1.
Prior to kernel 5.8, btusb would always use alt mode 1 for WBS (Wide
Band Speech aka mSBC aka transparent SCO). In commit baac6276c0
("Bluetooth: btusb: handle mSBC audio over USB Endpoints") this
was changed to use alt mode 6, which is the recommended mode in the
Bluetooth spec (Specifications of the Bluetooth System, v5.0, Vol 4.B
§2.2.1). However, many if not most BT USB adapters do not support alt
mode 6. In fact, I have been unable to find any which do.
In kernel 5.8, this was changed to use alt mode 6, and if not available,
use alt mode 0. But mode 0 has a zero byte max packet length and can
not possibly work. It is just there as a zero-bandwidth dummy mode to
work around a USB flaw that would prevent device enumeration if
insufficient bandwidth were available for the lowest isoc mode
supported.
In effect, WBS was broken for all USB-BT adapters that do not support
alt 6, which appears to nearly all of them.
Then in commit 461f95f04f ("Bluetooth: btusb: USB alternate setting 1 for
WBS") the 5.7 behavior was restored, but only for Realtek adapters.
I've tested a Broadcom BRCM20702A and CSR 8510 adapter, both work with
the 5.7 behavior and do not with the 5.8.
So get rid of the Realtek specific flag and use the 5.7 behavior for all
adapters as a fallback when alt 6 is not available. This was the
kernel's behavior prior to 5.8 and I can find no adapters for which it
is not correct. And even if there is an adapter for which this does not
work, the current behavior would be to fall back to alt 0, which can not
possibly work either, and so is no better.
Signed-off-by: Trent Piepho <tpiepho@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: Salvatore Bonaccorso <carnil@debian.org>
Cc: Sjoerd Simons <sjoerd@luon.net>
Cc: Sebastian Reichel <sre@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3342ff2698 upstream.
Al root-caused a new warning from syzbot to the ttyprintk tty driver
returning a write count larger than the data the tty layer actually gave
it. Which confused the tty write code mightily, and with the new
iov_iter based code, caused a WARNING in iov_iter_revert().
syzbot correctly bisected the source of the new warning to commit
9bb48c82ac ("tty: implement write_iter"), but the oddity goes back
much further, it just didn't get caught by anything before.
Reported-by: syzbot+3d2c27c2b7dc2a94814d@syzkaller.appspotmail.com
Fixes: 9bb48c82ac ("tty: implement write_iter")
Debugged-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 871997bc9e upstream.
The function uses a goto-based loop, which may lead to an earlier error
getting discarded by a later iteration. Exit this ad-hoc loop when an
error was encountered.
The out-of-memory error path additionally fails to fill a structure
field looked at by xen_blkbk_unmap_prepare() before inspecting the
handle which does get properly set (to BLKBACK_INVALID_HANDLE).
Since the earlier exiting from the ad-hoc loop requires the same field
filling (invalidation) as that on the out-of-memory path, fold both
paths. While doing so, drop the pr_alert(), as extra log messages aren't
going to help the situation (the kernel will log oom conditions already
anyway).
This is XSA-365.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <julien@xen.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7c77474b2d upstream.
In particular -ENOMEM may come back here, from set_foreign_p2m_mapping().
Don't make problems worse, the more that handling elsewhere (together
with map's status fields now indicating whether a mapping wasn't even
attempted, and hence has to be considered failed) doesn't require this
odd way of dealing with errors.
This is part of XSA-362.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: stable@vger.kernel.org
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3194a1746e upstream.
In particular -ENOMEM may come back here, from set_foreign_p2m_mapping().
Don't make problems worse, the more that handling elsewhere (together
with map's status fields now indicating whether a mapping wasn't even
attempted, and hence has to be considered failed) doesn't require this
odd way of dealing with errors.
This is part of XSA-362.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: stable@vger.kernel.org
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5a264285ed upstream.
In particular -ENOMEM may come back here, from set_foreign_p2m_mapping().
Don't make problems worse, the more that handling elsewhere (together
with map's status fields now indicating whether a mapping wasn't even
attempted, and hence has to be considered failed) doesn't require this
odd way of dealing with errors.
This is part of XSA-362.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: stable@vger.kernel.org
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 36bf1dfb8b upstream.
set_phys_to_machine can fail due to lack of memory, see the kzalloc call
in arch/arm/xen/p2m.c:__set_phys_to_machine_multi.
Don't ignore the potential return error in set_foreign_p2m_mapping,
returning it to the caller instead.
This is part of XSA-361.
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Cc: stable@vger.kernel.org
Reviewed-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ebee0eab08 upstream.
Failure of the kernel part of the mapping operation should also be
indicated as an error to the caller, or else it may assume the
respective kernel VA is okay to access.
Furthermore gnttab_map_refs() failing still requires recording
successfully mapped handles, so they can be unmapped subsequently. This
in turn requires there to be a way to tell full hypercall failure from
partial success - preset map_op status fields such that they won't
"happen" to look as if the operation succeeded.
Also again use GNTST_okay instead of implying its value (zero).
This is part of XSA-361.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: stable@vger.kernel.org
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dbe5283605 upstream.
We may not skip setting the field in the unmap structure when
GNTMAP_device_map is in use - such an unmap would fail to release the
respective resources (a page ref in the hypervisor). Otoh the field
doesn't need setting at all when GNTMAP_device_map is not in use.
To record the value for unmapping, we also better don't use our local
p2m: In particular after a subsequent change it may not have got updated
for all the batch elements. Instead it can simply be taken from the
respective map's results.
We can additionally avoid playing this game altogether for the kernel
part of the mappings in (x86) PV mode.
This is part of XSA-361.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: stable@vger.kernel.org
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b512e1b077 upstream.
We should not set up further state if either mapping failed; paying
attention to just the user mapping's status isn't enough.
Also use GNTST_okay instead of implying its value (zero).
This is part of XSA-361.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: stable@vger.kernel.org
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a35f2ef3b7 upstream.
Its sibling (set_foreign_p2m_mapping()) as well as the sibling of its
only caller (gnttab_map_refs()) don't clean up after themselves in case
of error. Higher level callers are expected to do so. However, in order
for that to really clean up any partially set up state, the operation
should not terminate upon encountering an entry in unexpected state. It
is particularly relevant to notice here that set_foreign_p2m_mapping()
would skip setting up a p2m entry if its grant mapping failed, but it
would continue to set up further p2m entries as long as their mappings
succeeded.
Arguably down the road set_foreign_p2m_mapping() may want its page state
related WARN_ON() also converted to an error return.
This is part of XSA-361.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: stable@vger.kernel.org
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a268e0f245 ]
proc_fs was used, in af_packet, without a surrounding #ifdef,
although there is no hard dependency on proc_fs.
That caused the initialization of the af_packet module to fail
when CONFIG_PROC_FS=n.
Specifically, proc_create_net() was used in af_packet.c,
and when it fails, packet_net_init() returns -ENOMEM.
It will always fail when the kernel is compiled without proc_fs,
because, proc_create_net() for example always returns NULL.
The calling order that starts in af_packet.c is as follows:
packet_init()
register_pernet_subsys()
register_pernet_operations()
__register_pernet_operations()
ops_init()
ops->init() (packet_net_ops.init=packet_net_init())
proc_create_net()
It worked in the past because register_pernet_subsys()'s return value
wasn't checked before this Commit 36096f2f4f ("packet: Fix error path in
packet_init.").
It always returned an error, but was not checked before, so everything
was working even when CONFIG_PROC_FS=n.
The fix here is simply to add the necessary #ifdef.
This also fixes a similar error in tls_proc.c, that was found by Jakub
Kicinski.
Fixes: d26b698dd3 ("net/tls: add skeleton of MIB statistics")
Fixes: 36096f2f4f ("packet: Fix error path in packet_init")
Signed-off-by: Yonatan Linik <yonatanlinik@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 09d6217254 ]
Currently, the exception actions are not processed correctly as the wrong
dataset is passed. This change fixes this, including the misleading
comment.
In addition, a check was added to make sure we work on an IPv4 packet,
and not just assume if it's not IPv6 it's IPv4.
This was all tested using OVS with patch,
https://patchwork.ozlabs.org/project/openvswitch/list/?series=21639,
applied and sending packets with a TTL of 1 (and 0), both with IPv4
and IPv6.
Fixes: 69929d4c49 ("net: openvswitch: fix TTL decrement action netlink message format")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Link: https://lore.kernel.org/r/160733569860.3007.12938188180387116741.stgit@wsfd-netdev64.ntdv.lab.eng.bos.redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit aadaca9e7c ]
The mru in the qdisc_skb_cb should be init as 0. Only defrag packets in the
act_ct will set the value.
Fixes: 038ebb1a71 ("net/sched: act_ct: fix miss set mru for ovs after defrag in act_ct")
Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 860975c6f8 ]
In case a subflow path is blocked, MPTCP-level retransmit may not take
place anymore because such subflow is likely to have unacked data left
in its write queue.
Ignore subflows that have experienced loss and test next candidate.
Fixes: 3b1d6210a9 ("mptcp: implement and use MPTCP-level retransmission")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ae068f561b ]
The port ID for control messages was uncorrectly set with broadcast
node ID value, causing message to be dropped on remote side since
not passing packet filtering (cb->dst_port != QRTR_PORT_CTRL).
Fixes: d27e77a3de ("net: qrtr: Reset the node and port ID of broadcast messages")
Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dae7a75f1f ]
Currently, iser target support max IO size of 16MiB by default. For some
adapters, allocating this amount of resources might reduce the total
number of possible connections that can be created. For those adapters,
it's preferred to reduce the max IO size to be able to create more
connections. Since there is no handshake procedure for max IO size in iser
protocol, set the default max IO size to 1MiB and add a module parameter
for enabling the option to control it for suitable adapters.
Fixes: 317000b926 ("IB/isert: allocate RW ctxs according to max IO size")
Link: https://lore.kernel.org/r/20201019094628.17202-1-mgurtovoy@nvidia.com
Reported-by: Krishnamraju Eraparaju <krishna2@chelsio.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 65b709586e upstream.
The get_config callback can be used by the device to fill the
config structure.
The callback will be invoked in vdpasim_get_config() before copying
bytes into caller buffer.
Move vDPA-net config updates from vdpasim_set_features() in the
new vdpasim_net_get_config() callback.
This is safe since in vdpa_get_config() we already check that
.set_features() callback is called before .get_config().
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20201215144256.155342-13-sgarzare@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f37cbbc651 upstream.
Add new 'config_size' attribute in 'vdpasim_dev_attr' and allocates
'config' dynamically to support any device types.
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20201215144256.155342-12-sgarzare@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cf1a3b3538 upstream.
As preparation for the next patches, we store the MAC address,
parsed during the vdpasim_create(), in a buffer that will be used
to fill 'config' together with other configurations.
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20201215144256.155342-11-sgarzare@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6c6e28fe45 upstream.
vdpasim_dev_attr will contain device specific attributes. We starting
moving the number of virtqueues (i.e. nvqs) to vdpasim_dev_attr.
vdpasim_create() creates a new vDPA simulator following the device
attributes defined in the vdpasim_dev_attr parameter.
Co-developed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20201215144256.155342-7-sgarzare@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 423248d60d upstream.
Add a new attribute that will define the number of virt queues to be
created for the vdpasim device.
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
[sgarzare: replace kmalloc_array() with kcalloc()]
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20201215144256.155342-4-sgarzare@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit aee9ddb1d3 upstream.
Currently there's a KCOV remote coverage collection section in
__usb_hcd_giveback_urb(). Initially that section was added based on the
assumption that usb_hcd_giveback_urb() can only be called in interrupt
context as indicated by a comment before it. This is what happens when
syzkaller is fuzzing the USB stack via the dummy_hcd driver.
As it turns out, it's actually valid to call usb_hcd_giveback_urb() in task
context, provided that the caller turned off the interrupts; USB/IP does
exactly that. This can lead to a nested KCOV remote coverage collection
sections both trying to collect coverage in task context. This isn't
supported by KCOV, and leads to a WARNING.
Change __usb_hcd_giveback_urb() to only call kcov_remote_*() callbacks
when it's being executed in a softirq. As the result, the coverage from
USB/IP related usb_hcd_giveback_urb() calls won't be collected, but the
WARNING is fixed.
A potential future improvement would be to support nested remote coverage
collection sections, but this patch doesn't address that.
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Marco Elver <elver@google.com>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Link: https://lore.kernel.org/r/f3a7a153f0719cb53ec385b16e912798bd3e4cf9.1602856358.git.andreyknvl@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cef4cbff06 upstream.
There was a syzbot report with this warning but insufficient information...
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>