Commit graph

858023 commits

Author SHA1 Message Date
Eugene Syromiatnikov
a0eb9abd8a
fork: block invalid exit signals with clone3()
Previously, higher 32 bits of exit_signal fields were lost when copied
to the kernel args structure (that uses int as a type for the respective
field). Moreover, as Oleg has noted, exit_signal is used unchecked, so
it has to be checked for sanity before use; for the legacy syscalls,
applying CSIGNAL mask guarantees that it is at least non-negative;
however, there's no such thing is done in clone3() code path, and that
can break at least thread_group_leader.

This commit adds a check to copy_clone_args_from_user() to verify that
the exit signal is limited by CSIGNAL as with legacy clone() and that
the signal is valid. With this we don't get the legacy clone behavior
were an invalid signal could be handed down and would only be detected
and ignored in do_notify_parent(). Users of clone3() will now get a
proper error when they pass an invalid exit signal. Note, that this is
not user-visible behavior since no kernel with clone3() has been
released yet.

The following program will cause a splat on a non-fixed clone3() version
and will fail correctly on a fixed version:

 #define _GNU_SOURCE
 #include <linux/sched.h>
 #include <linux/types.h>
 #include <sched.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <sys/syscall.h>
 #include <sys/wait.h>
 #include <unistd.h>

 int main(int argc, char *argv[])
 {
        pid_t pid = -1;
        struct clone_args args = {0};
        args.exit_signal = -1;

        pid = syscall(__NR_clone3, &args, sizeof(struct clone_args));
        if (pid < 0)
                exit(EXIT_FAILURE);

        if (pid == 0)
                exit(EXIT_SUCCESS);

        wait(NULL);

        exit(EXIT_SUCCESS);
 }

Fixes: 7f192e3cd3 ("fork: add clone3")
Reported-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
Link: https://lore.kernel.org/r/4b38fa4ce420b119a4c6345f42fe3cec2de9b0b5.1568223594.git.esyr@redhat.com
[christian.brauner@ubuntu.com: simplify check and rework commit message]
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2019-09-12 14:56:33 +02:00
Thomas Huth
53936b5bf3 KVM: s390: Do not leak kernel stack data in the KVM_S390_INTERRUPT ioctl
When the userspace program runs the KVM_S390_INTERRUPT ioctl to inject
an interrupt, we convert them from the legacy struct kvm_s390_interrupt
to the new struct kvm_s390_irq via the s390int_to_s390irq() function.
However, this function does not take care of all types of interrupts
that we can inject into the guest later (see do_inject_vcpu()). Since we
do not clear out the s390irq values before calling s390int_to_s390irq(),
there is a chance that we copy random data from the kernel stack which
could be leaked to the userspace later.

Specifically, the problem exists with the KVM_S390_INT_PFAULT_INIT
interrupt: s390int_to_s390irq() does not handle it, and the function
__inject_pfault_init() later copies irq->u.ext which contains the
random kernel stack data. This data can then be leaked either to
the guest memory in __deliver_pfault_init(), or the userspace might
retrieve it directly with the KVM_S390_GET_IRQ_STATE ioctl.

Fix it by handling that interrupt type in s390int_to_s390irq(), too,
and by making sure that the s390irq struct is properly pre-initialized.
And while we're at it, make sure that s390int_to_s390irq() now
directly returns -EINVAL for unknown interrupt types, so that we
immediately get a proper error code in case we add more interrupt
types to do_inject_vcpu() without updating s390int_to_s390irq()
sometime in the future.

Cc: stable@vger.kernel.org
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Link: https://lore.kernel.org/kvm/20190912115438.25761-1-thuth@redhat.com
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2019-09-12 14:12:21 +02:00
Christophe JAILLET
b456d72412 sctp: Fix the link time qualifier of 'sctp_ctrlsock_exit()'
The '.exit' functions from 'pernet_operations' structure should be marked
as __net_exit, not __net_init.

Fixes: 8e2d61e0ae ("sctp: fix race on protocol/netns initialization")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-12 12:55:28 +01:00
Steffen Klassert
f39b683d35 ixgbe: Fix secpath usage for IPsec TX offload.
The ixgbe driver currently does IPsec TX offloading
based on an existing secpath. However, the secpath
can also come from the RX side, in this case it is
misinterpreted for TX offload and the packets are
dropped with a "bad sa_idx" error. Fix this by using
the xfrm_offload() function to test for TX offload.

Fixes: 5925947047 ("ixgbe: process the Tx ipsec offload")
Reported-by: Michael Marley <michael@michaelmarley.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-12 12:43:14 +01:00
Filipe Manana
18dfa7117a Btrfs: fix unwritten extent buffers and hangs on future writeback attempts
The lock_extent_buffer_io() returns 1 to the caller to tell it everything
went fine and the callers needs to start writeback for the extent buffer
(submit a bio, etc), 0 to tell the caller everything went fine but it does
not need to start writeback for the extent buffer, and a negative value if
some error happened.

When it's about to return 1 it tries to lock all pages, and if a try lock
on a page fails, and we didn't flush any existing bio in our "epd", it
calls flush_write_bio(epd) and overwrites the return value of 1 to 0 or
an error. The page might have been locked elsewhere, not with the goal
of starting writeback of the extent buffer, and even by some code other
than btrfs, like page migration for example, so it does not mean the
writeback of the extent buffer was already started by some other task,
so returning a 0 tells the caller (btree_write_cache_pages()) to not
start writeback for the extent buffer. Note that epd might currently have
either no bio, so flush_write_bio() returns 0 (success) or it might have
a bio for another extent buffer with a lower index (logical address).

Since we return 0 with the EXTENT_BUFFER_WRITEBACK bit set on the
extent buffer and writeback is never started for the extent buffer,
future attempts to writeback the extent buffer will hang forever waiting
on that bit to be cleared, since it can only be cleared after writeback
completes. Such hang is reported with a trace like the following:

  [49887.347053] INFO: task btrfs-transacti:1752 blocked for more than 122 seconds.
  [49887.347059]       Not tainted 5.2.13-gentoo #2
  [49887.347060] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [49887.347062] btrfs-transacti D    0  1752      2 0x80004000
  [49887.347064] Call Trace:
  [49887.347069]  ? __schedule+0x265/0x830
  [49887.347071]  ? bit_wait+0x50/0x50
  [49887.347072]  ? bit_wait+0x50/0x50
  [49887.347074]  schedule+0x24/0x90
  [49887.347075]  io_schedule+0x3c/0x60
  [49887.347077]  bit_wait_io+0x8/0x50
  [49887.347079]  __wait_on_bit+0x6c/0x80
  [49887.347081]  ? __lock_release.isra.29+0x155/0x2d0
  [49887.347083]  out_of_line_wait_on_bit+0x7b/0x80
  [49887.347084]  ? var_wake_function+0x20/0x20
  [49887.347087]  lock_extent_buffer_for_io+0x28c/0x390
  [49887.347089]  btree_write_cache_pages+0x18e/0x340
  [49887.347091]  do_writepages+0x29/0xb0
  [49887.347093]  ? kmem_cache_free+0x132/0x160
  [49887.347095]  ? convert_extent_bit+0x544/0x680
  [49887.347097]  filemap_fdatawrite_range+0x70/0x90
  [49887.347099]  btrfs_write_marked_extents+0x53/0x120
  [49887.347100]  btrfs_write_and_wait_transaction.isra.4+0x38/0xa0
  [49887.347102]  btrfs_commit_transaction+0x6bb/0x990
  [49887.347103]  ? start_transaction+0x33e/0x500
  [49887.347105]  transaction_kthread+0x139/0x15c

So fix this by not overwriting the return value (ret) with the result
from flush_write_bio(). We also need to clear the EXTENT_BUFFER_WRITEBACK
bit in case flush_write_bio() returns an error, otherwise it will hang
any future attempts to writeback the extent buffer, and undo all work
done before (set back EXTENT_BUFFER_DIRTY, etc).

This is a regression introduced in the 5.2 kernel.

Fixes: 2e3c25136a ("btrfs: extent_io: add proper error handling to lock_extent_buffer_for_io()")
Fixes: f4340622e0 ("btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up")
Reported-by: Zdenek Sojka <zsojka@seznam.cz>
Link: https://lore.kernel.org/linux-btrfs/GpO.2yos.3WGDOLpx6t%7D.1TUDYM@seznam.cz/T/#u
Reported-by: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
Link: https://lore.kernel.org/linux-btrfs/5c4688ac-10a7-fb07-70e8-c5d31a3fbb38@profihost.ag/T/#t
Reported-by: Drazen Kacar <drazen.kacar@oradian.com>
Link: https://lore.kernel.org/linux-btrfs/DB8PR03MB562876ECE2319B3E579590F799C80@DB8PR03MB5628.eurprd03.prod.outlook.com/
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204377
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-12 13:37:25 +02:00
Filipe Manana
410f954cb1 Btrfs: fix assertion failure during fsync and use of stale transaction
Sometimes when fsync'ing a file we need to log that other inodes exist and
when we need to do that we acquire a reference on the inodes and then drop
that reference using iput() after logging them.

That generally is not a problem except if we end up doing the final iput()
(dropping the last reference) on the inode and that inode has a link count
of 0, which can happen in a very short time window if the logging path
gets a reference on the inode while it's being unlinked.

In that case we end up getting the eviction callback, btrfs_evict_inode(),
invoked through the iput() call chain which needs to drop all of the
inode's items from its subvolume btree, and in order to do that, it needs
to join a transaction at the helper function evict_refill_and_join().
However because the task previously started a transaction at the fsync
handler, btrfs_sync_file(), it has current->journal_info already pointing
to a transaction handle and therefore evict_refill_and_join() will get
that transaction handle from btrfs_join_transaction(). From this point on,
two different problems can happen:

1) evict_refill_and_join() will often change the transaction handle's
   block reserve (->block_rsv) and set its ->bytes_reserved field to a
   value greater than 0. If evict_refill_and_join() never commits the
   transaction, the eviction handler ends up decreasing the reference
   count (->use_count) of the transaction handle through the call to
   btrfs_end_transaction(), and after that point we have a transaction
   handle with a NULL ->block_rsv (which is the value prior to the
   transaction join from evict_refill_and_join()) and a ->bytes_reserved
   value greater than 0. If after the eviction/iput completes the inode
   logging path hits an error or it decides that it must fallback to a
   transaction commit, the btrfs fsync handle, btrfs_sync_file(), gets a
   non-zero value from btrfs_log_dentry_safe(), and because of that
   non-zero value it tries to commit the transaction using a handle with
   a NULL ->block_rsv and a non-zero ->bytes_reserved value. This makes
   the transaction commit hit an assertion failure at
   btrfs_trans_release_metadata() because ->bytes_reserved is not zero but
   the ->block_rsv is NULL. The produced stack trace for that is like the
   following:

   [192922.917158] assertion failed: !trans->bytes_reserved, file: fs/btrfs/transaction.c, line: 816
   [192922.917553] ------------[ cut here ]------------
   [192922.917922] kernel BUG at fs/btrfs/ctree.h:3532!
   [192922.918310] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
   [192922.918666] CPU: 2 PID: 883 Comm: fsstress Tainted: G        W         5.1.4-btrfs-next-47 #1
   [192922.919035] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
   [192922.919801] RIP: 0010:assfail.constprop.25+0x18/0x1a [btrfs]
   (...)
   [192922.920925] RSP: 0018:ffffaebdc8a27da8 EFLAGS: 00010286
   [192922.921315] RAX: 0000000000000051 RBX: ffff95c9c16a41c0 RCX: 0000000000000000
   [192922.921692] RDX: 0000000000000000 RSI: ffff95cab6b16838 RDI: ffff95cab6b16838
   [192922.922066] RBP: ffff95c9c16a41c0 R08: 0000000000000000 R09: 0000000000000000
   [192922.922442] R10: ffffaebdc8a27e70 R11: 0000000000000000 R12: ffff95ca731a0980
   [192922.922820] R13: 0000000000000000 R14: ffff95ca84c73338 R15: ffff95ca731a0ea8
   [192922.923200] FS:  00007f337eda4e80(0000) GS:ffff95cab6b00000(0000) knlGS:0000000000000000
   [192922.923579] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
   [192922.923948] CR2: 00007f337edad000 CR3: 00000001e00f6002 CR4: 00000000003606e0
   [192922.924329] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
   [192922.924711] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
   [192922.925105] Call Trace:
   [192922.925505]  btrfs_trans_release_metadata+0x10c/0x170 [btrfs]
   [192922.925911]  btrfs_commit_transaction+0x3e/0xaf0 [btrfs]
   [192922.926324]  btrfs_sync_file+0x44c/0x490 [btrfs]
   [192922.926731]  do_fsync+0x38/0x60
   [192922.927138]  __x64_sys_fdatasync+0x13/0x20
   [192922.927543]  do_syscall_64+0x60/0x1c0
   [192922.927939]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
   (...)
   [192922.934077] ---[ end trace f00808b12068168f ]---

2) If evict_refill_and_join() decides to commit the transaction, it will
   be able to do it, since the nested transaction join only increments the
   transaction handle's ->use_count reference counter and it does not
   prevent the transaction from getting committed. This means that after
   eviction completes, the fsync logging path will be using a transaction
   handle that refers to an already committed transaction. What happens
   when using such a stale transaction can be unpredictable, we are at
   least having a use-after-free on the transaction handle itself, since
   the transaction commit will call kmem_cache_free() against the handle
   regardless of its ->use_count value, or we can end up silently losing
   all the updates to the log tree after that iput() in the logging path,
   or using a transaction handle that in the meanwhile was allocated to
   another task for a new transaction, etc, pretty much unpredictable
   what can happen.

In order to fix both of them, instead of using iput() during logging, use
btrfs_add_delayed_iput(), so that the logging path of fsync never drops
the last reference on an inode, that step is offloaded to a safe context
(usually the cleaner kthread).

The assertion failure issue was sporadically triggered by the test case
generic/475 from fstests, which loads the dm error target while fsstress
is running, which lead to fsync failing while logging inodes with -EIO
errors and then trying later to commit the transaction, triggering the
assertion failure.

CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-12 13:37:19 +02:00
Igor Mammedov
13a17cc052 KVM: s390: kvm_s390_vm_start_migration: check dirty_bitmap before using it as target for memset()
If userspace doesn't set KVM_MEM_LOG_DIRTY_PAGES on memslot before calling
kvm_s390_vm_start_migration(), kernel will oops with:

  Unable to handle kernel pointer dereference in virtual kernel address space
  Failing address: 0000000000000000 TEID: 0000000000000483
  Fault in home space mode while using kernel ASCE.
  AS:0000000002a2000b R2:00000001bff8c00b R3:00000001bff88007 S:00000001bff91000 P:000000000000003d
  Oops: 0004 ilc:2 [#1] SMP
  ...
  Call Trace:
  ([<001fffff804ec552>] kvm_s390_vm_set_attr+0x347a/0x3828 [kvm])
   [<001fffff804ecfc0>] kvm_arch_vm_ioctl+0x6c0/0x1998 [kvm]
   [<001fffff804b67e4>] kvm_vm_ioctl+0x51c/0x11a8 [kvm]
   [<00000000008ba572>] do_vfs_ioctl+0x1d2/0xe58
   [<00000000008bb284>] ksys_ioctl+0x8c/0xb8
   [<00000000008bb2e2>] sys_ioctl+0x32/0x40
   [<000000000175552c>] system_call+0x2b8/0x2d8
  INFO: lockdep is turned off.
  Last Breaking-Event-Address:
   [<0000000000dbaf60>] __memset+0xc/0xa0

due to ms->dirty_bitmap being NULL, which might crash the host.

Make sure that ms->dirty_bitmap is set before using it or
return -EINVAL otherwise.

Cc: <stable@vger.kernel.org>
Fixes: afdad61615 ("KVM: s390: Fix storage attributes migration with memory slots")
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Link: https://lore.kernel.org/kvm/20190911075218.29153-1-imammedo@redhat.com/
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2019-09-12 13:09:17 +02:00
Navid Emamdoost
a21b7f0cff net: qrtr: fix memort leak in qrtr_tun_write_iter
In qrtr_tun_write_iter the allocated kbuf should be release in case of
error or success return.

v2 Update: Thanks to David Miller for pointing out the release on success
path as well.

Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-12 11:58:44 +01:00
Subash Abhinov Kasiviswanathan
10cc514f45 net: Fix null de-reference of device refcount
In event of failure during register_netdevice, free_netdev is
invoked immediately. free_netdev assumes that all the netdevice
refcounts have been dropped prior to it being called and as a
result frees and clears out the refcount pointer.

However, this is not necessarily true as some of the operations
in the NETDEV_UNREGISTER notifier handlers queue RCU callbacks for
invocation after a grace period. The IPv4 callback in_dev_rcu_put
tries to access the refcount after free_netdev is called which
leads to a null de-reference-

44837.761523:   <6> Unable to handle kernel paging request at
                    virtual address 0000004a88287000
44837.761651:   <2> pc : in_dev_finish_destroy+0x4c/0xc8
44837.761654:   <2> lr : in_dev_finish_destroy+0x2c/0xc8
44837.762393:   <2> Call trace:
44837.762398:   <2>  in_dev_finish_destroy+0x4c/0xc8
44837.762404:   <2>  in_dev_rcu_put+0x24/0x30
44837.762412:   <2>  rcu_nocb_kthread+0x43c/0x468
44837.762418:   <2>  kthread+0x118/0x128
44837.762424:   <2>  ret_from_fork+0x10/0x1c

Fix this by waiting for the completion of the call_rcu() in
case of register_netdevice errors.

Fixes: 93ee31f14f ("[NET]: Fix free_netdev on register_netdev failure.")
Cc: Sean Tranchetti <stranche@codeaurora.org>
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-12 11:55:34 +01:00
Christophe JAILLET
d23dbc479a ipv6: Fix the link time qualifier of 'ping_v6_proc_exit_net()'
The '.exit' functions from 'pernet_operations' structure should be marked
as __net_exit, not __net_init.

Fixes: d862e54614 ("net: ipv6: Implement /proc/net/icmp6.")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-12 11:20:33 +01:00
Yang Yingliang
77f22f92df tun: fix use-after-free when register netdev failed
I got a UAF repport in tun driver when doing fuzzy test:

[  466.269490] ==================================================================
[  466.271792] BUG: KASAN: use-after-free in tun_chr_read_iter+0x2ca/0x2d0
[  466.271806] Read of size 8 at addr ffff888372139250 by task tun-test/2699
[  466.271810]
[  466.271824] CPU: 1 PID: 2699 Comm: tun-test Not tainted 5.3.0-rc1-00001-g5a9433db2614-dirty #427
[  466.271833] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[  466.271838] Call Trace:
[  466.271858]  dump_stack+0xca/0x13e
[  466.271871]  ? tun_chr_read_iter+0x2ca/0x2d0
[  466.271890]  print_address_description+0x79/0x440
[  466.271906]  ? vprintk_func+0x5e/0xf0
[  466.271920]  ? tun_chr_read_iter+0x2ca/0x2d0
[  466.271935]  __kasan_report+0x15c/0x1df
[  466.271958]  ? tun_chr_read_iter+0x2ca/0x2d0
[  466.271976]  kasan_report+0xe/0x20
[  466.271987]  tun_chr_read_iter+0x2ca/0x2d0
[  466.272013]  do_iter_readv_writev+0x4b7/0x740
[  466.272032]  ? default_llseek+0x2d0/0x2d0
[  466.272072]  do_iter_read+0x1c5/0x5e0
[  466.272110]  vfs_readv+0x108/0x180
[  466.299007]  ? compat_rw_copy_check_uvector+0x440/0x440
[  466.299020]  ? fsnotify+0x888/0xd50
[  466.299040]  ? __fsnotify_parent+0xd0/0x350
[  466.299064]  ? fsnotify_first_mark+0x1e0/0x1e0
[  466.304548]  ? vfs_write+0x264/0x510
[  466.304569]  ? ksys_write+0x101/0x210
[  466.304591]  ? do_preadv+0x116/0x1a0
[  466.304609]  do_preadv+0x116/0x1a0
[  466.309829]  do_syscall_64+0xc8/0x600
[  466.309849]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  466.309861] RIP: 0033:0x4560f9
[  466.309875] Code: 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
[  466.309889] RSP: 002b:00007ffffa5166e8 EFLAGS: 00000206 ORIG_RAX: 0000000000000127
[  466.322992] RAX: ffffffffffffffda RBX: 0000000000400460 RCX: 00000000004560f9
[  466.322999] RDX: 0000000000000003 RSI: 00000000200008c0 RDI: 0000000000000003
[  466.323007] RBP: 00007ffffa516700 R08: 0000000000000004 R09: 0000000000000000
[  466.323014] R10: 0000000000000000 R11: 0000000000000206 R12: 000000000040cb10
[  466.323021] R13: 0000000000000000 R14: 00000000006d7018 R15: 0000000000000000
[  466.323057]
[  466.323064] Allocated by task 2605:
[  466.335165]  save_stack+0x19/0x80
[  466.336240]  __kasan_kmalloc.constprop.8+0xa0/0xd0
[  466.337755]  kmem_cache_alloc+0xe8/0x320
[  466.339050]  getname_flags+0xca/0x560
[  466.340229]  user_path_at_empty+0x2c/0x50
[  466.341508]  vfs_statx+0xe6/0x190
[  466.342619]  __do_sys_newstat+0x81/0x100
[  466.343908]  do_syscall_64+0xc8/0x600
[  466.345303]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  466.347034]
[  466.347517] Freed by task 2605:
[  466.348471]  save_stack+0x19/0x80
[  466.349476]  __kasan_slab_free+0x12e/0x180
[  466.350726]  kmem_cache_free+0xc8/0x430
[  466.351874]  putname+0xe2/0x120
[  466.352921]  filename_lookup+0x257/0x3e0
[  466.354319]  vfs_statx+0xe6/0x190
[  466.355498]  __do_sys_newstat+0x81/0x100
[  466.356889]  do_syscall_64+0xc8/0x600
[  466.358037]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  466.359567]
[  466.360050] The buggy address belongs to the object at ffff888372139100
[  466.360050]  which belongs to the cache names_cache of size 4096
[  466.363735] The buggy address is located 336 bytes inside of
[  466.363735]  4096-byte region [ffff888372139100, ffff88837213a100)
[  466.367179] The buggy address belongs to the page:
[  466.368604] page:ffffea000dc84e00 refcount:1 mapcount:0 mapping:ffff8883df1b4f00 index:0x0 compound_mapcount: 0
[  466.371582] flags: 0x2fffff80010200(slab|head)
[  466.372910] raw: 002fffff80010200 dead000000000100 dead000000000122 ffff8883df1b4f00
[  466.375209] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[  466.377778] page dumped because: kasan: bad access detected
[  466.379730]
[  466.380288] Memory state around the buggy address:
[  466.381844]  ffff888372139100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  466.384009]  ffff888372139180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  466.386131] >ffff888372139200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  466.388257]                                                  ^
[  466.390234]  ffff888372139280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  466.392512]  ffff888372139300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  466.394667] ==================================================================

tun_chr_read_iter() accessed the memory which freed by free_netdev()
called by tun_set_iff():

        CPUA                                           CPUB
  tun_set_iff()
    alloc_netdev_mqs()
    tun_attach()
                                                  tun_chr_read_iter()
                                                    tun_get()
                                                    tun_do_read()
                                                      tun_ring_recv()
    register_netdevice() <-- inject error
    goto err_detach
    tun_detach_all() <-- set RCV_SHUTDOWN
    free_netdev() <-- called from
                     err_free_dev path
      netdev_freemem() <-- free the memory
                        without check refcount
      (In this path, the refcount cannot prevent
       freeing the memory of dev, and the memory
       will be used by dev_put() called by
       tun_chr_read_iter() on CPUB.)
                                                     (Break from tun_ring_recv(),
                                                     because RCV_SHUTDOWN is set)
                                                   tun_put()
                                                     dev_put() <-- use the memory
                                                                   freed by netdev_freemem()

Put the publishing of tfile->tun after register_netdevice(),
so tun_get() won't get the tun pointer that freed by
err_detach path if register_netdevice() failed.

Fixes: eb0fb363f9 ("tuntap: attach queue 0 before registering netdevice")
Reported-by: Hulk Robot <hulkci@huawei.com>
Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-12 11:17:26 +01:00
Linus Torvalds
ad32b4800c virtio: last minute bugfixes
A couple of security things.
 
 And an error handling bugfix that is never encountered by most people,
 but that also makes it kind of safe to push at the last minute, and it
 helps push the fix to stable a bit sooner.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJdeUfXAAoJECgfDbjSjVRpfFoH/2B58dJasWK9CiHlu1Pm9sGZ
 44JOA3M9uqNCev0sXEYXB/ldRW0BK8BgOKv1UJ6Za2bvO0mEz6Go8jr8EUId5kaO
 YSwSscn7Fp1XJgRzJunhJNo/t70zptsDHpeVU68ObP4ubQgSQWLlVMvA9EQOz8M+
 Fq1FCmnLdPpu/8u2dN4bstt5uaUQSCOLOB9Sq0U8qNRuuVnNgwLtDqCw8chZRrfn
 Wl1XGzDp22wtf/Beey8YFB+IGP2BnPNbxNeuPI33SkLM6ZVjR6511lpihfZyN2AF
 Fb+yajOfLGnAftff5Lpn0dXxOOfwe+D20ymKa1rGCf6iVI5ZnZYZavrYmVy7X9k=
 =P20D
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio fixes from Michael Tsirkin:
 "Last minute bugfixes.

  A couple of security things.

  And an error handling bugfix that is never encountered by most people,
  but that also makes it kind of safe to push at the last minute, and it
  helps push the fix to stable a bit sooner"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vhost: make sure log_num < in_num
  vhost: block speculation of translated descriptors
  virtio_ring: fix unmap of indirect descriptors
2019-09-12 11:07:31 +01:00
Linus Torvalds
6dcf6a4eb9 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fix from Ingo Molnar:
 "Fix an initialization bug in the hw-breakpoints, which triggered on
  the ARM platform"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/hw_breakpoint: Fix arch_hw_breakpoint use-before-initialization
2019-09-12 11:04:50 +01:00
Linus Torvalds
95779fe850 Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Ingo Molnar:
 "Fix a race in the IRQ resend mechanism, which can result in a NULL
  dereference crash"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq: Prevent NULL pointer dereference in resend_irqs()
2019-09-12 11:02:00 +01:00
Linus Torvalds
840ce8f807 Pin control fixes for v5.3:
- A single patch for some Aspeed problems. The BMCs are much
   happier now.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEElDRnuGcz/wPCXQWMQRCzN7AZXXMFAl16AN0ACgkQQRCzN7AZ
 XXN5EA/+LSMeYpoLPEi2NbEikj00hl71zyLRCb4X2KZKqqc6Wa7gbp0tTNLAmfX/
 zCPiSF0bsUlKjRrwHsMsPwq9mju4vbgsUILwddEy++CXZQeVRs8eOTuWF8a9a2Ga
 1D3ZLrQHwBl7CiboWnnXsPW7vPvOq+nV4hVMCrernYpYk5FM4SqDs+pqMtpZZP5J
 LW83RhBpjatdoqsQ2ZvUlOOADYALXtbCh9gCCZ9q5DejYTMMkr0EpyUMyC2quDYV
 RB9qpISkprwApjjnLSRe38KoErtrHTyr8mP96Y9b7pR8lQc/4mc+L5NGtiL0Xnk0
 /oC6eolIrDJQvV2lZjAXmfV6KVTS2c6ts9Vpusc8iQfaqnfsQC53/W8XcqpjKfmY
 NpK9rL41veoVWJmnm2wrvRcCpqFrVhqHb875ZhksdbAaXbLVoXo4Mwi8SY7GSAEv
 V4WVcevNqgEk7ip3RAPAMSx36bnx9aFXEWRXRThbh77yuBcsKM7taQIp9KtQPPtQ
 /ZR37N0pFxmOPdaSqX9tjsXw9QWKqy+FPmDlKfR4mh1JCzav5YszgOc5S1NDAsqK
 E3nyLrubyTZiGwpzqtF3sOVVoW9DccRQEpxE3YnwkCSA0OdaOCMpC0b7zlftYQpK
 CDjzgEgUpN1wgu6a1xAkkaVKc7FalmuLYkG4XbIgd0NjsBNfzr4=
 =JzRC
 -----END PGP SIGNATURE-----

Merge tag 'pinctrl-v5.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pin control fix from Linus Walleij:
 "Hopefully last pin control fix: a single patch for some Aspeed
  problems. The BMCs are much happier now"

* tag 'pinctrl-v5.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: aspeed: Fix spurious mux failures on the AST2500
2019-09-12 10:58:47 +01:00
Dmitry Torokhov
11c43bb022 gpiolib: of: add a fallback for wlf,reset GPIO name
The old Arizona binding did not use -gpio or -gpios suffix, so
devm_gpiod_get() does not work for it. As it is the one of a few users
of devm_gpiod_get_from_of_node() API that I want to remove, I'd rather
have a small quirk in the gpiolib OF handler, and switch Arizona
driver to devm_gpiod_get().

Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Link: https://lore.kernel.org/r/20190911075215.78047-2-dmitry.torokhov@gmail.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2019-09-12 10:29:17 +01:00
Geert Uytterhoeven
c34a024e4e gpio: htc-egpio: Remove unused exported htc_egpio_get_wakeup_irq()
This function was never used upstream, and is a relic of the original
handhelds.org code the htc-egpio driver was based on.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20190910141529.21030-1-geert+renesas@glider.be
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2019-09-12 10:07:44 +01:00
Linus Torvalds
9c09f62348 GPIO fixes for v5.3:
- An ACPI DSDT error fixup of the type we always see and
   Hans invariably gets to fix.
 - A OF quirk fix for the current release (v5.3)
 - Some consistency checks on the userspace ABI.
 - A memory leak.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEElDRnuGcz/wPCXQWMQRCzN7AZXXMFAl15/G8ACgkQQRCzN7AZ
 XXOpKA/8D1yRVDehYXgL5vH9AweXtUN82t2ZKeN5Mu+XiTJ2DMI1tS3lNrgwlnew
 3nJ/o8hgR7O2ZTmNXcQ/tyw6+svCi9bGQjYnhmqVx5u2dyhxjFg5gLJJNT+Mrzw/
 7AnD05oBV4yGtRyscgfr9ODJR8Qy2a9el/5SSm1rAOgwRgtKviKFnv3crasWrS/m
 TavpO/BtQOH8gUByR8sIaOFSMzKYc66Km9CQLdmcpfCOksz3E1lx4MlPghA7Fs7G
 p2M3883mYQrK6Ya2ZUEOwbPG48JPmGHK0rGwTH2GASd0F1tfLxaxmBMdDOQ4KAjx
 V3sWiCyR5RscehB/TWRcDl63o2HyZv6DYggYwDeNPx3Lck3WO+BLhv+RGjhXC46p
 QWU+c0WoZeAKfd3S63N/68mVh/7EpK9ef9Nq/vv2IKxI0THn3ZI6xg70fsgdl7wW
 s/KMOpqmztxmaQIlthzMmQSRnKO9sbv5Zy58SWOA992BNbgbRVjkHjTkdcu6UXJ7
 aURX/k/3/6alMxTBLITWdax+ihjuwpdhX0VVf5qPGh1/WdMq5DAI79bUQAdvrPW6
 Q0YFhQqZNTBzCOWXSXEtUJiSI7a0Wrnf0bKIFXTu7qQ7Fw+ohqliY8qOs+m1hm8f
 6nYrS5JQKoGHbTVxL0/0m83f8OyQGnZiy7FEdBGYNKcSGec5rao=
 =QEAT
 -----END PGP SIGNATURE-----

Merge tag 'gpio-v5.3-6' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio

Pull GPIO fixes from Linus Walleij:
 "I don't really like to send so many fixes at the very last minute, but
  the bug-sport activity is unpredictable.

  Four fixes, three are -stable material that will go everywhere, one is
  for the current cycle:

   - An ACPI DSDT error fixup of the type we always see and Hans
     invariably gets to fix.

   - A OF quirk fix for the current release (v5.3)

   - Some consistency checks on the userspace ABI.

   - A memory leak"

* tag 'gpio-v5.3-6' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
  gpiolib: acpi: Add gpiolib_acpi_run_edge_events_on_boot option and blacklist
  gpiolib: of: fix fallback quirks handling
  gpio: fix line flag validation in lineevent_create
  gpio: fix line flag validation in linehandle_create
  gpio: mockup: add missing single_release()
2019-09-12 09:53:38 +01:00
Andrew Jeffery
c1432423a1 pinctrl: aspeed: Fix spurious mux failures on the AST2500
Commit 674fa8daa8 ("pinctrl: aspeed-g5: Delay acquisition of regmaps")
was determined to be a partial fix to the problem of acquiring the LPC
Host Controller and GFX regmaps: The AST2500 pin controller may need to
fetch syscon regmaps during expression evaluation as well as when
setting mux state. For example, this case is hit by attempting to export
pins exposing the LPC Host Controller as GPIOs.

An optional eval() hook is added to the Aspeed pinmux operation struct
and called from aspeed_sig_expr_eval() if the pointer is set by the
SoC-specific driver. This enables the AST2500 to perform the custom
action of acquiring its regmap dependencies as required.

John Wang tested the fix on an Inspur FP5280G2 machine (AST2500-based)
where the issue was found, and I've booted the fix on Witherspoon
(AST2500) and Palmetto (AST2400) machines, and poked at relevant pins
under QEMU by forcing mux configurations via devmem before exporting
GPIOs to exercise the driver.

Fixes: 7d29ed88ac ("pinctrl: aspeed: Read and write bits in LPC and GFX controllers")
Fixes: 674fa8daa8 ("pinctrl: aspeed-g5: Delay acquisition of regmaps")
Reported-by: John Wang <wangzqbj@inspur.com>
Tested-by: John Wang <wangzqbj@inspur.com>
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>

Link: https://lore.kernel.org/r/20190829071738.2523-1-andrew@aj.id.au
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2019-09-12 00:08:27 +01:00
David S. Miller
13d5231cc0 Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue
Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2019-09-11

This series contains fixes to ixgbe.

Alex fixes up the adaptive ITR scheme for ixgbe which could result in a
value that was either 0 or something less than 10 which was causing
issues with hardware features, like RSC, that do not function well with
ITR values that low.

Ilya Maximets fixes the ixgbe driver to limit the number of transmit
descriptors to clean by the number of transmit descriptors used in the
transmit ring, so that the driver does not try to "double" clean the
same descriptors.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-12 00:05:52 +01:00
Saiyam Doshi
2efc6bfadb gpio: remove explicit comparison with 0
No need to compare return value with 0. In case of non-zero
return value, the if condition will be true.

This makes intent a bit more clear to the reader.
"if (x) then", compared to "if (x is not zero) then".

Signed-off-by: Saiyam Doshi <saiyamdoshi.in@gmail.com>
Link: https://lore.kernel.org/r/20190907173910.GA9547@SD
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2019-09-12 00:05:29 +01:00
Neal Cardwell
af38d07ed3 tcp: fix tcp_ecn_withdraw_cwr() to clear TCP_ECN_QUEUE_CWR
Fix tcp_ecn_withdraw_cwr() to clear the correct bit:
TCP_ECN_QUEUE_CWR.

Rationale: basically, TCP_ECN_DEMAND_CWR is a bit that is purely about
the behavior of data receivers, and deciding whether to reflect
incoming IP ECN CE marks as outgoing TCP th->ece marks. The
TCP_ECN_QUEUE_CWR bit is purely about the behavior of data senders,
and deciding whether to send CWR. The tcp_ecn_withdraw_cwr() function
is only called from tcp_undo_cwnd_reduction() by data senders during
an undo, so it should zero the sender-side state,
TCP_ECN_QUEUE_CWR. It does not make sense to stop the reflection of
incoming CE bits on incoming data packets just because outgoing
packets were spuriously retransmitted.

The bug has been reproduced with packetdrill to manifest in a scenario
with RFC3168 ECN, with an incoming data packet with CE bit set and
carrying a TCP timestamp value that causes cwnd undo. Before this fix,
the IP CE bit was ignored and not reflected in the TCP ECE header bit,
and sender sent a TCP CWR ('W') bit on the next outgoing data packet,
even though the cwnd reduction had been undone.  After this fix, the
sender properly reflects the CE bit and does not set the W bit.

Note: the bug actually predates 2005 git history; this Fixes footer is
chosen to be the oldest SHA1 I have tested (from Sep 2007) for which
the patch applies cleanly (since before this commit the code was in a
.h file).

Fixes: bdf1ee5d3b ("[TCP]: Move code from tcp_ecn.h to tcp*.c and tcp.h & remove it")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-11 23:53:18 +01:00
yongduan
060423bfde vhost: make sure log_num < in_num
The code assumes log_num < in_num everywhere, and that is true as long as
in_num is incremented by descriptor iov count, and log_num by 1. However
this breaks if there's a zero sized descriptor.

As a result, if a malicious guest creates a vring desc with desc.len = 0,
it may cause the host kernel to crash by overflowing the log array. This
bug can be triggered during the VM migration.

There's no need to log when desc.len = 0, so just don't increment log_num
in this case.

Fixes: 3a4d5c94e9 ("vhost_net: a kernel-level virtio server")
Cc: stable@vger.kernel.org
Reviewed-by: Lidong Chen <lidongchen@tencent.com>
Signed-off-by: ruippan <ruippan@tencent.com>
Signed-off-by: yongduan <yongduan@tencent.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-09-11 15:15:26 -04:00
Michael S. Tsirkin
a89db445fb vhost: block speculation of translated descriptors
iovec addresses coming from vhost are assumed to be
pre-validated, but in fact can be speculated to a value
out of range.

Userspace address are later validated with array_index_nospec so we can
be sure kernel info does not leak through these addresses, but vhost
must also not leak userspace info outside the allowed memory table to
guests.

Following the defence in depth principle, make sure
the address is not validated out of node range.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Cc: stable@vger.kernel.org
Acked-by: Jason Wang <jasowang@redhat.com>
Tested-by: Jason Wang <jasowang@redhat.com>
2019-09-11 15:15:07 -04:00
Ilya Maximets
bf280c0387 ixgbe: fix double clean of Tx descriptors with xdp
Tx code doesn't clear the descriptors' status after cleaning.
So, if the budget is larger than number of used elems in a ring, some
descriptors will be accounted twice and xsk_umem_complete_tx will move
prod_tail far beyond the prod_head breaking the completion queue ring.

Fix that by limiting the number of descriptors to clean by the number
of used descriptors in the Tx ring.

'ixgbe_clean_xdp_tx_irq()' function refactored to look more like
'ixgbe_xsk_clean_tx_ring()' since we're allowed to directly use
'next_to_clean' and 'next_to_use' indexes.

CC: stable@vger.kernel.org
Fixes: 8221c5eba8 ("ixgbe: add AF_XDP zero-copy Tx support")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Tested-by: William Tu <u9012063@gmail.com>
Tested-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-09-11 09:42:18 -07:00
Alexander Duyck
377228accb ixgbe: Prevent u8 wrapping of ITR value to something less than 10us
There were a couple cases where the ITR value generated via the adaptive
ITR scheme could exceed 126. This resulted in the value becoming either 0
or something less than 10. Switching back and forth between a value less
than 10 and a value greater than 10 can cause issues as certain hardware
features such as RSC to not function well when the ITR value has dropped
that low.

CC: stable@vger.kernel.org
Fixes: b4ded8327f ("ixgbe: Update adaptive ITR algorithm")
Reported-by: Gregg Leventhal <gleventhal@janestreet.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-09-11 09:39:35 -07:00
Mark Brown
c4ad85026d
Merge branch 'regulator-5.4' into regulator-next 2019-09-11 16:00:19 +01:00
Mark Brown
d440c4efe4
Merge branch 'regulator-5.3' into regulator-linus 2019-09-11 16:00:17 +01:00
Lukas Wunner
2b8279aec1
spi: bcm2835: Speed up RX-only DMA transfers by zero-filling TX FIFO
The BCM2835 SPI driver currently sets the SPI_CONTROLLER_MUST_TX flag.
When performing an RX-only transfer, this flag causes the SPI core to
allocate and DMA-map a dummy buffer which is copied to the TX FIFO.
The dummy buffer is necessary because the chip is not capable of
automatically clocking out null bytes.

Avoid the overhead induced by the dummy buffer by preallocating a
reusable DMA transaction which fills the TX FIFO by cyclically copying
from the zero page.  The transaction requires very little CPU time to
submit and generates no interrupts while running.  Specifics are
provided in kerneldoc comments.

[Nathan Chancellor contributed a DMA mapping fixup for an early version
of this commit, hence his Signed-off-by.]

Tested-by: Nuno Sá <nuno.sa@analog.com>
Tested-by: Noralf Trønnes <noralf@tronnes.org>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Stefan Wahren <wahrenst@gmx.net>
Acked-by: Martin Sperl <kernel@martin.sperl.org>
Cc: Robert Jarzmik <robert.jarzmik@free.fr>
Link: https://lore.kernel.org/r/f45920af18dbf06e34129bbc406f53dc9c5d1075.1568187525.git.lukas@wunner.de
Signed-off-by: Mark Brown <broonie@kernel.org>
2019-09-11 15:57:46 +01:00
Lukas Wunner
8259bf667a
spi: bcm2835: Speed up TX-only DMA transfers by clearing RX FIFO
The BCM2835 SPI driver currently sets the SPI_CONTROLLER_MUST_RX flag.
When performing a TX-only transfer, this flag causes the SPI core to
allocate and DMA-map a dummy buffer into which the RX FIFO contents are
copied.  The dummy buffer is necessary because the chip is not capable
of disabling the receiver or automatically throwing away received data.
Not reading the RX FIFO isn't an option either since transmission is
halted once it's full.

Avoid the overhead induced by the dummy buffer by preallocating a
reusable DMA transaction which cyclically clears the RX FIFO.  The
transaction requires very little CPU time to submit and generates no
interrupts while running.  Specifics are provided in kerneldoc comments.

With a ks8851 Ethernet chip attached to the SPI controller, I am seeing
a 30 us reduction in ping time with this commit (1.819 ms vs. 1.849 ms,
average of 100,000 packets) as well as a 2% reduction in CPU time
(75:08 vs. 76:39 for transmission of 5 GByte over the SPI bus).

The commit uses the TX DMA interrupt to signal completion of a transfer.
This interrupt is raised once all bytes have been written to the
TX FIFO and it is then necessary to busy-wait for the TX FIFO to become
empty before the transfer can be finalized.  As an alternative approach,
I have explored using the SPI controller's DONE interrupt to detect
completion.  This interrupt is signaled when the TX FIFO becomes empty,
avoiding the need to busy-wait.  However latency deteriorates compared
to the present commit and surprisingly, CPU time is slightly higher as
well:

It turns out that in 45% of the cases, no busy-waiting is needed at all
and in 76% of the cases, less than 10 busy-wait iterations are
sufficient for the TX FIFO to drain.  This was measured on an RT kernel.
On a vanilla kernel, wakeup latency is worse and thus fewer iterations
are needed.  The measurements were made with an SPI clock of 20 MHz,
they may differ slightly for slower or faster clock speeds.

Previously we always used the RX DMA interrupt to signal completion of a
transfer.  Using the TX DMA interrupt now introduces a race condition:
TX DMA is always started before RX DMA so that bytes are already clocked
out while RX DMA is still being set up.  But if a TX-only transfer is
very short, then the TX DMA interrupt may occur before RX DMA is set up.
If the interrupt happens to occur on the same CPU, setup of RX DMA may
even be delayed until after the interrupt was handled.

I've solved this by having the TX DMA callback clear the RX FIFO while
busy-waiting for the TX FIFO to drain, thus avoiding a dependency on
setup of RX DMA.  Additionally, I am using a lock-free mechanism with
two flags, tx_dma_active and rx_dma_active plus memory barriers to
terminate RX DMA either by the TX DMA callback or immediately after
setting it up, whichever wins the race.  I've explored an alternative
approach which temporarily disables the TX DMA callback until RX DMA
has been set up (using tasklet_disable(), local_bh_disable() or
local_irq_save()), but the performance was minimally worse.

[Nathan Chancellor contributed a DMA mapping fixup for an early version
of this commit, hence his Signed-off-by.]

Tested-by: Nuno Sá <nuno.sa@analog.com>
Tested-by: Noralf Trønnes <noralf@tronnes.org>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Stefan Wahren <wahrenst@gmx.net>
Acked-by: Martin Sperl <kernel@martin.sperl.org>
Cc: Robert Jarzmik <robert.jarzmik@free.fr>
Link: https://lore.kernel.org/r/874949385f28251e2dcaa9494e39a27b50e9f9e4.1568187525.git.lukas@wunner.de
Signed-off-by: Mark Brown <broonie@kernel.org>
2019-09-11 15:57:30 +01:00
Lukas Wunner
bf75703d09
dmaengine: bcm2835: Avoid accessing memory when copying zeroes
The BCM2835 DMA controller is capable of synthesizing zeroes instead of
copying them from a source address. The feature is enabled by setting
the SRC_IGNORE bit in the Transfer Information field of a Control Block:

"Do not perform source reads.
 In addition, destination writes will zero all the write strobes.
 This is used for fast cache fill operations."
https://www.raspberrypi.org/app/uploads/2012/02/BCM2835-ARM-Peripherals.pdf

The feature is only available on 8 of the 16 channels. The others are
so-called "lite" channels with a limited feature set and performance.

Enable the feature if a cyclic transaction copies from the zero page.
This reduces traffic on the memory bus.

A forthcoming use case is the BCM2835 SPI driver, which will cyclically
copy from the zero page to the TX FIFO. The idea to use SRC_IGNORE was
taken from an ancient GitHub conversation between Martin and Noralf:
https://github.com/msperl/spi-bcm2835/issues/13#issuecomment-98180451

Tested-by: Nuno Sá <nuno.sa@analog.com>
Tested-by: Noralf Trønnes <noralf@tronnes.org>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Vinod Koul <vkoul@kernel.org>
Acked-by: Stefan Wahren <wahrenst@gmx.net>
Acked-by: Martin Sperl <kernel@martin.sperl.org>
Cc: Florian Kauer <florian.kauer@koalo.de>
Link: https://lore.kernel.org/r/b2286c904408745192e4beb3de3c88f73e4a7210.1568187525.git.lukas@wunner.de
Signed-off-by: Mark Brown <broonie@kernel.org>
2019-09-11 15:56:46 +01:00
Lukas Wunner
571e31fa60
spi: bcm2835: Cache CS register value for ->prepare_message()
The BCM2835 SPI driver needs to set up the clock polarity in its
->prepare_message() hook before spi_transfer_one_message() asserts chip
select to avoid a gratuitous clock signal edge (cf. commit acace73df2
("spi: bcm2835: set up spi-mode before asserting cs-gpio")).

Precalculate the CS register value (which selects the clock polarity)
once in ->setup() and use that cached value in ->prepare_message() and
->transfer_one().  This avoids one MMIO read per message and one per
transfer, yielding a small latency improvement.  Additionally, a
forthcoming commit will use the precalculated value to derive the
register value for clearing the RX FIFO, which will eliminate the need
for an RX dummy buffer when performing TX-only DMA transfers.

Tested-by: Nuno Sá <nuno.sa@analog.com>
Tested-by: Noralf Trønnes <noralf@tronnes.org>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Stefan Wahren <wahrenst@gmx.net>
Acked-by: Martin Sperl <kernel@martin.sperl.org>
Link: https://lore.kernel.org/r/d17c1d7fcdc97fffa961b8737cfd80eeb14f9416.1568187525.git.lukas@wunner.de
Signed-off-by: Mark Brown <broonie@kernel.org>
2019-09-11 15:56:30 +01:00
Lukas Wunner
c3ef820783
dmaengine: bcm2835: Document struct bcm2835_dmadev
Document the BCM2835 DMA driver's device data structure so that upcoming
commits may add further members with proper kerneldoc.

Tested-by: Nuno Sá <nuno.sa@analog.com>
Tested-by: Noralf Trønnes <noralf@tronnes.org>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Vinod Koul <vkoul@kernel.org>
Acked-by: Stefan Wahren <wahrenst@gmx.net>
Acked-by: Martin Sperl <kernel@martin.sperl.org>
Cc: Florian Kauer <florian.kauer@koalo.de>
Link: https://lore.kernel.org/r/78648f80f67d97bb7beecc1b9be6b6e4a45bc1d8.1568187525.git.lukas@wunner.de
Signed-off-by: Mark Brown <broonie@kernel.org>
2019-09-11 15:53:27 +01:00
Lukas Wunner
229e6af102
spi: Guarantee cacheline alignment of driver-private data
__spi_alloc_controller() uses a single allocation to accommodate struct
spi_controller and the driver-private data, but places the latter behind
the former.  This order does not guarantee cacheline alignment of the
driver-private data.  (It does guarantee cacheline alignment of struct
spi_controller but the structure doesn't make any use of that property.)

Round up struct spi_controller to cacheline size.  A forthcoming commit
leverages this to grant DMA access to driver-private data of the BCM2835
SPI master.

An alternative, less economical approach would be to use two allocations.

A third approach consists of reversing the order to conserve memory.
But Mark Brown is concerned that it may result in a performance penalty
on architectures that don't like unaligned accesses.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Link: https://lore.kernel.org/r/01625b9b26b93417fb09d2c15ad02dfe9cdbbbe5.1568187525.git.lukas@wunner.de
Signed-off-by: Mark Brown <broonie@kernel.org>
2019-09-11 15:53:11 +01:00
Lukas Wunner
6f6869dc97
dmaengine: bcm2835: Allow reusable descriptors
The DMA engine API requires DMA drivers to explicitly allow that
descriptors are prepared once and reused multiple times. Only a
single driver makes use of this functionality so far (pxa_dma.c,
to speed up pxa_camera.c).

We're about to add another use case for reusable descriptors in
the BCM2835 SPI driver, so allow that in the BCM2835 DMA driver.

Tested-by: Nuno Sá <nuno.sa@analog.com>
Tested-by: Noralf Trønnes <noralf@tronnes.org>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Vinod Koul <vkoul@kernel.org>
Acked-by: Stefan Wahren <wahrenst@gmx.net>
Acked-by: Martin Sperl <kernel@martin.sperl.org>
Cc: Florian Kauer <florian.kauer@koalo.de>
Cc: Robert Jarzmik <robert.jarzmik@free.fr>
Link: https://lore.kernel.org/r/bfc98a38225bbec4158440ad06cb9eee675e3e6f.1568187525.git.lukas@wunner.de
Signed-off-by: Mark Brown <broonie@kernel.org>
2019-09-11 15:53:02 +01:00
Lukas Wunner
4f2228cce2
dmaengine: bcm2835: Allow cyclic transactions without interrupt
The BCM2835 DMA driver currently requests an interrupt from the
controller regardless whether or not the client has passed in the
DMA_PREP_INTERRUPT flag. This causes unnecessary overhead for cyclic
transactions which do not need an interrupt after each period.

We're about to add such a use case, namely cyclic clearing of the SPI
controller's RX FIFO, so amend the DMA driver to request an interrupt
only if DMA_PREP_INTERRUPT was passed in. Ignore the period_len for
such transactions and set it to the buffer length to make the driver's
calculations work.

Tested-by: Nuno Sá <nuno.sa@analog.com>
Tested-by: Noralf Trønnes <noralf@tronnes.org>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Vinod Koul <vkoul@kernel.org>
Acked-by: Stefan Wahren <wahrenst@gmx.net>
Acked-by: Martin Sperl <kernel@martin.sperl.org>
Cc: Florian Kauer <florian.kauer@koalo.de>
Link: https://lore.kernel.org/r/73cf37be56eb4cbe6f696057c719f3a38cbaf26e.1568187525.git.lukas@wunner.de
Signed-off-by: Mark Brown <broonie@kernel.org>
2019-09-11 15:52:53 +01:00
Lukas Wunner
1513ceee70
spi: bcm2835: Drop dma_pending flag
The BCM2835 SPI driver uses a flag to keep track of whether a DMA
transfer is in progress.

The flag is used to avoid terminating DMA channels multiple times if a
transfer finishes orderly while simultaneously the SPI core invokes the
->handle_err() callback because the transfer took too long.  However
terminating DMA channels multiple times is perfectly fine, so the flag
is unnecessary for this particular purpose.

The flag is also used to avoid invoking bcm2835_spi_undo_prologue()
multiple times under this race condition.  However multiple *concurrent*
invocations can no longer happen since commit 2527704d84 ("spi:
bcm2835: Synchronize with callback on DMA termination") because the
->handle_err() callback now uses the _sync() variant when terminating
DMA channels.

The only raison d'être of the flag is therefore that
bcm2835_spi_undo_prologue() cannot cope with multiple *sequential*
invocations.  Achieve that by setting tx_prologue to 0 at the end of
the function.  Subsequent invocations thus become no-ops.

With that, the dma_pending flag becomes unnecessary, so drop it.

Tested-by: Nuno Sá <nuno.sa@analog.com>
Tested-by: Noralf Trønnes <noralf@tronnes.org>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Stefan Wahren <wahrenst@gmx.net>
Acked-by: Martin Sperl <kernel@martin.sperl.org>
Link: https://lore.kernel.org/r/062b03b7f86af77a13ce0ec3b22e0bdbfcfba10d.1568187525.git.lukas@wunner.de
Signed-off-by: Mark Brown <broonie@kernel.org>
2019-09-11 15:52:33 +01:00
Colin Ian King
f4b752a6b2 mlx4: fix spelling mistake "veify" -> "verify"
There is a spelling mistake in a mlx4_err error message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-11 15:20:04 +01:00
Colin Ian King
c3dc1fa722 net: hns3: fix spelling mistake "undeflow" -> "underflow"
There is a spelling mistake in a .msg literal string. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-11 15:17:00 +01:00
Colin Ian King
b93fb20f01 net: lmc: fix spelling mistake "runnin" -> "running"
There is a spelling mistake in the lmc_trace message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-11 15:11:59 +01:00
Colin Ian King
90aa11f1bc NFC: st95hf: fix spelling mistake "receieve" -> "receive"
There is a spelling mistake in a dev_err message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-11 15:07:07 +01:00
Ka-Cheong Poon
c5c1a030a7 net/rds: An rds_sock is added too early to the hash table
In rds_bind(), an rds_sock is added to the RDS bind hash table before
rs_transport is set.  This means that the socket can be found by the
receive code path when rs_transport is NULL.  And the receive code
path de-references rs_transport for congestion update check.  This can
cause a panic.  An rds_sock should not be added to the bind hash table
before all the needed fields are set.

Reported-by: syzbot+4b4f8163c2e246df3c4c@syzkaller.appspotmail.com
Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-11 15:05:40 +01:00
Jouni Malinen
3e493173b7 mac80211: Do not send Layer 2 Update frame before authorization
The Layer 2 Update frame is used to update bridges when a station roams
to another AP even if that STA does not transmit any frames after the
reassociation. This behavior was described in IEEE Std 802.11F-2003 as
something that would happen based on MLME-ASSOCIATE.indication, i.e.,
before completing 4-way handshake. However, this IEEE trial-use
recommended practice document was published before RSN (IEEE Std
802.11i-2004) and as such, did not consider RSN use cases. Furthermore,
IEEE Std 802.11F-2003 was withdrawn in 2006 and as such, has not been
maintained amd should not be used anymore.

Sending out the Layer 2 Update frame immediately after association is
fine for open networks (and also when using SAE, FT protocol, or FILS
authentication when the station is actually authenticated by the time
association completes). However, it is not appropriate for cases where
RSN is used with PSK or EAP authentication since the station is actually
fully authenticated only once the 4-way handshake completes after
authentication and attackers might be able to use the unauthenticated
triggering of Layer 2 Update frame transmission to disrupt bridge
behavior.

Fix this by postponing transmission of the Layer 2 Update frame from
station entry addition to the point when the station entry is marked
authorized. Similarly, send out the VLAN binding update only if the STA
entry has already been authorized.

Signed-off-by: Jouni Malinen <jouni@codeaurora.org>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-11 14:59:26 +01:00
Daniel Drake
49baa01c8b Revert "mmc: sdhci: Remove unneeded quirk2 flag of O2 SD host controller"
This reverts commit 414126f9e5.

This commit broke eMMC storage access on a new consumer MiniPC based on
AMD SoC, which has eMMC connected to:

02:00.0 SD Host controller: O2 Micro, Inc. Device 8620 (rev 01) (prog-if 01)
	Subsystem: O2 Micro, Inc. Device 0002

During probe, several errors are seen including:

  mmc1: Got data interrupt 0x02000000 even though no data operation was in progress.
  mmc1: Timeout waiting for hardware interrupt.
  mmc1: error -110 whilst initialising MMC card

Reverting this commit allows the eMMC storage to be detected & usable
again.

Signed-off-by: Daniel Drake <drake@endlessm.com>
Fixes: 414126f9e5 ("mmc: sdhci: Remove unneeded quirk2 flag of O2 SD host
controller")
Cc: stable@vger.kernel.org # v5.1+
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2019-09-11 15:57:21 +02:00
Stefan Wahren
aea64b5836 Revert "mmc: bcm2835: Terminate timeout work synchronously"
The commit 37fefadee8 ("mmc: bcm2835: Terminate timeout work
synchronously") causes lockups in case of hardware timeouts due the
timeout work also calling cancel_delayed_work_sync() on its own.
So revert it.

Fixes: 37fefadee8 ("mmc: bcm2835: Terminate timeout work synchronously")
Cc: stable@vger.kernel.org
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2019-09-11 15:57:21 +02:00
YueHaibing
aba30f6f31 gpio: creg-snps: use devm_platform_ioremap_resource() to simplify code
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20190906131032.22148-1-yuehaibing@huawei.com
Acked-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2019-09-11 14:50:02 +01:00
Geert Uytterhoeven
ac57199180 gpio: devres: Switch to EXPORT_SYMBOL_GPL()
Change all exported symbols for managed GPIO functions from
EXPORT_SYMBOL() to EXPORT_SYMBOL_GPL(), like is used for their
non-managed counterparts.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20190906084539.21838-5-geert+renesas@glider.be
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2019-09-11 14:47:39 +01:00
Geert Uytterhoeven
6d6624554d gpio: of: Switch to EXPORT_SYMBOL_GPL()
All exported functions provide genuine Linux-specific functionality.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20190906084539.21838-4-geert+renesas@glider.be
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2019-09-11 14:46:52 +01:00
Geert Uytterhoeven
b0c7e73b51 gpio: of: Make of_gpio_simple_xlate() private
Since commit 9a95e8d25a ("gpio: remove etraxfs driver"), there are
no more users of of_gpio_simple_xlate() outside gpiolib-of.c.
All GPIO drivers that need it now rely on of_gpiochip_add() setting it
up as the default translate function.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20190906084539.21838-3-geert+renesas@glider.be
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2019-09-11 14:46:02 +01:00
Geert Uytterhoeven
c83d3c7733 gpio: of: Make of_get_named_gpiod_flags() private
Since commit f626d6dfb7 ("gpio: of: Break out OF-only code"),
there are no more users of of_get_named_gpiod_flags() outside
gpiolib-of.c.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20190906084539.21838-2-geert+renesas@glider.be
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2019-09-11 14:45:01 +01:00