Go to file
John Fastabend a454d84ee2 bpf, sockmap: Fix skb refcnt race after locking changes
There is a race where skb's from the sk_psock_backlog can be referenced
after userspace side has already skb_consumed() the sk_buff and its refcnt
dropped to zer0 causing use after free.

The flow is the following:

  while ((skb = skb_peek(&psock->ingress_skb))
    sk_psock_handle_Skb(psock, skb, ..., ingress)
    if (!ingress) ...
    sk_psock_skb_ingress
       sk_psock_skb_ingress_enqueue(skb)
          msg->skb = skb
          sk_psock_queue_msg(psock, msg)
    skb_dequeue(&psock->ingress_skb)

The sk_psock_queue_msg() puts the msg on the ingress_msg queue. This is
what the application reads when recvmsg() is called. An application can
read this anytime after the msg is placed on the queue. The recvmsg hook
will also read msg->skb and then after user space reads the msg will call
consume_skb(skb) on it effectively free'ing it.

But, the race is in above where backlog queue still has a reference to
the skb and calls skb_dequeue(). If the skb_dequeue happens after the
user reads and free's the skb we have a use after free.

The !ingress case does not suffer from this problem because it uses
sendmsg_*(sk, msg) which does not pass the sk_buff further down the
stack.

The following splat was observed with 'test_progs -t sockmap_listen':

  [ 1022.710250][ T2556] general protection fault, ...
  [...]
  [ 1022.712830][ T2556] Workqueue: events sk_psock_backlog
  [ 1022.713262][ T2556] RIP: 0010:skb_dequeue+0x4c/0x80
  [ 1022.713653][ T2556] Code: ...
  [...]
  [ 1022.720699][ T2556] Call Trace:
  [ 1022.720984][ T2556]  <TASK>
  [ 1022.721254][ T2556]  ? die_addr+0x32/0x80^M
  [ 1022.721589][ T2556]  ? exc_general_protection+0x25a/0x4b0
  [ 1022.722026][ T2556]  ? asm_exc_general_protection+0x22/0x30
  [ 1022.722489][ T2556]  ? skb_dequeue+0x4c/0x80
  [ 1022.722854][ T2556]  sk_psock_backlog+0x27a/0x300
  [ 1022.723243][ T2556]  process_one_work+0x2a7/0x5b0
  [ 1022.723633][ T2556]  worker_thread+0x4f/0x3a0
  [ 1022.723998][ T2556]  ? __pfx_worker_thread+0x10/0x10
  [ 1022.724386][ T2556]  kthread+0xfd/0x130
  [ 1022.724709][ T2556]  ? __pfx_kthread+0x10/0x10
  [ 1022.725066][ T2556]  ret_from_fork+0x2d/0x50
  [ 1022.725409][ T2556]  ? __pfx_kthread+0x10/0x10
  [ 1022.725799][ T2556]  ret_from_fork_asm+0x1b/0x30
  [ 1022.726201][ T2556]  </TASK>

To fix we add an skb_get() before passing the skb to be enqueued in the
engress queue. This bumps the skb->users refcnt so that consume_skb()
and kfree_skb will not immediately free the sk_buff. With this we can
be sure the skb is still around when we do the dequeue. Then we just
need to decrement the refcnt or free the skb in the backlog case which
we do by calling kfree_skb() on the ingress case as well as the sendmsg
case.

Before locking change from fixes tag we had the sock locked so we
couldn't race with user and there was no issue here.

Fixes: 799aa7f98d ("skmsg: Avoid lock_sock() in sk_psock_backlog()")
Reported-by: Jiri Olsa  <jolsa@kernel.org>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Xu Kuohai <xukuohai@huawei.com>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20230901202137.214666-1-john.fastabend@gmail.com
2023-09-04 09:53:35 +02:00
Documentation docs/bpf: Fix "file doesn't exist" warnings in {llvm_reloc,btf}.rst 2023-09-01 15:56:48 +02:00
LICENSES LICENSES: Add the copyleft-next-0.3.1 license 2022-11-08 15:44:01 +01:00
arch Networking changes for 6.6. 2023-08-29 11:33:01 -07:00
block v6.6-vfs.super 2023-08-28 11:04:18 -07:00
certs certs: Reference revocation list for all keyrings 2023-08-17 20:12:41 +00:00
crypto This update includes the following changes: 2023-08-29 11:23:29 -07:00
drivers sfc: check for zero length in EF10 RX prefix 2023-09-01 08:14:57 +01:00
fs pstore updates for v6.6-rc1 2023-08-28 12:36:04 -07:00
include ipv6: ignore dst hint for multipath routes 2023-09-01 08:11:51 +01:00
init Scheduler changes for v6.6: 2023-08-28 16:43:39 -07:00
io_uring New code for 6.6: 2023-08-28 11:59:52 -07:00
ipc mqueue: convert to ctime accessor functions 2023-07-24 10:30:07 +02:00
kernel netfilter: nf_tables: Audit log rule reset 2023-08-31 01:29:28 +02:00
lib Networking changes for 6.6. 2023-08-29 11:33:01 -07:00
mm Networking changes for 6.6. 2023-08-29 11:33:01 -07:00
net bpf, sockmap: Fix skb refcnt race after locking changes 2023-09-04 09:53:35 +02:00
rust Rust changes for v6.6 2023-08-29 08:19:46 -07:00
samples Networking changes for 6.6. 2023-08-29 11:33:01 -07:00
scripts bpf, docs: Fix invalid escape sequence warnings in bpf_doc.py 2023-08-31 13:56:31 +02:00
security Networking changes for 6.6. 2023-08-29 11:33:01 -07:00
sound regmap: Updates for v6.6 2023-08-29 09:26:04 -07:00
tools selftests/bpf: Fix a CI failure caused by vsock write 2023-09-01 10:56:00 +02:00
usr initramfs: Encode dependency on KBUILD_BUILD_TIMESTAMP 2023-06-06 17:54:49 +09:00
virt KVM: Grab a reference to KVM for VM and vCPU stats file descriptors 2023-07-29 11:05:28 -04:00
.clang-format iommu: Add for_each_group_device() 2023-05-23 08:15:51 +02:00
.cocciconfig
.get_maintainer.ignore get_maintainer: add Alan to .get_maintainer.ignore 2022-08-20 15:17:44 -07:00
.gitattributes .gitattributes: set diff driver for Rust source code files 2023-05-31 17:48:25 +02:00
.gitignore Revert ".gitignore: ignore *.cover and *.mbx" 2023-07-04 15:05:12 -07:00
.mailmap mailmap: add entries for Simon Horman 2023-08-16 09:53:10 +01:00
.rustfmt.toml rust: add `.rustfmt.toml` 2022-09-28 09:02:20 +02:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS - Address -Wmissing-prototype warnings 2023-06-26 16:43:54 -07:00
Kbuild Kbuild updates for v6.1 2022-10-10 12:00:45 -07:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS Networking changes for 6.6. 2023-08-29 11:33:01 -07:00
Makefile Rust changes for v6.6 2023-08-29 08:19:46 -07:00
README

README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.