No description
Find a file
Shakeel Butt 944f720534 cgroup: memcg: net: do not associate sock with unrelated cgroup
[ Upstream commit e876ecc67d ]

We are testing network memory accounting in our setup and noticed
inconsistent network memory usage and often unrelated cgroups network
usage correlates with testing workload. On further inspection, it
seems like mem_cgroup_sk_alloc() and cgroup_sk_alloc() are broken in
irq context specially for cgroup v1.

mem_cgroup_sk_alloc() and cgroup_sk_alloc() can be called in irq context
and kind of assumes that this can only happen from sk_clone_lock()
and the source sock object has already associated cgroup. However in
cgroup v1, where network memory accounting is opt-in, the source sock
can be unassociated with any cgroup and the new cloned sock can get
associated with unrelated interrupted cgroup.

Cgroup v2 can also suffer if the source sock object was created by
process in the root cgroup or if sk_alloc() is called in irq context.
The fix is to just do nothing in interrupt.

WARNING: Please note that about half of the TCP sockets are allocated
from the IRQ context, so, memory used by such sockets will not be
accouted by the memcg.

The stack trace of mem_cgroup_sk_alloc() from IRQ-context:

CPU: 70 PID: 12720 Comm: ssh Tainted:  5.6.0-smp-DEV #1
Hardware name: ...
Call Trace:
 <IRQ>
 dump_stack+0x57/0x75
 mem_cgroup_sk_alloc+0xe9/0xf0
 sk_clone_lock+0x2a7/0x420
 inet_csk_clone_lock+0x1b/0x110
 tcp_create_openreq_child+0x23/0x3b0
 tcp_v6_syn_recv_sock+0x88/0x730
 tcp_check_req+0x429/0x560
 tcp_v6_rcv+0x72d/0xa40
 ip6_protocol_deliver_rcu+0xc9/0x400
 ip6_input+0x44/0xd0
 ? ip6_protocol_deliver_rcu+0x400/0x400
 ip6_rcv_finish+0x71/0x80
 ipv6_rcv+0x5b/0xe0
 ? ip6_sublist_rcv+0x2e0/0x2e0
 process_backlog+0x108/0x1e0
 net_rx_action+0x26b/0x460
 __do_softirq+0x104/0x2a6
 do_softirq_own_stack+0x2a/0x40
 </IRQ>
 do_softirq.part.19+0x40/0x50
 __local_bh_enable_ip+0x51/0x60
 ip6_finish_output2+0x23d/0x520
 ? ip6table_mangle_hook+0x55/0x160
 __ip6_finish_output+0xa1/0x100
 ip6_finish_output+0x30/0xd0
 ip6_output+0x73/0x120
 ? __ip6_finish_output+0x100/0x100
 ip6_xmit+0x2e3/0x600
 ? ipv6_anycast_cleanup+0x50/0x50
 ? inet6_csk_route_socket+0x136/0x1e0
 ? skb_free_head+0x1e/0x30
 inet6_csk_xmit+0x95/0xf0
 __tcp_transmit_skb+0x5b4/0xb20
 __tcp_send_ack.part.60+0xa3/0x110
 tcp_send_ack+0x1d/0x20
 tcp_rcv_state_process+0xe64/0xe80
 ? tcp_v6_connect+0x5d1/0x5f0
 tcp_v6_do_rcv+0x1b1/0x3f0
 ? tcp_v6_do_rcv+0x1b1/0x3f0
 __release_sock+0x7f/0xd0
 release_sock+0x30/0xa0
 __inet_stream_connect+0x1c3/0x3b0
 ? prepare_to_wait+0xb0/0xb0
 inet_stream_connect+0x3b/0x60
 __sys_connect+0x101/0x120
 ? __sys_getsockopt+0x11b/0x140
 __x64_sys_connect+0x1a/0x20
 do_syscall_64+0x51/0x200
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

The stack trace of mem_cgroup_sk_alloc() from IRQ-context:
Fixes: 2d75807383 ("mm: memcontrol: consolidate cgroup socket tracking")
Fixes: d979a39d72 ("cgroup: duplicate cgroup reference when cloning sockets")
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-20 10:54:09 +01:00
arch powerpc: fix hardware PMU exception bug on PowerVM compatibility mode systems 2020-03-11 18:03:08 +01:00
block block: don't use bio->bi_vcnt to figure out segment number 2020-01-27 14:46:19 +01:00
certs Replace magic for trusting the secondary keyring with #define 2018-09-09 19:55:54 +02:00
crypto crypto: api - Fix race condition in crypto_spawn_alg 2020-02-14 16:32:14 -05:00
Documentation PM / devfreq: Add new name attribute for sysfs 2020-02-05 14:18:12 +00:00
drivers bnxt_en: reinitialize IRQs when MTU is modified 2020-03-20 10:54:09 +01:00
firmware Fix built-in early-load Intel microcode alignment 2020-01-23 08:20:30 +01:00
fs fat: fix uninit-memory access for partial initialized inode 2020-03-11 18:03:03 +01:00
include include/linux/bitops.h: introduce BITS_PER_TYPE 2020-03-11 18:02:52 +01:00
init fork: fix some -Wmissing-prototypes warnings 2019-12-05 15:37:52 +01:00
ipc Revert "ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()" 2020-02-28 16:36:12 +01:00
kernel cgroup: memcg: net: do not associate sock with unrelated cgroup 2020-03-20 10:54:09 +01:00
lib lib/stackdepot.c: fix global out-of-bounds in stack_slabs 2020-02-28 16:36:13 +01:00
mm cgroup: memcg: net: do not associate sock with unrelated cgroup 2020-03-20 10:54:09 +01:00
net net/packet: tpacket_rcv: do not increment ring index on drop 2020-03-20 10:54:08 +01:00
samples samples/bpf: Don't try to remove user's homedir on clean 2020-02-14 16:32:13 -05:00
scripts kconfig: fix broken dependency in randconfig-generated .config 2020-02-28 16:35:59 +01:00
security selinux: ensure we cleanup the internal AVC counters on error in avc_update() 2020-02-28 16:36:09 +01:00
sound ASoC: topology: Fix memleak in soc_tplg_manifest_load() 2020-03-11 18:03:09 +01:00
tools selftests: fix too long argument 2020-03-11 18:02:58 +01:00
usr kbuild: clean compressed initramfs image 2019-10-07 18:55:14 +02:00
virt KVM: Check for a bad hva before dropping into the ghc slow path 2020-03-11 18:02:53 +01:00
.cocciconfig
.get_maintainer.ignore
.gitattributes .gitattributes: set git diff driver for C source code files 2016-10-07 18:46:30 -07:00
.gitignore kbuild: rpm-pkg: keep spec file until make mrproper 2018-02-13 10:19:46 +01:00
.mailmap .mailmap: Add Maciej W. Rozycki's Imagination e-mail address 2017-11-10 12:16:15 -08:00
COPYING
CREDITS MAINTAINERS: update TPM driver infrastructure changes 2017-11-09 17:58:40 -08:00
Kbuild License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
Kconfig License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
MAINTAINERS MAINTAINERS: Update drm/i915 bug filing URL 2020-02-28 16:36:12 +01:00
Makefile Linux 4.14.173 2020-03-11 18:03:09 +01:00
README README: add a new README file, pointing to the Documentation/ 2016-10-24 08:12:35 -02:00

Linux kernel
============

This file was moved to Documentation/admin-guide/README.rst

Please notice that there are several guides for kernel developers and users.
These guides can be rendered in a number of formats, like HTML and PDF.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.
See Documentation/00-INDEX for a list of what is contained in each file.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.