linux-stable/include
Shakeel Butt f1aedd2ffe Revert "memcg: cleanup racy sum avoidance code"
commit dbb16df644 upstream.

This reverts commit 96e51ccf1a.

Recently we started running the kernel with rstat infrastructure on
production traffic and begin to see negative memcg stats values.
Particularly the 'sock' stat is the one which we observed having negative
value.

$ grep "sock " /mnt/memory/job/memory.stat
sock 253952
total_sock 18446744073708724224

Re-run after couple of seconds

$ grep "sock " /mnt/memory/job/memory.stat
sock 253952
total_sock 53248

For now we are only seeing this issue on large machines (256 CPUs) and
only with 'sock' stat.  I think the networking stack increase the stat on
one cpu and decrease it on another cpu much more often.  So, this negative
sock is due to rstat flusher flushing the stats on the CPU that has seen
the decrement of sock but missed the CPU that has increments.  A typical
race condition.

For easy stable backport, revert is the most simple solution.  For long
term solution, I am thinking of two directions.  First is just reduce the
race window by optimizing the rstat flusher.  Second is if the reader sees
a negative stat value, force flush and restart the stat collection.
Basically retry but limited.

Link: https://lkml.kernel.org/r/20220817172139.3141101-1-shakeelb@google.com
Fixes: 96e51ccf1a ("memcg: cleanup racy sum avoidance code")
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Cc: "Michal Koutný" <mkoutny@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: <stable@vger.kernel.org>	[5.15]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-31 17:16:48 +02:00
..
acpi ACPI: CPPC: Do not prevent CPPC from working in the future 2022-08-17 14:24:25 +02:00
asm-generic asm-generic: sections: refactor memory_intersects 2022-08-31 17:16:47 +02:00
clocksource
crypto crypto: blake2s - remove shash module 2022-08-17 14:24:19 +02:00
drm drm/bridge: Add a function to abstract away panels 2022-08-17 14:23:24 +02:00
dt-bindings clk: qcom: gcc-msm8939: Add missing SYSTEM_MM_NOC_BFDCD_CLK_SRC 2022-08-17 14:23:45 +02:00
keys
kunit
kvm
linux Revert "memcg: cleanup racy sum avoidance code" 2022-08-31 17:16:48 +02:00
math-emu
media media: cec: fix a deadlock situation 2022-01-27 11:02:53 +01:00
memory memory: renesas-rpc-if: Fix HF/OSPI data transfer in Manual Mode 2022-05-09 09:14:34 +02:00
misc
net tcp: expose the tcp_mark_push() and tcp_skb_entail() helpers 2022-08-31 17:16:44 +02:00
pcmcia
ras Revert "mm/memory-failure.c: fix race with changing page compound again" 2022-07-12 16:35:17 +02:00
rdma
scsi scsi: iscsi: Fix session removal on shutdown 2022-08-17 14:23:45 +02:00
soc
sound ALSA: control: Use deferred fasync helper 2022-08-25 11:40:44 +02:00
target
trace tracing/perf: Avoid -Warray-bounds warning for __rel_loc macro 2022-08-17 14:24:30 +02:00
uapi netfilter: xtables: Bring SPDX identifier back 2022-08-17 14:23:42 +02:00
vdso
video video: of_display_timing.h: include errno.h 2022-07-12 16:35:10 +02:00
xen xen/gnttab: fix gnttab_end_foreign_access() without page specified 2022-03-11 12:22:37 +01:00