No description
Find a file
Feng Tang 103c4a08ba mm: relocate 'write_protect_seq' in struct mm_struct
[ Upstream commit 2e3025434a ]

0day robot reported a 9.2% regression for will-it-scale mmap1 test
case[1], caused by commit 57efa1fe59 ("mm/gup: prevent gup_fast from
racing with COW during fork").

Further debug shows the regression is due to that commit changes the
offset of hot fields 'mmap_lock' inside structure 'mm_struct', thus some
cache alignment changes.

From the perf data, the contention for 'mmap_lock' is very severe and
takes around 95% cpu cycles, and it is a rw_semaphore

        struct rw_semaphore {
                atomic_long_t count;	/* 8 bytes */
                atomic_long_t owner;	/* 8 bytes */
                struct optimistic_spin_queue osq; /* spinner MCS lock */
                ...

Before commit 57efa1fe59 adds the 'write_protect_seq', it happens to
have a very optimal cache alignment layout, as Linus explained:

 "and before the addition of the 'write_protect_seq' field, the
  mmap_sem was at offset 120 in 'struct mm_struct'.

  Which meant that count and owner were in two different cachelines,
  and then when you have contention and spend time in
  rwsem_down_write_slowpath(), this is probably *exactly* the kind
  of layout you want.

  Because first the rwsem_write_trylock() will do a cmpxchg on the
  first cacheline (for the optimistic fast-path), and then in the
  case of contention, rwsem_down_write_slowpath() will just access
  the second cacheline.

  Which is probably just optimal for a load that spends a lot of
  time contended - new waiters touch that first cacheline, and then
  they queue themselves up on the second cacheline."

After the commit, the rw_semaphore is at offset 128, which means the
'count' and 'owner' fields are now in the same cacheline, and causes
more cache bouncing.

Currently there are 3 "#ifdef CONFIG_XXX" before 'mmap_lock' which will
affect its offset:

  CONFIG_MMU
  CONFIG_MEMBARRIER
  CONFIG_HAVE_ARCH_COMPAT_MMAP_BASES

The layout above is on 64 bits system with 0day's default kernel config
(similar to RHEL-8.3's config), in which all these 3 options are 'y'.
And the layout can vary with different kernel configs.

Relayouting a structure is usually a double-edged sword, as sometimes it
can helps one case, but hurt other cases.  For this case, one solution
is, as the newly added 'write_protect_seq' is a 4 bytes long seqcount_t
(when CONFIG_DEBUG_LOCK_ALLOC=n), placing it into an existing 4 bytes
hole in 'mm_struct' will not change other fields' alignment, while
restoring the regression.

Link: https://lore.kernel.org/lkml/20210525031636.GB7744@xsang-OptiPlex-9020/ [1]
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-23 14:42:49 +02:00
arch kvm: LAPIC: Restore guard to prevent illegal APIC register access 2021-06-23 14:42:41 +02:00
block blk-mq: Swap two calls in blk_mq_exit_queue() 2021-05-19 10:13:14 +02:00
certs certs: Fix blacklist flag type confusion 2021-03-04 11:37:59 +01:00
crypto async_xor: check src_offs is not NULL before updating it 2021-06-16 12:01:40 +02:00
Documentation ASoC: meson: gx-card: fix sound-dai dt schema 2021-06-16 12:01:45 +02:00
drivers hwmon: (scpi-hwmon) shows the negative temperature properly 2021-06-23 14:42:49 +02:00
fs fanotify: fix copy_event_to_user() fid error clean up 2021-06-23 14:42:41 +02:00
include mm: relocate 'write_protect_seq' in struct mm_struct 2021-06-23 14:42:49 +02:00
init pid: take a reference when initializing cad_pid 2021-06-10 13:39:26 +02:00
ipc ipc/mqueue, msg, sem: avoid relying on a stack reference past its expiry 2021-05-26 12:06:54 +02:00
kernel sched/pelt: Ensure that *_sum is always synced with *_avg 2021-06-23 14:42:48 +02:00
lib lib/lz4: explicitly support in-place decompression 2021-06-10 13:39:29 +02:00
LICENSES LICENSES/deprecated: add Zlib license text 2020-09-16 14:33:49 +02:00
mm mm/memory-failure: make sure wait for page writeback in memory_failure 2021-06-23 14:42:40 +02:00
net icmp: don't send out ICMP messages with a source address of 0.0.0.0 2021-06-23 14:42:47 +02:00
samples samples: vfio-mdev: fix error handing in mdpy_fb_probe() 2021-06-10 13:39:15 +02:00
scripts scripts/clang-tools: switch explicitly to Python 3 2021-06-03 09:00:52 +02:00
security KEYS: trusted: Fix memory leak on object td 2021-05-19 10:12:50 +02:00
sound ASoC: qcom: lpass-cpu: Fix pop noise during audio capture begin 2021-06-23 14:42:49 +02:00
tools ipv4: Fix device used for dst_alloc with local routes 2021-06-23 14:42:45 +02:00
usr Merge branch 'work.fdpic' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-08-07 13:29:39 -07:00
virt Revert "irqbypass: do not start cons/prod when failed connect" 2021-06-03 09:00:34 +02:00
.clang-format RDMA 5.10 pull request 2020-10-17 11:18:18 -07:00
.cocciconfig
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes .gitattributes: use 'dts' diff driver for dts files 2019-12-04 19:44:11 -08:00
.gitignore kbuild: generate Module.symvers only when vmlinux exists 2021-05-19 10:12:59 +02:00
.mailmap mailmap: add two more addresses of Uwe Kleine-König 2020-12-06 10:19:07 -08:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS MAINTAINERS: Move Jason Cooper to CREDITS 2020-11-30 10:20:34 +01:00
Kbuild kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS f2fs: move ioctl interface definitions to separated file 2021-05-19 10:13:00 +02:00
Makefile Linux 5.10.45 2021-06-18 10:00:06 +02:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.