No description
Find a file
Andrey Ryabinin bf04afb613 mm: mempolicy: fix THP allocations escaping mempolicy restrictions
commit 3386353406 upstream.

alloc_pages_vma() may try to allocate THP page on the local NUMA node
first:

	page = __alloc_pages_node(hpage_node,
		gfp | __GFP_THISNODE | __GFP_NORETRY, order);

And if the allocation fails it retries allowing remote memory:

	if (!page && (gfp & __GFP_DIRECT_RECLAIM))
    		page = __alloc_pages_node(hpage_node,
					gfp, order);

However, this retry allocation completely ignores memory policy nodemask
allowing allocation to escape restrictions.

The first appearance of this bug seems to be the commit ac5b2c1891
("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings").

The bug disappeared later in the commit 89c83fb539 ("mm, thp:
consolidate THP gfp handling into alloc_hugepage_direct_gfpmask") and
reappeared again in slightly different form in the commit 76e654cc91
("mm, page_alloc: allow hugepage fallback to remote nodes when
madvised")

Fix this by passing correct nodemask to the __alloc_pages() call.

The demonstration/reproducer of the problem:

    $ mount -oremount,size=4G,huge=always /dev/shm/
    $ echo always > /sys/kernel/mm/transparent_hugepage/defrag
    $ cat mbind_thp.c
    #include <unistd.h>
    #include <sys/mman.h>
    #include <sys/stat.h>
    #include <fcntl.h>
    #include <assert.h>
    #include <stdlib.h>
    #include <stdio.h>
    #include <numaif.h>

    #define SIZE 2ULL << 30
    int main(int argc, char **argv)
    {
        int fd;
        unsigned long long i;
        char *addr;
        pid_t pid;
        char buf[100];
        unsigned long nodemask = 1;

        fd = open("/dev/shm/test", O_RDWR|O_CREAT);
        assert(fd > 0);
        assert(ftruncate(fd, SIZE) == 0);

        addr = mmap(NULL, SIZE, PROT_READ|PROT_WRITE,
                           MAP_SHARED, fd, 0);

        assert(mbind(addr, SIZE, MPOL_BIND, &nodemask, 2, MPOL_MF_STRICT|MPOL_MF_MOVE)==0);
        for (i = 0; i < SIZE; i+=4096) {
          addr[i] = 1;
        }
        pid = getpid();
        snprintf(buf, sizeof(buf), "grep shm /proc/%d/numa_maps", pid);
        system(buf);
        sleep(10000);

        return 0;
    }
    $ gcc mbind_thp.c -o mbind_thp -lnuma
    $ numactl -H
    available: 2 nodes (0-1)
    node 0 cpus: 0 2
    node 0 size: 1918 MB
    node 0 free: 1595 MB
    node 1 cpus: 1 3
    node 1 size: 2014 MB
    node 1 free: 1731 MB
    node distances:
    node   0   1
      0:  10  20
      1:  20  10
    $ rm -f /dev/shm/test; taskset -c 0 ./mbind_thp
    7fd970a00000 bind:0 file=/dev/shm/test dirty=524288 active=0 N0=396800 N1=127488 kernelpagesize_kB=4

Link: https://lkml.kernel.org/r/20211208165343.22349-1-arbn@yandex-team.com
Fixes: ac5b2c1891 ("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings")
Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-29 12:23:37 +01:00
arch ARM: 9169/1: entry: fix Thumb2 bug in iWMMXt exception handling 2021-12-29 12:23:37 +01:00
block block: fix ioprio_get(IOPRIO_WHO_PGRP) vs setuid(2) 2021-12-14 14:49:02 +01:00
certs certs: Trigger creation of RSA module signing key if it's not an RSA key 2021-09-15 09:47:29 +02:00
crypto crypto: pcrypt - Delay write to padata->info 2021-11-17 09:48:40 +01:00
Documentation KVM: VMX: Fix stale docs for kvm-intel.emulate_invalid_guest_state 2021-12-29 12:23:37 +01:00
drivers usb: gadget: u_ether: fix race in setting MAC address in setup phase 2021-12-29 12:23:37 +01:00
fs f2fs: fix to do sanity check on last xattr entry in __f2fs_setxattr() 2021-12-29 12:23:37 +01:00
include net: skip virtio_net_hdr_set_proto if protocol already set 2021-12-29 12:23:34 +01:00
init kbuild: add CONFIG_LD_IS_LLD 2021-06-30 08:47:44 -04:00
ipc shm: extend forced shm destroy to support objects from several IPC nses 2021-12-01 09:23:35 +01:00
kernel rcu: Mark accesses to rcu_state.n_force_qs 2021-12-22 09:29:40 +01:00
lib siphash: use _unaligned version by default 2021-12-08 09:01:12 +01:00
LICENSES LICENSES: Rename other to deprecated 2019-05-03 06:34:32 -06:00
mm mm: mempolicy: fix THP allocations escaping mempolicy restrictions 2021-12-29 12:23:37 +01:00
net netfilter: fix regression in looped (broad|multi)cast's MAC handling 2021-12-29 12:23:34 +01:00
samples samples/kretprobes: Fix return value if register_kretprobe() failed 2021-11-17 09:48:39 +01:00
scripts recordmcount.pl: look for jgnop instruction as well as bcrl on s390 2021-12-22 09:29:35 +01:00
security selinux: fix race condition when computing ocontext SIDs 2021-12-17 10:12:24 +01:00
sound ALSA: hda/realtek: Amp init fixup for HP ZBook 15 G6 2021-12-29 12:23:36 +01:00
tools selftest/net/forwarding: declare NETIFS p9 p10 2021-12-22 09:29:36 +01:00
usr initramfs: restore default compression behavior 2020-04-08 09:08:38 +02:00
virt KVM: do not shrink halt_poll_ns below grow_start 2021-10-09 14:39:50 +02:00
.clang-format clang-format: Update with the latest for_each macro list 2019-08-31 10:00:51 +02:00
.cocciconfig
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes
.gitignore Modules updates for v5.4 2019-09-22 10:34:46 -07:00
.mailmap ARM: SoC fixes 2019-11-10 13:41:59 -08:00
COPYING COPYING: use the new text with points to the license files 2018-03-23 12:41:45 -06:00
CREDITS MAINTAINERS: Remove Simon as Renesas SoC Co-Maintainer 2019-10-10 08:12:51 -07:00
Kbuild kbuild: do not descend to ./Kbuild when cleaning 2019-08-21 21:03:58 +09:00
Kconfig docs: kbuild: convert docs to ReST and rename to *.rst 2019-06-14 14:21:21 -06:00
MAINTAINERS Documentation/llvm: add documentation on building w/ Clang/LLVM 2020-08-26 10:40:46 +02:00
Makefile Linux 5.4.168 2021-12-22 09:29:41 +01:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.