Merge branch 'akpm' (patches from Andrew)

Merge misc updates from Andrew Morton:
 "173 patches.

  Subsystems affected by this series: ia64, ocfs2, block, and mm (debug,
  pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap,
  bootmem, sparsemem, vmalloc, kasan, pagealloc, memory-failure,
  hugetlb, userfaultfd, vmscan, compaction, mempolicy, memblock,
  oom-kill, migration, ksm, percpu, vmstat, and madvise)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (173 commits)
  mm/madvise: add MADV_WILLNEED to process_madvise()
  mm/vmstat: remove unneeded return value
  mm/vmstat: simplify the array size calculation
  mm/vmstat: correct some wrong comments
  mm/percpu,c: remove obsolete comments of pcpu_chunk_populated()
  selftests: vm: add COW time test for KSM pages
  selftests: vm: add KSM merging time test
  mm: KSM: fix data type
  selftests: vm: add KSM merging across nodes test
  selftests: vm: add KSM zero page merging test
  selftests: vm: add KSM unmerge test
  selftests: vm: add KSM merge test
  mm/migrate: correct kernel-doc notation
  mm: wire up syscall process_mrelease
  mm: introduce process_mrelease system call
  memblock: make memblock_find_in_range method private
  mm/mempolicy.c: use in_task() in mempolicy_slab_node()
  mm/mempolicy: unify the create() func for bind/interleave/prefer-many policies
  mm/mempolicy: advertise new MPOL_PREFERRED_MANY
  mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY
  ...
This commit is contained in:
Linus Torvalds 2021-09-03 10:08:28 -07:00
commit 14726903c8
171 changed files with 3519 additions and 1723 deletions

View File

@ -0,0 +1,24 @@
What: /sys/kernel/mm/numa/
Date: June 2021
Contact: Linux memory management mailing list <linux-mm@kvack.org>
Description: Interface for NUMA
What: /sys/kernel/mm/numa/demotion_enabled
Date: June 2021
Contact: Linux memory management mailing list <linux-mm@kvack.org>
Description: Enable/disable demoting pages during reclaim
Page migration during reclaim is intended for systems
with tiered memory configurations. These systems have
multiple types of memory with varied performance
characteristics instead of plain NUMA systems where
the same kind of memory is found at varied distances.
Allowing page migration during reclaim enables these
systems to migrate pages from fast tiers to slow tiers
when the fast tier is under pressure. This migration
is performed before swap. It may move data to a NUMA
node that does not fall into the cpuset of the
allocating process which might be construed to violate
the guarantees of cpusets. This should not be enabled
on systems which need strict cpuset location
guarantees.

View File

@ -245,6 +245,13 @@ MPOL_INTERLEAVED
address range or file. During system boot up, the temporary
interleaved system default policy works in this mode.
MPOL_PREFERRED_MANY
This mode specifices that the allocation should be preferrably
satisfied from the nodemask specified in the policy. If there is
a memory pressure on all nodes in the nodemask, the allocation
can fall back to all existing numa nodes. This is effectively
MPOL_PREFERRED allowed for a mask rather than a single node.
NUMA memory policy supports the following optional mode flags:
MPOL_F_STATIC_NODES
@ -253,10 +260,10 @@ MPOL_F_STATIC_NODES
nodes changes after the memory policy has been defined.
Without this flag, any time a mempolicy is rebound because of a
change in the set of allowed nodes, the node (Preferred) or
nodemask (Bind, Interleave) is remapped to the new set of
allowed nodes. This may result in nodes being used that were
previously undesired.
change in the set of allowed nodes, the preferred nodemask (Preferred
Many), preferred node (Preferred) or nodemask (Bind, Interleave) is
remapped to the new set of allowed nodes. This may result in nodes
being used that were previously undesired.
With this flag, if the user-specified nodes overlap with the
nodes allowed by the task's cpuset, then the memory policy is

View File

@ -118,7 +118,8 @@ compaction_proactiveness
This tunable takes a value in the range [0, 100] with a default value of
20. This tunable determines how aggressively compaction is done in the
background. Setting it to 0 disables proactive compaction.
background. Write of a non zero value to this tunable will immediately
trigger the proactive compaction. Setting it to 0 disables proactive compaction.
Note that compaction has a non-trivial system-wide impact as pages
belonging to different processes are moved around, which could also lead

View File

@ -271,10 +271,15 @@ maps this page at its virtual address.
``void flush_dcache_page(struct page *page)``
Any time the kernel writes to a page cache page, _OR_
the kernel is about to read from a page cache page and
user space shared/writable mappings of this page potentially
exist, this routine is called.
This routines must be called when:
a) the kernel did write to a page that is in the page cache page
and / or in high memory
b) the kernel is about to read from a page cache page and user space
shared/writable mappings of this page potentially exist. Note
that {get,pin}_user_pages{_fast} already call flush_dcache_page
on any page found in the user address space and thus driver
code rarely needs to take this into account.
.. note::
@ -284,38 +289,34 @@ maps this page at its virtual address.
handling vfs symlinks in the page cache need not call
this interface at all.
The phrase "kernel writes to a page cache page" means,
specifically, that the kernel executes store instructions
that dirty data in that page at the page->virtual mapping
of that page. It is important to flush here to handle
D-cache aliasing, to make sure these kernel stores are
visible to user space mappings of that page.
The phrase "kernel writes to a page cache page" means, specifically,
that the kernel executes store instructions that dirty data in that
page at the page->virtual mapping of that page. It is important to
flush here to handle D-cache aliasing, to make sure these kernel stores
are visible to user space mappings of that page.
The corollary case is just as important, if there are users
which have shared+writable mappings of this file, we must make
sure that kernel reads of these pages will see the most recent
stores done by the user.
The corollary case is just as important, if there are users which have
shared+writable mappings of this file, we must make sure that kernel
reads of these pages will see the most recent stores done by the user.
If D-cache aliasing is not an issue, this routine may
simply be defined as a nop on that architecture.
If D-cache aliasing is not an issue, this routine may simply be defined
as a nop on that architecture.
There is a bit set aside in page->flags (PG_arch_1) as
"architecture private". The kernel guarantees that,
for pagecache pages, it will clear this bit when such
a page first enters the pagecache.
There is a bit set aside in page->flags (PG_arch_1) as "architecture
private". The kernel guarantees that, for pagecache pages, it will
clear this bit when such a page first enters the pagecache.
This allows these interfaces to be implemented much more
efficiently. It allows one to "defer" (perhaps indefinitely)
the actual flush if there are currently no user processes
mapping this page. See sparc64's flush_dcache_page and
update_mmu_cache implementations for an example of how to go
about doing this.
This allows these interfaces to be implemented much more efficiently.
It allows one to "defer" (perhaps indefinitely) the actual flush if
there are currently no user processes mapping this page. See sparc64's
flush_dcache_page and update_mmu_cache implementations for an example
of how to go about doing this.
The idea is, first at flush_dcache_page() time, if
page->mapping->i_mmap is an empty tree, just mark the architecture
private page flag bit. Later, in update_mmu_cache(), a check is
made of this flag bit, and if set the flush is done and the flag
bit is cleared.
The idea is, first at flush_dcache_page() time, if page_file_mapping()
returns a mapping, and mapping_mapped on that mapping returns %false,
just mark the architecture private page flag bit. Later, in
update_mmu_cache(), a check is made of this flag bit, and if set the
flush is done and the flag bit is cleared.
.. important::
@ -351,19 +352,6 @@ maps this page at its virtual address.
architectures). For incoherent architectures, it should flush
the cache of the page at vmaddr.
``void flush_kernel_dcache_page(struct page *page)``
When the kernel needs to modify a user page is has obtained
with kmap, it calls this function after all modifications are
complete (but before kunmapping it) to bring the underlying
page up to date. It is assumed here that the user has no
incoherent cached copies (i.e. the original page was obtained
from a mechanism like get_user_pages()). The default
implementation is a nop and should remain so on all coherent
architectures. On incoherent architectures, this should flush
the kernel cache for page (using page_address(page)).
``void flush_icache_range(unsigned long start, unsigned long end)``
When the kernel stores into addresses that it will execute

View File

@ -181,9 +181,16 @@ By default, KASAN prints a bug report only for the first invalid memory access.
With ``kasan_multi_shot``, KASAN prints a report on every invalid access. This
effectively disables ``panic_on_warn`` for KASAN reports.
Alternatively, independent of ``panic_on_warn`` the ``kasan.fault=`` boot
parameter can be used to control panic and reporting behaviour:
- ``kasan.fault=report`` or ``=panic`` controls whether to only print a KASAN
report or also panic the kernel (default: ``report``). The panic happens even
if ``kasan_multi_shot`` is enabled.
Hardware tag-based KASAN mode (see the section about various modes below) is
intended for use in production as a security mitigation. Therefore, it supports
boot parameters that allow disabling KASAN or controlling its features.
additional boot parameters that allow disabling KASAN or controlling features:
- ``kasan=off`` or ``=on`` controls whether KASAN is enabled (default: ``on``).
@ -199,10 +206,6 @@ boot parameters that allow disabling KASAN or controlling its features.
- ``kasan.stacktrace=off`` or ``=on`` disables or enables alloc and free stack
traces collection (default: ``on``).
- ``kasan.fault=report`` or ``=panic`` controls whether to only print a KASAN
report or also panic the kernel (default: ``report``). The panic happens even
if ``kasan_multi_shot`` is enabled.
Implementation details
----------------------

View File

@ -298,15 +298,6 @@ HyperSparc cpu就是这样一个具有这种属性的cpu。
用。默认的实现是nop对于所有相干的架构应该保持这样。对于不一致性
的架构它应该刷新vmaddr处的页面缓存。
``void flush_kernel_dcache_page(struct page *page)``
当内核需要修改一个用kmap获得的用户页时它会在所有修改完成后但在
kunmapping之前调用这个函数以使底层页面达到最新状态。这里假定用
户没有不一致性的缓存副本即原始页面是从类似get_user_pages()的机制
中获得的。默认的实现是一个nop在所有相干的架构上都应该如此。在不
一致性的架构上这应该刷新内核缓存中的页面使用page_address(page))。
``void flush_icache_range(unsigned long start, unsigned long end)``
当内核存储到它将执行的地址中时(例如在加载模块时),这个函数被调用。

View File

@ -180,7 +180,6 @@ Limitations
===========
- Not all page types are supported and never will. Most kernel internal
objects cannot be recovered, only LRU pages for now.
- Right now hugepage support is missing.
---
Andi Kleen, Oct 2009

View File

@ -486,3 +486,5 @@
554 common landlock_create_ruleset sys_landlock_create_ruleset
555 common landlock_add_rule sys_landlock_add_rule
556 common landlock_restrict_self sys_landlock_restrict_self
# 557 reserved for memfd_secret
558 common process_mrelease sys_process_mrelease

View File

@ -291,6 +291,7 @@ extern void flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr
#define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
extern void flush_dcache_page(struct page *);
#define ARCH_IMPLEMENTS_FLUSH_KERNEL_VMAP_RANGE 1
static inline void flush_kernel_vmap_range(void *addr, int size)
{
if ((cache_is_vivt() || cache_is_vipt_aliasing()))
@ -312,9 +313,6 @@ static inline void flush_anon_page(struct vm_area_struct *vma,
__flush_anon_page(vma, page, vmaddr);
}
#define ARCH_HAS_FLUSH_KERNEL_DCACHE_PAGE
extern void flush_kernel_dcache_page(struct page *);
#define flush_dcache_mmap_lock(mapping) xa_lock_irq(&mapping->i_pages)
#define flush_dcache_mmap_unlock(mapping) xa_unlock_irq(&mapping->i_pages)

View File

@ -1012,31 +1012,25 @@ static void __init reserve_crashkernel(void)
unsigned long long lowmem_max = __pa(high_memory - 1) + 1;
if (crash_max > lowmem_max)
crash_max = lowmem_max;
crash_base = memblock_find_in_range(CRASH_ALIGN, crash_max,
crash_size, CRASH_ALIGN);
crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
CRASH_ALIGN, crash_max);
if (!crash_base) {
pr_err("crashkernel reservation failed - No suitable area found.\n");
return;
}
} else {
unsigned long long crash_max = crash_base + crash_size;
unsigned long long start;
start = memblock_find_in_range(crash_base,
crash_base + crash_size,
crash_size, SECTION_SIZE);
if (start != crash_base) {
start = memblock_phys_alloc_range(crash_size, SECTION_SIZE,
crash_base, crash_max);
if (!start) {
pr_err("crashkernel reservation failed - memory is in use.\n");
return;
}
}
ret = memblock_reserve(crash_base, crash_size);
if (ret < 0) {
pr_warn("crashkernel reservation failed - memory is in use (0x%lx)\n",
(unsigned long)crash_base);
return;
}
pr_info("Reserving %ldMB of memory at %ldMB for crashkernel (System RAM: %ldMB)\n",
(unsigned long)(crash_size >> 20),
(unsigned long)(crash_base >> 20),

View File

@ -345,39 +345,6 @@ void flush_dcache_page(struct page *page)
}
EXPORT_SYMBOL(flush_dcache_page);
/*
* Ensure cache coherency for the kernel mapping of this page. We can
* assume that the page is pinned via kmap.
*
* If the page only exists in the page cache and there are no user
* space mappings, this is a no-op since the page was already marked
* dirty at creation. Otherwise, we need to flush the dirty kernel
* cache lines directly.
*/
void flush_kernel_dcache_page(struct page *page)
{
if (cache_is_vivt() || cache_is_vipt_aliasing()) {
struct address_space *mapping;
mapping = page_mapping_file(page);
if (!mapping || mapping_mapped(mapping)) {
void *addr;
addr = page_address(page);
/*
* kmap_atomic() doesn't set the page virtual
* address for highmem pages, and
* kunmap_atomic() takes care of cache
* flushing already.
*/
if (!IS_ENABLED(CONFIG_HIGHMEM) || addr)
__cpuc_flush_dcache_area(addr, PAGE_SIZE);
}
}
}
EXPORT_SYMBOL(flush_kernel_dcache_page);
/*
* Flush an anonymous page so that users of get_user_pages()
* can safely access the data. The expected sequence is:

View File

@ -166,12 +166,6 @@ void flush_dcache_page(struct page *page)
}
EXPORT_SYMBOL(flush_dcache_page);
void flush_kernel_dcache_page(struct page *page)
{
__cpuc_flush_dcache_area(page_address(page), PAGE_SIZE);
}
EXPORT_SYMBOL(flush_kernel_dcache_page);
void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
unsigned long uaddr, void *dst, const void *src,
unsigned long len)

View File

@ -460,3 +460,5 @@
444 common landlock_create_ruleset sys_landlock_create_ruleset
445 common landlock_add_rule sys_landlock_add_rule
446 common landlock_restrict_self sys_landlock_restrict_self
# 447 reserved for memfd_secret
448 common process_mrelease sys_process_mrelease

View File

@ -38,7 +38,7 @@
#define __ARM_NR_compat_set_tls (__ARM_NR_COMPAT_BASE + 5)
#define __ARM_NR_COMPAT_END (__ARM_NR_COMPAT_BASE + 0x800)
#define __NR_compat_syscalls 447
#define __NR_compat_syscalls 449
#endif
#define __ARCH_WANT_SYS_CLONE

View File

@ -901,6 +901,8 @@ __SYSCALL(__NR_landlock_create_ruleset, sys_landlock_create_ruleset)
__SYSCALL(__NR_landlock_add_rule, sys_landlock_add_rule)
#define __NR_landlock_restrict_self 446
__SYSCALL(__NR_landlock_restrict_self, sys_landlock_restrict_self)
#define __NR_process_mrelease 448
__SYSCALL(__NR_process_mrelease, sys_process_mrelease)
/*
* Please add new compat syscalls above this comment and update

View File

@ -92,12 +92,10 @@ void __init kvm_hyp_reserve(void)
* this is unmapped from the host stage-2, and fallback to PAGE_SIZE.
*/
hyp_mem_size = hyp_mem_pages << PAGE_SHIFT;
hyp_mem_base = memblock_find_in_range(0, memblock_end_of_DRAM(),
ALIGN(hyp_mem_size, PMD_SIZE),
PMD_SIZE);
hyp_mem_base = memblock_phys_alloc(ALIGN(hyp_mem_size, PMD_SIZE),
PMD_SIZE);
if (!hyp_mem_base)
hyp_mem_base = memblock_find_in_range(0, memblock_end_of_DRAM(),
hyp_mem_size, PAGE_SIZE);
hyp_mem_base = memblock_phys_alloc(hyp_mem_size, PAGE_SIZE);
else
hyp_mem_size = ALIGN(hyp_mem_size, PMD_SIZE);
@ -105,7 +103,6 @@ void __init kvm_hyp_reserve(void)
kvm_err("Failed to reserve hyp memory\n");
return;
}
memblock_reserve(hyp_mem_base, hyp_mem_size);
kvm_info("Reserved %lld MiB at 0x%llx\n", hyp_mem_size >> 20,
hyp_mem_base);

View File

@ -74,6 +74,7 @@ phys_addr_t arm64_dma_phys_limit __ro_after_init;
static void __init reserve_crashkernel(void)
{
unsigned long long crash_base, crash_size;
unsigned long long crash_max = arm64_dma_phys_limit;
int ret;
ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
@ -84,33 +85,18 @@ static void __init reserve_crashkernel(void)
crash_size = PAGE_ALIGN(crash_size);
if (crash_base == 0) {
/* Current arm64 boot protocol requires 2MB alignment */
crash_base = memblock_find_in_range(0, arm64_dma_phys_limit,
crash_size, SZ_2M);
if (crash_base == 0) {
pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
crash_size);
return;
}
} else {
/* User specifies base address explicitly. */
if (!memblock_is_region_memory(crash_base, crash_size)) {
pr_warn("cannot reserve crashkernel: region is not memory\n");
return;
}
/* User specifies base address explicitly. */
if (crash_base)
crash_max = crash_base + crash_size;
if (memblock_is_region_reserved(crash_base, crash_size)) {
pr_warn("cannot reserve crashkernel: region overlaps reserved memory\n");
return;
}
if (!IS_ALIGNED(crash_base, SZ_2M)) {
pr_warn("cannot reserve crashkernel: base address is not 2MB aligned\n");
return;
}
/* Current arm64 boot protocol requires 2MB alignment */
crash_base = memblock_phys_alloc_range(crash_size, SZ_2M,
crash_base, crash_max);
if (!crash_base) {
pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
crash_size);
return;
}
memblock_reserve(crash_base, crash_size);
pr_info("crashkernel reserved: 0x%016llx - 0x%016llx (%lld MB)\n",
crash_base, crash_base + crash_size, crash_size >> 20);

View File

@ -56,17 +56,6 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr,
}
}
void flush_kernel_dcache_page(struct page *page)
{
struct address_space *mapping;
mapping = page_mapping_file(page);
if (!mapping || mapping_mapped(mapping))
dcache_wbinv_all();
}
EXPORT_SYMBOL(flush_kernel_dcache_page);
void flush_cache_range(struct vm_area_struct *vma, unsigned long start,
unsigned long end)
{

View File

@ -14,12 +14,10 @@ extern void flush_dcache_page(struct page *);
#define flush_cache_page(vma, page, pfn) cache_wbinv_all()
#define flush_cache_dup_mm(mm) cache_wbinv_all()
#define ARCH_HAS_FLUSH_KERNEL_DCACHE_PAGE
extern void flush_kernel_dcache_page(struct page *);
#define flush_dcache_mmap_lock(mapping) xa_lock_irq(&mapping->i_pages)
#define flush_dcache_mmap_unlock(mapping) xa_unlock_irq(&mapping->i_pages)
#define ARCH_IMPLEMENTS_FLUSH_KERNEL_VMAP_RANGE 1
static inline void flush_kernel_vmap_range(void *addr, int size)
{
dcache_wbinv_all();

View File

@ -283,8 +283,7 @@ int __kprobes kprobe_fault_handler(struct pt_regs *regs, unsigned int trapnr)
* normal page fault.
*/
regs->pc = (unsigned long) cur->addr;
if (!instruction_pointer(regs))
BUG();
BUG_ON(!instruction_pointer(regs));
if (kcb->kprobe_status == KPROBE_REENTER)
restore_previous_kprobe(kcb);

View File

@ -29,7 +29,6 @@ struct rsvd_region {
};
extern struct rsvd_region rsvd_region[IA64_MAX_RSVD_REGIONS + 1];
extern int num_rsvd_regions;
extern void find_memory (void);
extern void reserve_memory (void);
@ -40,7 +39,6 @@ extern unsigned long efi_memmap_init(u64 *s, u64 *e);
extern int find_max_min_low_pfn (u64, u64, void *);
extern unsigned long vmcore_find_descriptor_size(unsigned long address);
extern int reserve_elfcorehdr(u64 *start, u64 *end);
/*
* For rounding an address to the next IA64_GRANULE_SIZE or order

View File

@ -906,6 +906,6 @@ EXPORT_SYMBOL(acpi_unregister_ioapic);
/*
* acpi_suspend_lowlevel() - save kernel state and suspend.
*
* TBD when when IA64 starts to support suspend...
* TBD when IA64 starts to support suspend...
*/
int acpi_suspend_lowlevel(void) { return 0; }

View File

@ -131,7 +131,7 @@ unsigned long ia64_cache_stride_shift = ~0;
* We use a special marker for the end of memory and it uses the extra (+1) slot
*/
struct rsvd_region rsvd_region[IA64_MAX_RSVD_REGIONS + 1] __initdata;
int num_rsvd_regions __initdata;
static int num_rsvd_regions __initdata;
/*
@ -325,6 +325,31 @@ static inline void __init setup_crashkernel(unsigned long total, int *n)
{}
#endif
#ifdef CONFIG_CRASH_DUMP
static int __init reserve_elfcorehdr(u64 *start, u64 *end)
{
u64 length;
/* We get the address using the kernel command line,
* but the size is extracted from the EFI tables.
* Both address and size are required for reservation
* to work properly.
*/
if (!is_vmcore_usable())
return -EINVAL;
if ((length = vmcore_find_descriptor_size(elfcorehdr_addr)) == 0) {
vmcore_unusable();
return -EINVAL;
}
*start = (unsigned long)__va(elfcorehdr_addr);
*end = *start + length;
return 0;
}
#endif /* CONFIG_CRASH_DUMP */
/**
* reserve_memory - setup reserved memory areas
*
@ -522,32 +547,6 @@ static __init int setup_nomca(char *s)
}
early_param("nomca", setup_nomca);
#ifdef CONFIG_CRASH_DUMP
int __init reserve_elfcorehdr(u64 *start, u64 *end)
{
u64 length;
/* We get the address using the kernel command line,
* but the size is extracted from the EFI tables.
* Both address and size are required for reservation
* to work properly.
*/
if (!is_vmcore_usable())
return -EINVAL;
if ((length = vmcore_find_descriptor_size(elfcorehdr_addr)) == 0) {
vmcore_unusable();
return -EINVAL;
}
*start = (unsigned long)__va(elfcorehdr_addr);
*end = *start + length;
return 0;
}
#endif /* CONFIG_PROC_VMCORE */
void __init
setup_arch (char **cmdline_p)
{

View File

@ -367,3 +367,5 @@
444 common landlock_create_ruleset sys_landlock_create_ruleset
445 common landlock_add_rule sys_landlock_add_rule
446 common landlock_restrict_self sys_landlock_restrict_self
# 447 reserved for memfd_secret
448 common process_mrelease sys_process_mrelease

View File

@ -446,3 +446,5 @@
444 common landlock_create_ruleset sys_landlock_create_ruleset
445 common landlock_add_rule sys_landlock_add_rule
446 common landlock_restrict_self sys_landlock_restrict_self
# 447 reserved for memfd_secret
448 common process_mrelease sys_process_mrelease

View File

@ -112,8 +112,7 @@ extern int page_is_ram(unsigned long pfn);
# define page_to_phys(page) (page_to_pfn(page) << PAGE_SHIFT)
# define ARCH_PFN_OFFSET (memory_start >> PAGE_SHIFT)
# define pfn_valid(pfn) ((pfn) < (max_mapnr + ARCH_PFN_OFFSET))
# define pfn_valid(pfn) ((pfn) >= ARCH_PFN_OFFSET && (pfn) < (max_mapnr + ARCH_PFN_OFFSET))
# endif /* __ASSEMBLY__ */
#define virt_addr_valid(vaddr) (pfn_valid(virt_to_pfn(vaddr)))

View File

@ -443,8 +443,6 @@ extern int mem_init_done;
asmlinkage void __init mmu_init(void);
void __init *early_get_page(void);
#endif /* __ASSEMBLY__ */
#endif /* __KERNEL__ */

View File

@ -452,3 +452,5 @@
444 common landlock_create_ruleset sys_landlock_create_ruleset
445 common landlock_add_rule sys_landlock_add_rule
446 common landlock_restrict_self sys_landlock_restrict_self
# 447 reserved for memfd_secret
448 common process_mrelease sys_process_mrelease

View File

@ -265,18 +265,6 @@ asmlinkage void __init mmu_init(void)
dma_contiguous_reserve(memory_start + lowmem_size - 1);
}
/* This is only called until mem_init is done. */
void __init *early_get_page(void)
{
/*
* Mem start + kernel_tlb -> here is limit
* because of mem mapping from head.S
*/
return memblock_alloc_try_nid_raw(PAGE_SIZE, PAGE_SIZE,
MEMBLOCK_LOW_LIMIT, memory_start + kernel_tlb,
NUMA_NO_NODE);
}
void * __ref zalloc_maybe_bootmem(size_t size, gfp_t mask)
{
void *p;

View File

@ -33,6 +33,7 @@
#include <linux/init.h>
#include <linux/mm_types.h>
#include <linux/pgtable.h>
#include <linux/memblock.h>
#include <asm/pgalloc.h>
#include <linux/io.h>
@ -242,15 +243,13 @@ unsigned long iopa(unsigned long addr)
__ref pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
{
pte_t *pte;
if (mem_init_done) {
pte = (pte_t *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
} else {
pte = (pte_t *)early_get_page();
if (pte)
clear_page(pte);
}
return pte;
if (mem_init_done)
return (pte_t *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
else
return memblock_alloc_try_nid(PAGE_SIZE, PAGE_SIZE,
MEMBLOCK_LOW_LIMIT,
memory_start + kernel_tlb,
NUMA_NO_NODE);
}
void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t flags)

View File

@ -125,13 +125,7 @@ static inline void kunmap_noncoherent(void)
kunmap_coherent();
}
#define ARCH_HAS_FLUSH_KERNEL_DCACHE_PAGE
static inline void flush_kernel_dcache_page(struct page *page)
{
BUG_ON(cpu_has_dc_aliases && PageHighMem(page));
flush_dcache_page(page);
}
#define ARCH_IMPLEMENTS_FLUSH_KERNEL_VMAP_RANGE 1
/*
* For now flush_kernel_vmap_range and invalidate_kernel_vmap_range both do a
* cache writeback and invalidate operation.

View File

@ -452,8 +452,9 @@ static void __init mips_parse_crashkernel(void)
return;
if (crash_base <= 0) {
crash_base = memblock_find_in_range(CRASH_ALIGN, CRASH_ADDR_MAX,
crash_size, CRASH_ALIGN);
crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
CRASH_ALIGN,
CRASH_ADDR_MAX);
if (!crash_base) {
pr_warn("crashkernel reservation failed - No suitable area found.\n");
return;
@ -461,8 +462,9 @@ static void __init mips_parse_crashkernel(void)
} else {
unsigned long long start;
start = memblock_find_in_range(crash_base, crash_base + crash_size,
crash_size, 1);
start = memblock_phys_alloc_range(crash_size, 1,
crash_base,
crash_base + crash_size);
if (start != crash_base) {
pr_warn("Invalid memory region reserved for crash kernel\n");
return;
@ -656,10 +658,6 @@ static void __init arch_mem_init(char **cmdline_p)
mips_reserve_vmcore();
mips_parse_crashkernel();
#ifdef CONFIG_KEXEC
if (crashk_res.start != crashk_res.end)
memblock_reserve(crashk_res.start, resource_size(&crashk_res));
#endif
device_tree_init();
/*

View File

@ -385,3 +385,5 @@
444 n32 landlock_create_ruleset sys_landlock_create_ruleset
445 n32 landlock_add_rule sys_landlock_add_rule
446 n32 landlock_restrict_self sys_landlock_restrict_self
# 447 reserved for memfd_secret
448 n32 process_mrelease sys_process_mrelease

View File

@ -361,3 +361,5 @@
444 n64 landlock_create_ruleset sys_landlock_create_ruleset
445 n64 landlock_add_rule sys_landlock_add_rule
446 n64 landlock_restrict_self sys_landlock_restrict_self
# 447 reserved for memfd_secret
448 n64 process_mrelease sys_process_mrelease

View File

@ -434,3 +434,5 @@
444 o32 landlock_create_ruleset sys_landlock_create_ruleset
445 o32 landlock_add_rule sys_landlock_add_rule
446 o32 landlock_restrict_self sys_landlock_restrict_self
# 447 reserved for memfd_secret
448 o32 process_mrelease sys_process_mrelease

View File

@ -36,8 +36,7 @@ void copy_from_user_page(struct vm_area_struct *vma, struct page *page,
void flush_anon_page(struct vm_area_struct *vma,
struct page *page, unsigned long vaddr);
#define ARCH_HAS_FLUSH_KERNEL_DCACHE_PAGE
void flush_kernel_dcache_page(struct page *page);
#define ARCH_IMPLEMENTS_FLUSH_KERNEL_VMAP_RANGE 1
void flush_kernel_vmap_range(void *addr, int size);
void invalidate_kernel_vmap_range(void *addr, int size);
#define flush_dcache_mmap_lock(mapping) xa_lock_irq(&(mapping)->i_pages)

View File

@ -318,15 +318,6 @@ void flush_anon_page(struct vm_area_struct *vma,
local_irq_restore(flags);
}
void flush_kernel_dcache_page(struct page *page)
{
unsigned long flags;
local_irq_save(flags);
cpu_dcache_wbinval_page((unsigned long)page_address(page));
local_irq_restore(flags);
}
EXPORT_SYMBOL(flush_kernel_dcache_page);
void flush_kernel_vmap_range(void *addr, int size)
{
unsigned long flags;

View File

@ -36,16 +36,12 @@ void flush_cache_all_local(void);
void flush_cache_all(void);
void flush_cache_mm(struct mm_struct *mm);
#define ARCH_HAS_FLUSH_KERNEL_DCACHE_PAGE
void flush_kernel_dcache_page_addr(void *addr);
static inline void flush_kernel_dcache_page(struct page *page)
{
flush_kernel_dcache_page_addr(page_address(page));
}
#define flush_kernel_dcache_range(start,size) \
flush_kernel_dcache_range_asm((start), (start)+(size));
#define ARCH_IMPLEMENTS_FLUSH_KERNEL_VMAP_RANGE 1
void flush_kernel_vmap_range(void *vaddr, int size);
void invalidate_kernel_vmap_range(void *vaddr, int size);
@ -59,7 +55,7 @@ extern void flush_dcache_page(struct page *page);
#define flush_dcache_mmap_unlock(mapping) xa_unlock_irq(&mapping->i_pages)
#define flush_icache_page(vma,page) do { \
flush_kernel_dcache_page(page); \
flush_kernel_dcache_page_addr(page_address(page)); \
flush_kernel_icache_page(page_address(page)); \
} while (0)

View File

@ -334,7 +334,7 @@ void flush_dcache_page(struct page *page)
return;
}
flush_kernel_dcache_page(page);
flush_kernel_dcache_page_addr(page_address(page));
if (!mapping)
return;
@ -375,7 +375,6 @@ EXPORT_SYMBOL(flush_dcache_page);
/* Defined in arch/parisc/kernel/pacache.S */
EXPORT_SYMBOL(flush_kernel_dcache_range_asm);
EXPORT_SYMBOL(flush_kernel_dcache_page_asm);
EXPORT_SYMBOL(flush_data_cache_local);
EXPORT_SYMBOL(flush_kernel_icache_range_asm);

View File

@ -444,3 +444,5 @@
444 common landlock_create_ruleset sys_landlock_create_ruleset
445 common landlock_add_rule sys_landlock_add_rule
446 common landlock_restrict_self sys_landlock_restrict_self
# 447 reserved for memfd_secret
448 common process_mrelease sys_process_mrelease

View File

@ -526,3 +526,5 @@
444 common landlock_create_ruleset sys_landlock_create_ruleset
445 common landlock_add_rule sys_landlock_add_rule
446 common landlock_restrict_self sys_landlock_restrict_self
# 447 reserved for memfd_secret
448 common process_mrelease sys_process_mrelease

View File

@ -211,13 +211,11 @@ static int update_lmb_associativity_index(struct drmem_lmb *lmb)
static struct memory_block *lmb_to_memblock(struct drmem_lmb *lmb)
{
unsigned long section_nr;
struct mem_section *mem_sect;
struct memory_block *mem_block;
section_nr = pfn_to_section_nr(PFN_DOWN(lmb->base_addr));
mem_sect = __nr_to_section(section_nr);
mem_block = find_memory_block(mem_sect);
mem_block = find_memory_block(section_nr);
return mem_block;
}

View File

@ -819,38 +819,22 @@ static void __init reserve_crashkernel(void)
crash_size = PAGE_ALIGN(crash_size);
if (crash_base == 0) {
/*
* Current riscv boot protocol requires 2MB alignment for
* RV64 and 4MB alignment for RV32 (hugepage size)
*/
crash_base = memblock_find_in_range(search_start, search_end,
crash_size, PMD_SIZE);
if (crash_base == 0) {
pr_warn("crashkernel: couldn't allocate %lldKB\n",
crash_size >> 10);
return;
}
} else {
/* User specifies base address explicitly. */
if (!memblock_is_region_memory(crash_base, crash_size)) {
pr_warn("crashkernel: requested region is not memory\n");
return;
}
if (memblock_is_region_reserved(crash_base, crash_size)) {
pr_warn("crashkernel: requested region is reserved\n");
return;
}
if (!IS_ALIGNED(crash_base, PMD_SIZE)) {
pr_warn("crashkernel: requested region is misaligned\n");
return;
}
if (crash_base) {
search_start = crash_base;
search_end = crash_base + crash_size;
}
/*
* Current riscv boot protocol requires 2MB alignment for
* RV64 and 4MB alignment for RV32 (hugepage size)
*/
crash_base = memblock_phys_alloc_range(crash_size, PMD_SIZE,
search_start, search_end);
if (crash_base == 0) {
pr_warn("crashkernel: couldn't allocate %lldKB\n",
crash_size >> 10);
return;
}
memblock_reserve(crash_base, crash_size);
pr_info("crashkernel: reserved 0x%016llx - 0x%016llx (%lld MB)\n",
crash_base, crash_base + crash_size, crash_size >> 20);

View File

@ -677,8 +677,9 @@ static void __init reserve_crashkernel(void)
return;
}
low = crash_base ?: low;
crash_base = memblock_find_in_range(low, high, crash_size,
KEXEC_CRASH_MEM_ALIGN);
crash_base = memblock_phys_alloc_range(crash_size,
KEXEC_CRASH_MEM_ALIGN,
low, high);
}
if (!crash_base) {
@ -687,8 +688,10 @@ static void __init reserve_crashkernel(void)
return;
}
if (register_memory_notifier(&kdump_mem_nb))
if (register_memory_notifier(&kdump_mem_nb)) {
memblock_free(crash_base, crash_size);
return;
}
if (!oldmem_data.start && MACHINE_IS_VM)
diag10_range(PFN_DOWN(crash_base), PFN_DOWN(crash_size));

View File

@ -449,3 +449,5 @@
444 common landlock_create_ruleset sys_landlock_create_ruleset sys_landlock_create_ruleset
445 common landlock_add_rule sys_landlock_add_rule sys_landlock_add_rule
446 common landlock_restrict_self sys_landlock_restrict_self sys_landlock_restrict_self
# 447 reserved for memfd_secret
448 common process_mrelease sys_process_mrelease sys_process_mrelease

View File

@ -822,7 +822,7 @@ void do_secure_storage_access(struct pt_regs *regs)
break;
case KERNEL_FAULT:
page = phys_to_page(addr);
if (unlikely(!try_get_page(page)))
if (unlikely(!try_get_compound_head(page, 1)))
break;
rc = arch_make_page_accessible(page);
put_page(page);

View File

@ -63,6 +63,8 @@ static inline void flush_anon_page(struct vm_area_struct *vma,
if (boot_cpu_data.dcache.n_aliases && PageAnon(page))
__flush_anon_page(page, vmaddr);
}
#define ARCH_IMPLEMENTS_FLUSH_KERNEL_VMAP_RANGE 1
static inline void flush_kernel_vmap_range(void *addr, int size)
{
__flush_wback_region(addr, size);
@ -72,12 +74,6 @@ static inline void invalidate_kernel_vmap_range(void *addr, int size)
__flush_invalidate_region(addr, size);
}
#define ARCH_HAS_FLUSH_KERNEL_DCACHE_PAGE
static inline void flush_kernel_dcache_page(struct page *page)
{
flush_dcache_page(page);
}
extern void copy_to_user_page(struct vm_area_struct *vma,
struct page *page, unsigned long vaddr, void *dst, const void *src,
unsigned long len);

View File

@ -449,3 +449,5 @@
444 common landlock_create_ruleset sys_landlock_create_ruleset
445 common landlock_add_rule sys_landlock_add_rule
446 common landlock_restrict_self sys_landlock_restrict_self
# 447 reserved for memfd_secret
448 common process_mrelease sys_process_mrelease

View File

@ -492,3 +492,5 @@
444 common landlock_create_ruleset sys_landlock_create_ruleset
445 common landlock_add_rule sys_landlock_add_rule
446 common landlock_restrict_self sys_landlock_restrict_self
# 447 reserved for memfd_secret
448 common process_mrelease sys_process_mrelease

View File

@ -452,3 +452,4 @@
445 i386 landlock_add_rule sys_landlock_add_rule
446 i386 landlock_restrict_self sys_landlock_restrict_self
447 i386 memfd_secret sys_memfd_secret
448 i386 process_mrelease sys_process_mrelease

View File

@ -369,6 +369,7 @@
445 common landlock_add_rule sys_landlock_add_rule
446 common landlock_restrict_self sys_landlock_restrict_self
447 common memfd_secret sys_memfd_secret
448 common process_mrelease sys_process_mrelease
#
# Due to a historical design error, certain syscalls are numbered differently

View File

@ -109,14 +109,13 @@ static u32 __init allocate_aperture(void)
* memory. Unfortunately we cannot move it up because that would
* make the IOMMU useless.
*/
addr = memblock_find_in_range(GART_MIN_ADDR, GART_MAX_ADDR,
aper_size, aper_size);
addr = memblock_phys_alloc_range(aper_size, aper_size,
GART_MIN_ADDR, GART_MAX_ADDR);
if (!addr) {
pr_err("Cannot allocate aperture memory hole [mem %#010lx-%#010lx] (%uKB)\n",
addr, addr + aper_size - 1, aper_size >> 10);
return 0;
}
memblock_reserve(addr, aper_size);
pr_info("Mapping aperture over RAM [mem %#010lx-%#010lx] (%uKB)\n",
addr, addr + aper_size - 1, aper_size >> 10);
register_nosave_region(addr >> PAGE_SHIFT,

View File

@ -154,7 +154,7 @@ static struct ldt_struct *alloc_ldt_struct(unsigned int num_entries)
if (num_entries > LDT_ENTRIES)
return NULL;
new_ldt = kmalloc(sizeof(struct ldt_struct), GFP_KERNEL);
new_ldt = kmalloc(sizeof(struct ldt_struct), GFP_KERNEL_ACCOUNT);
if (!new_ldt)
return NULL;
@ -168,9 +168,9 @@ static struct ldt_struct *alloc_ldt_struct(unsigned int num_entries)
* than PAGE_SIZE.
*/
if (alloc_size > PAGE_SIZE)
new_ldt->entries = vzalloc(alloc_size);
new_ldt->entries = __vmalloc(alloc_size, GFP_KERNEL_ACCOUNT | __GFP_ZERO);
else
new_ldt->entries = (void *)get_zeroed_page(GFP_KERNEL);
new_ldt->entries = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
if (!new_ldt->entries) {
kfree(new_ldt);

View File

@ -127,14 +127,12 @@ __ref void *alloc_low_pages(unsigned int num)
unsigned long ret = 0;
if (min_pfn_mapped < max_pfn_mapped) {
ret = memblock_find_in_range(
ret = memblock_phys_alloc_range(
PAGE_SIZE * num, PAGE_SIZE,
min_pfn_mapped << PAGE_SHIFT,
max_pfn_mapped << PAGE_SHIFT,
PAGE_SIZE * num , PAGE_SIZE);
max_pfn_mapped << PAGE_SHIFT);
}
if (ret)
memblock_reserve(ret, PAGE_SIZE * num);
else if (can_use_brk_pgt)
if (!ret && can_use_brk_pgt)
ret = __pa(extend_brk(PAGE_SIZE * num, PAGE_SIZE));
if (!ret)
@ -610,8 +608,17 @@ static void __init memory_map_top_down(unsigned long map_start,
unsigned long addr;
unsigned long mapped_ram_size = 0;
/* xen has big range in reserved near end of ram, skip it at first.*/
addr = memblock_find_in_range(map_start, map_end, PMD_SIZE, PMD_SIZE);
/*
* Systems that have many reserved areas near top of the memory,
* e.g. QEMU with less than 1G RAM and EFI enabled, or Xen, will
* require lots of 4K mappings which may exhaust pgt_buf.
* Start with top-most PMD_SIZE range aligned at PMD_SIZE to ensure
* there is enough mapped memory that can be allocated from
* memblock.
*/
addr = memblock_phys_alloc_range(PMD_SIZE, PMD_SIZE, map_start,
map_end);
memblock_free(addr, PMD_SIZE);
real_end = addr + PMD_SIZE;
/* step_size need to be small so pgt_buf from BRK could cover it */

View File

@ -376,15 +376,14 @@ static int __init numa_alloc_distance(void)
cnt++;
size = cnt * cnt * sizeof(numa_distance[0]);
phys = memblock_find_in_range(0, PFN_PHYS(max_pfn_mapped),
size, PAGE_SIZE);
phys = memblock_phys_alloc_range(size, PAGE_SIZE, 0,
PFN_PHYS(max_pfn_mapped));
if (!phys) {
pr_warn("Warning: can't allocate distance table!\n");
/* don't retry until explicitly reset */
numa_distance = (void *)1LU;
return -ENOMEM;
}
memblock_reserve(phys, size);
numa_distance = __va(phys);
numa_distance_cnt = cnt;

View File

@ -447,13 +447,12 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt)
if (numa_dist_cnt) {
u64 phys;
phys = memblock_find_in_range(0, PFN_PHYS(max_pfn_mapped),
phys_size, PAGE_SIZE);
phys = memblock_phys_alloc_range(phys_size, PAGE_SIZE, 0,
PFN_PHYS(max_pfn_mapped));
if (!phys) {
pr_warn("NUMA: Warning: can't allocate copy of distance table, disabling emulation\n");
goto no_emu;
}
memblock_reserve(phys, phys_size);
phys_dist = __va(phys);
for (i = 0; i < numa_dist_cnt; i++)

View File

@ -28,7 +28,7 @@ void __init reserve_real_mode(void)
WARN_ON(slab_is_available());
/* Has to be under 1M so we can execute real-mode AP code. */
mem = memblock_find_in_range(0, 1<<20, size, PAGE_SIZE);
mem = memblock_phys_alloc_range(size, PAGE_SIZE, 0, 1<<20);
if (!mem)
pr_info("No sub-1M memory is available for the trampoline\n");
else

View File

@ -417,3 +417,5 @@
444 common landlock_create_ruleset sys_landlock_create_ruleset
445 common landlock_add_rule sys_landlock_add_rule
446 common landlock_restrict_self sys_landlock_restrict_self
# 447 reserved for memfd_secret
448 common process_mrelease sys_process_mrelease

View File

@ -309,7 +309,7 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter,
static void bio_invalidate_vmalloc_pages(struct bio *bio)
{
#ifdef ARCH_HAS_FLUSH_KERNEL_DCACHE_PAGE
#ifdef ARCH_IMPLEMENTS_FLUSH_KERNEL_VMAP_RANGE
if (bio->bi_private && !op_is_write(bio_op(bio))) {
unsigned long i, len = 0;

View File

@ -583,8 +583,8 @@ void __init acpi_table_upgrade(void)
}
acpi_tables_addr =
memblock_find_in_range(0, ACPI_TABLE_UPGRADE_MAX_PHYS,
all_tables_size, PAGE_SIZE);
memblock_phys_alloc_range(all_tables_size, PAGE_SIZE,
0, ACPI_TABLE_UPGRADE_MAX_PHYS);
if (!acpi_tables_addr) {
WARN_ON(1);
return;
@ -599,7 +599,6 @@ void __init acpi_table_upgrade(void)
* Both memblock_reserve and e820__range_add (via arch_reserve_mem_area)
* works fine.
*/
memblock_reserve(acpi_tables_addr, all_tables_size);
arch_reserve_mem_area(acpi_tables_addr, all_tables_size);
/*

View File

@ -279,13 +279,10 @@ static int __init numa_alloc_distance(void)
int i, j;
size = nr_node_ids * nr_node_ids * sizeof(numa_distance[0]);
phys = memblock_find_in_range(0, PFN_PHYS(max_pfn),
size, PAGE_SIZE);
phys = memblock_phys_alloc_range(size, PAGE_SIZE, 0, PFN_PHYS(max_pfn));
if (WARN_ON(!phys))
return -ENOMEM;
memblock_reserve(phys, size);
numa_distance = __va(phys);
numa_distance_cnt = nr_node_ids;

View File

@ -578,9 +578,9 @@ static struct memory_block *find_memory_block_by_id(unsigned long block_id)
/*
* Called under device_hotplug_lock.
*/
struct memory_block *find_memory_block(struct mem_section *section)
struct memory_block *find_memory_block(unsigned long section_nr)
{
unsigned long block_id = memory_block_id(__section_nr(section));
unsigned long block_id = memory_block_id(section_nr);
return find_memory_block_by_id(block_id);
}

View File

@ -578,10 +578,6 @@ static bool jz4740_mmc_read_data(struct jz4740_mmc_host *host,
}
}
data->bytes_xfered += miter->length;
/* This can go away once MIPS implements
* flush_kernel_dcache_page */
flush_dcache_page(miter->page);
}
sg_miter_stop(miter);

View File

@ -941,7 +941,7 @@ mmc_spi_data_do(struct mmc_spi_host *host, struct mmc_command *cmd,
/* discard mappings */
if (direction == DMA_FROM_DEVICE)
flush_kernel_dcache_page(sg_page(sg));
flush_dcache_page(sg_page(sg));
kunmap(sg_page(sg));
if (dma_dev)
dma_unmap_page(dma_dev, dma_addr, PAGE_SIZE, dir);

View File

@ -33,18 +33,22 @@ static int __init early_init_dt_alloc_reserved_memory_arch(phys_addr_t size,
phys_addr_t *res_base)
{
phys_addr_t base;
int err = 0;
end = !end ? MEMBLOCK_ALLOC_ANYWHERE : end;
align = !align ? SMP_CACHE_BYTES : align;
base = memblock_find_in_range(start, end, size, align);
base = memblock_phys_alloc_range(size, align, start, end);
if (!base)
return -ENOMEM;
*res_base = base;
if (nomap)
return memblock_mark_nomap(base, size);
if (nomap) {
err = memblock_mark_nomap(base, size);
if (err)
memblock_free(base, size);
}
return memblock_reserve(base, size);
return err;
}
/*

View File

@ -3,6 +3,7 @@
* Implement the manual drop-all-pagecache function
*/
#include <linux/pagemap.h>
#include <linux/kernel.h>
#include <linux/mm.h>
#include <linux/fs.h>
@ -27,7 +28,7 @@ static void drop_pagecache_sb(struct super_block *sb, void *unused)
* we need to reschedule to avoid softlockups.
*/
if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) ||
(inode->i_mapping->nrpages == 0 && !need_resched())) {
(mapping_empty(inode->i_mapping) && !need_resched())) {
spin_unlock(&inode->i_lock);
continue;
}

View File

@ -217,8 +217,10 @@ static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
* We are doing an exec(). 'current' is the process
* doing the exec and bprm->mm is the new process's mm.
*/
mmap_read_lock(bprm->mm);
ret = get_user_pages_remote(bprm->mm, pos, 1, gup_flags,
&page, NULL, NULL);
mmap_read_unlock(bprm->mm);
if (ret <= 0)
return NULL;
@ -574,7 +576,7 @@ static int copy_strings(int argc, struct user_arg_ptr argv,
}
if (kmapped_page) {
flush_kernel_dcache_page(kmapped_page);
flush_dcache_page(kmapped_page);
kunmap(kmapped_page);
put_arg_page(kmapped_page);
}
@ -592,7 +594,7 @@ static int copy_strings(int argc, struct user_arg_ptr argv,
ret = 0;
out:
if (kmapped_page) {
flush_kernel_dcache_page(kmapped_page);
flush_dcache_page(kmapped_page);
kunmap(kmapped_page);
put_arg_page(kmapped_page);
}
@ -634,7 +636,7 @@ int copy_string_kernel(const char *arg, struct linux_binprm *bprm)
kaddr = kmap_atomic(page);
flush_arg_page(bprm, pos & PAGE_MASK, page);
memcpy(kaddr + offset_in_page(pos), arg, bytes_to_copy);
flush_kernel_dcache_page(page);
flush_dcache_page(page);
kunmap_atomic(kaddr);
put_arg_page(page);
}

View File

@ -1051,7 +1051,8 @@ static int __init fcntl_init(void)
__FMODE_EXEC | __FMODE_NONOTIFY));
fasync_cache = kmem_cache_create("fasync_cache",
sizeof(struct fasync_struct), 0, SLAB_PANIC, NULL);
sizeof(struct fasync_struct), 0,
SLAB_PANIC | SLAB_ACCOUNT, NULL);
return 0;
}

View File

@ -406,6 +406,11 @@ static bool inode_do_switch_wbs(struct inode *inode,
inc_wb_stat(new_wb, WB_WRITEBACK);
}
if (mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK)) {
atomic_dec(&old_wb->writeback_inodes);
atomic_inc(&new_wb->writeback_inodes);
}
wb_get(new_wb);
/*
@ -1034,20 +1039,20 @@ restart:
* cgroup_writeback_by_id - initiate cgroup writeback from bdi and memcg IDs
* @bdi_id: target bdi id
* @memcg_id: target memcg css id
* @nr: number of pages to write, 0 for best-effort dirty flushing
* @reason: reason why some writeback work initiated
* @done: target wb_completion
*
* Initiate flush of the bdi_writeback identified by @bdi_id and @memcg_id
* with the specified parameters.
*/
int cgroup_writeback_by_id(u64 bdi_id, int memcg_id, unsigned long nr,
int cgroup_writeback_by_id(u64 bdi_id, int memcg_id,
enum wb_reason reason, struct wb_completion *done)
{
struct backing_dev_info *bdi;
struct cgroup_subsys_state *memcg_css;
struct bdi_writeback *wb;
struct wb_writeback_work *work;
unsigned long dirty;
int ret;
/* lookup bdi and memcg */
@ -1076,24 +1081,22 @@ int cgroup_writeback_by_id(u64 bdi_id, int memcg_id, unsigned long nr,
}
/*
* If @nr is zero, the caller is attempting to write out most of
* The caller is attempting to write out most of
* the currently dirty pages. Let's take the current dirty page
* count and inflate it by 25% which should be large enough to
* flush out most dirty pages while avoiding getting livelocked by
* concurrent dirtiers.
*
* BTW the memcg stats are flushed periodically and this is best-effort
* estimation, so some potential error is ok.
*/
if (!nr) {
unsigned long filepages, headroom, dirty, writeback;
mem_cgroup_wb_stats(wb, &filepages, &headroom, &dirty,
&writeback);
nr = dirty * 10 / 8;
}
dirty = memcg_page_state(mem_cgroup_from_css(memcg_css), NR_FILE_DIRTY);
dirty = dirty * 10 / 8;
/* issue the writeback work */
work = kzalloc(sizeof(*work), GFP_NOWAIT | __GFP_NOWARN);
if (work) {
work->nr_pages = nr;
work->nr_pages = dirty;
work->sync_mode = WB_SYNC_NONE;
work->range_cyclic = 1;
work->reason = reason;
@ -1999,7 +2002,6 @@ static long writeback_inodes_wb(struct bdi_writeback *wb, long nr_pages,
static long wb_writeback(struct bdi_writeback *wb,
struct wb_writeback_work *work)
{
unsigned long wb_start = jiffies;
long nr_pages = work->nr_pages;
unsigned long dirtied_before = jiffies;
struct inode *inode;
@ -2053,8 +2055,6 @@ static long wb_writeback(struct bdi_writeback *wb,
progress = __writeback_inodes_wb(wb, work);
trace_writeback_written(wb, work);
wb_update_bandwidth(wb, wb_start);
/*
* Did we write something? Try for more
*

View File

@ -254,7 +254,7 @@ static struct fs_context *alloc_fs_context(struct file_system_type *fs_type,
struct fs_context *fc;
int ret = -ENOMEM;
fc = kzalloc(sizeof(struct fs_context), GFP_KERNEL);
fc = kzalloc(sizeof(struct fs_context), GFP_KERNEL_ACCOUNT);
if (!fc)
return ERR_PTR(-ENOMEM);
@ -649,7 +649,7 @@ const struct fs_context_operations legacy_fs_context_ops = {
*/
static int legacy_init_fs_context(struct fs_context *fc)
{
fc->fs_private = kzalloc(sizeof(struct legacy_fs_context), GFP_KERNEL);
fc->fs_private = kzalloc(sizeof(struct legacy_fs_context), GFP_KERNEL_ACCOUNT);
if (!fc->fs_private)
return -ENOMEM;
fc->ops = &legacy_fs_context_ops;

View File

@ -770,7 +770,7 @@ static enum lru_status inode_lru_isolate(struct list_head *item,
return LRU_ROTATE;
}
if (inode_has_buffers(inode) || inode->i_data.nrpages) {
if (inode_has_buffers(inode) || !mapping_empty(&inode->i_data)) {
__iget(inode);
spin_unlock(&inode->i_lock);
spin_unlock(lru_lock);

View File

@ -2941,10 +2941,12 @@ static int __init filelock_init(void)
int i;
flctx_cache = kmem_cache_create("file_lock_ctx",
sizeof(struct file_lock_context), 0, SLAB_PANIC, NULL);
sizeof(struct file_lock_context), 0,
SLAB_PANIC | SLAB_ACCOUNT, NULL);
filelock_cache = kmem_cache_create("file_lock_cache",
sizeof(struct file_lock), 0, SLAB_PANIC, NULL);
sizeof(struct file_lock), 0,
SLAB_PANIC | SLAB_ACCOUNT, NULL);
for_each_possible_cpu(i) {
struct file_lock_list_struct *fll = per_cpu_ptr(&file_lock_list, i);

View File

@ -4089,7 +4089,9 @@ int vfs_unlink(struct user_namespace *mnt_userns, struct inode *dir,
return -EPERM;
inode_lock(target);
if (is_local_mountpoint(dentry))
if (IS_SWAPFILE(target))
error = -EPERM;
else if (is_local_mountpoint(dentry))
error = -EBUSY;
else {
error = security_inode_unlink(dir, dentry);
@ -4597,6 +4599,10 @@ int vfs_rename(struct renamedata *rd)
else if (target)
inode_lock(target);
error = -EPERM;
if (IS_SWAPFILE(source) || (target && IS_SWAPFILE(target)))
goto out;
error = -EBUSY;
if (is_local_mountpoint(old_dentry) || is_local_mountpoint(new_dentry))
goto out;

View File

@ -203,7 +203,8 @@ static struct mount *alloc_vfsmnt(const char *name)
goto out_free_cache;
if (name) {
mnt->mnt_devname = kstrdup_const(name, GFP_KERNEL);
mnt->mnt_devname = kstrdup_const(name,
GFP_KERNEL_ACCOUNT);
if (!mnt->mnt_devname)
goto out_free_id;
}
@ -3370,7 +3371,7 @@ static struct mnt_namespace *alloc_mnt_ns(struct user_namespace *user_ns, bool a
if (!ucounts)
return ERR_PTR(-ENOSPC);
new_ns = kzalloc(sizeof(struct mnt_namespace), GFP_KERNEL);
new_ns = kzalloc(sizeof(struct mnt_namespace), GFP_KERNEL_ACCOUNT);
if (!new_ns) {
dec_mnt_namespaces(ucounts);
return ERR_PTR(-ENOMEM);
@ -4306,7 +4307,7 @@ void __init mnt_init(void)
int err;
mnt_cache = kmem_cache_create("mnt_cache", sizeof(struct mount),
0, SLAB_HWCACHE_ALIGN | SLAB_PANIC, NULL);
0, SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT, NULL);
mount_hashtable = alloc_large_system_hash("Mount-cache",
sizeof(struct hlist_head),

View File

@ -16,6 +16,7 @@
#include <linux/debugfs.h>
#include <linux/seq_file.h>
#include <linux/time.h>
#include <linux/delay.h>
#include <linux/quotaops.h>
#include <linux/sched/signal.h>
@ -2721,7 +2722,7 @@ int ocfs2_inode_lock_tracker(struct inode *inode,
return status;
}
}
return tmp_oh ? 1 : 0;
return 1;
}
void ocfs2_inode_unlock_tracker(struct inode *inode,
@ -3912,6 +3913,17 @@ downconvert:
spin_unlock_irqrestore(&lockres->l_lock, flags);
ret = ocfs2_downconvert_lock(osb, lockres, new_level, set_lvb,
gen);
/* The dlm lock convert is being cancelled in background,
* ocfs2_cancel_convert() is asynchronous in fs/dlm,
* requeue it, try again later.
*/
if (ret == -EBUSY) {
ctl->requeue = 1;
mlog(ML_BASTS, "lockres %s, ReQ: Downconvert busy\n",
lockres->l_name);
ret = 0;
msleep(20);
}
leave:
if (ret)

View File

@ -357,7 +357,6 @@ int ocfs2_global_read_info(struct super_block *sb, int type)
}
oinfo->dqi_gi.dqi_sb = sb;
oinfo->dqi_gi.dqi_type = type;
ocfs2_qinfo_lock_res_init(&oinfo->dqi_gqlock, oinfo);
oinfo->dqi_gi.dqi_entry_size = sizeof(struct ocfs2_global_disk_dqblk);
oinfo->dqi_gi.dqi_ops = &ocfs2_global_ops;
oinfo->dqi_gqi_bh = NULL;

View File

@ -702,6 +702,8 @@ static int ocfs2_local_read_info(struct super_block *sb, int type)
info->dqi_priv = oinfo;
oinfo->dqi_type = type;
INIT_LIST_HEAD(&oinfo->dqi_chunk);
oinfo->dqi_gqinode = NULL;
ocfs2_qinfo_lock_res_init(&oinfo->dqi_gqlock, oinfo);
oinfo->dqi_rec = NULL;
oinfo->dqi_lqi_bh = NULL;
oinfo->dqi_libh = NULL;

View File

@ -191,7 +191,7 @@ EXPORT_SYMBOL(generic_pipe_buf_try_steal);
*/
bool generic_pipe_buf_get(struct pipe_inode_info *pipe, struct pipe_buffer *buf)
{
return try_get_page(buf->page);
return try_get_compound_head(buf->page, 1);
}
EXPORT_SYMBOL(generic_pipe_buf_get);

View File

@ -655,7 +655,7 @@ int core_sys_select(int n, fd_set __user *inp, fd_set __user *outp,
goto out_nofds;
alloc_size = 6 * size;
bits = kvmalloc(alloc_size, GFP_KERNEL);
bits = kvmalloc(alloc_size, GFP_KERNEL_ACCOUNT);
if (!bits)
goto out_nofds;
}
@ -1000,7 +1000,7 @@ static int do_sys_poll(struct pollfd __user *ufds, unsigned int nfds,
len = min(todo, POLLFD_PER_PAGE);
walk = walk->next = kmalloc(struct_size(walk, entries, len),
GFP_KERNEL);
GFP_KERNEL_ACCOUNT);
if (!walk) {
err = -ENOMEM;
goto out_fds;

View File

@ -33,11 +33,6 @@ int sysctl_unprivileged_userfaultfd __read_mostly;
static struct kmem_cache *userfaultfd_ctx_cachep __read_mostly;
enum userfaultfd_state {
UFFD_STATE_WAIT_API,
UFFD_STATE_RUNNING,
};
/*
* Start with fault_pending_wqh and fault_wqh so they're more likely
* to be in the same cacheline.
@ -69,12 +64,10 @@ struct userfaultfd_ctx {
unsigned int flags;
/* features requested from the userspace */
unsigned int features;
/* state machine */
enum userfaultfd_state state;
/* released */
bool released;
/* memory mappings are changing because of non-cooperative event */
bool mmap_changing;
atomic_t mmap_changing;
/* mm with one ore more vmas attached to this userfaultfd_ctx */
struct mm_struct *mm;
};
@ -104,6 +97,14 @@ struct userfaultfd_wake_range {
unsigned long len;
};
/* internal indication that UFFD_API ioctl was successfully executed */
#define UFFD_FEATURE_INITIALIZED (1u << 31)
static bool userfaultfd_is_initialized(struct userfaultfd_ctx *ctx)
{
return ctx->features & UFFD_FEATURE_INITIALIZED;
}
static int userfaultfd_wake_function(wait_queue_entry_t *wq, unsigned mode,
int wake_flags, void *key)
{
@ -623,7 +624,8 @@ static void userfaultfd_event_wait_completion(struct userfaultfd_ctx *ctx,
* already released.
*/
out:
WRITE_ONCE(ctx->mmap_changing, false);
atomic_dec(&ctx->mmap_changing);
VM_BUG_ON(atomic_read(&ctx->mmap_changing) < 0);
userfaultfd_ctx_put(ctx);
}
@ -666,15 +668,14 @@ int dup_userfaultfd(struct vm_area_struct *vma, struct list_head *fcs)
refcount_set(&ctx->refcount, 1);
ctx->flags = octx->flags;
ctx->state = UFFD_STATE_RUNNING;
ctx->features = octx->features;
ctx->released = false;
ctx->mmap_changing = false;
atomic_set(&ctx->mmap_changing, 0);
ctx->mm = vma->vm_mm;
mmgrab(ctx->mm);
userfaultfd_ctx_get(octx);
WRITE_ONCE(octx->mmap_changing, true);
atomic_inc(&octx->mmap_changing);
fctx->orig = octx;
fctx->new = ctx;
list_add_tail(&fctx->list, fcs);
@ -721,7 +722,7 @@ void mremap_userfaultfd_prep(struct vm_area_struct *vma,
if (ctx->features & UFFD_FEATURE_EVENT_REMAP) {
vm_ctx->ctx = ctx;
userfaultfd_ctx_get(ctx);
WRITE_ONCE(ctx->mmap_changing, true);
atomic_inc(&ctx->mmap_changing);
} else {
/* Drop uffd context if remap feature not enabled */
vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX;
@ -766,7 +767,7 @@ bool userfaultfd_remove(struct vm_area_struct *vma,
return true;
userfaultfd_ctx_get(ctx);
WRITE_ONCE(ctx->mmap_changing, true);
atomic_inc(&ctx->mmap_changing);
mmap_read_unlock(mm);
msg_init(&ewq.msg);
@ -810,7 +811,7 @@ int userfaultfd_unmap_prep(struct vm_area_struct *vma,
return -ENOMEM;
userfaultfd_ctx_get(ctx);
WRITE_ONCE(ctx->mmap_changing, true);
atomic_inc(&ctx->mmap_changing);
unmap_ctx->ctx = ctx;
unmap_ctx->start = start;
unmap_ctx->end = end;
@ -943,38 +944,33 @@ static __poll_t userfaultfd_poll(struct file *file, poll_table *wait)
poll_wait(file, &ctx->fd_wqh, wait);
switch (ctx->state) {
case UFFD_STATE_WAIT_API:
if (!userfaultfd_is_initialized(ctx))
return EPOLLERR;
case UFFD_STATE_RUNNING:
/*
* poll() never guarantees that read won't block.
* userfaults can be waken before they're read().
*/
if (unlikely(!(file->f_flags & O_NONBLOCK)))
return EPOLLERR;
/*
* lockless access to see if there are pending faults
* __pollwait last action is the add_wait_queue but
* the spin_unlock would allow the waitqueue_active to
* pass above the actual list_add inside
* add_wait_queue critical section. So use a full
* memory barrier to serialize the list_add write of
* add_wait_queue() with the waitqueue_active read
* below.
*/
ret = 0;
smp_mb();
if (waitqueue_active(&ctx->fault_pending_wqh))
ret = EPOLLIN;
else if (waitqueue_active(&ctx->event_wqh))
ret = EPOLLIN;
return ret;
default:
WARN_ON_ONCE(1);
/*
* poll() never guarantees that read won't block.
* userfaults can be waken before they're read().
*/
if (unlikely(!(file->f_flags & O_NONBLOCK)))
return EPOLLERR;
}
/*
* lockless access to see if there are pending faults
* __pollwait last action is the add_wait_queue but
* the spin_unlock would allow the waitqueue_active to
* pass above the actual list_add inside
* add_wait_queue critical section. So use a full
* memory barrier to serialize the list_add write of
* add_wait_queue() with the waitqueue_active read
* below.
*/
ret = 0;
smp_mb();
if (waitqueue_active(&ctx->fault_pending_wqh))
ret = EPOLLIN;
else if (waitqueue_active(&ctx->event_wqh))
ret = EPOLLIN;
return ret;
}
static const struct file_operations userfaultfd_fops;
@ -1169,7 +1165,7 @@ static ssize_t userfaultfd_read(struct file *file, char __user *buf,
int no_wait = file->f_flags & O_NONBLOCK;
struct inode *inode = file_inode(file);
if (ctx->state == UFFD_STATE_WAIT_API)
if (!userfaultfd_is_initialized(ctx))
return -EINVAL;
for (;;) {
@ -1700,7 +1696,7 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx,
user_uffdio_copy = (struct uffdio_copy __user *) arg;
ret = -EAGAIN;
if (READ_ONCE(ctx->mmap_changing))
if (atomic_read(&ctx->mmap_changing))
goto out;
ret = -EFAULT;
@ -1757,7 +1753,7 @@ static int userfaultfd_zeropage(struct userfaultfd_ctx *ctx,
user_uffdio_zeropage = (struct uffdio_zeropage __user *) arg;
ret = -EAGAIN;
if (READ_ONCE(ctx->mmap_changing))
if (atomic_read(&ctx->mmap_changing))
goto out;
ret = -EFAULT;
@ -1807,7 +1803,7 @@ static int userfaultfd_writeprotect(struct userfaultfd_ctx *ctx,
struct userfaultfd_wake_range range;
bool mode_wp, mode_dontwake;
if (READ_ONCE(ctx->mmap_changing))
if (atomic_read(&ctx->mmap_changing))
return -EAGAIN;
user_uffdio_wp = (struct uffdio_writeprotect __user *) arg;
@ -1855,7 +1851,7 @@ static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long arg)
user_uffdio_continue = (struct uffdio_continue __user *)arg;
ret = -EAGAIN;
if (READ_ONCE(ctx->mmap_changing))
if (atomic_read(&ctx->mmap_changing))
goto out;
ret = -EFAULT;
@ -1908,9 +1904,10 @@ out:
static inline unsigned int uffd_ctx_features(__u64 user_features)
{
/*
* For the current set of features the bits just coincide
* For the current set of features the bits just coincide. Set
* UFFD_FEATURE_INITIALIZED to mark the features as enabled.
*/
return (unsigned int)user_features;
return (unsigned int)user_features | UFFD_FEATURE_INITIALIZED;
}
/*
@ -1923,12 +1920,10 @@ static int userfaultfd_api(struct userfaultfd_ctx *ctx,
{
struct uffdio_api uffdio_api;
void __user *buf = (void __user *)arg;
unsigned int ctx_features;
int ret;
__u64 features;
ret = -EINVAL;
if (ctx->state != UFFD_STATE_WAIT_API)
goto out;
ret = -EFAULT;
if (copy_from_user(&uffdio_api, buf, sizeof(uffdio_api)))
goto out;
@ -1952,9 +1947,13 @@ static int userfaultfd_api(struct userfaultfd_ctx *ctx,
ret = -EFAULT;
if (copy_to_user(buf, &uffdio_api, sizeof(uffdio_api)))
goto out;
ctx->state = UFFD_STATE_RUNNING;
/* only enable the requested features for this uffd context */
ctx->features = uffd_ctx_features(features);
ctx_features = uffd_ctx_features(features);
ret = -EINVAL;
if (cmpxchg(&ctx->features, 0, ctx_features) != 0)
goto err_out;
ret = 0;
out:
return ret;
@ -1971,7 +1970,7 @@ static long userfaultfd_ioctl(struct file *file, unsigned cmd,
int ret = -EINVAL;
struct userfaultfd_ctx *ctx = file->private_data;
if (cmd != UFFDIO_API && ctx->state == UFFD_STATE_WAIT_API)
if (cmd != UFFDIO_API && !userfaultfd_is_initialized(ctx))
return -EINVAL;
switch(cmd) {
@ -2085,9 +2084,8 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
refcount_set(&ctx->refcount, 1);
ctx->flags = flags;
ctx->features = 0;
ctx->state = UFFD_STATE_WAIT_API;
ctx->released = false;
ctx->mmap_changing = false;
atomic_set(&ctx->mmap_changing, 0);
ctx->mm = current->mm;
/* prevent the mm struct to be freed */
mmgrab(ctx->mm);

View File

@ -116,6 +116,7 @@ struct bdi_writeback {
struct list_head b_dirty_time; /* time stamps are dirty */
spinlock_t list_lock; /* protects the b_* lists */
atomic_t writeback_inodes; /* number of inodes under writeback */
struct percpu_counter stat[NR_WB_STAT_ITEMS];
unsigned long congested; /* WB_[a]sync_congested flags */
@ -142,6 +143,7 @@ struct bdi_writeback {
spinlock_t work_lock; /* protects work_list & dwork scheduling */
struct list_head work_list;
struct delayed_work dwork; /* work item used for writeback */
struct delayed_work bw_dwork; /* work item used for bandwidth estimate */
unsigned long dirty_sleep; /* last wait */

View File

@ -288,6 +288,17 @@ static inline struct bdi_writeback *inode_to_wb(const struct inode *inode)
return inode->i_wb;
}
static inline struct bdi_writeback *inode_to_wb_wbc(
struct inode *inode,
struct writeback_control *wbc)
{
/*
* If wbc does not have inode attached, it means cgroup writeback was
* disabled when wbc started. Just use the default wb in that case.
*/
return wbc->wb ? wbc->wb : &inode_to_bdi(inode)->wb;
}
/**
* unlocked_inode_to_wb_begin - begin unlocked inode wb access transaction
* @inode: target inode
@ -366,6 +377,14 @@ static inline struct bdi_writeback *inode_to_wb(struct inode *inode)
return &inode_to_bdi(inode)->wb;
}
static inline struct bdi_writeback *inode_to_wb_wbc(
struct inode *inode,
struct writeback_control *wbc)
{
return inode_to_wb(inode);
}
static inline struct bdi_writeback *
unlocked_inode_to_wb_begin(struct inode *inode, struct wb_lock_cookie *cookie)
{

View File

@ -409,7 +409,7 @@ static inline void invalidate_inode_buffers(struct inode *inode) {}
static inline int remove_inode_buffers(struct inode *inode) { return 1; }
static inline int sync_mapping_buffers(struct address_space *mapping) { return 0; }
static inline void invalidate_bh_lrus_cpu(int cpu) {}
static inline bool has_bh_in_lru(int cpu, void *dummy) { return 0; }
static inline bool has_bh_in_lru(int cpu, void *dummy) { return false; }
#define buffer_heads_over_limit 0
#endif /* CONFIG_BLOCK */

View File

@ -84,6 +84,8 @@ static inline unsigned long compact_gap(unsigned int order)
extern unsigned int sysctl_compaction_proactiveness;
extern int sysctl_compaction_handler(struct ctl_table *table, int write,
void *buffer, size_t *length, loff_t *ppos);
extern int compaction_proactiveness_sysctl_handler(struct ctl_table *table,
int write, void *buffer, size_t *length, loff_t *ppos);
extern int sysctl_extfrag_threshold;
extern int sysctl_compact_unevictable_allowed;

View File

@ -130,10 +130,7 @@ static inline void flush_anon_page(struct vm_area_struct *vma, struct page *page
}
#endif
#ifndef ARCH_HAS_FLUSH_KERNEL_DCACHE_PAGE
static inline void flush_kernel_dcache_page(struct page *page)
{
}
#ifndef ARCH_IMPLEMENTS_FLUSH_KERNEL_VMAP_RANGE
static inline void flush_kernel_vmap_range(void *vaddr, int size)
{
}

View File

@ -121,6 +121,13 @@ static inline void hugetlb_cgroup_put_rsvd_cgroup(struct hugetlb_cgroup *h_cg)
css_put(&h_cg->css);
}
static inline void resv_map_dup_hugetlb_cgroup_uncharge_info(
struct resv_map *resv_map)
{
if (resv_map->css)
css_get(resv_map->css);
}
extern int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages,
struct hugetlb_cgroup **ptr);
extern int hugetlb_cgroup_charge_cgroup_rsvd(int idx, unsigned long nr_pages,
@ -199,6 +206,11 @@ static inline void hugetlb_cgroup_put_rsvd_cgroup(struct hugetlb_cgroup *h_cg)
{
}
static inline void resv_map_dup_hugetlb_cgroup_uncharge_info(
struct resv_map *resv_map)
{
}
static inline int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages,
struct hugetlb_cgroup **ptr)
{

View File

@ -99,8 +99,6 @@ void memblock_discard(void);
static inline void memblock_discard(void) {}
#endif
phys_addr_t memblock_find_in_range(phys_addr_t start, phys_addr_t end,
phys_addr_t size, phys_addr_t align);
void memblock_allow_resize(void);
int memblock_add_node(phys_addr_t base, phys_addr_t size, int nid);
int memblock_add(phys_addr_t base, phys_addr_t size);

View File

@ -105,14 +105,6 @@ struct mem_cgroup_reclaim_iter {
unsigned int generation;
};
struct lruvec_stat {
long count[NR_VM_NODE_STAT_ITEMS];
};
struct batched_lruvec_stat {
s32 count[NR_VM_NODE_STAT_ITEMS];
};
/*
* Bitmap and deferred work of shrinker::id corresponding to memcg-aware
* shrinkers, which have elements charged to this memcg.
@ -123,24 +115,30 @@ struct shrinker_info {
unsigned long *map;
};
struct lruvec_stats_percpu {
/* Local (CPU and cgroup) state */
long state[NR_VM_NODE_STAT_ITEMS];
/* Delta calculation for lockless upward propagation */
long state_prev[NR_VM_NODE_STAT_ITEMS];
};
struct lruvec_stats {
/* Aggregated (CPU and subtree) state */
long state[NR_VM_NODE_STAT_ITEMS];
/* Pending child counts during tree propagation */
long state_pending[NR_VM_NODE_STAT_ITEMS];
};
/*
* per-node information in memory controller.
*/
struct mem_cgroup_per_node {
struct lruvec lruvec;
/*
* Legacy local VM stats. This should be struct lruvec_stat and
* cannot be optimized to struct batched_lruvec_stat. Because
* the threshold of the lruvec_stat_cpu can be as big as
* MEMCG_CHARGE_BATCH * PAGE_SIZE. It can fit into s32. But this
* filed has no upper limit.
*/
struct lruvec_stat __percpu *lruvec_stat_local;
/* Subtree VM stats (batched updates) */
struct batched_lruvec_stat __percpu *lruvec_stat_cpu;
atomic_long_t lruvec_stat[NR_VM_NODE_STAT_ITEMS];
struct lruvec_stats_percpu __percpu *lruvec_stats_percpu;
struct lruvec_stats lruvec_stats;
unsigned long lru_zone_size[MAX_NR_ZONES][NR_LRU_LISTS];
@ -595,13 +593,6 @@ static inline struct obj_cgroup **page_objcgs_check(struct page *page)
}
#endif
static __always_inline bool memcg_stat_item_in_bytes(int idx)
{
if (idx == MEMCG_PERCPU_B)
return true;
return vmstat_item_in_bytes(idx);
}
static inline bool mem_cgroup_is_root(struct mem_cgroup *memcg)
{
return (memcg == root_mem_cgroup);
@ -693,13 +684,35 @@ static inline bool mem_cgroup_below_min(struct mem_cgroup *memcg)
page_counter_read(&memcg->memory);
}
int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask);
int __mem_cgroup_charge(struct page *page, struct mm_struct *mm,
gfp_t gfp_mask);
static inline int mem_cgroup_charge(struct page *page, struct mm_struct *mm,
gfp_t gfp_mask)
{
if (mem_cgroup_disabled())
return 0;
return __mem_cgroup_charge(page, mm, gfp_mask);
}
int mem_cgroup_swapin_charge_page(struct page *page, struct mm_struct *mm,
gfp_t gfp, swp_entry_t entry);
void mem_cgroup_swapin_uncharge_swap(swp_entry_t entry);
void mem_cgroup_uncharge(struct page *page);
void mem_cgroup_uncharge_list(struct list_head *page_list);
void __mem_cgroup_uncharge(struct page *page);
static inline void mem_cgroup_uncharge(struct page *page)
{
if (mem_cgroup_disabled())
return;
__mem_cgroup_uncharge(page);
}
void __mem_cgroup_uncharge_list(struct list_head *page_list);
static inline void mem_cgroup_uncharge_list(struct list_head *page_list)
{
if (mem_cgroup_disabled())
return;
__mem_cgroup_uncharge_list(page_list);
}
void mem_cgroup_migrate(struct page *oldpage, struct page *newpage);
@ -884,11 +897,6 @@ static inline bool mem_cgroup_online(struct mem_cgroup *memcg)
return !!(memcg->css.flags & CSS_ONLINE);
}
/*
* For memory reclaim.
*/
int mem_cgroup_select_victim_node(struct mem_cgroup *memcg);
void mem_cgroup_update_lru_size(struct lruvec *lruvec, enum lru_list lru,
int zid, int nr_pages);
@ -955,22 +963,21 @@ static inline void mod_memcg_state(struct mem_cgroup *memcg,
local_irq_restore(flags);
}
static inline unsigned long memcg_page_state(struct mem_cgroup *memcg, int idx)
{
return READ_ONCE(memcg->vmstats.state[idx]);
}
static inline unsigned long lruvec_page_state(struct lruvec *lruvec,
enum node_stat_item idx)
{
struct mem_cgroup_per_node *pn;
long x;
if (mem_cgroup_disabled())
return node_page_state(lruvec_pgdat(lruvec), idx);
pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
x = atomic_long_read(&pn->lruvec_stat[idx]);
#ifdef CONFIG_SMP
if (x < 0)
x = 0;
#endif
return x;
return READ_ONCE(pn->lruvec_stats.state[idx]);
}
static inline unsigned long lruvec_page_state_local(struct lruvec *lruvec,
@ -985,7 +992,7 @@ static inline unsigned long lruvec_page_state_local(struct lruvec *lruvec,
pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
for_each_possible_cpu(cpu)
x += per_cpu(pn->lruvec_stat_local->count[idx], cpu);
x += per_cpu(pn->lruvec_stats_percpu->state[idx], cpu);
#ifdef CONFIG_SMP
if (x < 0)
x = 0;
@ -993,6 +1000,8 @@ static inline unsigned long lruvec_page_state_local(struct lruvec *lruvec,
return x;
}
void mem_cgroup_flush_stats(void);
void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
int val);
void __mod_lruvec_kmem_state(void *p, enum node_stat_item idx, int val);
@ -1391,6 +1400,11 @@ static inline void mod_memcg_state(struct mem_cgroup *memcg,
{
}
static inline unsigned long memcg_page_state(struct mem_cgroup *memcg, int idx)
{
return 0;
}
static inline unsigned long lruvec_page_state(struct lruvec *lruvec,
enum node_stat_item idx)
{
@ -1403,6 +1417,10 @@ static inline unsigned long lruvec_page_state_local(struct lruvec *lruvec,
return node_page_state(lruvec_pgdat(lruvec), idx);
}
static inline void mem_cgroup_flush_stats(void)
{
}
static inline void __mod_memcg_lruvec_state(struct lruvec *lruvec,
enum node_stat_item idx, int val)
{

View File

@ -90,7 +90,7 @@ int create_memory_block_devices(unsigned long start, unsigned long size,
void remove_memory_block_devices(unsigned long start, unsigned long size);
extern void memory_dev_init(void);
extern int memory_notify(unsigned long val, void *v);
extern struct memory_block *find_memory_block(struct mem_section *);
extern struct memory_block *find_memory_block(unsigned long section_nr);
typedef int (*walk_memory_blocks_func_t)(struct memory_block *, void *);
extern int walk_memory_blocks(unsigned long start, unsigned long size,
void *arg, walk_memory_blocks_func_t func);

View File

@ -184,6 +184,14 @@ extern bool vma_migratable(struct vm_area_struct *vma);
extern int mpol_misplaced(struct page *, struct vm_area_struct *, unsigned long);
extern void mpol_put_task_policy(struct task_struct *);
extern bool numa_demotion_enabled;
static inline bool mpol_is_preferred_many(struct mempolicy *pol)
{
return (pol->mode == MPOL_PREFERRED_MANY);
}
#else
struct mempolicy {};
@ -292,5 +300,13 @@ static inline nodemask_t *policy_nodemask_current(gfp_t gfp)
{
return NULL;
}
#define numa_demotion_enabled false
static inline bool mpol_is_preferred_many(struct mempolicy *pol)
{
return false;
}
#endif /* CONFIG_NUMA */
#endif

View File

@ -28,6 +28,7 @@ enum migrate_reason {
MR_NUMA_MISPLACED,
MR_CONTIG_RANGE,
MR_LONGTERM_PIN,
MR_DEMOTION,
MR_TYPES
};
@ -41,7 +42,8 @@ extern int migrate_page(struct address_space *mapping,
struct page *newpage, struct page *page,
enum migrate_mode mode);
extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t free,
unsigned long private, enum migrate_mode mode, int reason);
unsigned long private, enum migrate_mode mode, int reason,
unsigned int *ret_succeeded);
extern struct page *alloc_migration_target(struct page *page, unsigned long private);
extern int isolate_movable_page(struct page *page, isolate_mode_t mode);
@ -56,7 +58,7 @@ extern int migrate_page_move_mapping(struct address_space *mapping,
static inline void putback_movable_pages(struct list_head *l) {}
static inline int migrate_pages(struct list_head *l, new_page_t new,
free_page_t free, unsigned long private, enum migrate_mode mode,
int reason)
int reason, unsigned int *ret_succeeded)
{ return -ENOSYS; }
static inline struct page *alloc_migration_target(struct page *page,
unsigned long private)
@ -166,6 +168,14 @@ struct migrate_vma {
int migrate_vma_setup(struct migrate_vma *args);
void migrate_vma_pages(struct migrate_vma *migrate);
void migrate_vma_finalize(struct migrate_vma *migrate);
int next_demotion_node(int node);
#else /* CONFIG_MIGRATION disabled: */
static inline int next_demotion_node(int node)
{
return NUMA_NO_NODE;
}
#endif /* CONFIG_MIGRATION */

View File

@ -1216,18 +1216,10 @@ static inline void get_page(struct page *page)
}
bool __must_check try_grab_page(struct page *page, unsigned int flags);
__maybe_unused struct page *try_grab_compound_head(struct page *page, int refs,
unsigned int flags);
struct page *try_grab_compound_head(struct page *page, int refs,
unsigned int flags);
static inline __must_check bool try_get_page(struct page *page)
{
page = compound_head(page);
if (WARN_ON_ONCE(page_ref_count(page) <= 0))
return false;
page_ref_inc(page);
return true;
}
struct page *try_get_compound_head(struct page *page, int refs);
static inline void put_page(struct page *page)
{
@ -1849,7 +1841,6 @@ int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc,
struct kvec;
int get_kernel_pages(const struct kvec *iov, int nr_pages, int write,
struct page **pages);
int get_kernel_page(unsigned long start, int write, struct page **pages);
struct page *get_dump_page(unsigned long addr);
extern int try_to_release_page(struct page * page, gfp_t gfp_mask);
@ -3121,7 +3112,7 @@ extern void memory_failure_queue_kick(int cpu);
extern int unpoison_memory(unsigned long pfn);
extern int sysctl_memory_failure_early_kill;
extern int sysctl_memory_failure_recovery;
extern void shake_page(struct page *p, int access);
extern void shake_page(struct page *p);
extern atomic_long_t num_poisoned_pages __read_mostly;
extern int soft_offline_page(unsigned long pfn, int flags);

View File

@ -846,6 +846,7 @@ typedef struct pglist_data {
enum zone_type kcompactd_highest_zoneidx;
wait_queue_head_t kcompactd_wait;
struct task_struct *kcompactd;
bool proactive_compact_trigger;
#endif
/*
* This is a per-node reserve of pages that are not available
@ -1342,7 +1343,6 @@ static inline struct mem_section *__nr_to_section(unsigned long nr)
return NULL;
return &mem_section[SECTION_NR_TO_ROOT(nr)][nr & SECTION_ROOT_MASK];
}
extern unsigned long __section_nr(struct mem_section *ms);
extern size_t mem_section_usage_size(void);
/*
@ -1365,7 +1365,7 @@ extern size_t mem_section_usage_size(void);
#define SECTION_TAINT_ZONE_DEVICE (1UL<<4)
#define SECTION_MAP_LAST_BIT (1UL<<5)
#define SECTION_MAP_MASK (~(SECTION_MAP_LAST_BIT-1))
#define SECTION_NID_SHIFT 3
#define SECTION_NID_SHIFT 6
static inline struct page *__section_mem_map_addr(struct mem_section *section)
{

View File

@ -736,7 +736,7 @@ extern void add_page_wait_queue(struct page *page, wait_queue_entry_t *waiter);
/*
* Fault everything in given userspace address range in.
*/
static inline int fault_in_pages_writeable(char __user *uaddr, int size)
static inline int fault_in_pages_writeable(char __user *uaddr, size_t size)
{
char __user *end = uaddr + size - 1;
@ -763,7 +763,7 @@ static inline int fault_in_pages_writeable(char __user *uaddr, int size)
return 0;
}
static inline int fault_in_pages_readable(const char __user *uaddr, int size)
static inline int fault_in_pages_readable(const char __user *uaddr, size_t size)
{
volatile char c;
const char __user *end = uaddr + size - 1;

View File

@ -174,13 +174,13 @@ static inline gfp_t current_gfp_context(gfp_t flags)
}
#ifdef CONFIG_LOCKDEP
extern void __fs_reclaim_acquire(void);
extern void __fs_reclaim_release(void);
extern void __fs_reclaim_acquire(unsigned long ip);
extern void __fs_reclaim_release(unsigned long ip);
extern void fs_reclaim_acquire(gfp_t gfp_mask);
extern void fs_reclaim_release(gfp_t gfp_mask);
#else
static inline void __fs_reclaim_acquire(void) { }
static inline void __fs_reclaim_release(void) { }
static inline void __fs_reclaim_acquire(unsigned long ip) { }
static inline void __fs_reclaim_release(unsigned long ip) { }
static inline void fs_reclaim_acquire(gfp_t gfp_mask) { }
static inline void fs_reclaim_release(gfp_t gfp_mask) { }
#endif
@ -306,7 +306,7 @@ set_active_memcg(struct mem_cgroup *memcg)
{
struct mem_cgroup *old;
if (in_interrupt()) {
if (!in_task()) {
old = this_cpu_read(int_active_memcg);
this_cpu_write(int_active_memcg, memcg);
} else {

View File

@ -18,6 +18,7 @@ struct shmem_inode_info {
unsigned long flags;
unsigned long alloced; /* data pages alloced to file */
unsigned long swapped; /* subtotal assigned to swap */
pgoff_t fallocend; /* highest fallocate endindex */
struct list_head shrinklist; /* shrinkable hpage inodes */
struct list_head swaplist; /* chain of maybes on swap */
struct shared_policy policy; /* NUMA memory alloc policy */
@ -31,7 +32,7 @@ struct shmem_sb_info {
struct percpu_counter used_blocks; /* How many are allocated */
unsigned long max_inodes; /* How many inodes are allowed */
unsigned long free_inodes; /* How many are left for allocation */
spinlock_t stat_lock; /* Serialize shmem_sb_info changes */
raw_spinlock_t stat_lock; /* Serialize shmem_sb_info changes */
umode_t mode; /* Mount mode for root directory */
unsigned char huge; /* Whether to try for hugepages */
kuid_t uid; /* Mount uid for root directory */
@ -85,7 +86,12 @@ extern void shmem_truncate_range(struct inode *inode, loff_t start, loff_t end);
extern int shmem_unuse(unsigned int type, bool frontswap,
unsigned long *fs_pages_to_unuse);
extern bool shmem_huge_enabled(struct vm_area_struct *vma);
extern bool shmem_is_huge(struct vm_area_struct *vma,
struct inode *inode, pgoff_t index);
static inline bool shmem_huge_enabled(struct vm_area_struct *vma)
{
return shmem_is_huge(vma, file_inode(vma->vm_file), vma->vm_pgoff);
}
extern unsigned long shmem_swap_usage(struct vm_area_struct *vma);
extern unsigned long shmem_partial_swap_usage(struct address_space *mapping,
pgoff_t start, pgoff_t end);
@ -93,9 +99,8 @@ extern unsigned long shmem_partial_swap_usage(struct address_space *mapping,
/* Flag allocation requirements to shmem_getpage */
enum sgp_type {
SGP_READ, /* don't exceed i_size, don't allocate page */
SGP_NOALLOC, /* similar, but fail on hole or use fallocated page */
SGP_CACHE, /* don't exceed i_size, may allocate page */
SGP_NOHUGE, /* like SGP_CACHE, but no huge pages */
SGP_HUGE, /* like SGP_CACHE, huge pages preferred */
SGP_WRITE, /* may exceed i_size, may allocate !Uptodate page */
SGP_FALLOC, /* like SGP_WRITE, but make existing page Uptodate */
};
@ -119,6 +124,18 @@ static inline bool shmem_file(struct file *file)
return shmem_mapping(file->f_mapping);
}
/*
* If fallocate(FALLOC_FL_KEEP_SIZE) has been used, there may be pages
* beyond i_size's notion of EOF, which fallocate has committed to reserving:
* which split_huge_page() must therefore not delete. This use of a single
* "fallocend" per inode errs on the side of not deleting a reservation when
* in doubt: there are plenty of cases when it preserves unreserved pages.
*/
static inline pgoff_t shmem_fallocend(struct inode *inode, pgoff_t eof)
{
return max(eof, SHMEM_I(inode)->fallocend);
}
extern bool shmem_charge(struct inode *inode, long pages);
extern void shmem_uncharge(struct inode *inode, long pages);

View File

@ -408,7 +408,7 @@ static inline bool node_reclaim_enabled(void)
extern void check_move_unevictable_pages(struct pagevec *pvec);
extern int kswapd_run(int nid);
extern void kswapd_run(int nid);
extern void kswapd_stop(int nid);
#ifdef CONFIG_SWAP
@ -721,7 +721,13 @@ static inline int mem_cgroup_swappiness(struct mem_cgroup *mem)
#endif
#if defined(CONFIG_SWAP) && defined(CONFIG_MEMCG) && defined(CONFIG_BLK_CGROUP)
extern void cgroup_throttle_swaprate(struct page *page, gfp_t gfp_mask);
extern void __cgroup_throttle_swaprate(struct page *page, gfp_t gfp_mask);
static inline void cgroup_throttle_swaprate(struct page *page, gfp_t gfp_mask)
{
if (mem_cgroup_disabled())
return;
__cgroup_throttle_swaprate(page, gfp_mask);
}
#else
static inline void cgroup_throttle_swaprate(struct page *page, gfp_t gfp_mask)
{
@ -730,8 +736,22 @@ static inline void cgroup_throttle_swaprate(struct page *page, gfp_t gfp_mask)
#ifdef CONFIG_MEMCG_SWAP
extern void mem_cgroup_swapout(struct page *page, swp_entry_t entry);
extern int mem_cgroup_try_charge_swap(struct page *page, swp_entry_t entry);
extern void mem_cgroup_uncharge_swap(swp_entry_t entry, unsigned int nr_pages);
extern int __mem_cgroup_try_charge_swap(struct page *page, swp_entry_t entry);
static inline int mem_cgroup_try_charge_swap(struct page *page, swp_entry_t entry)
{
if (mem_cgroup_disabled())
return 0;
return __mem_cgroup_try_charge_swap(page, entry);
}
extern void __mem_cgroup_uncharge_swap(swp_entry_t entry, unsigned int nr_pages);
static inline void mem_cgroup_uncharge_swap(swp_entry_t entry, unsigned int nr_pages)
{
if (mem_cgroup_disabled())
return;
__mem_cgroup_uncharge_swap(entry, nr_pages);
}
extern long mem_cgroup_get_nr_swap_pages(struct mem_cgroup *memcg);
extern bool mem_cgroup_swap_full(struct page *page);
#else

View File

@ -915,6 +915,7 @@ asmlinkage long sys_mincore(unsigned long start, size_t len,
asmlinkage long sys_madvise(unsigned long start, size_t len, int behavior);
asmlinkage long sys_process_madvise(int pidfd, const struct iovec __user *vec,
size_t vlen, int behavior, unsigned int flags);
asmlinkage long sys_process_mrelease(int pidfd, unsigned int flags);
asmlinkage long sys_remap_file_pages(unsigned long start, unsigned long size,
unsigned long prot, unsigned long pgoff,
unsigned long flags);

View File

@ -60,16 +60,16 @@ extern int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
extern ssize_t mcopy_atomic(struct mm_struct *dst_mm, unsigned long dst_start,
unsigned long src_start, unsigned long len,
bool *mmap_changing, __u64 mode);
atomic_t *mmap_changing, __u64 mode);
extern ssize_t mfill_zeropage(struct mm_struct *dst_mm,
unsigned long dst_start,
unsigned long len,
bool *mmap_changing);
atomic_t *mmap_changing);
extern ssize_t mcopy_continue(struct mm_struct *dst_mm, unsigned long dst_start,
unsigned long len, bool *mmap_changing);
unsigned long len, atomic_t *mmap_changing);
extern int mwriteprotect_range(struct mm_struct *dst_mm,
unsigned long start, unsigned long len,
bool enable_wp, bool *mmap_changing);
bool enable_wp, atomic_t *mmap_changing);
/* mm helpers */
static inline bool is_mergeable_vm_userfaultfd_ctx(struct vm_area_struct *vma,

View File

@ -33,6 +33,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
PGREUSE,
PGSTEAL_KSWAPD,
PGSTEAL_DIRECT,
PGDEMOTE_KSWAPD,
PGDEMOTE_DIRECT,
PGSCAN_KSWAPD,
PGSCAN_DIRECT,
PGSCAN_DIRECT_THROTTLE,

Some files were not shown because too many files have changed in this diff Show More