Merge branch 'akpm' (second patch-bomb from Andrew)

Merge second patchbomb from Andrew Morton:
 - the rest of MM
 - misc fs fixes
 - add execveat() syscall
 - new ratelimit feature for fault-injection
 - decompressor updates
 - ipc/ updates
 - fallocate feature creep
 - fsnotify cleanups
 - a few other misc things

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (99 commits)
  cgroups: Documentation: fix trivial typos and wrong paragraph numberings
  parisc: percpu: update comments referring to __get_cpu_var
  percpu: update local_ops.txt to reflect this_cpu operations
  percpu: remove __get_cpu_var and __raw_get_cpu_var macros
  fsnotify: remove destroy_list from fsnotify_mark
  fsnotify: unify inode and mount marks handling
  fallocate: create FAN_MODIFY and IN_MODIFY events
  mm/cma: make kmemleak ignore CMA regions
  slub: fix cpuset check in get_any_partial
  slab: fix cpuset check in fallback_alloc
  shmdt: use i_size_read() instead of ->i_size
  ipc/shm.c: fix overly aggressive shmdt() when calls span multiple segments
  ipc/msg: increase MSGMNI, remove scaling
  ipc/sem.c: increase SEMMSL, SEMMNI, SEMOPM
  ipc/sem.c: change memory barrier in sem_lock() to smp_rmb()
  lib/decompress.c: consistency of compress formats for kernel image
  decompress_bunzip2: off by one in get_next_block()
  usr/Kconfig: make initrd compression algorithm selection not expert
  fault-inject: add ratelimit option
  ratelimit: add initialization macro
  ...
This commit is contained in:
Linus Torvalds 2014-12-13 13:00:36 -08:00
commit 78a45c6f06
154 changed files with 3208 additions and 1363 deletions

View file

@ -445,7 +445,7 @@ across partially overlapping sets of CPUs would risk unstable dynamics
that would be beyond our understanding. So if each of two partially
overlapping cpusets enables the flag 'cpuset.sched_load_balance', then we
form a single sched domain that is a superset of both. We won't move
a task to a CPU outside it cpuset, but the scheduler load balancing
a task to a CPU outside its cpuset, but the scheduler load balancing
code might waste some compute cycles considering that possibility.
This mismatch is why there is not a simple one-to-one relation
@ -552,8 +552,8 @@ otherwise initial value -1 that indicates the cpuset has no request.
1 : search siblings (hyperthreads in a core).
2 : search cores in a package.
3 : search cpus in a node [= system wide on non-NUMA system]
( 4 : search nodes in a chunk of node [on NUMA system] )
( 5 : search system wide [on NUMA system] )
4 : search nodes in a chunk of node [on NUMA system]
5 : search system wide [on NUMA system]
The system default is architecture dependent. The system default
can be changed using the relax_domain_level= boot parameter.

View file

@ -326,7 +326,7 @@ per cgroup, instead of globally.
* tcp memory pressure: sockets memory pressure for the tcp protocol.
2.7.3 Common use cases
2.7.2 Common use cases
Because the "kmem" counter is fed to the main user counter, kernel memory can
never be limited completely independently of user memory. Say "U" is the user
@ -354,19 +354,19 @@ set:
3. User Interface
0. Configuration
3.0. Configuration
a. Enable CONFIG_CGROUPS
b. Enable CONFIG_MEMCG
c. Enable CONFIG_MEMCG_SWAP (to use swap extension)
d. Enable CONFIG_MEMCG_KMEM (to use kmem extension)
1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?)
3.1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?)
# mount -t tmpfs none /sys/fs/cgroup
# mkdir /sys/fs/cgroup/memory
# mount -t cgroup none /sys/fs/cgroup/memory -o memory
2. Make the new group and move bash into it
3.2. Make the new group and move bash into it
# mkdir /sys/fs/cgroup/memory/0
# echo $$ > /sys/fs/cgroup/memory/0/tasks

View file

@ -829,6 +829,15 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
CONFIG_DEBUG_PAGEALLOC, hence this option will not help
tracking down these problems.
debug_pagealloc=
[KNL] When CONFIG_DEBUG_PAGEALLOC is set, this
parameter enables the feature at boot time. In
default, it is disabled. We can avoid allocating huge
chunk of memory for debug pagealloc if we don't enable
it at boot time and the system will work mostly same
with the kernel built without CONFIG_DEBUG_PAGEALLOC.
on: enable the feature
debugpat [X86] Enable PAT debugging
decnet.addr= [HW,NET]
@ -1228,9 +1237,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
multiple times interleaved with hugepages= to reserve
huge pages of different sizes. Valid pages sizes on
x86-64 are 2M (when the CPU supports "pse") and 1G
(when the CPU supports the "pdpe1gb" cpuinfo flag)
Note that 1GB pages can only be allocated at boot time
using hugepages= and not freed afterwards.
(when the CPU supports the "pdpe1gb" cpuinfo flag).
hvc_iucv= [S390] Number of z/VM IUCV hypervisor console (HVC)
terminal devices. Valid values: 0..8
@ -2506,6 +2513,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
OSS [HW,OSS]
See Documentation/sound/oss/oss-parameters.txt
page_owner= [KNL] Boot-time page_owner enabling option.
Storage of the information about who allocated
each page is disabled in default. With this switch,
we can turn it on.
on: enable the feature
panic= [KNL] Kernel behaviour on panic: delay <timeout>
timeout > 0: seconds before rebooting
timeout = 0: wait forever

View file

@ -8,6 +8,11 @@ to implement them for any given architecture and shows how they can be used
properly. It also stresses on the precautions that must be taken when reading
those local variables across CPUs when the order of memory writes matters.
Note that local_t based operations are not recommended for general kernel use.
Please use the this_cpu operations instead unless there is really a special purpose.
Most uses of local_t in the kernel have been replaced by this_cpu operations.
this_cpu operations combine the relocation with the local_t like semantics in
a single instruction and yield more compact and faster executing code.
* Purpose of local atomic operations
@ -87,10 +92,10 @@ the per cpu variable. For instance :
local_inc(&get_cpu_var(counters));
put_cpu_var(counters);
If you are already in a preemption-safe context, you can directly use
__get_cpu_var() instead.
If you are already in a preemption-safe context, you can use
this_cpu_ptr() instead.
local_inc(&__get_cpu_var(counters));
local_inc(this_cpu_ptr(&counters));
@ -134,7 +139,7 @@ static void test_each(void *info)
{
/* Increment the counter from a non preemptible context */
printk("Increment on cpu %d\n", smp_processor_id());
local_inc(&__get_cpu_var(counters));
local_inc(this_cpu_ptr(&counters));
/* This is what incrementing the variable would look like within a
* preemptible context (it disables preemption) :

View file

@ -116,10 +116,12 @@ set during run time.
auto_msgmni:
Enables/Disables automatic recomputing of msgmni upon memory add/remove
or upon ipc namespace creation/removal (see the msgmni description
above). Echoing "1" into this file enables msgmni automatic recomputing.
Echoing "0" turns it off. auto_msgmni default value is 1.
This variable has no effect and may be removed in future kernel
releases. Reading it always returns 0.
Up to Linux 3.17, it enabled/disabled automatic recomputing of msgmni
upon memory add/remove or upon ipc namespace creation/removal.
Echoing "1" into this file enabled msgmni automatic recomputing.
Echoing "0" turned it off. auto_msgmni default value was 1.
==============================================================

View file

@ -0,0 +1,81 @@
page owner: Tracking about who allocated each page
-----------------------------------------------------------
* Introduction
page owner is for the tracking about who allocated each page.
It can be used to debug memory leak or to find a memory hogger.
When allocation happens, information about allocation such as call stack
and order of pages is stored into certain storage for each page.
When we need to know about status of all pages, we can get and analyze
this information.
Although we already have tracepoint for tracing page allocation/free,
using it for analyzing who allocate each page is rather complex. We need
to enlarge the trace buffer for preventing overlapping until userspace
program launched. And, launched program continually dump out the trace
buffer for later analysis and it would change system behviour with more
possibility rather than just keeping it in memory, so bad for debugging.
page owner can also be used for various purposes. For example, accurate
fragmentation statistics can be obtained through gfp flag information of
each page. It is already implemented and activated if page owner is
enabled. Other usages are more than welcome.
page owner is disabled in default. So, if you'd like to use it, you need
to add "page_owner=on" into your boot cmdline. If the kernel is built
with page owner and page owner is disabled in runtime due to no enabling
boot option, runtime overhead is marginal. If disabled in runtime, it
doesn't require memory to store owner information, so there is no runtime
memory overhead. And, page owner inserts just two unlikely branches into
the page allocator hotpath and if it returns false then allocation is
done like as the kernel without page owner. These two unlikely branches
would not affect to allocation performance. Following is the kernel's
code size change due to this facility.
- Without page owner
text data bss dec hex filename
40662 1493 644 42799 a72f mm/page_alloc.o
- With page owner
text data bss dec hex filename
40892 1493 644 43029 a815 mm/page_alloc.o
1427 24 8 1459 5b3 mm/page_ext.o
2722 50 0 2772 ad4 mm/page_owner.o
Although, roughly, 4 KB code is added in total, page_alloc.o increase by
230 bytes and only half of it is in hotpath. Building the kernel with
page owner and turning it on if needed would be great option to debug
kernel memory problem.
There is one notice that is caused by implementation detail. page owner
stores information into the memory from struct page extension. This memory
is initialized some time later than that page allocator starts in sparse
memory system, so, until initialization, many pages can be allocated and
they would have no owner information. To fix it up, these early allocated
pages are investigated and marked as allocated in initialization phase.
Although it doesn't mean that they have the right owner information,
at least, we can tell whether the page is allocated or not,
more accurately. On 2GB memory x86-64 VM box, 13343 early allocated pages
are catched and marked, although they are mostly allocated from struct
page extension feature. Anyway, after that, no page is left in
un-tracking state.
* Usage
1) Build user-space helper
cd tools/vm
make page_owner_sort
2) Enable page owner
Add "page_owner=on" to boot cmdline.
3) Do the job what you want to debug
4) Analyze information from page owner
cat /sys/kernel/debug/page_owner > page_owner_full.txt
grep -v ^PFN page_owner_full.txt > page_owner.txt
./page_owner_sort page_owner.txt sorted_page_owner.txt
See the result about who allocated each page
in the sorted_page_owner.txt.

View file

@ -4045,7 +4045,7 @@ F: drivers/tty/serial/ucc_uart.c
FREESCALE SOC SOUND DRIVERS
M: Timur Tabi <timur@tabi.org>
M: Nicolin Chen <nicoleotsuka@gmail.com>
M: Xiubo Li <Li.Xiubo@freescale.com>
M: Xiubo Li <Xiubo.Lee@gmail.com>
L: alsa-devel@alsa-project.org (moderated for non-subscribers)
L: linuxppc-dev@lists.ozlabs.org
S: Maintained

View file

@ -5,6 +5,7 @@ config ARM
select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_HAVE_CUSTOM_GPIO_H
select ARCH_HAS_GCOV_PROFILE_ALL
select ARCH_MIGHT_HAVE_PC_PARPORT
select ARCH_SUPPORTS_ATOMIC_RMW
select ARCH_USE_BUILTIN_BSWAP

View file

@ -2,6 +2,7 @@ config ARM64
def_bool y
select ARCH_BINFMT_ELF_RANDOMIZE_PIE
select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
select ARCH_HAS_GCOV_PROFILE_ALL
select ARCH_HAS_SG_CHAIN
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_USE_CMPXCHG_LOCKREF

View file

@ -1,5 +1,6 @@
config MICROBLAZE
def_bool y
select ARCH_HAS_GCOV_PROFILE_ALL
select ARCH_MIGHT_HAVE_PC_PARPORT
select ARCH_WANT_IPC_PARSE_VERSION
select ARCH_WANT_OPTIONAL_GPIOLIB

View file

@ -38,14 +38,14 @@
LDREGX \t2(\t1),\t2
addil LT%exception_data,%r27
LDREG RT%exception_data(%r1),\t1
/* t1 = &__get_cpu_var(exception_data) */
/* t1 = this_cpu_ptr(&exception_data) */
add,l \t1,\t2,\t1
/* t1 = t1->fault_ip */
LDREG EXCDATA_IP(\t1), \t1
.endm
#else
.macro get_fault_ip t1 t2
/* t1 = &__get_cpu_var(exception_data) */
/* t1 = this_cpu_ptr(&exception_data) */
addil LT%exception_data,%r27
LDREG RT%exception_data(%r1),\t2
/* t1 = t2->fault_ip */

View file

@ -129,6 +129,7 @@ config PPC
select HAVE_BPF_JIT if PPC64
select HAVE_ARCH_JUMP_LABEL
select ARCH_HAVE_NMI_SAFE_CMPXCHG
select ARCH_HAS_GCOV_PROFILE_ALL
select GENERIC_SMP_IDLE_THREAD
select GENERIC_CMOS_UPDATE
select GENERIC_TIME_VSYSCALL_OLD

View file

@ -1514,7 +1514,7 @@ static void kernel_unmap_linear_page(unsigned long vaddr, unsigned long lmi)
mmu_kernel_ssize, 0);
}
void kernel_map_pages(struct page *page, int numpages, int enable)
void __kernel_map_pages(struct page *page, int numpages, int enable)
{
unsigned long flags, vaddr, lmi;
int i;

View file

@ -429,7 +429,7 @@ static int change_page_attr(struct page *page, int numpages, pgprot_t prot)
}
void kernel_map_pages(struct page *page, int numpages, int enable)
void __kernel_map_pages(struct page *page, int numpages, int enable)
{
if (PageHighMem(page))
return;

View file

@ -65,6 +65,7 @@ config S390
def_bool y
select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
select ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS
select ARCH_HAS_GCOV_PROFILE_ALL
select ARCH_HAVE_NMI_SAFE_CMPXCHG
select ARCH_INLINE_READ_LOCK
select ARCH_INLINE_READ_LOCK_BH

View file

@ -120,7 +120,7 @@ static void ipte_range(pte_t *pte, unsigned long address, int nr)
}
}
void kernel_map_pages(struct page *page, int numpages, int enable)
void __kernel_map_pages(struct page *page, int numpages, int enable)
{
unsigned long address;
int nr, i, j;

View file

@ -16,6 +16,7 @@ config SUPERH
select HAVE_DEBUG_BUGVERBOSE
select ARCH_HAVE_CUSTOM_GPIO_H
select ARCH_HAVE_NMI_SAFE_CMPXCHG if (GUSA_RB || CPU_SH4A)
select ARCH_HAS_GCOV_PROFILE_ALL
select PERF_USE_VMALLOC
select HAVE_DEBUG_KMEMLEAK
select HAVE_KERNEL_GZIP

View file

@ -415,8 +415,9 @@
#define __NR_getrandom 347
#define __NR_memfd_create 348
#define __NR_bpf 349
#define __NR_execveat 350
#define NR_syscalls 350
#define NR_syscalls 351
/* Bitmask values returned from kern_features system call. */
#define KERN_FEATURE_MIXED_MODE_STACK 0x00000001

View file

@ -6,6 +6,11 @@ sys64_execve:
jmpl %g1, %g0
flushw
sys64_execveat:
set sys_execveat, %g1
jmpl %g1, %g0
flushw
#ifdef CONFIG_COMPAT
sunos_execv:
mov %g0, %o2
@ -13,6 +18,11 @@ sys32_execve:
set compat_sys_execve, %g1
jmpl %g1, %g0
flushw
sys32_execveat:
set compat_sys_execveat, %g1
jmpl %g1, %g0
flushw
#endif
.align 32

View file

@ -87,3 +87,4 @@ sys_call_table:
/*335*/ .long sys_syncfs, sys_sendmmsg, sys_setns, sys_process_vm_readv, sys_process_vm_writev
/*340*/ .long sys_ni_syscall, sys_kcmp, sys_finit_module, sys_sched_setattr, sys_sched_getattr
/*345*/ .long sys_renameat2, sys_seccomp, sys_getrandom, sys_memfd_create, sys_bpf
/*350*/ .long sys_execveat

View file

@ -88,6 +88,7 @@ sys_call_table32:
.word sys_syncfs, compat_sys_sendmmsg, sys_setns, compat_sys_process_vm_readv, compat_sys_process_vm_writev
/*340*/ .word sys_kern_features, sys_kcmp, sys_finit_module, sys_sched_setattr, sys_sched_getattr
.word sys32_renameat2, sys_seccomp, sys_getrandom, sys_memfd_create, sys_bpf
/*350*/ .word sys32_execveat
#endif /* CONFIG_COMPAT */
@ -167,3 +168,4 @@ sys_call_table:
.word sys_syncfs, sys_sendmmsg, sys_setns, sys_process_vm_readv, sys_process_vm_writev
/*340*/ .word sys_kern_features, sys_kcmp, sys_finit_module, sys_sched_setattr, sys_sched_getattr
.word sys_renameat2, sys_seccomp, sys_getrandom, sys_memfd_create, sys_bpf
/*350*/ .word sys64_execveat

View file

@ -1621,7 +1621,7 @@ static void __init kernel_physical_mapping_init(void)
}
#ifdef CONFIG_DEBUG_PAGEALLOC
void kernel_map_pages(struct page *page, int numpages, int enable)
void __kernel_map_pages(struct page *page, int numpages, int enable)
{
unsigned long phys_start = page_to_pfn(page) << PAGE_SHIFT;
unsigned long phys_end = phys_start + (numpages * PAGE_SIZE);

View file

@ -24,6 +24,7 @@ config X86
select ARCH_MIGHT_HAVE_ACPI_PDC if ACPI
select ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS
select ARCH_HAS_FAST_MULTIPLIER
select ARCH_HAS_GCOV_PROFILE_ALL
select ARCH_MIGHT_HAVE_PC_PARPORT
select ARCH_MIGHT_HAVE_PC_SERIO
select HAVE_AOUT if X86_32

View file

@ -35,6 +35,7 @@ int ia32_classify_syscall(unsigned syscall)
case __NR_socketcall:
return 4;
case __NR_execve:
case __NR_execveat:
return 5;
default:
return 1;

View file

@ -480,6 +480,7 @@ GLOBAL(\label)
PTREGSCALL stub32_rt_sigreturn, sys32_rt_sigreturn
PTREGSCALL stub32_sigreturn, sys32_sigreturn
PTREGSCALL stub32_execve, compat_sys_execve
PTREGSCALL stub32_execveat, compat_sys_execveat
PTREGSCALL stub32_fork, sys_fork
PTREGSCALL stub32_vfork, sys_vfork

View file

@ -50,6 +50,7 @@ int audit_classify_syscall(int abi, unsigned syscall)
case __NR_openat:
return 3;
case __NR_execve:
case __NR_execveat:
return 5;
default:
return 0;

View file

@ -652,6 +652,20 @@ ENTRY(stub_execve)
CFI_ENDPROC
END(stub_execve)
ENTRY(stub_execveat)
CFI_STARTPROC
addq $8, %rsp
PARTIAL_FRAME 0
SAVE_REST
FIXUP_TOP_OF_STACK %r11
call sys_execveat
RESTORE_TOP_OF_STACK %r11
movq %rax,RAX(%rsp)
RESTORE_REST
jmp int_ret_from_sys_call
CFI_ENDPROC
END(stub_execveat)
/*
* sigreturn is special because it needs to restore all registers on return.
* This cannot be done with SYSRET, so use the IRET return path instead.
@ -697,6 +711,20 @@ ENTRY(stub_x32_execve)
CFI_ENDPROC
END(stub_x32_execve)
ENTRY(stub_x32_execveat)
CFI_STARTPROC
addq $8, %rsp
PARTIAL_FRAME 0
SAVE_REST
FIXUP_TOP_OF_STACK %r11
call compat_sys_execveat
RESTORE_TOP_OF_STACK %r11
movq %rax,RAX(%rsp)
RESTORE_REST
jmp int_ret_from_sys_call
CFI_ENDPROC
END(stub_x32_execveat)
#endif
/*

View file

@ -1817,7 +1817,7 @@ static int __set_pages_np(struct page *page, int numpages)
return __change_page_attr_set_clr(&cpa, 0);
}
void kernel_map_pages(struct page *page, int numpages, int enable)
void __kernel_map_pages(struct page *page, int numpages, int enable)
{
if (PageHighMem(page))
return;

View file

@ -364,3 +364,4 @@
355 i386 getrandom sys_getrandom
356 i386 memfd_create sys_memfd_create
357 i386 bpf sys_bpf
358 i386 execveat sys_execveat stub32_execveat

View file

@ -328,6 +328,7 @@
319 common memfd_create sys_memfd_create
320 common kexec_file_load sys_kexec_file_load
321 common bpf sys_bpf
322 64 execveat stub_execveat
#
# x32-specific system call numbers start at 512 to avoid cache impact
@ -366,3 +367,4 @@
542 x32 getsockopt compat_sys_getsockopt
543 x32 io_setup compat_sys_io_setup
544 x32 io_submit compat_sys_io_submit
545 x32 execveat stub_x32_execveat

View file

@ -31,6 +31,7 @@
#define stub_fork sys_fork
#define stub_vfork sys_vfork
#define stub_execve sys_execve
#define stub_execveat sys_execveat
#define stub_rt_sigreturn sys_rt_sigreturn
#define __SYSCALL_COMMON(nr, sym, compat) __SYSCALL_64(nr, sym, compat)

View file

@ -228,8 +228,8 @@ memory_block_action(unsigned long phys_index, unsigned long action, int online_t
struct page *first_page;
int ret;
first_page = pfn_to_page(phys_index << PFN_SECTION_SHIFT);
start_pfn = page_to_pfn(first_page);
start_pfn = phys_index << PFN_SECTION_SHIFT;
first_page = pfn_to_page(start_pfn);
switch (action) {
case MEM_ONLINE:

View file

@ -44,15 +44,14 @@ static const char *default_compressor = "lzo";
static unsigned int num_devices = 1;
#define ZRAM_ATTR_RO(name) \
static ssize_t zram_attr_##name##_show(struct device *d, \
static ssize_t name##_show(struct device *d, \
struct device_attribute *attr, char *b) \
{ \
struct zram *zram = dev_to_zram(d); \
return scnprintf(b, PAGE_SIZE, "%llu\n", \
(u64)atomic64_read(&zram->stats.name)); \
} \
static struct device_attribute dev_attr_##name = \
__ATTR(name, S_IRUGO, zram_attr_##name##_show, NULL);
static DEVICE_ATTR_RO(name);
static inline int init_done(struct zram *zram)
{
@ -287,19 +286,18 @@ static inline int is_partial_io(struct bio_vec *bvec)
/*
* Check if request is within bounds and aligned on zram logical blocks.
*/
static inline int valid_io_request(struct zram *zram, struct bio *bio)
static inline int valid_io_request(struct zram *zram,
sector_t start, unsigned int size)
{
u64 start, end, bound;
u64 end, bound;
/* unaligned request */
if (unlikely(bio->bi_iter.bi_sector &
(ZRAM_SECTOR_PER_LOGICAL_BLOCK - 1)))
if (unlikely(start & (ZRAM_SECTOR_PER_LOGICAL_BLOCK - 1)))
return 0;
if (unlikely(bio->bi_iter.bi_size & (ZRAM_LOGICAL_BLOCK_SIZE - 1)))
if (unlikely(size & (ZRAM_LOGICAL_BLOCK_SIZE - 1)))
return 0;
start = bio->bi_iter.bi_sector;
end = start + (bio->bi_iter.bi_size >> SECTOR_SHIFT);
end = start + (size >> SECTOR_SHIFT);
bound = zram->disksize >> SECTOR_SHIFT;
/* out of range range */
if (unlikely(start >= bound || end > bound || start > end))
@ -453,7 +451,7 @@ static int zram_decompress_page(struct zram *zram, char *mem, u32 index)
}
static int zram_bvec_read(struct zram *zram, struct bio_vec *bvec,
u32 index, int offset, struct bio *bio)
u32 index, int offset)
{
int ret;
struct page *page;
@ -645,14 +643,13 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
}
static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index,
int offset, struct bio *bio)
int offset, int rw)
{
int ret;
int rw = bio_data_dir(bio);
if (rw == READ) {
atomic64_inc(&zram->stats.num_reads);
ret = zram_bvec_read(zram, bvec, index, offset, bio);
ret = zram_bvec_read(zram, bvec, index, offset);
} else {
atomic64_inc(&zram->stats.num_writes);
ret = zram_bvec_write(zram, bvec, index, offset);
@ -853,7 +850,7 @@ static ssize_t reset_store(struct device *dev,
static void __zram_make_request(struct zram *zram, struct bio *bio)
{
int offset;
int offset, rw;
u32 index;
struct bio_vec bvec;
struct bvec_iter iter;
@ -868,6 +865,7 @@ static void __zram_make_request(struct zram *zram, struct bio *bio)
return;
}
rw = bio_data_dir(bio);
bio_for_each_segment(bvec, bio, iter) {
int max_transfer_size = PAGE_SIZE - offset;
@ -882,15 +880,15 @@ static void __zram_make_request(struct zram *zram, struct bio *bio)
bv.bv_len = max_transfer_size;
bv.bv_offset = bvec.bv_offset;
if (zram_bvec_rw(zram, &bv, index, offset, bio) < 0)
if (zram_bvec_rw(zram, &bv, index, offset, rw) < 0)
goto out;
bv.bv_len = bvec.bv_len - max_transfer_size;
bv.bv_offset += max_transfer_size;
if (zram_bvec_rw(zram, &bv, index + 1, 0, bio) < 0)
if (zram_bvec_rw(zram, &bv, index + 1, 0, rw) < 0)
goto out;
} else
if (zram_bvec_rw(zram, &bvec, index, offset, bio) < 0)
if (zram_bvec_rw(zram, &bvec, index, offset, rw) < 0)
goto out;
update_position(&index, &offset, &bvec);
@ -915,7 +913,8 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio)
if (unlikely(!init_done(zram)))
goto error;
if (!valid_io_request(zram, bio)) {
if (!valid_io_request(zram, bio->bi_iter.bi_sector,
bio->bi_iter.bi_size)) {
atomic64_inc(&zram->stats.invalid_io);
goto error;
}
@ -945,25 +944,64 @@ static void zram_slot_free_notify(struct block_device *bdev,
atomic64_inc(&zram->stats.notify_free);
}
static int zram_rw_page(struct block_device *bdev, sector_t sector,
struct page *page, int rw)
{
int offset, err;
u32 index;
struct zram *zram;
struct bio_vec bv;
zram = bdev->bd_disk->private_data;
if (!valid_io_request(zram, sector, PAGE_SIZE)) {
atomic64_inc(&zram->stats.invalid_io);
return -EINVAL;
}
down_read(&zram->init_lock);
if (unlikely(!init_done(zram))) {
err = -EIO;
goto out_unlock;
}
index = sector >> SECTORS_PER_PAGE_SHIFT;
offset = sector & (SECTORS_PER_PAGE - 1) << SECTOR_SHIFT;
bv.bv_page = page;
bv.bv_len = PAGE_SIZE;
bv.bv_offset = 0;
err = zram_bvec_rw(zram, &bv, index, offset, rw);
out_unlock:
up_read(&zram->init_lock);
/*
* If I/O fails, just return error(ie, non-zero) without
* calling page_endio.
* It causes resubmit the I/O with bio request by upper functions
* of rw_page(e.g., swap_readpage, __swap_writepage) and
* bio->bi_end_io does things to handle the error
* (e.g., SetPageError, set_page_dirty and extra works).
*/
if (err == 0)
page_endio(page, rw, 0);
return err;
}
static const struct block_device_operations zram_devops = {
.swap_slot_free_notify = zram_slot_free_notify,
.rw_page = zram_rw_page,
.owner = THIS_MODULE
};
static DEVICE_ATTR(disksize, S_IRUGO | S_IWUSR,
disksize_show, disksize_store);
static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL);
static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store);
static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL);
static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show,
mem_limit_store);
static DEVICE_ATTR(mem_used_max, S_IRUGO | S_IWUSR, mem_used_max_show,
mem_used_max_store);
static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR,
max_comp_streams_show, max_comp_streams_store);
static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR,
comp_algorithm_show, comp_algorithm_store);
static DEVICE_ATTR_RW(disksize);
static DEVICE_ATTR_RO(initstate);
static DEVICE_ATTR_WO(reset);
static DEVICE_ATTR_RO(orig_data_size);
static DEVICE_ATTR_RO(mem_used_total);
static DEVICE_ATTR_RW(mem_limit);
static DEVICE_ATTR_RW(mem_used_max);
static DEVICE_ATTR_RW(max_comp_streams);
static DEVICE_ATTR_RW(comp_algorithm);
ZRAM_ATTR_RO(num_reads);
ZRAM_ATTR_RO(num_writes);

View file

@ -66,8 +66,8 @@ static const size_t max_zpage_size = PAGE_SIZE / 4 * 3;
/* Flags for zram pages (table[page_no].value) */
enum zram_pageflags {
/* Page consists entirely of zeros */
ZRAM_ZERO = ZRAM_FLAG_SHIFT + 1,
ZRAM_ACCESS, /* page in now accessed */
ZRAM_ZERO = ZRAM_FLAG_SHIFT,
ZRAM_ACCESS, /* page is now accessed */
__NR_ZRAM_PAGEFLAGS,
};

View file

@ -509,45 +509,67 @@ static void finish_pri_tag(struct device_state *dev_state,
spin_unlock_irqrestore(&pasid_state->lock, flags);
}
static void handle_fault_error(struct fault *fault)
{
int status;
if (!fault->dev_state->inv_ppr_cb) {
set_pri_tag_status(fault->state, fault->tag, PPR_INVALID);
return;
}
status = fault->dev_state->inv_ppr_cb(fault->dev_state->pdev,
fault->pasid,
fault->address,
fault->flags);
switch (status) {
case AMD_IOMMU_INV_PRI_RSP_SUCCESS:
set_pri_tag_status(fault->state, fault->tag, PPR_SUCCESS);
break;
case AMD_IOMMU_INV_PRI_RSP_INVALID:
set_pri_tag_status(fault->state, fault->tag, PPR_INVALID);
break;
case AMD_IOMMU_INV_PRI_RSP_FAIL:
set_pri_tag_status(fault->state, fault->tag, PPR_FAILURE);
break;
default:
BUG();
}
}
static void do_fault(struct work_struct *work)
{
struct fault *fault = container_of(work, struct fault, work);
int npages, write;
struct page *page;
struct mm_struct *mm;
struct vm_area_struct *vma;
u64 address;
int ret, write;
write = !!(fault->flags & PPR_FAULT_WRITE);
down_read(&fault->state->mm->mmap_sem);
npages = get_user_pages(NULL, fault->state->mm,
fault->address, 1, write, 0, &page, NULL);
up_read(&fault->state->mm->mmap_sem);
mm = fault->state->mm;
address = fault->address;
if (npages == 1) {
put_page(page);
} else if (fault->dev_state->inv_ppr_cb) {
int status;
status = fault->dev_state->inv_ppr_cb(fault->dev_state->pdev,
fault->pasid,
fault->address,
fault->flags);
switch (status) {
case AMD_IOMMU_INV_PRI_RSP_SUCCESS:
set_pri_tag_status(fault->state, fault->tag, PPR_SUCCESS);
break;
case AMD_IOMMU_INV_PRI_RSP_INVALID:
set_pri_tag_status(fault->state, fault->tag, PPR_INVALID);
break;
case AMD_IOMMU_INV_PRI_RSP_FAIL:
set_pri_tag_status(fault->state, fault->tag, PPR_FAILURE);
break;
default:
BUG();
}
} else {
set_pri_tag_status(fault->state, fault->tag, PPR_INVALID);
down_read(&mm->mmap_sem);
vma = find_extend_vma(mm, address);
if (!vma || address < vma->vm_start) {
/* failed to get a vma in the right range */
up_read(&mm->mmap_sem);
handle_fault_error(fault);
goto out;
}
ret = handle_mm_fault(mm, vma, address, write);
if (ret & VM_FAULT_ERROR) {
/* failed to service fault */
up_read(&mm->mmap_sem);
handle_fault_error(fault);
goto out;
}
up_read(&mm->mmap_sem);
out:
finish_pri_tag(fault->dev_state, fault->state, fault->tag);
put_pasid_state(fault->state);

View file

@ -344,13 +344,20 @@ static int snvs_rtc_resume(struct device *dev)
return 0;
}
#endif
static const struct dev_pm_ops snvs_rtc_pm_ops = {
.suspend_noirq = snvs_rtc_suspend,
.resume_noirq = snvs_rtc_resume,
};
#define SNVS_RTC_PM_OPS (&snvs_rtc_pm_ops)
#else
#define SNVS_RTC_PM_OPS NULL
#endif
static const struct of_device_id snvs_dt_ids[] = {
{ .compatible = "fsl,sec-v4.0-mon-rtc-lp", },
{ /* sentinel */ }
@ -361,7 +368,7 @@ static struct platform_driver snvs_rtc_driver = {
.driver = {
.name = "snvs_rtc",
.owner = THIS_MODULE,
.pm = &snvs_rtc_pm_ops,
.pm = SNVS_RTC_PM_OPS,
.of_match_table = snvs_dt_ids,
},
.probe = snvs_rtc_probe,

View file

@ -418,7 +418,7 @@ static int ashmem_mmap(struct file *file, struct vm_area_struct *vma)
}
/*
* ashmem_shrink - our cache shrinker, called from mm/vmscan.c :: shrink_slab
* ashmem_shrink - our cache shrinker, called from mm/vmscan.c
*
* 'nr_to_scan' is the number of objects to scan for freeing.
*
@ -785,7 +785,6 @@ static long ashmem_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
.nr_to_scan = LONG_MAX,
};
ret = ashmem_shrink_count(&ashmem_shrinker, &sc);
nodes_setall(sc.nodes_to_scan);
ashmem_shrink_scan(&ashmem_shrinker, &sc);
}
break;

View file

@ -135,8 +135,10 @@ extern void affs_fix_checksum(struct super_block *sb, struct buffer_head *bh);
extern void secs_to_datestamp(time_t secs, struct affs_date *ds);
extern umode_t prot_to_mode(u32 prot);
extern void mode_to_prot(struct inode *inode);
__printf(3, 4)
extern void affs_error(struct super_block *sb, const char *function,
const char *fmt, ...);
__printf(3, 4)
extern void affs_warning(struct super_block *sb, const char *function,
const char *fmt, ...);
extern bool affs_nofilenametruncate(const struct dentry *dentry);

View file

@ -10,8 +10,6 @@
#include "affs.h"
static char ErrorBuffer[256];
/*
* Functions for accessing Amiga-FFS structures.
*/
@ -444,30 +442,30 @@ mode_to_prot(struct inode *inode)
void
affs_error(struct super_block *sb, const char *function, const char *fmt, ...)
{
va_list args;
struct va_format vaf;
va_list args;
va_start(args,fmt);
vsnprintf(ErrorBuffer,sizeof(ErrorBuffer),fmt,args);
va_end(args);
pr_crit("error (device %s): %s(): %s\n", sb->s_id,
function,ErrorBuffer);
va_start(args, fmt);
vaf.fmt = fmt;
vaf.va = &args;
pr_crit("error (device %s): %s(): %pV\n", sb->s_id, function, &vaf);
if (!(sb->s_flags & MS_RDONLY))
pr_warn("Remounting filesystem read-only\n");
sb->s_flags |= MS_RDONLY;
va_end(args);
}
void
affs_warning(struct super_block *sb, const char *function, const char *fmt, ...)
{
va_list args;
struct va_format vaf;
va_list args;
va_start(args,fmt);
vsnprintf(ErrorBuffer,sizeof(ErrorBuffer),fmt,args);
va_start(args, fmt);
vaf.fmt = fmt;
vaf.va = &args;
pr_warn("(device %s): %s(): %pV\n", sb->s_id, function, &vaf);
va_end(args);
pr_warn("(device %s): %s(): %s\n", sb->s_id,
function,ErrorBuffer);
}
bool

View file

@ -12,35 +12,10 @@
* affs regular file handling primitives
*/
#include <linux/aio.h>
#include "affs.h"
#if PAGE_SIZE < 4096
#error PAGE_SIZE must be at least 4096
#endif
static int affs_grow_extcache(struct inode *inode, u32 lc_idx);
static struct buffer_head *affs_alloc_extblock(struct inode *inode, struct buffer_head *bh, u32 ext);
static inline struct buffer_head *affs_get_extblock(struct inode *inode, u32 ext);
static struct buffer_head *affs_get_extblock_slow(struct inode *inode, u32 ext);
static int affs_file_open(struct inode *inode, struct file *filp);
static int affs_file_release(struct inode *inode, struct file *filp);
const struct file_operations affs_file_operations = {
.llseek = generic_file_llseek,
.read = new_sync_read,
.read_iter = generic_file_read_iter,
.write = new_sync_write,
.write_iter = generic_file_write_iter,
.mmap = generic_file_mmap,
.open = affs_file_open,
.release = affs_file_release,
.fsync = affs_file_fsync,
.splice_read = generic_file_splice_read,
};
const struct inode_operations affs_file_inode_operations = {
.setattr = affs_notify_change,
};
static int
affs_file_open(struct inode *inode, struct file *filp)
@ -355,7 +330,8 @@ affs_get_block(struct inode *inode, sector_t block, struct buffer_head *bh_resul
/* store new block */
if (bh_result->b_blocknr)
affs_warning(sb, "get_block", "block already set (%x)", bh_result->b_blocknr);
affs_warning(sb, "get_block", "block already set (%lx)",
(unsigned long)bh_result->b_blocknr);
AFFS_BLOCK(sb, ext_bh, block) = cpu_to_be32(blocknr);
AFFS_HEAD(ext_bh)->block_count = cpu_to_be32(block + 1);
affs_adjust_checksum(ext_bh, blocknr - bh_result->b_blocknr + 1);
@ -377,7 +353,8 @@ affs_get_block(struct inode *inode, sector_t block, struct buffer_head *bh_resul
return 0;
err_big:
affs_error(inode->i_sb,"get_block","strange block request %d", block);
affs_error(inode->i_sb, "get_block", "strange block request %d",
(int)block);
return -EIO;
err_ext:
// unlock cache
@ -412,6 +389,22 @@ static void affs_write_failed(struct address_space *mapping, loff_t to)
}
}
static ssize_t
affs_direct_IO(int rw, struct kiocb *iocb, struct iov_iter *iter,
loff_t offset)
{
struct file *file = iocb->ki_filp;
struct address_space *mapping = file->f_mapping;
struct inode *inode = mapping->host;
size_t count = iov_iter_count(iter);
ssize_t ret;
ret = blockdev_direct_IO(rw, iocb, inode, iter, offset, affs_get_block);
if (ret < 0 && (rw & WRITE))
affs_write_failed(mapping, offset + count);
return ret;
}
static int affs_write_begin(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len, unsigned flags,
struct page **pagep, void **fsdata)
@ -438,6 +431,7 @@ const struct address_space_operations affs_aops = {
.writepage = affs_writepage,
.write_begin = affs_write_begin,
.write_end = generic_write_end,
.direct_IO = affs_direct_IO,
.bmap = _affs_bmap
};
@ -867,8 +861,9 @@ affs_truncate(struct inode *inode)
// lock cache
ext_bh = affs_get_extblock(inode, ext);
if (IS_ERR(ext_bh)) {
affs_warning(sb, "truncate", "unexpected read error for ext block %u (%d)",
ext, PTR_ERR(ext_bh));
affs_warning(sb, "truncate",
"unexpected read error for ext block %u (%ld)",
(unsigned int)ext, PTR_ERR(ext_bh));
return;
}
if (AFFS_I(inode)->i_lc) {
@ -914,8 +909,9 @@ affs_truncate(struct inode *inode)
struct buffer_head *bh = affs_bread_ino(inode, last_blk, 0);
u32 tmp;
if (IS_ERR(bh)) {
affs_warning(sb, "truncate", "unexpected read error for last block %u (%d)",
ext, PTR_ERR(bh));
affs_warning(sb, "truncate",
"unexpected read error for last block %u (%ld)",
(unsigned int)ext, PTR_ERR(bh));
return;
}
tmp = be32_to_cpu(AFFS_DATA_HEAD(bh)->next);
@ -961,3 +957,19 @@ int affs_file_fsync(struct file *filp, loff_t start, loff_t end, int datasync)
mutex_unlock(&inode->i_mutex);
return ret;
}
const struct file_operations affs_file_operations = {
.llseek = generic_file_llseek,
.read = new_sync_read,
.read_iter = generic_file_read_iter,
.write = new_sync_write,
.write_iter = generic_file_write_iter,
.mmap = generic_file_mmap,
.open = affs_file_open,
.release = affs_file_release,
.fsync = affs_file_fsync,
.splice_read = generic_file_splice_read,
};
const struct inode_operations affs_file_inode_operations = {
.setattr = affs_notify_change,
};

View file

@ -269,10 +269,6 @@ befs_readdir(struct file *file, struct dir_context *ctx)
}
ctx->pos++;
goto more;
befs_debug(sb, "<--- %s pos %lld", __func__, ctx->pos);
return 0;
}
static struct inode *

View file

@ -42,6 +42,10 @@ static int load_em86(struct linux_binprm *bprm)
return -ENOEXEC;
}
/* Need to be able to load the file after exec */
if (bprm->interp_flags & BINPRM_FLAGS_PATH_INACCESSIBLE)
return -ENOENT;
allow_write_access(bprm->file);
fput(bprm->file);
bprm->file = NULL;

View file

@ -144,6 +144,10 @@ static int load_misc_binary(struct linux_binprm *bprm)
if (!fmt)
goto ret;
/* Need to be able to load the file after exec */
if (bprm->interp_flags & BINPRM_FLAGS_PATH_INACCESSIBLE)
return -ENOENT;
if (!(fmt->flags & MISC_FMT_PRESERVE_ARGV0)) {
retval = remove_arg_zero(bprm);
if (retval)

View file

@ -24,6 +24,16 @@ static int load_script(struct linux_binprm *bprm)
if ((bprm->buf[0] != '#') || (bprm->buf[1] != '!'))
return -ENOEXEC;
/*
* If the script filename will be inaccessible after exec, typically
* because it is a "/dev/fd/<fd>/.." path against an O_CLOEXEC fd, give
* up now (on the assumption that the interpreter will want to load
* this file).
*/
if (bprm->interp_flags & BINPRM_FLAGS_PATH_INACCESSIBLE)
return -ENOENT;
/*
* This section does the #! interpretation.
* Sorta complicated, but hopefully it will work. -TYT

View file

@ -40,13 +40,14 @@ static void drop_pagecache_sb(struct super_block *sb, void *unused)
static void drop_slab(void)
{
int nr_objects;
struct shrink_control shrink = {
.gfp_mask = GFP_KERNEL,
};
nodes_setall(shrink.nodes_to_scan);
do {
nr_objects = shrink_slab(&shrink, 1000, 1000);
int nid;
nr_objects = 0;
for_each_online_node(nid)
nr_objects += shrink_node_slabs(GFP_KERNEL, nid,
1000, 1000);
} while (nr_objects > 10);
}

113
fs/exec.c
View file

@ -748,18 +748,25 @@ EXPORT_SYMBOL(setup_arg_pages);
#endif /* CONFIG_MMU */
static struct file *do_open_exec(struct filename *name)
static struct file *do_open_execat(int fd, struct filename *name, int flags)
{
struct file *file;
int err;
static const struct open_flags open_exec_flags = {
struct open_flags open_exec_flags = {
.open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC,
.acc_mode = MAY_EXEC | MAY_OPEN,
.intent = LOOKUP_OPEN,
.lookup_flags = LOOKUP_FOLLOW,
};
file = do_filp_open(AT_FDCWD, name, &open_exec_flags);
if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
return ERR_PTR(-EINVAL);
if (flags & AT_SYMLINK_NOFOLLOW)
open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW;
if (flags & AT_EMPTY_PATH)
open_exec_flags.lookup_flags |= LOOKUP_EMPTY;
file = do_filp_open(fd, name, &open_exec_flags);
if (IS_ERR(file))
goto out;
@ -770,12 +777,13 @@ static struct file *do_open_exec(struct filename *name)
if (file->f_path.mnt->mnt_flags & MNT_NOEXEC)
goto exit;
fsnotify_open(file);
err = deny_write_access(file);
if (err)
goto exit;
if (name->name[0] != '\0')
fsnotify_open(file);
out:
return file;
@ -787,7 +795,7 @@ static struct file *do_open_exec(struct filename *name)
struct file *open_exec(const char *name)
{
struct filename tmp = { .name = name };
return do_open_exec(&tmp);
return do_open_execat(AT_FDCWD, &tmp, 0);
}
EXPORT_SYMBOL(open_exec);
@ -1428,10 +1436,12 @@ static int exec_binprm(struct linux_binprm *bprm)
/*
* sys_execve() executes a new program.
*/
static int do_execve_common(struct filename *filename,
struct user_arg_ptr argv,
struct user_arg_ptr envp)
static int do_execveat_common(int fd, struct filename *filename,
struct user_arg_ptr argv,
struct user_arg_ptr envp,
int flags)
{
char *pathbuf = NULL;
struct linux_binprm *bprm;
struct file *file;
struct files_struct *displaced;
@ -1472,7 +1482,7 @@ static int do_execve_common(struct filename *filename,
check_unsafe_exec(bprm);
current->in_execve = 1;
file = do_open_exec(filename);
file = do_open_execat(fd, filename, flags);
retval = PTR_ERR(file);
if (IS_ERR(file))
goto out_unmark;
@ -1480,7 +1490,28 @@ static int do_execve_common(struct filename *filename,
sched_exec();
bprm->file = file;
bprm->filename = bprm->interp = filename->name;
if (fd == AT_FDCWD || filename->name[0] == '/') {
bprm->filename = filename->name;
} else {
if (filename->name[0] == '\0')
pathbuf = kasprintf(GFP_TEMPORARY, "/dev/fd/%d", fd);
else
pathbuf = kasprintf(GFP_TEMPORARY, "/dev/fd/%d/%s",
fd, filename->name);
if (!pathbuf) {
retval = -ENOMEM;
goto out_unmark;
}
/*
* Record that a name derived from an O_CLOEXEC fd will be
* inaccessible after exec. Relies on having exclusive access to
* current->files (due to unshare_files above).
*/
if (close_on_exec(fd, rcu_dereference_raw(current->files->fdt)))
bprm->interp_flags |= BINPRM_FLAGS_PATH_INACCESSIBLE;
bprm->filename = pathbuf;
}
bprm->interp = bprm->filename;
retval = bprm_mm_init(bprm);
if (retval)
@ -1521,6 +1552,7 @@ static int do_execve_common(struct filename *filename,
acct_update_integrals(current);
task_numa_free(current);
free_bprm(bprm);
kfree(pathbuf);
putname(filename);
if (displaced)
put_files_struct(displaced);
@ -1538,6 +1570,7 @@ static int do_execve_common(struct filename *filename,
out_free:
free_bprm(bprm);
kfree(pathbuf);
out_files:
if (displaced)
@ -1553,7 +1586,18 @@ int do_execve(struct filename *filename,
{
struct user_arg_ptr argv = { .ptr.native = __argv };
struct user_arg_ptr envp = { .ptr.native = __envp };
return do_execve_common(filename, argv, envp);
return do_execveat_common(AT_FDCWD, filename, argv, envp, 0);
}
int do_execveat(int fd, struct filename *filename,
const char __user *const __user *__argv,
const char __user *const __user *__envp,
int flags)
{
struct user_arg_ptr argv = { .ptr.native = __argv };
struct user_arg_ptr envp = { .ptr.native = __envp };
return do_execveat_common(fd, filename, argv, envp, flags);
}
#ifdef CONFIG_COMPAT
@ -1569,7 +1613,23 @@ static int compat_do_execve(struct filename *filename,
.is_compat = true,
.ptr.compat = __envp,
};
return do_execve_common(filename, argv, envp);
return do_execveat_common(AT_FDCWD, filename, argv, envp, 0);
}
static int compat_do_execveat(int fd, struct filename *filename,
const compat_uptr_t __user *__argv,
const compat_uptr_t __user *__envp,
int flags)
{
struct user_arg_ptr argv = {
.is_compat = true,
.ptr.compat = __argv,
};
struct user_arg_ptr envp = {
.is_compat = true,
.ptr.compat = __envp,
};
return do_execveat_common(fd, filename, argv, envp, flags);
}
#endif
@ -1609,6 +1669,20 @@ SYSCALL_DEFINE3(execve,
{
return do_execve(getname(filename), argv, envp);
}
SYSCALL_DEFINE5(execveat,
int, fd, const char __user *, filename,
const char __user *const __user *, argv,
const char __user *const __user *, envp,
int, flags)
{
int lookup_flags = (flags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0;
return do_execveat(fd,
getname_flags(filename, lookup_flags, NULL),
argv, envp, flags);
}
#ifdef CONFIG_COMPAT
COMPAT_SYSCALL_DEFINE3(execve, const char __user *, filename,
const compat_uptr_t __user *, argv,
@ -1616,4 +1690,17 @@ COMPAT_SYSCALL_DEFINE3(execve, const char __user *, filename,
{
return compat_do_execve(getname(filename), argv, envp);
}
COMPAT_SYSCALL_DEFINE5(execveat, int, fd,
const char __user *, filename,
const compat_uptr_t __user *, argv,
const compat_uptr_t __user *, envp,
int, flags)
{
int lookup_flags = (flags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0;
return compat_do_execveat(fd,
getname_flags(filename, lookup_flags, NULL),
argv, envp, flags);
}
#endif

View file

@ -370,6 +370,7 @@ extern int fat_file_fsync(struct file *file, loff_t start, loff_t end,
int datasync);
/* fat/inode.c */
extern int fat_block_truncate_page(struct inode *inode, loff_t from);
extern void fat_attach(struct inode *inode, loff_t i_pos);
extern void fat_detach(struct inode *inode);
extern struct inode *fat_iget(struct super_block *sb, loff_t i_pos);

View file

@ -443,6 +443,9 @@ int fat_setattr(struct dentry *dentry, struct iattr *attr)
}
if (attr->ia_valid & ATTR_SIZE) {
error = fat_block_truncate_page(inode, attr->ia_size);
if (error)
goto out;
down_write(&MSDOS_I(inode)->truncate_lock);
truncate_setsize(inode, attr->ia_size);
fat_truncate_blocks(inode, attr->ia_size);

View file

@ -294,6 +294,18 @@ static sector_t _fat_bmap(struct address_space *mapping, sector_t block)
return blocknr;
}
/*
* fat_block_truncate_page() zeroes out a mapping from file offset `from'
* up to the end of the block which corresponds to `from'.
* This is required during truncate to physically zeroout the tail end
* of that block so it doesn't yield old data if the file is later grown.
* Also, avoid causing failure from fsx for cases of "data past EOF"
*/
int fat_block_truncate_page(struct inode *inode, loff_t from)
{
return block_truncate_page(inode->i_mapping, from, fat_get_block);
}
static const struct address_space_operations fat_aops = {
.readpage = fat_readpage,
.readpages = fat_readpages,

View file

@ -412,10 +412,10 @@ static int hugetlb_vmtruncate(struct inode *inode, loff_t offset)
pgoff = offset >> PAGE_SHIFT;
i_size_write(inode, offset);
mutex_lock(&mapping->i_mmap_mutex);
i_mmap_lock_write(mapping);
if (!RB_EMPTY_ROOT(&mapping->i_mmap))
hugetlb_vmtruncate_list(&mapping->i_mmap, pgoff);
mutex_unlock(&mapping->i_mmap_mutex);
i_mmap_unlock_write(mapping);
truncate_hugepages(inode, offset);
return 0;
}
@ -472,12 +472,12 @@ static struct inode *hugetlbfs_get_root(struct super_block *sb,
}
/*
* Hugetlbfs is not reclaimable; therefore its i_mmap_mutex will never
* Hugetlbfs is not reclaimable; therefore its i_mmap_rwsem will never
* be taken from reclaim -- unlike regular filesystems. This needs an
* annotation because huge_pmd_share() does an allocation under
* i_mmap_mutex.
* i_mmap_rwsem.
*/
static struct lock_class_key hugetlbfs_i_mmap_mutex_key;
static struct lock_class_key hugetlbfs_i_mmap_rwsem_key;
static struct inode *hugetlbfs_get_inode(struct super_block *sb,
struct inode *dir,
@ -495,8 +495,8 @@ static struct inode *hugetlbfs_get_inode(struct super_block *sb,
struct hugetlbfs_inode_info *info;
inode->i_ino = get_next_ino();
inode_init_owner(inode, dir, mode);
lockdep_set_class(&inode->i_mapping->i_mmap_mutex,
&hugetlbfs_i_mmap_mutex_key);
lockdep_set_class(&inode->i_mapping->i_mmap_rwsem,
&hugetlbfs_i_mmap_rwsem_key);
inode->i_mapping->a_ops = &hugetlbfs_aops;
inode->i_mapping->backing_dev_info =&hugetlbfs_backing_dev_info;
inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;

View file

@ -346,7 +346,7 @@ void address_space_init_once(struct address_space *mapping)
memset(mapping, 0, sizeof(*mapping));
INIT_RADIX_TREE(&mapping->page_tree, GFP_ATOMIC);
spin_lock_init(&mapping->tree_lock);
mutex_init(&mapping->i_mmap_mutex);
init_rwsem(&mapping->i_mmap_rwsem);
INIT_LIST_HEAD(&mapping->private_list);
spin_lock_init(&mapping->private_lock);
mapping->i_mmap = RB_ROOT;

View file

@ -130,7 +130,7 @@ void final_putname(struct filename *name)
#define EMBEDDED_NAME_MAX (PATH_MAX - sizeof(struct filename))
static struct filename *
struct filename *
getname_flags(const char __user *filename, int flags, int *empty)
{
struct filename *result, *err;

View file

@ -69,8 +69,8 @@ static void dnotify_recalc_inode_mask(struct fsnotify_mark *fsn_mark)
if (old_mask == new_mask)
return;
if (fsn_mark->i.inode)
fsnotify_recalc_inode_mask(fsn_mark->i.inode);
if (fsn_mark->inode)
fsnotify_recalc_inode_mask(fsn_mark->inode);
}
/*

View file

@ -80,7 +80,7 @@ static void inotify_fdinfo(struct seq_file *m, struct fsnotify_mark *mark)
return;
inode_mark = container_of(mark, struct inotify_inode_mark, fsn_mark);
inode = igrab(mark->i.inode);
inode = igrab(mark->inode);
if (inode) {
seq_printf(m, "inotify wd:%x ino:%lx sdev:%x mask:%x ignored_mask:%x ",
inode_mark->wd, inode->i_ino, inode->i_sb->s_dev,
@ -112,7 +112,7 @@ static void fanotify_fdinfo(struct seq_file *m, struct fsnotify_mark *mark)
mflags |= FAN_MARK_IGNORED_SURV_MODIFY;
if (mark->flags & FSNOTIFY_MARK_FLAG_INODE) {
inode = igrab(mark->i.inode);
inode = igrab(mark->inode);
if (!inode)
return;
seq_printf(m, "fanotify ino:%lx sdev:%x mflags:%x mask:%x ignored_mask:%x ",
@ -122,7 +122,7 @@ static void fanotify_fdinfo(struct seq_file *m, struct fsnotify_mark *mark)
seq_putc(m, '\n');
iput(inode);
} else if (mark->flags & FSNOTIFY_MARK_FLAG_VFSMOUNT) {
struct mount *mnt = real_mount(mark->m.mnt);
struct mount *mnt = real_mount(mark->mnt);
seq_printf(m, "fanotify mnt_id:%x mflags:%x mask:%x ignored_mask:%x\n",
mnt->mnt_id, mflags, mark->mask, mark->ignored_mask);

View file

@ -242,13 +242,13 @@ int fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is,
if (inode_node) {
inode_mark = hlist_entry(srcu_dereference(inode_node, &fsnotify_mark_srcu),
struct fsnotify_mark, i.i_list);
struct fsnotify_mark, obj_list);
inode_group = inode_mark->group;
}
if (vfsmount_node) {
vfsmount_mark = hlist_entry(srcu_dereference(vfsmount_node, &fsnotify_mark_srcu),
struct fsnotify_mark, m.m_list);
struct fsnotify_mark, obj_list);
vfsmount_group = vfsmount_mark->group;
}

View file

@ -12,12 +12,19 @@ extern void fsnotify_flush_notify(struct fsnotify_group *group);
/* protects reads of inode and vfsmount marks list */
extern struct srcu_struct fsnotify_mark_srcu;
/* Calculate mask of events for a list of marks */
extern u32 fsnotify_recalc_mask(struct hlist_head *head);
/* compare two groups for sorting of marks lists */
extern int fsnotify_compare_groups(struct fsnotify_group *a,
struct fsnotify_group *b);
extern void fsnotify_set_inode_mark_mask_locked(struct fsnotify_mark *fsn_mark,
__u32 mask);
/* Add mark to a proper place in mark list */
extern int fsnotify_add_mark_list(struct hlist_head *head,
struct fsnotify_mark *mark,
int allow_dups);
/* add a mark to an inode */
extern int fsnotify_add_inode_mark(struct fsnotify_mark *mark,
struct fsnotify_group *group, struct inode *inode,
@ -31,6 +38,11 @@ extern int fsnotify_add_vfsmount_mark(struct fsnotify_mark *mark,
extern void fsnotify_destroy_vfsmount_mark(struct fsnotify_mark *mark);
/* inode specific destruction of a mark */
extern void fsnotify_destroy_inode_mark(struct fsnotify_mark *mark);
/* Destroy all marks in the given list */
extern void fsnotify_destroy_marks(struct list_head *to_free);
/* Find mark belonging to given group in the list of marks */
extern struct fsnotify_mark *fsnotify_find_mark(struct hlist_head *head,
struct fsnotify_group *group);
/* run the list of all marks associated with inode and flag them to be freed */
extern void fsnotify_clear_marks_by_inode(struct inode *inode);
/* run the list of all marks associated with vfsmount and flag them to be freed */

View file

@ -30,21 +30,6 @@
#include "../internal.h"
/*
* Recalculate the mask of events relevant to a given inode locked.
*/
static void fsnotify_recalc_inode_mask_locked(struct inode *inode)
{
struct fsnotify_mark *mark;
__u32 new_mask = 0;
assert_spin_locked(&inode->i_lock);
hlist_for_each_entry(mark, &inode->i_fsnotify_marks, i.i_list)
new_mask |= mark->mask;
inode->i_fsnotify_mask = new_mask;
}
/*
* Recalculate the inode->i_fsnotify_mask, or the mask of all FS_* event types
* any notifier is interested in hearing for this inode.
@ -52,7 +37,7 @@ static void fsnotify_recalc_inode_mask_locked(struct inode *inode)
void fsnotify_recalc_inode_mask(struct inode *inode)
{
spin_lock(&inode->i_lock);
fsnotify_recalc_inode_mask_locked(inode);
inode->i_fsnotify_mask = fsnotify_recalc_mask(&inode->i_fsnotify_marks);
spin_unlock(&inode->i_lock);
__fsnotify_update_child_dentry_flags(inode);
@ -60,23 +45,22 @@ void fsnotify_recalc_inode_mask(struct inode *inode)
void fsnotify_destroy_inode_mark(struct fsnotify_mark *mark)
{
struct inode *inode = mark->i.inode;
struct inode *inode = mark->inode;
BUG_ON(!mutex_is_locked(&mark->group->mark_mutex));
assert_spin_locked(&mark->lock);
spin_lock(&inode->i_lock);
hlist_del_init_rcu(&mark->i.i_list);
mark->i.inode = NULL;
hlist_del_init_rcu(&mark->obj_list);
mark->inode = NULL;
/*
* this mark is now off the inode->i_fsnotify_marks list and we
* hold the inode->i_lock, so this is the perfect time to update the
* inode->i_fsnotify_mask
*/
fsnotify_recalc_inode_mask_locked(inode);
inode->i_fsnotify_mask = fsnotify_recalc_mask(&inode->i_fsnotify_marks);
spin_unlock(&inode->i_lock);
}
@ -85,30 +69,19 @@ void fsnotify_destroy_inode_mark(struct fsnotify_mark *mark)
*/
void fsnotify_clear_marks_by_inode(struct inode *inode)
{
struct fsnotify_mark *mark, *lmark;
struct fsnotify_mark *mark;
struct hlist_node *n;
LIST_HEAD(free_list);
spin_lock(&inode->i_lock);
hlist_for_each_entry_safe(mark, n, &inode->i_fsnotify_marks, i.i_list) {
list_add(&mark->i.free_i_list, &free_list);
hlist_del_init_rcu(&mark->i.i_list);
hlist_for_each_entry_safe(mark, n, &inode->i_fsnotify_marks, obj_list) {
list_add(&mark->free_list, &free_list);
hlist_del_init_rcu(&mark->obj_list);
fsnotify_get_mark(mark);
}
spin_unlock(&inode->i_lock);
list_for_each_entry_safe(mark, lmark, &free_list, i.free_i_list) {
struct fsnotify_group *group;
spin_lock(&mark->lock);
fsnotify_get_group(mark->group);
group = mark->group;
spin_unlock(&mark->lock);
fsnotify_destroy_mark(mark, group);
fsnotify_put_mark(mark);
fsnotify_put_group(group);
}
fsnotify_destroy_marks(&free_list);
}
/*
@ -119,27 +92,6 @@ void fsnotify_clear_inode_marks_by_group(struct fsnotify_group *group)
fsnotify_clear_marks_by_group_flags(group, FSNOTIFY_MARK_FLAG_INODE);
}
/*
* given a group and inode, find the mark associated with that combination.
* if found take a reference to that mark and return it, else return NULL
*/
static struct fsnotify_mark *fsnotify_find_inode_mark_locked(
struct fsnotify_group *group,
struct inode *inode)
{
struct fsnotify_mark *mark;
assert_spin_locked(&inode->i_lock);
hlist_for_each_entry(mark, &inode->i_fsnotify_marks, i.i_list) {
if (mark->group == group) {
fsnotify_get_mark(mark);
return mark;
}
}
return NULL;
}
/*
* given a group and inode, find the mark associated with that combination.
* if found take a reference to that mark and return it, else return NULL
@ -150,7 +102,7 @@ struct fsnotify_mark *fsnotify_find_inode_mark(struct fsnotify_group *group,
struct fsnotify_mark *mark;
spin_lock(&inode->i_lock);
mark = fsnotify_find_inode_mark_locked(group, inode);
mark = fsnotify_find_mark(&inode->i_fsnotify_marks, group);
spin_unlock(&inode->i_lock);
return mark;
@ -168,10 +120,10 @@ void fsnotify_set_inode_mark_mask_locked(struct fsnotify_mark *mark,
assert_spin_locked(&mark->lock);
if (mask &&
mark->i.inode &&
mark->inode &&
!(mark->flags & FSNOTIFY_MARK_FLAG_OBJECT_PINNED)) {
mark->flags |= FSNOTIFY_MARK_FLAG_OBJECT_PINNED;
inode = igrab(mark->i.inode);
inode = igrab(mark->inode);
/*
* we shouldn't be able to get here if the inode wasn't
* already safely held in memory. But bug in case it
@ -192,9 +144,7 @@ int fsnotify_add_inode_mark(struct fsnotify_mark *mark,
struct fsnotify_group *group, struct inode *inode,
int allow_dups)
{
struct fsnotify_mark *lmark, *last = NULL;
int ret = 0;
int cmp;
int ret;
mark->flags |= FSNOTIFY_MARK_FLAG_INODE;
@ -202,37 +152,10 @@ int fsnotify_add_inode_mark(struct fsnotify_mark *mark,
assert_spin_locked(&mark->lock);
spin_lock(&inode->i_lock);
mark->i.inode = inode;
/* is mark the first mark? */
if (hlist_empty(&inode->i_fsnotify_marks)) {
hlist_add_head_rcu(&mark->i.i_list, &inode->i_fsnotify_marks);
goto out;
}
/* should mark be in the middle of the current list? */
hlist_for_each_entry(lmark, &inode->i_fsnotify_marks, i.i_list) {
last = lmark;
if ((lmark->group == group) && !allow_dups) {
ret = -EEXIST;
goto out;
}
cmp = fsnotify_compare_groups(lmark->group, mark->group);
if (cmp < 0)
continue;
hlist_add_before_rcu(&mark->i.i_list, &lmark->i.i_list);
goto out;
}
BUG_ON(last == NULL);
/* mark should be the last entry. last is the current last entry */
hlist_add_behind_rcu(&mark->i.i_list, &last->i.i_list);
out:
fsnotify_recalc_inode_mask_locked(inode);
mark->inode = inode;
ret = fsnotify_add_mark_list(&inode->i_fsnotify_marks, mark,
allow_dups);
inode->i_fsnotify_mask = fsnotify_recalc_mask(&inode->i_fsnotify_marks);
spin_unlock(&inode->i_lock);
return ret;

View file

@ -156,7 +156,7 @@ static int idr_callback(int id, void *p, void *data)
*/
if (fsn_mark)
printk(KERN_WARNING "fsn_mark->group=%p inode=%p wd=%d\n",
fsn_mark->group, fsn_mark->i.inode, i_mark->wd);
fsn_mark->group, fsn_mark->inode, i_mark->wd);
return 0;
}

View file

@ -433,7 +433,7 @@ static void inotify_remove_from_idr(struct fsnotify_group *group,
if (wd == -1) {
WARN_ONCE(1, "%s: i_mark=%p i_mark->wd=%d i_mark->group=%p"
" i_mark->inode=%p\n", __func__, i_mark, i_mark->wd,
i_mark->fsn_mark.group, i_mark->fsn_mark.i.inode);
i_mark->fsn_mark.group, i_mark->fsn_mark.inode);
goto out;
}
@ -442,7 +442,7 @@ static void inotify_remove_from_idr(struct fsnotify_group *group,
if (unlikely(!found_i_mark)) {
WARN_ONCE(1, "%s: i_mark=%p i_mark->wd=%d i_mark->group=%p"
" i_mark->inode=%p\n", __func__, i_mark, i_mark->wd,
i_mark->fsn_mark.group, i_mark->fsn_mark.i.inode);
i_mark->fsn_mark.group, i_mark->fsn_mark.inode);
goto out;
}
@ -456,9 +456,9 @@ static void inotify_remove_from_idr(struct fsnotify_group *group,
"mark->inode=%p found_i_mark=%p found_i_mark->wd=%d "
"found_i_mark->group=%p found_i_mark->inode=%p\n",
__func__, i_mark, i_mark->wd, i_mark->fsn_mark.group,
i_mark->fsn_mark.i.inode, found_i_mark, found_i_mark->wd,
i_mark->fsn_mark.inode, found_i_mark, found_i_mark->wd,
found_i_mark->fsn_mark.group,
found_i_mark->fsn_mark.i.inode);
found_i_mark->fsn_mark.inode);
goto out;
}
@ -470,7 +470,7 @@ static void inotify_remove_from_idr(struct fsnotify_group *group,
if (unlikely(atomic_read(&i_mark->fsn_mark.refcnt) < 3)) {
printk(KERN_ERR "%s: i_mark=%p i_mark->wd=%d i_mark->group=%p"
" i_mark->inode=%p\n", __func__, i_mark, i_mark->wd,
i_mark->fsn_mark.group, i_mark->fsn_mark.i.inode);
i_mark->fsn_mark.group, i_mark->fsn_mark.inode);
/* we can't really recover with bad ref cnting.. */
BUG();
}

View file

@ -110,6 +110,17 @@ void fsnotify_put_mark(struct fsnotify_mark *mark)
}
}
/* Calculate mask of events for a list of marks */
u32 fsnotify_recalc_mask(struct hlist_head *head)
{
u32 new_mask = 0;
struct fsnotify_mark *mark;
hlist_for_each_entry(mark, head, obj_list)
new_mask |= mark->mask;
return new_mask;
}
/*
* Any time a mark is getting freed we end up here.
* The caller had better be holding a reference to this mark so we don't actually
@ -133,7 +144,7 @@ void fsnotify_destroy_mark_locked(struct fsnotify_mark *mark,
mark->flags &= ~FSNOTIFY_MARK_FLAG_ALIVE;
if (mark->flags & FSNOTIFY_MARK_FLAG_INODE) {
inode = mark->i.inode;
inode = mark->inode;
fsnotify_destroy_inode_mark(mark);
} else if (mark->flags & FSNOTIFY_MARK_FLAG_VFSMOUNT)
fsnotify_destroy_vfsmount_mark(mark);
@ -150,7 +161,7 @@ void fsnotify_destroy_mark_locked(struct fsnotify_mark *mark,
mutex_unlock(&group->mark_mutex);
spin_lock(&destroy_lock);
list_add(&mark->destroy_list, &destroy_list);
list_add(&mark->g_list, &destroy_list);
spin_unlock(&destroy_lock);
wake_up(&destroy_waitq);
/*
@ -192,6 +203,27 @@ void fsnotify_destroy_mark(struct fsnotify_mark *mark,
mutex_unlock(&group->mark_mutex);
}
/*
* Destroy all marks in the given list. The marks must be already detached from
* the original inode / vfsmount.
*/
void fsnotify_destroy_marks(struct list_head *to_free)
{
struct fsnotify_mark *mark, *lmark;
struct fsnotify_group *group;
list_for_each_entry_safe(mark, lmark, to_free, free_list) {
spin_lock(&mark->lock);
fsnotify_get_group(mark->group);
group = mark->group;
spin_unlock(&mark->lock);
fsnotify_destroy_mark(mark, group);
fsnotify_put_mark(mark);
fsnotify_put_group(group);
}
}
void fsnotify_set_mark_mask_locked(struct fsnotify_mark *mark, __u32 mask)
{
assert_spin_locked(&mark->lock);
@ -245,6 +277,39 @@ int fsnotify_compare_groups(struct fsnotify_group *a, struct fsnotify_group *b)
return -1;
}
/* Add mark into proper place in given list of marks */
int fsnotify_add_mark_list(struct hlist_head *head, struct fsnotify_mark *mark,
int allow_dups)
{
struct fsnotify_mark *lmark, *last = NULL;
int cmp;
/* is mark the first mark? */
if (hlist_empty(head)) {
hlist_add_head_rcu(&mark->obj_list, head);
return 0;
}
/* should mark be in the middle of the current list? */
hlist_for_each_entry(lmark, head, obj_list) {
last = lmark;
if ((lmark->group == mark->group) && !allow_dups)
return -EEXIST;
cmp = fsnotify_compare_groups(lmark->group, mark->group);
if (cmp >= 0) {
hlist_add_before_rcu(&mark->obj_list, &lmark->obj_list);
return 0;
}
}
BUG_ON(last == NULL);
/* mark should be the last entry. last is the current last entry */
hlist_add_behind_rcu(&mark->obj_list, &last->obj_list);
return 0;
}
/*
* Attach an initialized mark to a given group and fs object.
* These marks may be used for the fsnotify backend to determine which
@ -305,7 +370,7 @@ int fsnotify_add_mark_locked(struct fsnotify_mark *mark,
spin_unlock(&mark->lock);
spin_lock(&destroy_lock);
list_add(&mark->destroy_list, &destroy_list);
list_add(&mark->g_list, &destroy_list);
spin_unlock(&destroy_lock);
wake_up(&destroy_waitq);
@ -322,6 +387,24 @@ int fsnotify_add_mark(struct fsnotify_mark *mark, struct fsnotify_group *group,
return ret;
}
/*
* Given a list of marks, find the mark associated with given group. If found
* take a reference to that mark and return it, else return NULL.
*/
struct fsnotify_mark *fsnotify_find_mark(struct hlist_head *head,
struct fsnotify_group *group)
{
struct fsnotify_mark *mark;
hlist_for_each_entry(mark, head, obj_list) {
if (mark->group == group) {
fsnotify_get_mark(mark);
return mark;
}
}
return NULL;
}
/*
* clear any marks in a group in which mark->flags & flags is true
*/
@ -352,8 +435,8 @@ void fsnotify_clear_marks_by_group(struct fsnotify_group *group)
void fsnotify_duplicate_mark(struct fsnotify_mark *new, struct fsnotify_mark *old)
{
assert_spin_locked(&old->lock);
new->i.inode = old->i.inode;
new->m.mnt = old->m.mnt;
new->inode = old->inode;
new->mnt = old->mnt;
if (old->group)
fsnotify_get_group(old->group);
new->group = old->group;
@ -386,8 +469,8 @@ static int fsnotify_mark_destroy(void *ignored)
synchronize_srcu(&fsnotify_mark_srcu);
list_for_each_entry_safe(mark, next, &private_destroy_list, destroy_list) {
list_del_init(&mark->destroy_list);
list_for_each_entry_safe(mark, next, &private_destroy_list, g_list) {
list_del_init(&mark->g_list);
fsnotify_put_mark(mark);
}

View file

@ -32,31 +32,20 @@
void fsnotify_clear_marks_by_mount(struct vfsmount *mnt)
{
struct fsnotify_mark *mark, *lmark;
struct fsnotify_mark *mark;
struct hlist_node *n;
struct mount *m = real_mount(mnt);
LIST_HEAD(free_list);
spin_lock(&mnt->mnt_root->d_lock);
hlist_for_each_entry_safe(mark, n, &m->mnt_fsnotify_marks, m.m_list) {
list_add(&mark->m.free_m_list, &free_list);
hlist_del_init_rcu(&mark->m.m_list);
hlist_for_each_entry_safe(mark, n, &m->mnt_fsnotify_marks, obj_list) {
list_add(&mark->free_list, &free_list);
hlist_del_init_rcu(&mark->obj_list);
fsnotify_get_mark(mark);
}
spin_unlock(&mnt->mnt_root->d_lock);
list_for_each_entry_safe(mark, lmark, &free_list, m.free_m_list) {
struct fsnotify_group *group;
spin_lock(&mark->lock);
fsnotify_get_group(mark->group);
group = mark->group;
spin_unlock(&mark->lock);
fsnotify_destroy_mark(mark, group);
fsnotify_put_mark(mark);
fsnotify_put_group(group);
}
fsnotify_destroy_marks(&free_list);
}
void fsnotify_clear_vfsmount_marks_by_group(struct fsnotify_group *group)
@ -64,67 +53,36 @@ void fsnotify_clear_vfsmount_marks_by_group(struct fsnotify_group *group)
fsnotify_clear_marks_by_group_flags(group, FSNOTIFY_MARK_FLAG_VFSMOUNT);
}
/*
* Recalculate the mask of events relevant to a given vfsmount locked.
*/
static void fsnotify_recalc_vfsmount_mask_locked(struct vfsmount *mnt)
{
struct mount *m = real_mount(mnt);
struct fsnotify_mark *mark;
__u32 new_mask = 0;
assert_spin_locked(&mnt->mnt_root->d_lock);
hlist_for_each_entry(mark, &m->mnt_fsnotify_marks, m.m_list)
new_mask |= mark->mask;
m->mnt_fsnotify_mask = new_mask;
}
/*
* Recalculate the mnt->mnt_fsnotify_mask, or the mask of all FS_* event types
* any notifier is interested in hearing for this mount point
*/
void fsnotify_recalc_vfsmount_mask(struct vfsmount *mnt)
{
struct mount *m = real_mount(mnt);
spin_lock(&mnt->mnt_root->d_lock);
fsnotify_recalc_vfsmount_mask_locked(mnt);
m->mnt_fsnotify_mask = fsnotify_recalc_mask(&m->mnt_fsnotify_marks);
spin_unlock(&mnt->mnt_root->d_lock);
}
void fsnotify_destroy_vfsmount_mark(struct fsnotify_mark *mark)
{
struct vfsmount *mnt = mark->m.mnt;
struct vfsmount *mnt = mark->mnt;
struct mount *m = real_mount(mnt);
BUG_ON(!mutex_is_locked(&mark->group->mark_mutex));
assert_spin_locked(&mark->lock);
spin_lock(&mnt->mnt_root->d_lock);
hlist_del_init_rcu(&mark->m.m_list);
mark->m.mnt = NULL;
fsnotify_recalc_vfsmount_mask_locked(mnt);
hlist_del_init_rcu(&mark->obj_list);
mark->mnt = NULL;
m->mnt_fsnotify_mask = fsnotify_recalc_mask(&m->mnt_fsnotify_marks);
spin_unlock(&mnt->mnt_root->d_lock);
}
static struct fsnotify_mark *fsnotify_find_vfsmount_mark_locked(struct fsnotify_group *group,
struct vfsmount *mnt)
{
struct mount *m = real_mount(mnt);
struct fsnotify_mark *mark;
assert_spin_locked(&mnt->mnt_root->d_lock);
hlist_for_each_entry(mark, &m->mnt_fsnotify_marks, m.m_list) {
if (mark->group == group) {
fsnotify_get_mark(mark);
return mark;
}
}
return NULL;
}
/*
* given a group and vfsmount, find the mark associated with that combination.
* if found take a reference to that mark and return it, else return NULL
@ -132,10 +90,11 @@ static struct fsnotify_mark *fsnotify_find_vfsmount_mark_locked(struct fsnotify_
struct fsnotify_mark *fsnotify_find_vfsmount_mark(struct fsnotify_group *group,
struct vfsmount *mnt)
{
struct mount *m = real_mount(mnt);
struct fsnotify_mark *mark;
spin_lock(&mnt->mnt_root->d_lock);
mark = fsnotify_find_vfsmount_mark_locked(group, mnt);
mark = fsnotify_find_mark(&m->mnt_fsnotify_marks, group);
spin_unlock(&mnt->mnt_root->d_lock);
return mark;
@ -151,9 +110,7 @@ int fsnotify_add_vfsmount_mark(struct fsnotify_mark *mark,
int allow_dups)
{
struct mount *m = real_mount(mnt);
struct fsnotify_mark *lmark, *last = NULL;
int ret = 0;
int cmp;
int ret;
mark->flags |= FSNOTIFY_MARK_FLAG_VFSMOUNT;
@ -161,37 +118,9 @@ int fsnotify_add_vfsmount_mark(struct fsnotify_mark *mark,
assert_spin_locked(&mark->lock);
spin_lock(&mnt->mnt_root->d_lock);
mark->m.mnt = mnt;
/* is mark the first mark? */
if (hlist_empty(&m->mnt_fsnotify_marks)) {
hlist_add_head_rcu(&mark->m.m_list, &m->mnt_fsnotify_marks);
goto out;
}
/* should mark be in the middle of the current list? */
hlist_for_each_entry(lmark, &m->mnt_fsnotify_marks, m.m_list) {
last = lmark;
if ((lmark->group == group) && !allow_dups) {
ret = -EEXIST;
goto out;
}
cmp = fsnotify_compare_groups(lmark->group, mark->group);
if (cmp < 0)
continue;
hlist_add_before_rcu(&mark->m.m_list, &lmark->m.m_list);
goto out;
}
BUG_ON(last == NULL);
/* mark should be the last entry. last is the current last entry */
hlist_add_behind_rcu(&mark->m.m_list, &last->m.m_list);
out:
fsnotify_recalc_vfsmount_mask_locked(mnt);
mark->mnt = mnt;
ret = fsnotify_add_mark_list(&m->mnt_fsnotify_marks, mark, allow_dups);
m->mnt_fsnotify_mask = fsnotify_recalc_mask(&m->mnt_fsnotify_marks);
spin_unlock(&mnt->mnt_root->d_lock);
return ret;

View file

@ -295,6 +295,17 @@ int do_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
sb_start_write(inode->i_sb);
ret = file->f_op->fallocate(file, mode, offset, len);
/*
* Create inotify and fanotify events.
*
* To keep the logic simple always create events if fallocate succeeds.
* This implies that events are even created if the file size remains
* unchanged, e.g. when using flag FALLOC_FL_KEEP_SIZE.
*/
if (ret == 0)
fsnotify_modify(file);
sb_end_write(inode->i_sb);
return ret;
}

View file

@ -25,7 +25,11 @@ static void *seq_buf_alloc(unsigned long size)
{
void *buf;
buf = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
/*
* __GFP_NORETRY to avoid oom-killings with high-order allocations -
* it's better to fall back to vmalloc() than to kill things.
*/
buf = kmalloc(size, GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN);
if (!buf && size > PAGE_SIZE)
buf = vmalloc(size);
return buf;

View file

@ -53,6 +53,10 @@ struct linux_binprm {
#define BINPRM_FLAGS_EXECFD_BIT 1
#define BINPRM_FLAGS_EXECFD (1 << BINPRM_FLAGS_EXECFD_BIT)
/* filename of the binary will be inaccessible after exec */
#define BINPRM_FLAGS_PATH_INACCESSIBLE_BIT 2
#define BINPRM_FLAGS_PATH_INACCESSIBLE (1 << BINPRM_FLAGS_PATH_INACCESSIBLE_BIT)
/* Function parameter for binfmt->coredump */
struct coredump_params {
const siginfo_t *siginfo;

View file

@ -45,6 +45,7 @@
* bitmap_set(dst, pos, nbits) Set specified bit area
* bitmap_clear(dst, pos, nbits) Clear specified bit area
* bitmap_find_next_zero_area(buf, len, pos, n, mask) Find bit free area
* bitmap_find_next_zero_area_off(buf, len, pos, n, mask) as above
* bitmap_shift_right(dst, src, n, nbits) *dst = *src >> n
* bitmap_shift_left(dst, src, n, nbits) *dst = *src << n
* bitmap_remap(dst, src, old, new, nbits) *dst = map(old, new)(src)
@ -114,11 +115,36 @@ extern int __bitmap_weight(const unsigned long *bitmap, unsigned int nbits);
extern void bitmap_set(unsigned long *map, unsigned int start, int len);
extern void bitmap_clear(unsigned long *map, unsigned int start, int len);
extern unsigned long bitmap_find_next_zero_area(unsigned long *map,
unsigned long size,
unsigned long start,
unsigned int nr,
unsigned long align_mask);
extern unsigned long bitmap_find_next_zero_area_off(unsigned long *map,
unsigned long size,
unsigned long start,
unsigned int nr,
unsigned long align_mask,
unsigned long align_offset);
/**
* bitmap_find_next_zero_area - find a contiguous aligned zero area
* @map: The address to base the search on
* @size: The bitmap size in bits
* @start: The bitnumber to start searching at
* @nr: The number of zeroed bits we're looking for
* @align_mask: Alignment mask for zero area
*
* The @align_mask should be one less than a power of 2; the effect is that
* the bit offset of all zero areas this function finds is multiples of that
* power of 2. A @align_mask of 0 means no alignment is required.
*/
static inline unsigned long
bitmap_find_next_zero_area(unsigned long *map,
unsigned long size,
unsigned long start,
unsigned int nr,
unsigned long align_mask)
{
return bitmap_find_next_zero_area_off(map, size, start, nr,
align_mask, 0);
}
extern int bitmap_scnprintf(char *buf, unsigned int len,
const unsigned long *src, int nbits);

View file

@ -357,6 +357,9 @@ asmlinkage long compat_sys_lseek(unsigned int, compat_off_t, unsigned int);
asmlinkage long compat_sys_execve(const char __user *filename, const compat_uptr_t __user *argv,
const compat_uptr_t __user *envp);
asmlinkage long compat_sys_execveat(int dfd, const char __user *filename,
const compat_uptr_t __user *argv,
const compat_uptr_t __user *envp, int flags);
asmlinkage long compat_sys_select(int n, compat_ulong_t __user *inp,
compat_ulong_t __user *outp, compat_ulong_t __user *exp,

View file

@ -5,6 +5,7 @@
#include <linux/types.h>
#include <linux/debugfs.h>
#include <linux/ratelimit.h>
#include <linux/atomic.h>
/*
@ -25,14 +26,18 @@ struct fault_attr {
unsigned long reject_end;
unsigned long count;
struct ratelimit_state ratelimit_state;
struct dentry *dname;
};
#define FAULT_ATTR_INITIALIZER { \
.interval = 1, \
.times = ATOMIC_INIT(1), \
.require_end = ULONG_MAX, \
.stacktrace_depth = 32, \
.verbose = 2, \
#define FAULT_ATTR_INITIALIZER { \
.interval = 1, \
.times = ATOMIC_INIT(1), \
.require_end = ULONG_MAX, \
.stacktrace_depth = 32, \
.ratelimit_state = RATELIMIT_STATE_INIT_DISABLED, \
.verbose = 2, \
.dname = NULL, \
}
#define DECLARE_FAULT_ATTR(name) struct fault_attr name = FAULT_ATTR_INITIALIZER

View file

@ -18,6 +18,7 @@
#include <linux/pid.h>
#include <linux/bug.h>
#include <linux/mutex.h>
#include <linux/rwsem.h>
#include <linux/capability.h>
#include <linux/semaphore.h>
#include <linux/fiemap.h>
@ -401,7 +402,7 @@ struct address_space {
atomic_t i_mmap_writable;/* count VM_SHARED mappings */
struct rb_root i_mmap; /* tree of private and shared mappings */
struct list_head i_mmap_nonlinear;/*list VM_NONLINEAR mappings */
struct mutex i_mmap_mutex; /* protect tree, count, list */
struct rw_semaphore i_mmap_rwsem; /* protect tree, count, list */
/* Protected by tree_lock together with the radix tree */
unsigned long nrpages; /* number of total pages */
unsigned long nrshadows; /* number of shadow entries */
@ -467,6 +468,26 @@ struct block_device {
int mapping_tagged(struct address_space *mapping, int tag);
static inline void i_mmap_lock_write(struct address_space *mapping)
{
down_write(&mapping->i_mmap_rwsem);
}
static inline void i_mmap_unlock_write(struct address_space *mapping)
{
up_write(&mapping->i_mmap_rwsem);
}
static inline void i_mmap_lock_read(struct address_space *mapping)
{
down_read(&mapping->i_mmap_rwsem);
}
static inline void i_mmap_unlock_read(struct address_space *mapping)
{
up_read(&mapping->i_mmap_rwsem);
}
/*
* Might pages of this file be mapped into userspace?
*/
@ -2075,6 +2096,7 @@ extern int vfs_open(const struct path *, struct file *, const struct cred *);
extern struct file * dentry_open(const struct path *, int, const struct cred *);
extern int filp_close(struct file *, fl_owner_t id);
extern struct filename *getname_flags(const char __user *, int, int *);
extern struct filename *getname(const char __user *);
extern struct filename *getname_kernel(const char *);

View file

@ -196,24 +196,6 @@ struct fsnotify_group {
#define FSNOTIFY_EVENT_PATH 1
#define FSNOTIFY_EVENT_INODE 2
/*
* Inode specific fields in an fsnotify_mark
*/
struct fsnotify_inode_mark {
struct inode *inode; /* inode this mark is associated with */
struct hlist_node i_list; /* list of marks by inode->i_fsnotify_marks */
struct list_head free_i_list; /* tmp list used when freeing this mark */
};
/*
* Mount point specific fields in an fsnotify_mark
*/
struct fsnotify_vfsmount_mark {
struct vfsmount *mnt; /* vfsmount this mark is associated with */
struct hlist_node m_list; /* list of marks by inode->i_fsnotify_marks */
struct list_head free_m_list; /* tmp list used when freeing this mark */
};
/*
* a mark is simply an object attached to an in core inode which allows an
* fsnotify listener to indicate they are either no longer interested in events
@ -230,11 +212,17 @@ struct fsnotify_mark {
* in kernel that found and may be using this mark. */
atomic_t refcnt; /* active things looking at this mark */
struct fsnotify_group *group; /* group this mark is for */
struct list_head g_list; /* list of marks by group->i_fsnotify_marks */
struct list_head g_list; /* list of marks by group->i_fsnotify_marks
* Also reused for queueing mark into
* destroy_list when it's waiting for
* the end of SRCU period before it can
* be freed */
spinlock_t lock; /* protect group and inode */
struct hlist_node obj_list; /* list of marks for inode / vfsmount */
struct list_head free_list; /* tmp list used when freeing this mark */
union {
struct fsnotify_inode_mark i;
struct fsnotify_vfsmount_mark m;
struct inode *inode; /* inode this mark is associated with */
struct vfsmount *mnt; /* vfsmount this mark is associated with */
};
__u32 ignored_mask; /* events types to ignore */
#define FSNOTIFY_MARK_FLAG_INODE 0x01
@ -243,7 +231,6 @@ struct fsnotify_mark {
#define FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY 0x08
#define FSNOTIFY_MARK_FLAG_ALIVE 0x10
unsigned int flags; /* vfsmount or inode mark? */
struct list_head destroy_list;
void (*free_mark)(struct fsnotify_mark *mark); /* called on final put+free */
};

View file

@ -110,11 +110,8 @@ struct vm_area_struct;
#define GFP_TEMPORARY (__GFP_WAIT | __GFP_IO | __GFP_FS | \
__GFP_RECLAIMABLE)
#define GFP_USER (__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HARDWALL)
#define GFP_HIGHUSER (__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HARDWALL | \
__GFP_HIGHMEM)
#define GFP_HIGHUSER_MOVABLE (__GFP_WAIT | __GFP_IO | __GFP_FS | \
__GFP_HARDWALL | __GFP_HIGHMEM | \
__GFP_MOVABLE)
#define GFP_HIGHUSER (GFP_USER | __GFP_HIGHMEM)
#define GFP_HIGHUSER_MOVABLE (GFP_HIGHUSER | __GFP_MOVABLE)
#define GFP_IOFS (__GFP_IO | __GFP_FS)
#define GFP_TRANSHUGE (GFP_HIGHUSER_MOVABLE | __GFP_COMP | \
__GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN | \

View file

@ -7,15 +7,6 @@
#include <linux/notifier.h>
#include <linux/nsproxy.h>
/*
* ipc namespace events
*/
#define IPCNS_MEMCHANGED 0x00000001 /* Notify lowmem size changed */
#define IPCNS_CREATED 0x00000002 /* Notify new ipc namespace created */
#define IPCNS_REMOVED 0x00000003 /* Notify ipc namespace removed */
#define IPCNS_CALLBACK_PRI 0
struct user_namespace;
struct ipc_ids {
@ -38,7 +29,6 @@ struct ipc_namespace {
unsigned int msg_ctlmni;
atomic_t msg_bytes;
atomic_t msg_hdrs;
int auto_msgmni;
size_t shm_ctlmax;
size_t shm_ctlall;
@ -77,18 +67,8 @@ extern atomic_t nr_ipc_ns;
extern spinlock_t mq_lock;
#ifdef CONFIG_SYSVIPC
extern int register_ipcns_notifier(struct ipc_namespace *);
extern int cond_register_ipcns_notifier(struct ipc_namespace *);
extern void unregister_ipcns_notifier(struct ipc_namespace *);
extern int ipcns_notify(unsigned long);
extern void shm_destroy_orphaned(struct ipc_namespace *ns);
#else /* CONFIG_SYSVIPC */
static inline int register_ipcns_notifier(struct ipc_namespace *ns)
{ return 0; }
static inline int cond_register_ipcns_notifier(struct ipc_namespace *ns)
{ return 0; }
static inline void unregister_ipcns_notifier(struct ipc_namespace *ns) { }
static inline int ipcns_notify(unsigned long l) { return 0; }
static inline void shm_destroy_orphaned(struct ipc_namespace *ns) {}
#endif /* CONFIG_SYSVIPC */

View file

@ -21,6 +21,8 @@
#ifndef __KMEMLEAK_H
#define __KMEMLEAK_H
#include <linux/slab.h>
#ifdef CONFIG_DEBUG_KMEMLEAK
extern void kmemleak_init(void) __ref;

View file

@ -400,8 +400,8 @@ int memcg_cache_id(struct mem_cgroup *memcg);
void memcg_update_array_size(int num_groups);
struct kmem_cache *
__memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp);
struct kmem_cache *__memcg_kmem_get_cache(struct kmem_cache *cachep);
void __memcg_kmem_put_cache(struct kmem_cache *cachep);
int __memcg_charge_slab(struct kmem_cache *cachep, gfp_t gfp, int order);
void __memcg_uncharge_slab(struct kmem_cache *cachep, int order);
@ -492,7 +492,13 @@ memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp)
if (unlikely(fatal_signal_pending(current)))
return cachep;
return __memcg_kmem_get_cache(cachep, gfp);
return __memcg_kmem_get_cache(cachep);
}
static __always_inline void memcg_kmem_put_cache(struct kmem_cache *cachep)
{
if (memcg_kmem_enabled())
__memcg_kmem_put_cache(cachep);
}
#else
#define for_each_memcg_cache_index(_idx) \
@ -528,6 +534,10 @@ memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp)
{
return cachep;
}
static inline void memcg_kmem_put_cache(struct kmem_cache *cachep)
{
}
#endif /* CONFIG_MEMCG_KMEM */
#endif /* _LINUX_MEMCONTROL_H */

View file

@ -19,6 +19,7 @@
#include <linux/bit_spinlock.h>
#include <linux/shrinker.h>
#include <linux/resource.h>
#include <linux/page_ext.h>
struct mempolicy;
struct anon_vma;
@ -2060,7 +2061,22 @@ static inline void vm_stat_account(struct mm_struct *mm,
#endif /* CONFIG_PROC_FS */
#ifdef CONFIG_DEBUG_PAGEALLOC
extern void kernel_map_pages(struct page *page, int numpages, int enable);
extern bool _debug_pagealloc_enabled;
extern void __kernel_map_pages(struct page *page, int numpages, int enable);
static inline bool debug_pagealloc_enabled(void)
{
return _debug_pagealloc_enabled;
}
static inline void
kernel_map_pages(struct page *page, int numpages, int enable)
{
if (!debug_pagealloc_enabled())
return;
__kernel_map_pages(page, numpages, enable);
}
#ifdef CONFIG_HIBERNATION
extern bool kernel_page_present(struct page *page);
#endif /* CONFIG_HIBERNATION */
@ -2094,9 +2110,9 @@ int drop_caches_sysctl_handler(struct ctl_table *, int,
void __user *, size_t *, loff_t *);
#endif
unsigned long shrink_slab(struct shrink_control *shrink,
unsigned long nr_pages_scanned,
unsigned long lru_pages);
unsigned long shrink_node_slabs(gfp_t gfp_mask, int nid,
unsigned long nr_scanned,
unsigned long nr_eligible);
#ifndef CONFIG_MMU
#define randomize_va_space 0
@ -2155,20 +2171,36 @@ extern void copy_user_huge_page(struct page *dst, struct page *src,
unsigned int pages_per_huge_page);
#endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_HUGETLBFS */
extern struct page_ext_operations debug_guardpage_ops;
extern struct page_ext_operations page_poisoning_ops;
#ifdef CONFIG_DEBUG_PAGEALLOC
extern unsigned int _debug_guardpage_minorder;
extern bool _debug_guardpage_enabled;
static inline unsigned int debug_guardpage_minorder(void)
{
return _debug_guardpage_minorder;
}
static inline bool debug_guardpage_enabled(void)
{
return _debug_guardpage_enabled;
}
static inline bool page_is_guard(struct page *page)
{
return test_bit(PAGE_DEBUG_FLAG_GUARD, &page->debug_flags);
struct page_ext *page_ext;
if (!debug_guardpage_enabled())
return false;
page_ext = lookup_page_ext(page);
return test_bit(PAGE_EXT_DEBUG_GUARD, &page_ext->flags);
}
#else
static inline unsigned int debug_guardpage_minorder(void) { return 0; }
static inline bool debug_guardpage_enabled(void) { return false; }
static inline bool page_is_guard(struct page *page) { return false; }
#endif /* CONFIG_DEBUG_PAGEALLOC */

View file

@ -10,7 +10,6 @@
#include <linux/rwsem.h>
#include <linux/completion.h>
#include <linux/cpumask.h>
#include <linux/page-debug-flags.h>
#include <linux/uprobes.h>
#include <linux/page-flags-layout.h>
#include <asm/page.h>
@ -186,9 +185,6 @@ struct page {
void *virtual; /* Kernel virtual address (NULL if
not kmapped, ie. highmem) */
#endif /* WANT_PAGE_VIRTUAL */
#ifdef CONFIG_WANT_PAGE_DEBUG_FLAGS
unsigned long debug_flags; /* Use atomic bitops on this */
#endif
#ifdef CONFIG_KMEMCHECK
/*
@ -534,4 +530,12 @@ enum tlb_flush_reason {
NR_TLB_FLUSH_REASONS,
};
/*
* A swap entry has to fit into a "unsigned long", as the entry is hidden
* in the "index" field of the swapper address space.
*/
typedef struct {
unsigned long val;
} swp_entry_t;
#endif /* _LINUX_MM_TYPES_H */

View file

@ -154,7 +154,7 @@ struct mmu_notifier_ops {
* Therefore notifier chains can only be traversed when either
*
* 1. mmap_sem is held.
* 2. One of the reverse map locks is held (i_mmap_mutex or anon_vma->rwsem).
* 2. One of the reverse map locks is held (i_mmap_rwsem or anon_vma->rwsem).
* 3. No other concurrent thread can access the list (release)
*/
struct mmu_notifier {

View file

@ -722,6 +722,9 @@ typedef struct pglist_data {
int nr_zones;
#ifdef CONFIG_FLAT_NODE_MEM_MAP /* means !SPARSEMEM */
struct page *node_mem_map;
#ifdef CONFIG_PAGE_EXTENSION
struct page_ext *node_page_ext;
#endif
#endif
#ifndef CONFIG_NO_BOOTMEM
struct bootmem_data *bdata;
@ -1075,6 +1078,7 @@ static inline unsigned long early_pfn_to_nid(unsigned long pfn)
#define SECTION_ALIGN_DOWN(pfn) ((pfn) & PAGE_SECTION_MASK)
struct page;
struct page_ext;
struct mem_section {
/*
* This is, logically, a pointer to an array of struct
@ -1092,6 +1096,14 @@ struct mem_section {
/* See declaration of similar field in struct zone */
unsigned long *pageblock_flags;
#ifdef CONFIG_PAGE_EXTENSION
/*
* If !SPARSEMEM, pgdat doesn't have page_ext pointer. We use
* section. (see page_ext.h about this.)
*/
struct page_ext *page_ext;
unsigned long pad;
#endif
/*
* WARNING: mem_section must be a power-of-2 in size for the
* calculation and use of SECTION_ROOT_MASK to make sense.

View file

@ -92,6 +92,17 @@ static inline bool oom_gfp_allowed(gfp_t gfp_mask)
extern struct task_struct *find_lock_task_mm(struct task_struct *p);
static inline bool task_will_free_mem(struct task_struct *task)
{
/*
* A coredumping process may sleep for an extended period in exit_mm(),
* so the oom killer cannot assume that the process will promptly exit
* and release memory.
*/
return (task->flags & PF_EXITING) &&
!(task->signal->flags & SIGNAL_GROUP_COREDUMP);
}
/* sysctls */
extern int sysctl_oom_dump_tasks;
extern int sysctl_oom_kill_allocating_task;

View file

@ -1,32 +0,0 @@
#ifndef LINUX_PAGE_DEBUG_FLAGS_H
#define LINUX_PAGE_DEBUG_FLAGS_H
/*
* page->debug_flags bits:
*
* PAGE_DEBUG_FLAG_POISON is set for poisoned pages. This is used to
* implement generic debug pagealloc feature. The pages are filled with
* poison patterns and set this flag after free_pages(). The poisoned
* pages are verified whether the patterns are not corrupted and clear
* the flag before alloc_pages().
*/
enum page_debug_flags {
PAGE_DEBUG_FLAG_POISON, /* Page is poisoned */
PAGE_DEBUG_FLAG_GUARD,
};
/*
* Ensure that CONFIG_WANT_PAGE_DEBUG_FLAGS reliably
* gets turned off when no debug features are enabling it!
*/
#ifdef CONFIG_WANT_PAGE_DEBUG_FLAGS
#if !defined(CONFIG_PAGE_POISONING) && \
!defined(CONFIG_PAGE_GUARD) \
/* && !defined(CONFIG_PAGE_DEBUG_SOMETHING_ELSE) && ... */
#error WANT_PAGE_DEBUG_FLAGS is turned on with no debug features!
#endif
#endif /* CONFIG_WANT_PAGE_DEBUG_FLAGS */
#endif /* LINUX_PAGE_DEBUG_FLAGS_H */

84
include/linux/page_ext.h Normal file
View file

@ -0,0 +1,84 @@
#ifndef __LINUX_PAGE_EXT_H
#define __LINUX_PAGE_EXT_H
#include <linux/types.h>
#include <linux/stacktrace.h>
struct pglist_data;
struct page_ext_operations {
bool (*need)(void);
void (*init)(void);
};
#ifdef CONFIG_PAGE_EXTENSION
/*
* page_ext->flags bits:
*
* PAGE_EXT_DEBUG_POISON is set for poisoned pages. This is used to
* implement generic debug pagealloc feature. The pages are filled with
* poison patterns and set this flag after free_pages(). The poisoned
* pages are verified whether the patterns are not corrupted and clear
* the flag before alloc_pages().
*/
enum page_ext_flags {
PAGE_EXT_DEBUG_POISON, /* Page is poisoned */
PAGE_EXT_DEBUG_GUARD,
PAGE_EXT_OWNER,
};
/*
* Page Extension can be considered as an extended mem_map.
* A page_ext page is associated with every page descriptor. The
* page_ext helps us add more information about the page.
* All page_ext are allocated at boot or memory hotplug event,
* then the page_ext for pfn always exists.
*/
struct page_ext {
unsigned long flags;
#ifdef CONFIG_PAGE_OWNER
unsigned int order;
gfp_t gfp_mask;
struct stack_trace trace;
unsigned long trace_entries[8];
#endif
};
extern void pgdat_page_ext_init(struct pglist_data *pgdat);
#ifdef CONFIG_SPARSEMEM
static inline void page_ext_init_flatmem(void)
{
}
extern void page_ext_init(void);
#else
extern void page_ext_init_flatmem(void);
static inline void page_ext_init(void)
{
}
#endif
struct page_ext *lookup_page_ext(struct page *page);
#else /* !CONFIG_PAGE_EXTENSION */
struct page_ext;
static inline void pgdat_page_ext_init(struct pglist_data *pgdat)
{
}
static inline struct page_ext *lookup_page_ext(struct page *page)
{
return NULL;
}
static inline void page_ext_init(void)
{
}
static inline void page_ext_init_flatmem(void)
{
}
#endif /* CONFIG_PAGE_EXTENSION */
#endif /* __LINUX_PAGE_EXT_H */

View file

@ -0,0 +1,38 @@
#ifndef __LINUX_PAGE_OWNER_H
#define __LINUX_PAGE_OWNER_H
#ifdef CONFIG_PAGE_OWNER
extern bool page_owner_inited;
extern struct page_ext_operations page_owner_ops;
extern void __reset_page_owner(struct page *page, unsigned int order);
extern void __set_page_owner(struct page *page,
unsigned int order, gfp_t gfp_mask);
static inline void reset_page_owner(struct page *page, unsigned int order)
{
if (likely(!page_owner_inited))
return;
__reset_page_owner(page, order);
}
static inline void set_page_owner(struct page *page,
unsigned int order, gfp_t gfp_mask)
{
if (likely(!page_owner_inited))
return;
__set_page_owner(page, order, gfp_mask);
}
#else
static inline void reset_page_owner(struct page *page, unsigned int order)
{
}
static inline void set_page_owner(struct page *page,
unsigned int order, gfp_t gfp_mask)
{
}
#endif /* CONFIG_PAGE_OWNER */
#endif /* __LINUX_PAGE_OWNER_H */

View file

@ -254,8 +254,6 @@ do { \
#endif /* CONFIG_SMP */
#define per_cpu(var, cpu) (*per_cpu_ptr(&(var), cpu))
#define __raw_get_cpu_var(var) (*raw_cpu_ptr(&(var)))
#define __get_cpu_var(var) (*this_cpu_ptr(&(var)))
/*
* Must be an lvalue. Since @var must be a simple identifier,

View file

@ -17,14 +17,20 @@ struct ratelimit_state {
unsigned long begin;
};
#define DEFINE_RATELIMIT_STATE(name, interval_init, burst_init) \
\
struct ratelimit_state name = { \
#define RATELIMIT_STATE_INIT(name, interval_init, burst_init) { \
.lock = __RAW_SPIN_LOCK_UNLOCKED(name.lock), \
.interval = interval_init, \
.burst = burst_init, \
}
#define RATELIMIT_STATE_INIT_DISABLED \
RATELIMIT_STATE_INIT(ratelimit_state, 0, DEFAULT_RATELIMIT_BURST)
#define DEFINE_RATELIMIT_STATE(name, interval_init, burst_init) \
\
struct ratelimit_state name = \
RATELIMIT_STATE_INIT(name, interval_init, burst_init) \
static inline void ratelimit_state_init(struct ratelimit_state *rs,
int interval, int burst)
{

View file

@ -1364,6 +1364,10 @@ struct task_struct {
unsigned sched_reset_on_fork:1;
unsigned sched_contributes_to_load:1;
#ifdef CONFIG_MEMCG_KMEM
unsigned memcg_kmem_skip_account:1;
#endif
unsigned long atomic_flags; /* Flags needing atomic access. */
pid_t pid;
@ -1679,8 +1683,7 @@ struct task_struct {
/* bitmask and counter of trace recursion */
unsigned long trace_recursion;
#endif /* CONFIG_TRACING */
#ifdef CONFIG_MEMCG /* memcg uses this to do batch job */
unsigned int memcg_kmem_skip_account;
#ifdef CONFIG_MEMCG
struct memcg_oom_info {
struct mem_cgroup *memcg;
gfp_t gfp_mask;
@ -2482,6 +2485,10 @@ extern void do_group_exit(int);
extern int do_execve(struct filename *,
const char __user * const __user *,
const char __user * const __user *);
extern int do_execveat(int, struct filename *,
const char __user * const __user *,
const char __user * const __user *,
int);
extern long do_fork(unsigned long, unsigned long, unsigned long, int __user *, int __user *);
struct task_struct *fork_idle(int);
extern pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags);

View file

@ -18,8 +18,6 @@ struct shrink_control {
*/
unsigned long nr_to_scan;
/* shrink from these nodes */
nodemask_t nodes_to_scan;
/* current node being shrunk (for NUMA aware shrinkers) */
int nid;
};

View file

@ -493,7 +493,6 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
* @memcg: pointer to the memcg this cache belongs to
* @list: list_head for the list of all caches in this memcg
* @root_cache: pointer to the global, root cache, this cache was derived from
* @nr_pages: number of pages that belongs to this cache.
*/
struct memcg_cache_params {
bool is_root_cache;
@ -506,7 +505,6 @@ struct memcg_cache_params {
struct mem_cgroup *memcg;
struct list_head list;
struct kmem_cache *root_cache;
atomic_t nr_pages;
};
};
};

View file

@ -1,6 +1,8 @@
#ifndef __LINUX_STACKTRACE_H
#define __LINUX_STACKTRACE_H
#include <linux/types.h>
struct task_struct;
struct pt_regs;
@ -20,6 +22,8 @@ extern void save_stack_trace_tsk(struct task_struct *tsk,
struct stack_trace *trace);
extern void print_stack_trace(struct stack_trace *trace, int spaces);
extern int snprint_stack_trace(char *buf, size_t size,
struct stack_trace *trace, int spaces);
#ifdef CONFIG_USER_STACKTRACE_SUPPORT
extern void save_stack_trace_user(struct stack_trace *trace);
@ -32,6 +36,7 @@ extern void save_stack_trace_user(struct stack_trace *trace);
# define save_stack_trace_tsk(tsk, trace) do { } while (0)
# define save_stack_trace_user(trace) do { } while (0)
# define print_stack_trace(trace, spaces) do { } while (0)
# define snprint_stack_trace(buf, size, trace, spaces) do { } while (0)
#endif
#endif

View file

@ -102,14 +102,6 @@ union swap_header {
} info;
};
/* A swap entry has to fit into a "unsigned long", as
* the entry is hidden in the "index" field of the
* swapper address space.
*/
typedef struct {
unsigned long val;
} swp_entry_t;
/*
* current->reclaim_state points to one of these when a task is running
* memory reclaim

View file

@ -877,4 +877,9 @@ asmlinkage long sys_seccomp(unsigned int op, unsigned int flags,
asmlinkage long sys_getrandom(char __user *buf, size_t count,
unsigned int flags);
asmlinkage long sys_bpf(int cmd, union bpf_attr *attr, unsigned int size);
asmlinkage long sys_execveat(int dfd, const char __user *filename,
const char __user *const __user *argv,
const char __user *const __user *envp, int flags);
#endif

View file

@ -90,6 +90,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
#ifdef CONFIG_DEBUG_VM_VMACACHE
VMACACHE_FIND_CALLS,
VMACACHE_FIND_HITS,
VMACACHE_FULL_FLUSHES,
#endif
NR_VM_EVENT_ITEMS
};

View file

@ -707,9 +707,11 @@ __SYSCALL(__NR_getrandom, sys_getrandom)
__SYSCALL(__NR_memfd_create, sys_memfd_create)
#define __NR_bpf 280
__SYSCALL(__NR_bpf, sys_bpf)
#define __NR_execveat 281
__SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat)
#undef __NR_syscalls
#define __NR_syscalls 281
#define __NR_syscalls 282
/*
* All syscalls below here should go away really,

View file

@ -51,16 +51,28 @@ struct msginfo {
};
/*
* Scaling factor to compute msgmni:
* the memory dedicated to msg queues (msgmni * msgmnb) should occupy
* at most 1/MSG_MEM_SCALE of the lowmem (see the formula in ipc/msg.c):
* up to 8MB : msgmni = 16 (MSGMNI)
* 4 GB : msgmni = 8K
* more than 16 GB : msgmni = 32K (IPCMNI)
* MSGMNI, MSGMAX and MSGMNB are default values which can be
* modified by sysctl.
*
* MSGMNI is the upper limit for the number of messages queues per
* namespace.
* It has been chosen to be as large possible without facilitating
* scenarios where userspace causes overflows when adjusting the limits via
* operations of the form retrieve current limit; add X; update limit".
*
* MSGMNB is the default size of a new message queue. Non-root tasks can
* decrease the size with msgctl(IPC_SET), root tasks
* (actually: CAP_SYS_RESOURCE) can both increase and decrease the queue
* size. The optimal value is application dependent.
* 16384 is used because it was always used (since 0.99.10)
*
* MAXMAX is the maximum size of an individual message, it's a global
* (per-namespace) limit that applies for all message queues.
* It's set to 1/2 of MSGMNB, to ensure that at least two messages fit into
* the queue. This is also an arbitrary choice (since 2.6.0).
*/
#define MSG_MEM_SCALE 32
#define MSGMNI 16 /* <= IPCMNI */ /* max # of msg queue identifiers */
#define MSGMNI 32000 /* <= IPCMNI */ /* max # of msg queue identifiers */
#define MSGMAX 8192 /* <= INT_MAX */ /* max size of message (bytes) */
#define MSGMNB 16384 /* <= INT_MAX */ /* default max size of a message queue */

View file

@ -63,10 +63,22 @@ struct seminfo {
int semaem;
};
#define SEMMNI 128 /* <= IPCMNI max # of semaphore identifiers */
#define SEMMSL 250 /* <= 8 000 max num of semaphores per id */
/*
* SEMMNI, SEMMSL and SEMMNS are default values which can be
* modified by sysctl.
* The values has been chosen to be larger than necessary for any
* known configuration.
*
* SEMOPM should not be increased beyond 1000, otherwise there is the
* risk that semop()/semtimedop() fails due to kernel memory fragmentation when
* allocating the sop array.
*/
#define SEMMNI 32000 /* <= IPCMNI max # of semaphore identifiers */
#define SEMMSL 32000 /* <= INT_MAX max num of semaphores per id */
#define SEMMNS (SEMMNI*SEMMSL) /* <= INT_MAX max # of semaphores in system */
#define SEMOPM 32 /* <= 1 000 max num of ops per semop call */
#define SEMOPM 500 /* <= 1 000 max num of ops per semop call */
#define SEMVMX 32767 /* <= 32767 semaphore maximum value */
#define SEMAEM SEMVMX /* adjust on exit max value */

View file

@ -51,6 +51,7 @@
#include <linux/mempolicy.h>
#include <linux/key.h>
#include <linux/buffer_head.h>
#include <linux/page_ext.h>
#include <linux/debug_locks.h>
#include <linux/debugobjects.h>
#include <linux/lockdep.h>
@ -484,6 +485,11 @@ void __init __weak thread_info_cache_init(void)
*/
static void __init mm_init(void)
{
/*
* page_ext requires contiguous pages,
* bigger than MAX_ORDER unless SPARSEMEM.
*/
page_ext_init_flatmem();
mem_init();
kmem_cache_init();
percpu_init_late();
@ -621,6 +627,7 @@ asmlinkage __visible void __init start_kernel(void)
initrd_start = 0;
}
#endif
page_ext_init();
debug_objects_mem_init();
kmemleak_init();
setup_per_cpu_pageset();

View file

@ -3,7 +3,7 @@
#
obj-$(CONFIG_SYSVIPC_COMPAT) += compat.o
obj-$(CONFIG_SYSVIPC) += util.o msgutil.o msg.o sem.o shm.o ipcns_notifier.o syscall.o
obj-$(CONFIG_SYSVIPC) += util.o msgutil.o msg.o sem.o shm.o syscall.o
obj-$(CONFIG_SYSVIPC_SYSCTL) += ipc_sysctl.o
obj_mq-$(CONFIG_COMPAT) += compat_mq.o
obj-$(CONFIG_POSIX_MQUEUE) += mqueue.o msgutil.o $(obj_mq-y)

View file

@ -62,29 +62,6 @@ static int proc_ipc_dointvec_minmax_orphans(struct ctl_table *table, int write,
return err;
}
static int proc_ipc_callback_dointvec_minmax(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp, loff_t *ppos)
{
struct ctl_table ipc_table;
size_t lenp_bef = *lenp;
int rc;
memcpy(&ipc_table, table, sizeof(ipc_table));
ipc_table.data = get_ipc(table);
rc = proc_dointvec_minmax(&ipc_table, write, buffer, lenp, ppos);
if (write && !rc && lenp_bef == *lenp)
/*
* Tunable has successfully been changed by hand. Disable its
* automatic adjustment. This simply requires unregistering
* the notifiers that trigger recalculation.
*/
unregister_ipcns_notifier(current->nsproxy->ipc_ns);
return rc;
}
static int proc_ipc_doulongvec_minmax(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp, loff_t *ppos)
{
@ -96,54 +73,19 @@ static int proc_ipc_doulongvec_minmax(struct ctl_table *table, int write,
lenp, ppos);
}
/*
* Routine that is called when the file "auto_msgmni" has successfully been
* written.
* Two values are allowed:
* 0: unregister msgmni's callback routine from the ipc namespace notifier
* chain. This means that msgmni won't be recomputed anymore upon memory
* add/remove or ipc namespace creation/removal.
* 1: register back the callback routine.
*/
static void ipc_auto_callback(int val)
{
if (!val)
unregister_ipcns_notifier(current->nsproxy->ipc_ns);
else {
/*
* Re-enable automatic recomputing only if not already
* enabled.
*/
recompute_msgmni(current->nsproxy->ipc_ns);
cond_register_ipcns_notifier(current->nsproxy->ipc_ns);
}
}
static int proc_ipcauto_dointvec_minmax(struct ctl_table *table, int write,
static int proc_ipc_auto_msgmni(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp, loff_t *ppos)
{
struct ctl_table ipc_table;
int oldval;
int rc;
int dummy = 0;
memcpy(&ipc_table, table, sizeof(ipc_table));
ipc_table.data = get_ipc(table);
oldval = *((int *)(ipc_table.data));
ipc_table.data = &dummy;
rc = proc_dointvec_minmax(&ipc_table, write, buffer, lenp, ppos);
if (write)
pr_info_once("writing to auto_msgmni has no effect");
if (write && !rc) {
int newval = *((int *)(ipc_table.data));
/*
* The file "auto_msgmni" has correctly been set.
* React by (un)registering the corresponding tunable, if the
* value has changed.
*/
if (newval != oldval)
ipc_auto_callback(newval);
}
return rc;
return proc_dointvec_minmax(&ipc_table, write, buffer, lenp, ppos);
}
#else
@ -151,8 +93,7 @@ static int proc_ipcauto_dointvec_minmax(struct ctl_table *table, int write,
#define proc_ipc_dointvec NULL
#define proc_ipc_dointvec_minmax NULL
#define proc_ipc_dointvec_minmax_orphans NULL
#define proc_ipc_callback_dointvec_minmax NULL
#define proc_ipcauto_dointvec_minmax NULL
#define proc_ipc_auto_msgmni NULL
#endif
static int zero;
@ -204,10 +145,19 @@ static struct ctl_table ipc_kern_table[] = {
.data = &init_ipc_ns.msg_ctlmni,
.maxlen = sizeof(init_ipc_ns.msg_ctlmni),
.mode = 0644,
.proc_handler = proc_ipc_callback_dointvec_minmax,
.proc_handler = proc_ipc_dointvec_minmax,
.extra1 = &zero,
.extra2 = &int_max,
},
{
.procname = "auto_msgmni",
.data = NULL,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_ipc_auto_msgmni,
.extra1 = &zero,
.extra2 = &one,
},
{
.procname = "msgmnb",
.data = &init_ipc_ns.msg_ctlmnb,
@ -224,15 +174,6 @@ static struct ctl_table ipc_kern_table[] = {
.mode = 0644,
.proc_handler = proc_ipc_dointvec,
},
{
.procname = "auto_msgmni",
.data = &init_ipc_ns.auto_msgmni,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_ipcauto_dointvec_minmax,
.extra1 = &zero,
.extra2 = &one,
},
#ifdef CONFIG_CHECKPOINT_RESTORE
{
.procname = "sem_next_id",

View file

@ -1,92 +0,0 @@
/*
* linux/ipc/ipcns_notifier.c
* Copyright (C) 2007 BULL SA. Nadia Derbey
*
* Notification mechanism for ipc namespaces:
* The callback routine registered in the memory chain invokes the ipcns
* notifier chain with the IPCNS_MEMCHANGED event.
* Each callback routine registered in the ipcns namespace recomputes msgmni
* for the owning namespace.
*/
#include <linux/msg.h>
#include <linux/rcupdate.h>
#include <linux/notifier.h>
#include <linux/nsproxy.h>
#include <linux/ipc_namespace.h>
#include "util.h"
static BLOCKING_NOTIFIER_HEAD(ipcns_chain);
static int ipcns_callback(struct notifier_block *self,
unsigned long action, void *arg)
{
struct ipc_namespace *ns;
switch (action) {
case IPCNS_MEMCHANGED: /* amount of lowmem has changed */
case IPCNS_CREATED:
case IPCNS_REMOVED:
/*
* It's time to recompute msgmni
*/
ns = container_of(self, struct ipc_namespace, ipcns_nb);
/*
* No need to get a reference on the ns: the 1st job of
* free_ipc_ns() is to unregister the callback routine.
* blocking_notifier_chain_unregister takes the wr lock to do
* it.
* When this callback routine is called the rd lock is held by
* blocking_notifier_call_chain.
* So the ipc ns cannot be freed while we are here.
*/
recompute_msgmni(ns);
break;
default:
break;
}
return NOTIFY_OK;
}
int register_ipcns_notifier(struct ipc_namespace *ns)
{
int rc;
memset(&ns->ipcns_nb, 0, sizeof(ns->ipcns_nb));
ns->ipcns_nb.notifier_call = ipcns_callback;
ns->ipcns_nb.priority = IPCNS_CALLBACK_PRI;
rc = blocking_notifier_chain_register(&ipcns_chain, &ns->ipcns_nb);
if (!rc)
ns->auto_msgmni = 1;
return rc;
}
int cond_register_ipcns_notifier(struct ipc_namespace *ns)
{
int rc;
memset(&ns->ipcns_nb, 0, sizeof(ns->ipcns_nb));
ns->ipcns_nb.notifier_call = ipcns_callback;
ns->ipcns_nb.priority = IPCNS_CALLBACK_PRI;
rc = blocking_notifier_chain_cond_register(&ipcns_chain,
&ns->ipcns_nb);
if (!rc)
ns->auto_msgmni = 1;
return rc;
}
void unregister_ipcns_notifier(struct ipc_namespace *ns)
{
blocking_notifier_chain_unregister(&ipcns_chain, &ns->ipcns_nb);
ns->auto_msgmni = 0;
}
int ipcns_notify(unsigned long val)
{
return blocking_notifier_call_chain(&ipcns_chain, val, NULL);
}

View file

@ -989,43 +989,12 @@ SYSCALL_DEFINE5(msgrcv, int, msqid, struct msgbuf __user *, msgp, size_t, msgsz,
return do_msgrcv(msqid, msgp, msgsz, msgtyp, msgflg, do_msg_fill);
}
/*
* Scale msgmni with the available lowmem size: the memory dedicated to msg
* queues should occupy at most 1/MSG_MEM_SCALE of lowmem.
* Also take into account the number of nsproxies created so far.
* This should be done staying within the (MSGMNI , IPCMNI/nr_ipc_ns) range.
*/
void recompute_msgmni(struct ipc_namespace *ns)
{
struct sysinfo i;
unsigned long allowed;
int nb_ns;
si_meminfo(&i);
allowed = (((i.totalram - i.totalhigh) / MSG_MEM_SCALE) * i.mem_unit)
/ MSGMNB;
nb_ns = atomic_read(&nr_ipc_ns);
allowed /= nb_ns;
if (allowed < MSGMNI) {
ns->msg_ctlmni = MSGMNI;
return;
}
if (allowed > IPCMNI / nb_ns) {
ns->msg_ctlmni = IPCMNI / nb_ns;
return;
}
ns->msg_ctlmni = allowed;
}
void msg_init_ns(struct ipc_namespace *ns)
{
ns->msg_ctlmax = MSGMAX;
ns->msg_ctlmnb = MSGMNB;
recompute_msgmni(ns);
ns->msg_ctlmni = MSGMNI;
atomic_set(&ns->msg_bytes, 0);
atomic_set(&ns->msg_hdrs, 0);
@ -1069,9 +1038,6 @@ void __init msg_init(void)
{
msg_init_ns(&init_ipc_ns);
printk(KERN_INFO "msgmni has been set to %d\n",
init_ipc_ns.msg_ctlmni);
ipc_init_proc_interface("sysvipc/msg",
" key msqid perms cbytes qnum lspid lrpid uid gid cuid cgid stime rtime ctime\n",
IPC_MSG_IDS, sysvipc_msg_proc_show);

View file

@ -45,14 +45,6 @@ static struct ipc_namespace *create_ipc_ns(struct user_namespace *user_ns,
msg_init_ns(ns);
shm_init_ns(ns);
/*
* msgmni has already been computed for the new ipc ns.
* Thus, do the ipcns creation notification before registering that
* new ipcns in the chain.
*/
ipcns_notify(IPCNS_CREATED);
register_ipcns_notifier(ns);
ns->user_ns = get_user_ns(user_ns);
return ns;
@ -99,25 +91,11 @@ void free_ipcs(struct ipc_namespace *ns, struct ipc_ids *ids,
static void free_ipc_ns(struct ipc_namespace *ns)
{
/*
* Unregistering the hotplug notifier at the beginning guarantees
* that the ipc namespace won't be freed while we are inside the
* callback routine. Since the blocking_notifier_chain_XXX routines
* hold a rw lock on the notifier list, unregister_ipcns_notifier()
* won't take the rw lock before blocking_notifier_call_chain() has
* released the rd lock.
*/
unregister_ipcns_notifier(ns);
sem_exit_ns(ns);
msg_exit_ns(ns);
shm_exit_ns(ns);
atomic_dec(&nr_ipc_ns);
/*
* Do the ipcns removal notification after decrementing nr_ipc_ns in
* order to have a correct value when recomputing msgmni.
*/
ipcns_notify(IPCNS_REMOVED);
put_user_ns(ns->user_ns);
proc_free_inum(ns->proc_inum);
kfree(ns);

View file

@ -326,10 +326,17 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
/* Then check that the global lock is free */
if (!spin_is_locked(&sma->sem_perm.lock)) {
/* spin_is_locked() is not a memory barrier */
smp_mb();
/*
* The ipc object lock check must be visible on all
* cores before rechecking the complex count. Otherwise
* we can race with another thread that does:
* complex_count++;
* spin_unlock(sem_perm.lock);
*/
smp_rmb();
/* Now repeat the test of complex_count:
/*
* Now repeat the test of complex_count:
* It can't change anymore until we drop sem->lock.
* Thus: if is now 0, then it will stay 0.
*/

Some files were not shown because too many files have changed in this diff Show more