linux-stable/include/linux/uaccess.h
Ingo Molnar f180053694 x86, mm: dont use non-temporal stores in pagecache accesses
Impact: standardize IO on cached ops

On modern CPUs it is almost always a bad idea to use non-temporal stores,
as the regression in this commit has shown it:

  30d697f: x86: fix performance regression in write() syscall

The kernel simply has no good information about whether using non-temporal
stores is a good idea or not - and trying to add heuristics only increases
complexity and inserts fragility.

The regression on cached write()s took very long to be found - over two
years. So dont take any chances and let the hardware decide how it makes
use of its caches.

The only exception is drivers/gpu/drm/i915/i915_gem.c: there were we are
absolutely sure that another entity (the GPU) will pick up the dirty
data immediately and that the CPU will not touch that data before the
GPU will.

Also, keep the _nocache() primitives to make it easier for people to
experiment with these details. There may be more clear-cut cases where
non-cached copies can be used, outside of filemap.c.

Cc: Salman Qazi <sqazi@google.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-02 11:06:49 +01:00

109 lines
3.2 KiB
C

#ifndef __LINUX_UACCESS_H__
#define __LINUX_UACCESS_H__
#include <linux/preempt.h>
#include <asm/uaccess.h>
/*
* These routines enable/disable the pagefault handler in that
* it will not take any locks and go straight to the fixup table.
*
* They have great resemblance to the preempt_disable/enable calls
* and in fact they are identical; this is because currently there is
* no other way to make the pagefault handlers do this. So we do
* disable preemption but we don't necessarily care about that.
*/
static inline void pagefault_disable(void)
{
inc_preempt_count();
/*
* make sure to have issued the store before a pagefault
* can hit.
*/
barrier();
}
static inline void pagefault_enable(void)
{
/*
* make sure to issue those last loads/stores before enabling
* the pagefault handler again.
*/
barrier();
dec_preempt_count();
/*
* make sure we do..
*/
barrier();
preempt_check_resched();
}
#ifndef ARCH_HAS_NOCACHE_UACCESS
static inline unsigned long __copy_from_user_inatomic_nocache(void *to,
const void __user *from, unsigned long n)
{
return __copy_from_user_inatomic(to, from, n);
}
static inline unsigned long __copy_from_user_nocache(void *to,
const void __user *from, unsigned long n)
{
return __copy_from_user(to, from, n);
}
#endif /* ARCH_HAS_NOCACHE_UACCESS */
/**
* probe_kernel_address(): safely attempt to read from a location
* @addr: address to read from - its type is type typeof(retval)*
* @retval: read into this variable
*
* Safely read from address @addr into variable @revtal. If a kernel fault
* happens, handle that and return -EFAULT.
* We ensure that the __get_user() is executed in atomic context so that
* do_page_fault() doesn't attempt to take mmap_sem. This makes
* probe_kernel_address() suitable for use within regions where the caller
* already holds mmap_sem, or other locks which nest inside mmap_sem.
* This must be a macro because __get_user() needs to know the types of the
* args.
*
* We don't include enough header files to be able to do the set_fs(). We
* require that the probe_kernel_address() caller will do that.
*/
#define probe_kernel_address(addr, retval) \
({ \
long ret; \
mm_segment_t old_fs = get_fs(); \
\
set_fs(KERNEL_DS); \
pagefault_disable(); \
ret = __copy_from_user_inatomic(&(retval), (__force typeof(retval) __user *)(addr), sizeof(retval)); \
pagefault_enable(); \
set_fs(old_fs); \
ret; \
})
/*
* probe_kernel_read(): safely attempt to read from a location
* @dst: pointer to the buffer that shall take the data
* @src: address to read from
* @size: size of the data chunk
*
* Safely read from address @src to the buffer at @dst. If a kernel fault
* happens, handle that and return -EFAULT.
*/
extern long probe_kernel_read(void *dst, void *src, size_t size);
/*
* probe_kernel_write(): safely attempt to write to a location
* @dst: address to write to
* @src: pointer to the data that shall be written
* @size: size of the data chunk
*
* Safely write to address @dst from the buffer at @src. If a kernel fault
* happens, handle that and return -EFAULT.
*/
extern long probe_kernel_write(void *dst, void *src, size_t size);
#endif /* __LINUX_UACCESS_H__ */