linux-stable/include
Nick Piggin 08291429cf mm: fix pagecache write deadlocks
Modify the core write() code so that it won't take a pagefault while holding a
lock on the pagecache page. There are a number of different deadlocks possible
if we try to do such a thing:

1.  generic_buffered_write
2.   lock_page
3.    prepare_write
4.     unlock_page+vmtruncate
5.     copy_from_user
6.      mmap_sem(r)
7.       handle_mm_fault
8.        lock_page (filemap_nopage)
9.    commit_write
10.  unlock_page

a. sys_munmap / sys_mlock / others
b.  mmap_sem(w)
c.   make_pages_present
d.    get_user_pages
e.     handle_mm_fault
f.      lock_page (filemap_nopage)

2,8	- recursive deadlock if page is same
2,8;2,8	- ABBA deadlock is page is different
2,6;b,f	- ABBA deadlock if page is same

The solution is as follows:
1.  If we find the destination page is uptodate, continue as normal, but use
    atomic usercopies which do not take pagefaults and do not zero the uncopied
    tail of the destination. The destination is already uptodate, so we can
    commit_write the full length even if there was a partial copy: it does not
    matter that the tail was not modified, because if it is dirtied and written
    back to disk it will not cause any problems (uptodate *means* that the
    destination page is as new or newer than the copy on disk).

1a. The above requires that fault_in_pages_readable correctly returns access
    information, because atomic usercopies cannot distinguish between
    non-present pages in a readable mapping, from lack of a readable mapping.

2.  If we find the destination page is non uptodate, unlock it (this could be
    made slightly more optimal), then allocate a temporary page to copy the
    source data into. Relock the destination page and continue with the copy.
    However, instead of a usercopy (which might take a fault), copy the data
    from the pinned temporary page via the kernel address space.

(also, rename maxlen to seglen, because it was confusing)

This increases the CPU/memory copy cost by almost 50% on the affected
workloads. That will be solved by introducing a new set of pagecache write
aops in a subsequent patch.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:54 -07:00
..
acpi
asm-alpha
asm-arm Merge branch 'devel' of master.kernel.org:/home/rmk/linux-2.6-arm 2007-10-15 16:08:50 -07:00
asm-avr32 x86: optimize page faults like all other achitectures and kill notifier cruft 2007-10-16 09:42:50 -07:00
asm-blackfin Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2007-10-15 13:41:39 -07:00
asm-cris
asm-frv frv: missing casts in cmpxchg() 2007-10-14 12:41:51 -07:00
asm-generic Generic Virtual Memmap support for SPARSEMEM 2007-10-16 09:42:51 -07:00
asm-h8300
asm-ia64 IA64: SPARSEMEM_VMEMMAP 16K page size support 2007-10-16 09:42:51 -07:00
asm-m32r
asm-m68k m68k: Export cachectl.h 2007-10-13 09:41:03 -07:00
asm-m68knommu
asm-mips move a few definitions to au1000_xxs1500.c 2007-10-16 09:42:50 -07:00
asm-parisc
asm-powerpc ppc64: SPARSEMEM_VMEMMAP support 2007-10-16 09:42:51 -07:00
asm-ppc
asm-s390 x86: optimize page faults like all other achitectures and kill notifier cruft 2007-10-16 09:42:50 -07:00
asm-sh x86: optimize page faults like all other achitectures and kill notifier cruft 2007-10-16 09:42:50 -07:00
asm-sh64 Merge git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh64-2.6 2007-10-13 09:50:26 -07:00
asm-sparc [SPARC32]: Add irqflags.h to sparc32 and use it from generic code. 2007-10-13 21:53:11 -07:00
asm-sparc64 SPARC64: SPARSEMEM_VMEMMAP support 2007-10-16 09:42:51 -07:00
asm-um
asm-v850
asm-x86 x86_64: SPARSEMEM_VMEMMAP 2M page size support 2007-10-16 09:42:51 -07:00
asm-xtensa
crypto
keys
linux mm: fix pagecache write deadlocks 2007-10-16 09:42:54 -07:00
math-emu
media v4l: copy_to_user() is not a good method name 2007-10-13 09:58:59 -07:00
mtd
net [IPV6]: Replace sk_buff ** with sk_buff * in input handlers 2007-10-15 12:50:28 -07:00
pcmcia pcmcia: use DMA_MASK_NONE for the default for all pcmcia devices 2007-10-16 09:42:50 -07:00
rdma
rxrpc
scsi Merge master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 2007-10-15 08:19:33 -07:00
sound
video
xen
Kbuild