Commit graph

72 commits

Author SHA1 Message Date
Chris Wilson
8a2421bd0d drm/i915: Wait upon userptr get-user-pages within execbuffer
This simply hides the EAGAIN caused by userptr when userspace causes
resource contention. However, it is quite beneficial with highly
contended userptr users as we avoid repeating the setup costs and
kernel-user context switches.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
2017-06-16 16:54:05 +01:00
Chris Wilson
7fc92e96c3 drm/i915: Store i915_gem_object_is_coherent() as a bit next to cache-dirty
For ease of use (i.e. avoiding a few checks and function calls), store
the object's cache coherency next to the cache is dirty bit.

Specifically this patch aims to reduce the frequency of no-op calls to
i915_gem_object_clflush() to counter-act the increase of such calls for
GPU only objects in the previous patch.

v2: Replace cache_dirty & ~cache_coherent with cache_dirty &&
!cache_coherent as gcc generates much better code for the latter
(Tvrtko)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dongwon Kim <dongwon.kim@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Tested-by: Dongwon Kim <dongwon.kim@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170616105455.16977-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-06-16 14:52:27 +01:00
Chris Wilson
e27ab73d17 drm/i915: Mark CPU cache as dirty on every transition for CPU writes
Currently, we only mark the CPU cache as dirty if we skip a clflush.
This leads to some confusion where we have to ask if the object is in
the write domain or missed a clflush. If we always mark the cache as
dirty, this becomes a much simply question to answer.

The goal remains to do as few clflushes as required and to do them as
late as possible, in the hope of deferring the work to a kthread and not
block the caller (e.g. execbuf, flips).

v2: Always call clflush before GPU execution when the cache_dirty flag
is set. This may cause some extra work on llc systems that migrate dirty
buffers back and forth - but we do try to limit that by only setting
cache_dirty at the end of the gpu sequence.

v3: Always mark the cache as dirty upon a level change, as we need to
invalidate any stale cachelines due to external writes.

Reported-by: Dongwon Kim <dongwon.kim@intel.com>
Fixes: a6a7cc4b7d ("drm/i915: Always flush the dirty CPU cache when pinning the scanout")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dongwon Kim <dongwon.kim@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Tested-by: Dongwon Kim <dongwon.kim@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170615123850.26843-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-06-16 14:50:52 +01:00
Michal Hocko
2098105ec6 drm: drop drm_[cm]alloc* helpers
Now that drm_[cm]alloc* helpers are simple one line wrappers around
kvmalloc_array and drm_free_large is just kvfree alias we can drop
them and replace by their native forms.

This shouldn't introduce any functional change.

Changes since v1
- fix typo in drivers/gpu//drm/etnaviv/etnaviv_gem.c - noticed by 0day
  build robot

Suggested-by: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Michal Hocko <mhocko@suse.com>drm: drop drm_[cm]alloc* helpers
[danvet: Fixup vgem which grew another user very recently.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Christian König <christian.koenig@amd.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170517122312.GK18247@dhcp22.suse.cz
2017-05-18 17:22:39 +02:00
Chris Wilson
15c344f4d0 drm/i915/userptr: Reinvent GGTT self-faulting protection
lockdep doesn't like us taking the mm->mmap_sem inside the get_pages
callback for a couple of reasons. The straightforward deadlock:

[13755.434059] =============================================
[13755.434061] [ INFO: possible recursive locking detected ]
[13755.434064] 4.11.0-rc1-CI-CI_DRM_297+ #1 Tainted: G     U
[13755.434066] ---------------------------------------------
[13755.434068] gem_userptr_bli/8398 is trying to acquire lock:
[13755.434070]  (&mm->mmap_sem){++++++}, at: [<ffffffffa00c988a>] i915_gem_userptr_get_pages+0x5a/0x2e0 [i915]
[13755.434096]
               but task is already holding lock:
[13755.434098]  (&mm->mmap_sem){++++++}, at: [<ffffffff8104d485>] __do_page_fault+0x105/0x560
[13755.434105]
               other info that might help us debug this:
[13755.434108]  Possible unsafe locking scenario:

[13755.434110]        CPU0
[13755.434111]        ----
[13755.434112]   lock(&mm->mmap_sem);
[13755.434115]   lock(&mm->mmap_sem);
[13755.434117]
                *** DEADLOCK ***

[13755.434121]  May be due to missing lock nesting notation

[13755.434126] 2 locks held by gem_userptr_bli/8398:
[13755.434128]  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff8104d485>] __do_page_fault+0x105/0x560
[13755.434135]  #1:  (&obj->mm.lock){+.+.+.}, at: [<ffffffffa00b887d>] __i915_gem_object_get_pages+0x1d/0x70 [i915]
[13755.434156]
               stack backtrace:
[13755.434161] CPU: 3 PID: 8398 Comm: gem_userptr_bli Tainted: G     U          4.11.0-rc1-CI-CI_DRM_297+ #1
[13755.434165] Hardware name: GIGABYTE GB-BKi7(H)A-7500/MFLP7AP-00, BIOS F4 02/20/2017
[13755.434169] Call Trace:
[13755.434174]  dump_stack+0x67/0x92
[13755.434178]  __lock_acquire+0x133a/0x1b50
[13755.434182]  lock_acquire+0xc9/0x220
[13755.434200]  ? i915_gem_userptr_get_pages+0x5a/0x2e0 [i915]
[13755.434204]  down_read+0x42/0x70
[13755.434221]  ? i915_gem_userptr_get_pages+0x5a/0x2e0 [i915]
[13755.434238]  i915_gem_userptr_get_pages+0x5a/0x2e0 [i915]
[13755.434255]  ____i915_gem_object_get_pages+0x25/0x60 [i915]
[13755.434272]  __i915_gem_object_get_pages+0x59/0x70 [i915]
[13755.434288]  i915_gem_fault+0x397/0x6a0 [i915]
[13755.434304]  ? i915_gem_fault+0x1a1/0x6a0 [i915]
[13755.434308]  ? __lock_acquire+0x449/0x1b50
[13755.434311]  ? __lock_acquire+0x449/0x1b50
[13755.434315]  ? vm_mmap_pgoff+0xa9/0xd0
[13755.434318]  __do_fault+0x19/0x70
[13755.434321]  __handle_mm_fault+0x863/0xe50
[13755.434325]  handle_mm_fault+0x17f/0x370
[13755.434329]  ? handle_mm_fault+0x40/0x370
[13755.434332]  __do_page_fault+0x279/0x560
[13755.434336]  do_page_fault+0xc/0x10
[13755.434339]  page_fault+0x22/0x30
[13755.434342] RIP: 0033:0x7f5ab91b5880
[13755.434345] RSP: 002b:00007fff62922218 EFLAGS: 00010216
[13755.434348] RAX: 0000000000b74500 RBX: 00007f5ab7f81000 RCX: 0000000000000000
[13755.434352] RDX: 0000000000100000 RSI: 00007f5ab7f81000 RDI: 00007f5aba61c000
[13755.434355] RBP: 00007f5aba61c000 R08: 0000000000000007 R09: 0000000100000000
[13755.434359] R10: 000000000000037d R11: 00007f5ab91b5840 R12: 0000000000000001
[13755.434362] R13: 0000000000000005 R14: 0000000000000001 R15: 0000000000000000

and cyclic deadlocks:

[ 2566.458979] ======================================================
[ 2566.459054] [ INFO: possible circular locking dependency detected ]
[ 2566.459127] 4.11.0-rc1+ #26 Not tainted
[ 2566.459194] -------------------------------------------------------
[ 2566.459266] gem_streaming_w/759 is trying to acquire lock:
[ 2566.459334]  (&obj->mm.lock){+.+.+.}, at: [<ffffffffa034bc80>] i915_gem_object_pin_pages+0x0/0xc0 [i915]
[ 2566.459605]
[ 2566.459605] but task is already holding lock:
[ 2566.459699]  (&mm->mmap_sem){++++++}, at: [<ffffffff8106fd11>] __do_page_fault+0x121/0x500
[ 2566.459814]
[ 2566.459814] which lock already depends on the new lock.
[ 2566.459814]
[ 2566.459934]
[ 2566.459934] the existing dependency chain (in reverse order) is:
[ 2566.460030]
[ 2566.460030] -> #1 (&mm->mmap_sem){++++++}:
[ 2566.460139]        lock_acquire+0xfe/0x220
[ 2566.460214]        down_read+0x4e/0x90
[ 2566.460444]        i915_gem_userptr_get_pages+0x6e/0x340 [i915]
[ 2566.460669]        ____i915_gem_object_get_pages+0x8b/0xd0 [i915]
[ 2566.460900]        __i915_gem_object_get_pages+0x6a/0x80 [i915]
[ 2566.461132]        __i915_vma_do_pin+0x7fa/0x930 [i915]
[ 2566.461352]        eb_add_vma+0x67b/0x830 [i915]
[ 2566.461572]        eb_lookup_vmas+0xafe/0x1010 [i915]
[ 2566.461792]        i915_gem_do_execbuffer+0x715/0x2870 [i915]
[ 2566.462012]        i915_gem_execbuffer2+0x106/0x2b0 [i915]
[ 2566.462152]        drm_ioctl+0x36c/0x670 [drm]
[ 2566.462236]        do_vfs_ioctl+0x12c/0xa60
[ 2566.462317]        SyS_ioctl+0x41/0x70
[ 2566.462399]        entry_SYSCALL_64_fastpath+0x1c/0xb1
[ 2566.462477]
[ 2566.462477] -> #0 (&obj->mm.lock){+.+.+.}:
[ 2566.462587]        __lock_acquire+0x1602/0x1790
[ 2566.462661]        lock_acquire+0xfe/0x220
[ 2566.462893]        i915_gem_object_pin_pages+0x4c/0xc0 [i915]
[ 2566.463116]        i915_gem_fault+0x2c2/0x8c0 [i915]
[ 2566.463197]        __do_fault+0x42/0x130
[ 2566.463276]        __handle_mm_fault+0x92c/0x1280
[ 2566.463356]        handle_mm_fault+0x1e2/0x440
[ 2566.463443]        __do_page_fault+0x1c4/0x500
[ 2566.463529]        do_page_fault+0xc/0x10
[ 2566.463613]        page_fault+0x1f/0x30
[ 2566.463693]
[ 2566.463693] other info that might help us debug this:
[ 2566.463693]
[ 2566.463820]  Possible unsafe locking scenario:
[ 2566.463820]
[ 2566.463918]        CPU0                    CPU1
[ 2566.463988]        ----                    ----
[ 2566.464068]   lock(&mm->mmap_sem);
[ 2566.464143]                                lock(&obj->mm.lock);
[ 2566.464226]                                lock(&mm->mmap_sem);
[ 2566.464304]   lock(&obj->mm.lock);
[ 2566.464378]
[ 2566.464378]  *** DEADLOCK ***
[ 2566.464378]
[ 2566.464504] 1 lock held by gem_streaming_w/759:
[ 2566.464576]  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff8106fd11>] __do_page_fault+0x121/0x500
[ 2566.464699]
[ 2566.464699] stack backtrace:
[ 2566.464801] CPU: 0 PID: 759 Comm: gem_streaming_w Not tainted 4.11.0-rc1+ #26
[ 2566.464881] Hardware name: GIGABYTE GB-BXBT-1900/MZBAYAB-00, BIOS F8 03/02/2016
[ 2566.464983] Call Trace:
[ 2566.465061]  dump_stack+0x68/0x9f
[ 2566.465144]  print_circular_bug+0x20b/0x260
[ 2566.465234]  __lock_acquire+0x1602/0x1790
[ 2566.465323]  ? debug_check_no_locks_freed+0x1a0/0x1a0
[ 2566.465564]  ? i915_gem_object_wait+0x238/0x650 [i915]
[ 2566.465657]  ? debug_lockdep_rcu_enabled.part.4+0x1a/0x30
[ 2566.465749]  lock_acquire+0xfe/0x220
[ 2566.465985]  ? i915_sg_trim+0x1b0/0x1b0 [i915]
[ 2566.466223]  i915_gem_object_pin_pages+0x4c/0xc0 [i915]
[ 2566.466461]  ? i915_sg_trim+0x1b0/0x1b0 [i915]
[ 2566.466699]  i915_gem_fault+0x2c2/0x8c0 [i915]
[ 2566.466939]  ? i915_gem_pwrite_ioctl+0xce0/0xce0 [i915]
[ 2566.467030]  ? __lock_acquire+0x642/0x1790
[ 2566.467122]  ? __lock_acquire+0x642/0x1790
[ 2566.467209]  ? debug_lockdep_rcu_enabled+0x35/0x40
[ 2566.467299]  ? get_unmapped_area+0x1b4/0x1d0
[ 2566.467387]  __do_fault+0x42/0x130
[ 2566.467474]  __handle_mm_fault+0x92c/0x1280
[ 2566.467564]  ? __pmd_alloc+0x1e0/0x1e0
[ 2566.467651]  ? vm_mmap_pgoff+0x160/0x190
[ 2566.467740]  ? handle_mm_fault+0x111/0x440
[ 2566.467827]  handle_mm_fault+0x1e2/0x440
[ 2566.467914]  ? handle_mm_fault+0x5d/0x440
[ 2566.468002]  __do_page_fault+0x1c4/0x500
[ 2566.468090]  do_page_fault+0xc/0x10
[ 2566.468180]  page_fault+0x1f/0x30
[ 2566.468263] RIP: 0033:0x557895ced32a
[ 2566.468337] RSP: 002b:00007fffd6dd8a10 EFLAGS: 00010202
[ 2566.468419] RAX: 00007f659a4db000 RBX: 0000000000000003 RCX: 00007f659ad032da
[ 2566.468501] RDX: 0000000000000000 RSI: 0000000000100000 RDI: 0000000000000000
[ 2566.468586] RBP: 0000000000000007 R08: 0000000000000003 R09: 0000000100000000
[ 2566.468667] R10: 0000000000000001 R11: 0000000000000246 R12: 0000557895ceda60
[ 2566.468749] R13: 0000000000000001 R14: 00007fffd6dd8ac0 R15: 00007f659a4db000

By checking the status of the gup worker (serialized by the
obj->mm.lock) we can determine whether it is still active, has failed or
has succeeded. If the worker is still active (or failed), we know that
it cannot be bound and so we can skip taking struct_mutex (risking
potential recursion). As we check the worker status, we mark it to
discard any partial results, forcing us to restart on the next
get_pages.

Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Fixes: 1c8782dd31 ("drm/i915/userptr: Disallow wrapping GTT into a userptr")
Testcase: igt/gem_userptr_blits/map-fixed-invalidate-gup
Testcase: igt/gem_userptr_blits/dmabuf-sync
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170315140150.19432-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-03-16 10:21:25 +00:00
Chris Wilson
1c8782dd31 drm/i915/userptr: Disallow wrapping GTT into a userptr
If we allow the user to convert a GTT mmap address into a userptr, we
may end up in recursion hell, where currently we hit a mutex deadlock
but other possibilities include use-after-free during the
unbind/cancel_userptr.

[  143.203989] gem_userptr_bli D    0   902    898 0x00000000
[  143.204054] Call Trace:
[  143.204137]  __schedule+0x511/0x1180
[  143.204195]  ? pci_mmcfg_check_reserved+0xc0/0xc0
[  143.204274]  schedule+0x57/0xe0
[  143.204327]  schedule_timeout+0x383/0x670
[  143.204374]  ? trace_hardirqs_on_caller+0x187/0x280
[  143.204457]  ? trace_hardirqs_on_thunk+0x1a/0x1c
[  143.204507]  ? usleep_range+0x110/0x110
[  143.204657]  ? irq_exit+0x89/0x100
[  143.204710]  ? retint_kernel+0x2d/0x2d
[  143.204794]  ? trace_hardirqs_on_caller+0x187/0x280
[  143.204857]  ? _raw_spin_unlock_irq+0x33/0x60
[  143.204944]  wait_for_common+0x1f0/0x2f0
[  143.205006]  ? out_of_line_wait_on_atomic_t+0x170/0x170
[  143.205103]  ? wake_up_q+0xa0/0xa0
[  143.205159]  ? flush_workqueue_prep_pwqs+0x15a/0x2c0
[  143.205237]  wait_for_completion+0x1d/0x20
[  143.205292]  flush_workqueue+0x2e9/0xbb0
[  143.205339]  ? flush_workqueue+0x163/0xbb0
[  143.205418]  ? __schedule+0x533/0x1180
[  143.205498]  ? check_flush_dependency+0x1a0/0x1a0
[  143.205681]  i915_gem_userptr_mn_invalidate_range_start+0x1c7/0x270 [i915]
[  143.205865]  ? i915_gem_userptr_dmabuf_export+0x40/0x40 [i915]
[  143.205955]  __mmu_notifier_invalidate_range_start+0xc6/0x120
[  143.206044]  ? __mmu_notifier_invalidate_range_start+0x51/0x120
[  143.206123]  zap_page_range_single+0x1c7/0x1f0
[  143.206171]  ? unmap_single_vma+0x160/0x160
[  143.206260]  ? unmap_mapping_range+0xa9/0x1b0
[  143.206308]  ? vma_interval_tree_subtree_search+0x75/0xd0
[  143.206397]  unmap_mapping_range+0x18f/0x1b0
[  143.206444]  ? zap_vma_ptes+0x70/0x70
[  143.206524]  ? __pm_runtime_resume+0x67/0xa0
[  143.206723]  i915_gem_release_mmap+0x1ba/0x1c0 [i915]
[  143.206846]  i915_vma_unbind+0x5c2/0x690 [i915]
[  143.206925]  ? __lock_is_held+0x52/0x100
[  143.207076]  i915_gem_object_set_tiling+0x1db/0x650 [i915]
[  143.207236]  i915_gem_set_tiling_ioctl+0x1d3/0x3b0 [i915]
[  143.207377]  ? i915_gem_set_tiling_ioctl+0x5/0x3b0 [i915]
[  143.207457]  drm_ioctl+0x36c/0x670
[  143.207535]  ? debug_lockdep_rcu_enabled.part.0+0x1a/0x30
[  143.207730]  ? i915_gem_object_set_tiling+0x650/0x650 [i915]
[  143.207793]  ? drm_getunique+0x120/0x120
[  143.207875]  ? __handle_mm_fault+0x996/0x14a0
[  143.207939]  ? vm_insert_page+0x340/0x340
[  143.208028]  ? up_write+0x28/0x50
[  143.208086]  ? vm_mmap_pgoff+0x160/0x190
[  143.208163]  do_vfs_ioctl+0x12c/0xa60
[  143.208218]  ? debug_lockdep_rcu_enabled+0x35/0x40
[  143.208267]  ? ioctl_preallocate+0x150/0x150
[  143.208353]  ? __do_page_fault+0x36a/0x6e0
[  143.208400]  ? mark_held_locks+0x23/0xc0
[  143.208479]  ? up_read+0x1f/0x40
[  143.208526]  ? entry_SYSCALL_64_fastpath+0x5/0xc6
[  143.208669]  ? __fget_light+0xa7/0xc0
[  143.208747]  SyS_ioctl+0x41/0x70

To prevent the possibility of a deadlock, we defer scheduling the worker
until after we have proven that given the current mm, the userptr range
does not overlap a GGTT mmaping. If another thread tries to remap the
GGTT over the userptr before the worker is scheduled, it will be stopped
by its invalidate-range flushing the current work, before the deadlock
can occur.

v2: Improve discussion of how we end up in the deadlock.
v3: Don't forget to mark the userptr as active after a successful
gup_fast. Rename overlaps_ggtt to noncontiguous_or_overlaps_ggtt.
v4: Fix test ordering between invalid GTT mmaping and range completion
(Tvrtko)

Reported-by: Michał Winiarski <michal.winiarski@intel.com>
Testcase: igt/gem_userptr_blits/map-fixed-invalidate-gup
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170308215903.24171-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-03-09 07:31:14 +00:00
Chris Wilson
d151e9ce98 drm/i915/userptr: Only flush the workqueue if required
To avoid waiting for work from other invalidate-range threads where
not required, only wait on the userptr cancel workqueue if we have added
some work to it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170307205851.32578-2-chris@chris-wilson.co.uk
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-03-09 07:30:49 +00:00
Chris Wilson
42953b3c51 drm/i915/userptr: Deactivate a failed userptr if the worker reports an EFAULT
If the worker fails, it no longer has pages to release and can be
immediately removed from the invalidate-tree.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170307205851.32578-1-chris@chris-wilson.co.uk
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-03-09 07:30:23 +00:00
Ingo Molnar
6e84f31522 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/mm.h>
We are going to split <linux/sched/mm.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/mm.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

The APIs that are going to be moved first are:

   mm_alloc()
   __mmdrop()
   mmdrop()
   mmdrop_async_fn()
   mmdrop_async()
   mmget_not_zero()
   mmput()
   mmput_async()
   get_task_mm()
   mm_access()
   mm_release()

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:28 +01:00
Vegard Nossum
388f793455 mm: use mmget_not_zero() helper
We already have the helper, we can convert the rest of the kernel
mechanically using:

  git grep -l 'atomic_inc_not_zero.*mm_users' | xargs sed -i 's/atomic_inc_not_zero(&\(.*\)->mm_users)/mmget_not_zero\(\1\)/'

This is needed for a later patch that hooks into the helper, but might
be a worthwhile cleanup on its own.

Link: http://lkml.kernel.org/r/20161218123229.22952-3-vegard.nossum@oracle.com
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-27 18:43:48 -08:00
Vegard Nossum
f1f1007644 mm: add new mmgrab() helper
Apart from adding the helper function itself, the rest of the kernel is
converted mechanically using:

  git grep -l 'atomic_inc.*mm_count' | xargs sed -i 's/atomic_inc(&\(.*\)->mm_count);/mmgrab\(\1\);/'
  git grep -l 'atomic_inc.*mm_count' | xargs sed -i 's/atomic_inc(&\(.*\)\.mm_count);/mmgrab\(\&\1\);/'

This is needed for a later patch that hooks into the helper, but might
be a worthwhile cleanup on its own.

(Michal Hocko provided most of the kerneldoc comment.)

Link: http://lkml.kernel.org/r/20161218123229.22952-1-vegard.nossum@oracle.com
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-27 18:43:48 -08:00
Daniel Vetter
a402eae64d Linux 4.10-rc2
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJYaYNlAAoJEHm+PkMAQRiGtCUH/18PMUJpHqRKjxL3Yscw+QZC
 RmGlD/hwBRLUSgiTCfURNGKP4QZv2kQW7BGsGC72oL01lmxozsU72ixUIO+wXzDY
 K2b0OOKGZZWzFtaVm7Qs+5JhHAEKZcT046mLD8sjJuqkrFAhmNLKdwHjihKBEkm9
 J3s2tpdXdN0x/Uyga/GY9khEYIrvLPeBoKSz+JXcQKdC0iq3/+PMpWnN47QCNScr
 7azojkJkj/rs2cqVdOi7Wbh6PSqIvPsl8E3qJefpaVJF/IQaU1pFdy5g8kYm4V7T
 fr6HgIbuN4EQWdN/5cgKrUdpQyV7D8iYx02klk4R8WgfS0QMYoUcsg+XsTd02TI=
 =OhGe
 -----END PGP SIGNATURE-----

Merge tag 'v4.10-rc2' into drm-intel-next-queued

Backmerge Linux 4.10-rc2 to resync with our -fixes cherry-picks. I've
done the backmerge directly because Dave is on vacation.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2017-01-04 11:35:18 +01:00
Lorenzo Stoakes
5b56d49fc3 mm: add locked parameter to get_user_pages_remote()
Patch series "mm: unexport __get_user_pages_unlocked()".

This patch series continues the cleanup of get_user_pages*() functions
taking advantage of the fact we can now pass gup_flags as we please.

It firstly adds an additional 'locked' parameter to
get_user_pages_remote() to allow for its callers to utilise
VM_FAULT_RETRY functionality.  This is necessary as the invocation of
__get_user_pages_unlocked() in process_vm_rw_single_vec() makes use of
this and no other existing higher level function would allow it to do
so.

Secondly existing callers of __get_user_pages_unlocked() are replaced
with the appropriate higher-level replacement -
get_user_pages_unlocked() if the current task and memory descriptor are
referenced, or get_user_pages_remote() if other task/memory descriptors
are referenced (having acquiring mmap_sem.)

This patch (of 2):

Add a int *locked parameter to get_user_pages_remote() to allow
VM_FAULT_RETRY faulting behaviour similar to get_user_pages_[un]locked().

Taking into account the previous adjustments to get_user_pages*()
functions allowing for the passing of gup_flags, we are now in a
position where __get_user_pages_unlocked() need only be exported for his
ability to allow VM_FAULT_RETRY behaviour, this adjustment allows us to
subsequently unexport __get_user_pages_unlocked() as well as allowing
for future flexibility in the use of get_user_pages_remote().

[sfr@canb.auug.org.au: merge fix for get_user_pages_remote API change]
  Link: http://lkml.kernel.org/r/20161122210511.024ec341@canb.auug.org.au
Link: http://lkml.kernel.org/r/20161027095141.2569-2-lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14 16:04:08 -08:00
Tvrtko Ursulin
187685cb90 drm/i915: Make GEM object alloc/free and stolen created take dev_priv
Where it is more appropriate and also to be consistent with
the direction of the driver.

v2: Leave out object alloc/free inlining. (Joonas Lahtinen)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2016-12-01 18:00:15 +00:00
Tvrtko Ursulin
0031fb9685 drm/i915: Assorted dev_priv cleanups
A small selection of macros which can only accept dev_priv from
now on and a resulting trickle of fixups.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
2016-11-11 14:58:26 +00:00
Tvrtko Ursulin
3599a91cc8 drm/i915: Allow shrinking of userptr objects once again
Commit 1bec9b0bda ("drm/i915/shrinker: Only shmemfs objects
are backed by swap") stopped considering the userptr objects
in shrinker callbacks.

Restore that so idle userptr objects can be discarded in order
to free up memory.

One change further to what was introduced in 1bec9b0bda is
to start considering userptr objects in oom but that should
also be a correct thing to do.

v2: Introduce I915_GEM_OBJECT_IS_SHRINKABLE. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 1bec9b0bda ("drm/i915/shrinker: Only shmemfs objects are backed by swap")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: <stable@vger.kernel.org>
Link: http://patchwork.freedesktop.org/patch/msgid/1478011450-6634-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-11-01 16:35:26 +00:00
Chris Wilson
548625ee8f drm/i915: Improve lockdep tracking for obj->mm.lock
The shrinker may appear to recurse into obj->mm.lock as the shrinker may
be called from a direct reclaim path whilst handling get_pages. We
filter out recursing on the same obj->mm.lock by inspecting
obj->mm.pages, but we do want to take the lock on a second object in
order to reap their pages. lockdep spots the recursion on the same
lockclass and needs annotation to avoid a false positive. To keep the
two paths distinct, create an enum to indicate which subclass of
obj->mm.lock we are using. This removes the false positive and avoids
masking real bugs.

Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161101121134.27504-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2016-11-01 13:01:44 +00:00
Chris Wilson
f0cd518206 drm/i915: Use lockless object free
Having moved the locked phase of freeing an object to a separate worker,
we can now declare to the core that we only need the unlocked variant of
driver->gem_free_object, and can use the simple unreference internally.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-20-chris@chris-wilson.co.uk
2016-10-28 20:53:50 +01:00
Chris Wilson
1233e2db19 drm/i915: Move object backing storage manipulation to its own locking
Break the allocation of the backing storage away from struct_mutex into
a per-object lock. This allows parallel page allocation, provided we can
do so outside of struct_mutex (i.e. set-domain-ioctl, pwrite, GTT
fault), i.e. before execbuf! The increased cost of the atomic counters
are hidden behind i915_vma_pin() for the typical case of execbuf, i.e.
as the object is typically bound between execbufs, the page_pin_count is
static. The cost will be felt around set-domain and pwrite, but offset
by the improvement from reduced struct_mutex contention.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-14-chris@chris-wilson.co.uk
2016-10-28 20:53:47 +01:00
Chris Wilson
03ac84f183 drm/i915: Pass around sg_table to get_pages/put_pages backend
The plan is to move obj->pages out from under the struct_mutex into its
own per-object lock. We need to prune any assumption of the struct_mutex
from the get_pages/put_pages backends, and to make it easier we pass
around the sg_table to operate on rather than indirectly via the obj.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-13-chris@chris-wilson.co.uk
2016-10-28 20:53:47 +01:00
Chris Wilson
a4f5ea64f0 drm/i915: Refactor object page API
The plan is to make obtaining the backing storage for the object avoid
struct_mutex (i.e. use its own locking). The first step is to update the
API so that normal users only call pin/unpin whilst working on the
backing storage.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-12-chris@chris-wilson.co.uk
2016-10-28 20:53:46 +01:00
Chris Wilson
96d7763452 drm/i915: Use a radixtree for random access to the object's backing storage
A while ago we switched from a contiguous array of pages into an sglist,
for that was both more convenient for mapping to hardware and avoided
the requirement for a vmalloc array of pages on every object. However,
certain GEM API calls (like pwrite, pread as well as performing
relocations) do desire access to individual struct pages. A quick hack
was to introduce a cache of the last access such that finding the
following page was quick - this works so long as the caller desired
sequential access. Walking backwards, or multiple callers, still hits a
slow linear search for each page. One solution is to store each
successful lookup in a radix tree.

v2: Rewrite building the radixtree for clarity, hopefully.

v3: Rearrange execbuf to avoid calling i915_gem_object_get_sg() from
within an atomic section and so relax the allocation context to a simple
GFP_KERNEL and mutex.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-10-chris@chris-wilson.co.uk
2016-10-28 20:53:45 +01:00
Chris Wilson
e95433c73a drm/i915: Rearrange i915_wait_request() accounting with callers
Our low-level wait routine has evolved from our generic wait interface
that handled unlocked, RPS boosting, waits with time tracking. If we
push our GEM fence tracking to use reservation_objects (required for
handling multiple timelines), we lose the ability to pass the required
information down to i915_wait_request(). However, if we push the extra
functionality from i915_wait_request() to the individual callsites
(i915_gem_object_wait_rendering and i915_gem_wait_ioctl) that make use
of those extras, we can both simplify our low level wait and prepare for
extending the GEM interface for use of reservation_objects.

v2: Rewrite i915_wait_request() kerneldocs

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-4-chris@chris-wilson.co.uk
2016-10-28 20:53:43 +01:00
Lorenzo Stoakes
9beae1ea89 mm: replace get_user_pages_remote() write/force parameters with gup_flags
This removes the 'write' and 'force' from get_user_pages_remote() and
replaces them with 'gup_flags' to make the use of FOLL_FORCE explicit in
callers as use of this flag can result in surprising behaviour (and
hence bugs) within the mm subsystem.

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-19 08:12:02 -07:00
Chris Wilson
ea746f3659 drm/i915: Expand bool interruptible to pass flags to i915_wait_request()
We need finer control over wakeup behaviour during i915_wait_request(),
so expand the current bool interruptible to a bitmask.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-9-chris@chris-wilson.co.uk
2016-09-09 14:23:03 +01:00
Chris Wilson
364c8172ed drm/i915/userptr: Make gup errors stickier
Keep any error reported by the gup_worker until we are notified that the
arena has changed (via the mmu-notifier). This has the importance of
making two consecutive calls to i915_gem_object_get_pages() reporting
the same error, and curtailing a loop of detecting a fault and requeueing
a gup_worker.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-19-chris@chris-wilson.co.uk
2016-08-18 22:36:49 +01:00
Chris Wilson
f826ee21e5 drm/i915/userptr: Remove superfluous interruptible=false on waiting
Inside the kthread context, we can't be interrupted by signals so
touching the mm.interruptible flag is pointless and wait-request now
consumes EIO itself.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-4-git-send-email-chris@chris-wilson.co.uk
2016-08-05 10:54:36 +01:00
Chris Wilson
8a3b3d576c drm/i915: Convert non-blocking userptr waits for requests over to using RCU
We can completely avoid taking the struct_mutex around the non-blocking
waits by switching over to the RCU request management (trading the mutex
for a RCU read lock and some complex atomic operations). The improvement
is that we gain further contention reduction, and overall the code
become simpler due to the reduced mutex dancing.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-3-git-send-email-chris@chris-wilson.co.uk
2016-08-05 10:54:35 +01:00
Chris Wilson
573adb3962 drm/i915: Move obj->active:5 to obj->flags
We are motivated to avoid using a bitfield for obj->active for a couple
of reasons. Firstly, we wish to document our lockless read of obj->active
using READ_ONCE inside i915_gem_busy_ioctl() and that requires an
integral type (i.e. not a bitfield). Secondly, gcc produces abysmal code
when presented with a bitfield and that shows up high on the profiles of
request tracking (mainly due to excess memory traffic as it converts
the bitfield to a register and back and generates frequent AGI in the
process).

v2: BIT, break up a long line in compute the other engines, new paint
for i915_gem_object_is_active (now i915_gem_object_get_active).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-23-git-send-email-chris@chris-wilson.co.uk
2016-08-04 20:20:04 +01:00
Chris Wilson
776f32364d drm/i915: s/__i915_wait_request/i915_wait_request/
There is only one wait on request function now, so drop the "expert"
indication of leading __.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-21-git-send-email-chris@chris-wilson.co.uk
2016-08-04 08:09:29 +01:00
Chris Wilson
d72d908b56 drm/i915: Mark up i915_gem_active for locking annotation
The future annotations will track the locking used for access to ensure
that it is always sufficient. We make the preparations now to present
the API ahead and to make sure that GCC can eliminate the unused
parameter.

Before:	6298417 3619610  696320 10614347         a1f64b vmlinux
After:	6298417 3619610  696320 10614347         a1f64b vmlinux
(with i915 builtin)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-12-git-send-email-chris@chris-wilson.co.uk
2016-08-04 08:09:22 +01:00
Chris Wilson
27c01aaef0 drm/i915: Prepare i915_gem_active for annotations
In the future, we will want to add annotations to the i915_gem_active
struct. The API is thus expanded to hide direct access to the contents
of i915_gem_active and mediated instead through a number of helpers.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-11-git-send-email-chris@chris-wilson.co.uk
2016-08-04 08:09:21 +01:00
Chris Wilson
381f371b25 drm/i915: Introduce i915_gem_active for request tracking
In the next patch, request tracking is made more generic and for that we
need a new expanded struct and to separate out the logic changes from
the mechanical churn, we split out the structure renaming into this
patch.

v2: Writer's block. Add some spiel about why we track requests.
v3: Now i915_gem_active.
v4: Now with i915_gem_active_set() for attaching to the active request.
v5: Use i915_gem_active_set() from inside the retirement handlers

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-10-git-send-email-chris@chris-wilson.co.uk
2016-08-04 08:09:20 +01:00
Chris Wilson
aa653a685d drm/i915: Be more careful when unbinding vma
When we call i915_vma_unbind(), we will wait upon outstanding rendering.
This will also trigger a retirement phase, which may update the object
lists. If, we extend request tracking to the VMA itself (rather than
keep it at the encompassing object), then there is a potential that the
obj->vma_list be modified for other elements upon i915_vma_unbind(). As
a result, if we walk over the object list and call i915_vma_unbind(), we
need to be prepared for that list to change.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-8-git-send-email-chris@chris-wilson.co.uk
2016-08-04 08:09:18 +01:00
Chris Wilson
34911fd30c drm/i915: Rename drm_gem_object_unreference_unlocked in preparation for lockless free
Whilst this ultimately wraps kref_put_mutex(), our goal here is the
lockless variant, so keep the _unlocked() suffix until we need it no
more.

s/drm_gem_object_unreference_unlocked/i915_gem_object_put_unlocked/

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-7-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-6-git-send-email-chris@chris-wilson.co.uk
2016-07-20 13:40:13 +01:00
Chris Wilson
f8c417cdb1 drm/i915: Rename drm_gem_object_unreference in preparation for lockless free
Ultimately wraps kref_put(), so adopt its nomenclature for consistency
with other subsystems.

s/drm_gem_object_unreference/i915_gem_object_put/

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-6-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-5-git-send-email-chris@chris-wilson.co.uk
2016-07-20 13:40:12 +01:00
Chris Wilson
25dc556a2a drm/i915: Wrap drm_gem_object_reference in i915_gem_object_get
Ultimately wraps kref_get(), so adopt its nomenclature for consistency
with other subsystems.

s/drm_gem_object_reference/i915_gem_object_get/

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-5-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-4-git-send-email-chris@chris-wilson.co.uk
2016-07-20 13:40:11 +01:00
Chris Wilson
e8a261ea63 drm/i915: Rename request reference/unreference to get/put
Now that we derive requests from struct fence, swap over to its
nomenclature for references. It's shorter and more idiomatic across the
kernel.

s/i915_gem_request_reference/i915_gem_request_get/
s/i915_gem_request_unreference/i915_gem_request_put/

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-2-git-send-email-chris@chris-wilson.co.uk
Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-1-git-send-email-chris@chris-wilson.co.uk
2016-07-20 13:40:09 +01:00
Dave Gordon
85d1225ec0 drm/i915: Introduce & use new lightweight SGL iterators
The existing for_each_sg_page() iterator is somewhat heavyweight, and is
limiting i915 driver performance in a few benchmarks. So here we
introduce somewhat lighter weight iterators, primarily for use with GEM
objects or other case where we need only deal with whole aligned pages.

Unlike the old iterator, the new iterators use an internal state
structure which is not intended to be accessed by the caller; instead
each takes as a parameter an output variable which is set before each
iteration. This makes them particularly simple to use :)

One of the new iterators provides the caller with the DMA address of
each page in turn; the other provides the 'struct page' pointer required
by many memory management operations.

Various uses of for_each_sg_page() are then converted to the new macros.

v2: Force inlining of the sg_iter constructor and make the union
anonymous.

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1463741647-15666-4-git-send-email-chris@chris-wilson.co.uk
2016-05-20 13:43:00 +01:00
Chris Wilson
72778cb266 drm/i915/userptr: Convert to drm_i915_private
userptr directly only uses drm_device in a single interface where it
meant to use drm_i915_private (everywhere else we have to derive it from
the drm_i915_gem_object and so require going from drm_device).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1463671036-3235-1-git-send-email-chris@chris-wilson.co.uk
2016-05-19 17:57:08 +01:00
Chris Wilson
f4457ae71f drm/i915: Prevent leaking of -EIO from i915_wait_request()
Reporting -EIO from i915_wait_request() has proven very troublematic
over the years, with numerous hard-to-reproduce bugs cropping up in the
corner case of where a reset occurs and the code wasn't expecting such
an error.

If the we reset the GPU or have detected a hang and wish to reset the
GPU, the request is forcibly complete and the wait broken. Currently, we
report either -EAGAIN or -EIO in order for the caller to retreat and
restart the wait (if appropriate) after dropping and then reacquiring
the struct_mutex (essential to allow the GPU reset to proceed). However,
if we take the view that the request is complete (no further work will
be done on it by the GPU because it is dead and soon to be reset), then
we can proceed with the task at hand and then drop the struct_mutex
allowing the reset to occur. This transfers the burden of checking
whether it is safe to proceed to the caller, which in all but one
instance it is safe - completely eliminating the source of all spurious
-EIO.

Of note, we only have two API entry points where we expect that
userspace can observe an EIO. First is when submitting an execbuf, if
the GPU is terminally wedged, then the operation cannot succeed and an
-EIO is reported. Secondly, existing userspace uses the throttle ioctl
to detect an already wedged GPU before starting using HW acceleration
(or to confirm that the GPU is wedged after an error condition). So if
the GPU is wedged when the user calls throttle, also report -EIO.

v2: Split more carefully the change to i915_wait_request() and assorted
ABI from the reset handling.
v3: Add a couple of WARN_ON(EIO) to the interruptible modesetting code
so that we don't start to leak EIO there in future (and break our hang
resistant modesetting).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-9-git-send-email-chris@chris-wilson.co.uk
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-1-git-send-email-chris@chris-wilson.co.uk
2016-04-14 10:45:40 +01:00
Chris Wilson
299259a3a9 drm/i915: Store the reset counter when constructing a request
As the request is only valid during the same global reset epoch, we can
record the current reset_counter when constructing the request and reuse
it when waiting upon that request in future. This removes a very hairy
atomic check serialised by the struct_mutex at the time of waiting and
allows us to transfer those waits to a central dispatcher for all
waiters and all requests.

PS: With per-engine resets, we obviously cannot assume a global reset
epoch for the requests - a per-engine epoch makes the most sense. The
challenge then is how to handle checking in the waiter for when to break
the wait, as the fine-grained reset may also want to requeue the
request (i.e. the assumption that just because the epoch changes the
request is completed may be broken - or we just avoid breaking that
assumption with the fine-grained resets).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-7-git-send-email-chris@chris-wilson.co.uk
2016-04-14 10:45:40 +01:00
Chris Wilson
f470b19095 drm/i915/userptr: Store i915 backpointer for i915_mm_struct
Since we only ever use the drm_i915_private from the stored
i915_mm_struct->dev, save some electrons by storing the right
backpointer.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1459864801-28606-3-git-send-email-chris@chris-wilson.co.uk
2016-04-11 20:39:16 +01:00
Chris Wilson
40313f0cd0 drm/i915/userptr: Hold mmref whilst calling get-user-pages
Holding a reference to the containing task_struct is not sufficient to
prevent the mm_struct from being reaped under memory pressure. If this
happens whilst we are calling get_user_pages(), explosions erupt -
sometimes an immediate GPF, sometimes page flag corruption. To prevent
the target mm from being reaped as we are reading from it, acquire a
reference before we begin.

Testcase: igt/gem_shrink/*userptr
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1459864801-28606-2-git-send-email-chris@chris-wilson.co.uk
2016-04-11 20:39:01 +01:00
Chris Wilson
393afc2c3f drm/i915/userptr: Flush cancellations before mmu-notifier invalidate returns
In order to ensure that all invalidations are completed before the
operation returns to userspace (i.e. before the munmap() syscall returns)
we need to wait upon the outstanding operations.

We are allowed to block inside the invalidate_range_start callback, and
as struct_mutex is the inner lock with mmap_sem we can wait upon the
struct_mutex without provoking lockdep into warning about a deadlock.
However, we don't actually want to wait upon outstanding rendering
whilst holding the struct_mutex if we can help it otherwise we also
block other processes from submitting work to the GPU. So first we do a
wait without the lock and then when we reacquire the lock, we double
check that everything is ready for removing the invalidated pages.

Finally to wait upon the outstanding unpinning tasks, we create a
private workqueue as a means to conveniently wait upon all at once. The
drawback is that this workqueue is per-mm, so any threads concurrently
invalidating objects will wait upon each other. The advantage of using
the workqueue is that we can wait in parallel for completion of
rendering and unpinning of several objects (of particular importance if
the process terminates with a whole mm full of objects).

v2: Apply a cup of tea to the changelog.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94699
Testcase: igt/gem_userptr_blits/sync-unmap-cycles
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1459864801-28606-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2016-04-11 20:38:33 +01:00
Daniel Vetter
3970285319 Linux 4.6-rc3
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXCva8AAoJEHm+PkMAQRiGXBoIAIkrjxdbuT2nS9A3tHwkiFXa
 6/Th1UjbNaoLuZ+MckQHayAD9NcWY9lVjOUmFsSiSWMCQK/rTWDl8x5ITputrY2V
 VuhrJCwI7huEtu6GpRaJaUgwtdOjhIHz1Ue2MCdNIbKX3l+LjVyyJ9Vo8rruvZcR
 fC7kiivH04fYX58oQ+SHymCg54ny3qJEPT8i4+g26686m11hvZLI3UAs2PAn6ut+
 atCjxdQ4yLN3DWsbjuA7wYGWhTgFloxL4TIoisuOUc3FXnSi/ivIbXZvu4lUfisz
 LA2JBhfII3AEMBWG9xfGbXPijJTT4q7yNlTD0oYcnMtAt/Roh2F04asqB1LetEY=
 =bri6
 -----END PGP SIGNATURE-----

Merge tag 'v4.6-rc3' into drm-intel-next-queued

Linux 4.6-rc3

Backmerge requested by Chris Wilson to make his patches apply cleanly.
Tiny conflict in vmalloc.c with the (properly acked and all) patch in
drm-intel-next:

commit 4da56b99d9
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Apr 4 14:46:42 2016 +0100

    mm/vmap: Add a notifier for when we run out of vmap address space

and Linus' tree.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2016-04-11 19:25:13 +02:00
Chris Wilson
f2a85e1975 drm,i915: Introduce drm_malloc_gfp()
I have instances where I want to use drm_malloc_ab() but with a custom
gfp mask. And with those, where I want a temporary allocation, I want to
try a high-order kmalloc() before using a vmalloc().

So refactor my usage into drm_malloc_gfp().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: dri-devel@lists.freedesktop.org
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460113874-17366-6-git-send-email-chris@chris-wilson.co.uk
2016-04-11 17:13:10 +01:00
Kirill A. Shutemov
09cbfeaf1a mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized.  And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE.  And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special.  They are
not.

The changes are pretty straight-forward:

 - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

 - page_cache_get() -> get_page();

 - page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below.  For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach.  I'll
fix them manually in a separate patch.  Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-04 10:41:08 -07:00
Linus Torvalds
266c73b777 Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux
Pull drm updates from Dave Airlie:
 "This is the main drm pull request for 4.6 kernel.

  Overall the coolest thing here for me is the nouveau maxwell signed
  firmware support from NVidia, it's taken a long while to extract this
  from them.

  I also wish the ARM vendors just designed one set of display IP, ARM
  display block proliferation is definitely increasing.

  Core:
     - drm_event cleanups
     - Internal API cleanup making mode_fixup optional.
     - Apple GMUX vga switcheroo support.
     - DP AUX testing interface

  Panel:
     - Refactoring of DSI core for use over more transports.

  New driver:
     - ARM hdlcd driver

  i915:
     - FBC/PSR (framebuffer compression, panel self refresh) enabled by default.
     - Ongoing atomic display support work
     - Ongoing runtime PM work
     - Pixel clock limit checks
     - VBT DSI description support
     - GEM fixes
     - GuC firmware scheduler enhancements

  amdkfd:
     - Deferred probing fixes to avoid make file or link ordering.

  amdgpu/radeon:
     - ACP support for i2s audio support.
     - Command Submission/GPU scheduler/GPUVM optimisations
     - Initial GPU reset support for amdgpu

  vmwgfx:
     - Support for DX10 gen mipmaps
     - Pageflipping and other fixes.

  exynos:
     - Exynos5420 SoC support for FIMD
     - Exynos5422 SoC support for MIPI-DSI

  nouveau:
     - GM20x secure boot support - adds acceleration for Maxwell GPUs.
     - GM200 support
     - GM20B clock driver support
     - Power sensors work

  etnaviv:
     - Correctness fixes for GPU cache flushing
     - Better support for i.MX6 systems.

  imx-drm:
     - VBlank IRQ support
     - Fence support
     - OF endpoint support

  msm:
     - HDMI support for 8996 (snapdragon 820)
     - Adreno 430 support
     - Timestamp queries support

  virtio-gpu:
     - Fixes for Android support.

  rockchip:
     - Add support for Innosilicion HDMI

  rcar-du:
     - Support for 4 crtcs
     - R8A7795 support
     - RCar Gen 3 support

  omapdrm:
     - HDMI interlace output support
     - dma-buf import support
     - Refactoring to remove a lot of legacy code.

  tilcdc:
     - Rewrite of pageflipping code
     - dma-buf support
     - pinctrl support

  vc4:
     - HDMI modesetting bug fixes
     - Significant 3D performance improvement.

  fsl-dcu (FreeScale):
     - Lots of fixes

  tegra:
     - Two small fixes

  sti:
     - Atomic support for planes
     - Improved HDMI support"

* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (1063 commits)
  drm/amdgpu: release_pages requires linux/pagemap.h
  drm/sti: restore mode_fixup callback
  drm/amdgpu/gfx7: add MTYPE definition
  drm/amdgpu: removing BO_VAs shouldn't be interruptible
  drm/amd/powerplay: show uvd/vce power gate enablement for tonga.
  drm/amd/powerplay: show uvd/vce power gate info for fiji
  drm/amdgpu: use sched fence if possible
  drm/amdgpu: move ib.fence to job.fence
  drm/amdgpu: give a fence param to ib_free
  drm/amdgpu: include the right version of gmc header files for iceland
  drm/radeon: fix indentation.
  drm/amd/powerplay: add uvd/vce dpm enabling flag to fix the performance issue for CZ
  drm/amdgpu: switch back to 32bit hw fences v2
  drm/amdgpu: remove amdgpu_fence_is_signaled
  drm/amdgpu: drop the extra fence range check v2
  drm/amdgpu: signal fences directly in amdgpu_fence_process
  drm/amdgpu: cleanup amdgpu_fence_wait_empty v2
  drm/amdgpu: keep all fences in an RCU protected array v2
  drm/amdgpu: add number of hardware submissions to amdgpu_fence_driver_init_ring
  drm/amdgpu: RCU protected amd_sched_fence_release
  ...
2016-03-21 13:48:00 -07:00
Tvrtko Ursulin
ca377809d6 drm/i915: Avoid snooping with userptr where not supported
commit e5756c10d8
   Author: Imre Deak <imre.deak@intel.com>
   Date:   Fri Aug 14 18:43:30 2015 +0300

       drm/i915/bxt: don't allow cached GEM mappings on A stepping

Added an exception of disallowing snooping for Broxton A
stepping hardware but userptr was still enabling it regardless.

Move the check to HAS_SNOOP now that it is used from multiple
call sites and use it.

v2: Userptr cannot be supported when it cannot be coherent and
    generalize the code better. (Chris Wilson)

v3: Make has_snoop true only when !has_llc. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1456920631-34302-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-03-02 13:46:21 +00:00