linux-stable

Commit Graph

Author	SHA1	Message	Date
Ben Widawsky	2911a35b2e	drm/i915: use semaphores for the display plane In theory this will have performance and power improvements. Performance because we don't need to stall when the scanout BO is busy, and power because we don't have to stall when the BO is busy (and the ring can even go to sleep if the HW supports it). v2: squash 2 patches into 1 (me) un-inline the enable_semaphores function (Daniel) remove comment about SNB hangs from i915_gem_object_sync (Chris) rename intel_enable_semaphores to i915_semaphore_is_enabled (me) removed page flip comment; "no why" (Chris) To address other comments from Daniel (irc): update the comment to say 'vt-d is crap, don't enable semaphores' - I think you misinterpreted Chris' comment, it already exists. checking out whether we can pageflip on the render ring on ivb (didn't work on early silicon) - We don't want to enable workarounds for early silicon unless we have to. - I can't find any references in the docs about this. optionally use it if the fb is already busy on the render ring - This should be how the code already worked, unless I am misunderstanding your meaning. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-12 21:14:05 +02:00
Chris Wilson	9a5a53b392	drm/i915: Reorganise rules for get_fence/put_fence By simplifying the rules to calling get_fence when writing to the through the GTT in a tiled manner, and calling put_fence before writing to the object through the GTT in a linear manner, the code becomes clearer and there is less chance of making a mistake. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> [danvet: fixed up conflict with ppgtt code and spelling in a new comment.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-12 21:14:04 +02:00
Dave Airlie	effbc4fd8e	Merge branch 'drm-intel-next' of git://people.freedesktop.org/~danvet/drm-intel into drm-core-next Daniel Vetter wrote First pull request for 3.5-next, slightly large than usual because new things kept coming in since the last pull for 3.4. Highlights: - first batch of hw enablement for vlv (Jesse et al) and hsw (Eugeni). pci ids are not yet added, and there's still quite a few patches to merge (mostly modesetting). To make QA easier I've decided to merge this stuff in pieces. - loads of cleanups and prep patches spurred by the above. Especially vlv is a real frankenstein chip, but also hsw is stretching our driver's code design. Expect more to come in this area for 3.5. - more gmbus fixes, cleanups and improvements by Daniel Kurtz. Again, there are more patches needed (and some already queued up), but I wanted to split this a bit for better testing. - pwrite/pread rework and retuning. This series has been in the works for a few months already and a lot of i-g-t tests have been created for it. Now it's finally ready to be merged. Note that one patch in this series touches include/pagemap.h, that patch is acked-by akpm. - reduce mappable pressure and relocation throughput improvements from Chris. - mmap offset exhaustion mitigation by Chris Wilson. - a start at figuring out which codepaths in our messy dri1/ums+gem/kms driver we actually need to support by bailing out of unsupported case. The driver now refuses to load without kms on gen6+ and disallows a few ioctls that userspace never used in certain cases. More of this will definitely come. - More decoupling of global gtt and ppgtt. - Improved dual-link lvds detection by Takashi Iwai. - Shut up the compiler + plus fix the fallout (Ben) - Inverted panel brightness handling (mostly Acer manages to break things in this way). - Small fixlets and adjustements and some minor things to help debugging. Regression-wise QA reported quite a few issues on ivb, but all of them turned out to be hw stability issues which are already fixed in drm-intel-fixes (QA runs the nightly regression tests on -next alone, without -fixes automatically merged in). There's still one issue open on snb, it looks like occlusion query writes are not quite as cache coherent as we've expected. With some of the pwrite adjustements we can now reliably hit this. Kernel workaround for it is in the works." * 'drm-intel-next' of git://people.freedesktop.org/~danvet/drm-intel: (101 commits) drm/i915: VCS is not the last ring drm/i915: Add a dual link lvds quirk for MacBook Pro 8,2 drm/i915: make quirks more verbose drm/i915: dump the DMA fetch addr register on pre-gen6 drm/i915/sdvo: Include YRPB as an additional TV output type drm/i915: disallow gem init ioctl on ilk drm/i915: refuse to load on gen6+ without kms drm/i915: extract gt interrupt handler drm/i915: use render gen to switch ring irq functions drm/i915: rip out old HWSTAM missed irq WA for vlv drm/i915: open code gen6+ ring irqs drm/i915: ring irq cleanups drm/i915: add SFUSE_STRAP registers for digital port detection drm/i915: add WM_LINETIME registers drm/i915: add WRPLL clocks drm/i915: add LCPLL control registers drm/i915: add SSC offsets for SBI access drm/i915: add port clock selection support for HSW drm/i915: add S PLL control drm/i915: add PIXCLK_GATE register ... Conflicts: drivers/char/agp/intel-agp.h drivers/char/agp/intel-gtt.c drivers/gpu/drm/i915/i915_debugfs.c	2012-04-12 10:27:01 +01:00
Chris Wilson	7dd4906586	drm/i915: Mark untiled BLT commands as fenced on gen2/3 The BLT commands on gen2/3 utilize the fence registers and so we cannot modify any fences for the object whilst those commands are in flight. Currently we marked tiled commands as occupying a fence, but forgot to restrict the untiled commands from preventing a fence being assigned before they were completed. One side-effect is that we ten have to double check that a fence was allocated for a fenced buffer during move-to-active. Reported-by: Jiri Slaby <jirislaby@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43427 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47990 Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Testcase: i-g-t/tests/gem_tiled_after_untiled_blt Tested-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-01 12:26:05 +02:00
Daniel Vetter	f56f821feb	mm: extend prefault helpers to fault in more than PAGE_SIZE drm/i915 wants to read/write more than one page in its fastpath and hence needs to prefault more than PAGE_SIZE bytes. Add new functions in filemap.h to make that possible. Also kill a copy&pasted spurious space in both functions while at it. v2: As suggested by Andrew Morton, add a multipage parameter to both functions to avoid the additional branch for the pagemap.c hotpath. My gcc 4.6 here seems to dtrt and indeed reap these branches where not needed. v3: Becaus I couldn't find a way around adding a uaddr += PAGE_SIZE to the filemap.c hotpaths (that the compiler couldn't remove again), let's go with separate new functions for the multipage use-case. v4: Adjust comment to CodingStlye and fix spelling. Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-27 13:36:30 +02:00
Chris Wilson	dabdfe021a	drm/i915: Avoid using mappable space for relocation processing through the CPU We try to avoid writing the relocations through the uncached GTT, if the buffer is currently in the CPU write domain and so will be flushed out to main memory afterwards anyway. Also on SandyBridge we can safely write to the pages in cacheable memory, so long as the buffer is LLC mapped. In either of these cases, we therefore do not need to force the reallocation of the buffer into the mappable region of the GTT, reducing the aperture pressure. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-27 13:16:17 +02:00
Chris Wilson	1d83f4426f	drm/i915: Batch copy_from_user for relocation processing Originally the code tried to allocate a large enough array to perform the copy using vmalloc, performance wasn't great and throughput was improved by processing each individual relocation entry separately. This too is not as efficient as one would desire. A compromise would be to allocate a single page, or to allocate a few entries on the stack, and process the copy in batches. The latter gives simpler code and more consistent performance due to a lack of heuristic. x11perf -copywinwin10: n450/pnv i3-330m i5-2520m (cpu) before: 249000 785000 `1280000` (80%) page: 264000 896000 `1280000` (65%) on-stack: 264000 902000 `1280000` (67%) v2: Use 512-bytes of stack for batching rather than allocate a page. v3: Tidy the code slightly with more descriptive variable names Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-26 09:59:05 +02:00
Daniel Vetter	149c84077f	drm/i915: implement SNB workaround for lazy global gtt PIPE_CONTROL on snb needs global gtt mappings in place to workaround a hw gotcha. No other commands need such a workaround. Luckily we can detect a PIPE_CONTROL commands easily because they have a write_domain = I915_GEM_DOMAIN_INSTRUCTION (and nothing else has that). v2: Binding the target of such a reloc into the global gtt actually works instead of binding the source, which is rather pointless ... v3: Kill a superflous has_global_gtt_mapping assignement noticed by Chris Wilson. Reviewed-and-tested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-20 21:53:55 +01:00
Daniel Vetter	9edd576d89	Merge remote-tracking branch 'airlied/drm-fixes' into drm-intel-next-queued Back-merge from drm-fixes into drm-intel-next to sort out two things: - interlaced support: -fixes contains a bugfix to correctly clear interlaced configuration bits in case the bios sets up an interlaced mode and we want to set up the progressive mode (current kernels don't support interlaced). The actual feature work to support interlaced depends upon (and conflicts with) this bugfix. - forcewake voodoo to workaround missed IRQ issues: -fixes only enabled this for ivybridge, but some recent bug reports indicate that we need this on Sandybridge, too. But in a slightly different flavour and with other fixes and reworks on top. Additionally there are some forcewake cleanup patches heading to -next that would conflict with currrent -fixes. Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-02-10 17:14:49 +01:00
Daniel Vetter	7bddb01fb9	drm/i915: ppgtt binding/unbinding support This adds support to bind/unbind objects and wires it up. Objects are only put into the ppgtt when necessary, i.e. at execbuf time. Objects are still unconditionally put into the global gtt. v2: Kill the quick hack and explicitly pass cache_level to ppgtt_bind like for the global gtt function. Noticed by Chris Wilson. Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: Eugeni Dodonov <eugeni.dodonov@intel.com> Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-02-09 21:25:23 +01:00
Daniel Vetter	ff240199b6	drm/i915: s/DRM_ERROR/DRM_DEBUG in i915_gem_execbuffer.c These are all user-trigerable, so tune down their loudness a notch. For some of these we have i-g-t tests (because they prevent newly-discovered bugs), without this patches running the test suite leaves behind a dirty dmesg. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-02-09 11:10:32 +01:00
Daniel Vetter	4ca4a250ac	drm/i915: reject GTT domain in relocations This confuses our domain tracking and can (for gtt write domains) lead to a subsequent oops. Tested by tests/gem_exec_bad_domains from i-g-t. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-01-29 18:37:10 +01:00
Chris Wilson	1690e1eb7a	drm/i915: Separate fence pin counting from normal bind pin counting In order to correctly account for reserving space in the GTT and fences for a batch buffer, we need to independently track whether the fence is pinned due to a fenced GPU access in the batch or whether the buffer is pinned in the aperture. Currently we count the fenced as pinned if the buffer has already been seen in the execbuffer. This leads to a false accounting of available fence registers, causing frequent mass evictions. Worse, if coupled with the change to make i915_gem_object_get_fence() report EDADLK upon fence starvation, the batchbuffer can fail with only one fence required... Fixes intel-gpu-tools/tests/gem_fenced_exec_thrash Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38735 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Tested-by: Paul Neumann <paul104x@yahoo.de> [danvet: Resolve the functional conflict with Jesse Barnes sprite patches, acked by Chris Wilson on irc.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-01-29 18:23:37 +01:00
Daniel Vetter	96154f2fab	drm/i915: switch ring->id to be a real id ... and add a helpr function for the places where we want a flag. This way we can use ring->id to index into arrays. v2: Resurrect the missing beautification-space Chris Wilson noted. I'm moving this space around because I'll reuse ring_str in the next patch. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-01-29 17:32:58 +01:00
Ben Widawsky	b93f9cf14e	drm/i915: argument to control retiring behavior Sometimes it may be the case when we idle the gpu or wait on something we don't actually want to process the retiring list. This patch allows callers to choose the behavior. Reviewed-by: Keith Packard <keithp@keithp.com> Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-01-26 11:19:19 +01:00
Linus Torvalds	1a464cbb3d	Merge branch 'drm-core-next' of git://people.freedesktop.org/~airlied/linux * 'drm-core-next' of git://people.freedesktop.org/~airlied/linux: (307 commits) drm/nouveau/pm: fix build with HWMON off gma500: silence gcc warnings in mid_get_vbt_data() drm/ttm: fix condition (and vs or) drm/radeon: double lock typo in radeon_vm_bo_rmv() drm/radeon: use after free in radeon_vm_bo_add() drm/sis\|via: don't return stack garbage from free_mem ioctl drm/radeon/kms: remove pointless CS flags priority struct drm/radeon/kms: check if vm is supported in VA ioctl drm: introduce drm_can_sleep and use in intel/radeon drivers. (v2) radeon: Fix disabling PCI bus mastering on big endian hosts. ttm: fix agp since ttm tt rework agp: Fix multi-line warning message whitespace drm/ttm/dma: Fix accounting error when calling ttm_mem_global_free_page and don't try to free freed pages. drm/ttm/dma: Only call set_pages_array_wb when the page is not in WB pool. drm/radeon/kms: sync across multiple rings when doing bo moves v3 drm/radeon/kms: Add support for multi-ring sync in CS ioctl (v2) drm/radeon: GPU virtual memory support v22 drm: make DRM_UNLOCKED ioctls with their own mutex drm: no need to hold global mutex for static data drm/radeon/benchmark: common modes sweep ignores 640x480@32 ... Fix up trivial conflicts in radeon/evergreen.c and vmwgfx/vmwgfx_kms.c	2012-01-10 11:04:36 -08:00
Eric Anholt	ae662d3126	drm/i915: Add support for resetting the SO write pointers on gen7. These registers are automatically incremented by the hardware during transform feedback to track where the next streamed vertex output should go. Unlike the previous generation, which had a packet for setting the corresponding registers to a defined value, gen7 only has MI_LOAD_REGISTER_IMM to do so. That's a secure packet (since it loads an arbitrary register), so we need to do it from the kernel, and it needs to be settable atomically with the batchbuffer execution so that two clients doing transform feedback don't stomp on each others' state. Instead of building a more complicated interface involcing setting the registers to a specific value, just set them to 0 when asked and userland can tweak its pointers accordingly. Signed-off-by: Eric Anholt <eric@anholt.net> Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Keith Packard <keithp@keithp.com>	2012-01-03 09:31:18 -08:00
Ben Widawsky	84f9f938be	drm/i915: Force sync command ordering (Gen6+) The docs say this is required for Gen7, and since the bit was added for Gen6, we are also setting it there pit pf paranoia. Particularly as Chris points out, if PIPE_CONTROL counts as a 3d state packet. This was found through doc inspection by Ken and applies to Gen6+; Reported-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Keith Packard <keithp@keithp.com>	2012-01-03 09:09:44 -08:00
Ben Widawsky	e2971bdab2	drm/i915: relative_constants_mode race fix dev_priv keeps track of the current addressing mode that gets set at execbuffer time. Unfortunately the existing code was doing this before acquiring struct_mutex which leaves a race with another thread also doing an execbuffer. If that wasn't bad enough, relocate_slow drops struct_mutex which opens a much more likely error where another thread comes in and modifies the state while relocate_slow is being slow. The solution here is to just defer setting this state until we absolutely need it, and we know we'll have struct_mutex for the remainder of our code path. v2: Keith noticed a bug in the original patch. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Keith Packard <keithp@keithp.com>	2012-01-03 09:09:44 -08:00
Keith Packard	ebbd857e6b	drm/i915: Disable semaphores by default on SNB Semaphores still cause problems on some machines: > From Udo Steinberg: > > With Linux-3.2-rc6 I'm frequently seeing GPU hangs when large amounts of > text scroll in an xterm, such as when extracting a tar archive. Such as this > one (note the timestamps): > > I can reproduce it fairly easily with something > as simple as: > > while true; do dmesg; done This patch turns them off on SNB while leaving them on for IVB. Reported-by: Udo Steinberg <udo@hypervisor.org> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Eugeni Dodonov <eugeni@dodonov.net> Signed-off-by: Keith Packard <keithp@keithp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-12-26 21:07:26 -08:00
Eugeni Dodonov	f45b55575c	drm/i915: enable semaphores on per-device defaults This adds a default setting for semaphores parameter, and enables semaphores by default on IVB. For now, as semaphores interaction with VTd causes random issues on SNB, we do not enable them by default. But they can still be enabled via the semaphores=1 kernel parameter. v2: enables semaphores on SNB when IO remapping is disabled, with base on Keith Packard patch. CC: Daniel Vetter <daniel.vetter@ffwll.ch> CC: Ben Widawsky <ben@bwidawsk.net> CC: Keith Packard <keithp@keithp.com> CC: Jesse Barnes <jbarnes@virtuousgeek.org> CC: Chris Wilson <chris@chris-wilson.co.uk> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=42696 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=40564 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41353 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38862 Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com> Signed-off-by: Keith Packard <keithp@keithp.com>	2011-12-16 08:49:59 -08:00
Ben Widawsky	c8c99b0f0d	drm/i915: Dumb down the semaphore logic While I think the previous code is correct, it was hard to follow and hard to debug. Since we already have a ring abstraction, might as well use it to handle the semaphore updates and compares. I don't expect this code to make semaphores better or worse, but you never know... v2: Remove magic per Keith's suggestions. Ran Daniel's gem_ring_sync_loop test on this. v3: Ignored one of Keith's suggestions. v4: Removed some bloat per Daniel's recommendation. Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Keith Packard <keithp@keithp.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Keith Packard <keithp@keithp.com>	2011-09-21 14:52:41 -07:00
Eric Anholt	e92d03bff9	Revert "drm/i915: Kill GTT mappings when moving from GTT domain" This reverts commit `4a684a4117`. Userland has always been required to set the object's domain to GTT before using it through a GTT mapping, it's not something that the kernel is supposed to enforce. (The pagefault support is so that we can handle multiple mappings without userland having to pin across them, not so that userland can use GTT after GPU domains without telling the kernel). Fixes 19.2% +/- 0.8% (n=6) performance regression in cairo-gl firefox-talos-gfx on my T420 latop. Signed-off-by: Keith Packard <keithp@keithp.com>	2011-06-21 11:11:02 -07:00
Chris Wilson	d4aeee7760	drm/i915: Disable pagefaults along execbuffer relocation fast path Along the fast path for relocation handling, we attempt to copy directly from the user data structures whilst holding our mutex. This causes lockdep to warn about circular lock dependencies if we need to pagefault the user pages. [Since when handling a page fault on a mmapped bo, we need to acquire the struct mutex whilst already holding the mm semaphore, it is then verboten to acquire the mm semaphore when already holding the struct mutex. The likelihood of the user passing in the relocations contained in a GTT mmaped bo is low, but conceivable for extreme pathology.] In order to force the mm to return EFAULT rather than handle the pagefault, we therefore need to disable pagefaults across the relocation fast path. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@kernel.org Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2011-03-23 09:17:01 +00:00
Chris Wilson	47ae63e0c2	Merge branch 'drm-intel-fixes' into drm-intel-next Apply the trivial conflicting regression fixes, but keep GPU semaphores enabled. Conflicts: drivers/gpu/drm/i915/i915_drv.h drivers/gpu/drm/i915/i915_gem_execbuffer.c	2011-03-07 12:35:15 +00:00
Chris Wilson	c59a333f73	drm/i915: Only wait on a pending flip if we intend to write to the buffer ... as if we are only reading from it, we can do that concurrently with the queue flip. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2011-03-07 11:06:02 +00:00
Chris Wilson	a1656b9090	drm/i915: Disable GPU semaphores by default Andi Kleen narrowed his GPU hangs on his Sugar Bay (SNB desktop) rev 09 down to the use of GPU semaphores, and we already know that they appear broken up to Huron River (mobile) rev 08. (I'm optimistic that disabling GPU semaphores is simply hiding another bug by the latency and side-effects of the additional device interaction it introduces...) However, use of semaphores is a massive performance improvement... Only as long as the system remains stable. Enable at your peril. Reported-by: Andi Kleen <andi-fd@firstfloor.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=33921 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2011-03-07 11:00:59 +00:00
Chris Wilson	e8b2c3c47a	drm/i915: Re-enable GPU semaphores for SandyBridge mobile This seems to be running stably on my test laptop, so hopefully the reported hangs where just symptoms of other bugs. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2011-03-01 19:22:52 +00:00
Chris Wilson	271d81b841	drm/i915: Allow relocation deltas outside of target bo Userspace has a legitimate requirement to use a delta that points to outside of the target bo, and so we need to enable this. (As this is an abi break, albeit a relaxation of the current restrictions, mark the change with a new flag.) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2011-03-01 16:01:02 +00:00
Chris Wilson	ce453d81cb	drm/i915: Use a device flag for non-interruptible phases The code paths for modesetting are growing in complexity as we may need to move the buffers around in order to fit the scanout in the aperture. Therefore we face a choice as to whether to thread the interruptible status through the entire pinning and unbinding code paths or to add a flag to the device when we may not be interrupted by a signal. This does the latter and so fixes a few instances of modesetting failures under stress. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2011-02-22 15:56:25 +00:00
Chris Wilson	8408c282f0	drm/i915: First try a normal large kmalloc for the temporary exec buffers As we just need a temporary array whilst performing the relocations for the execbuffer, first attempt to allocate using kmalloc even if it is not of order page-0. This avoids the overhead of remapping the discontiguous array and so gives a moderate boost to execution throughput. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2011-02-22 15:56:16 +00:00
Chris Wilson	c872522663	drm/i915: Protect against drm_gem_object not being the first member Dave Airlie spotted that we had a potential bug should we ever rearrange the drm_i915_gem_object so not the base drm_gem_object was not its first member. He noticed that we often convert the return of drm_gem_object_lookup() immediately into drm_i915_gem_object and then check the result for nullity. This is only valid when the base object is the first member and so the superobject has the same address. Play safe instead and use the compiler to convert back to the original return address for sanity testing. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2011-02-22 15:55:57 +00:00
Chris Wilson	db53a30261	drm/i915: Refine tracepoints A lot of minor tweaks to fix the tracepoints, improve the outputting for ftrace, and to generally make the tracepoints useful again. It is a start and enough to begin identifying performance issues and gaps in our coverage. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2011-02-07 14:59:18 +00:00
Chris Wilson	bdd92c9ad2	Merge branch 'drm-intel-fixes' into drm-intel-next Merge important suspend and resume regression fixes and resolve the small conflict. Conflicts: drivers/gpu/drm/i915/i915_dma.c	2011-01-24 23:45:32 +00:00
Chris Wilson	076e2c0eb8	drm/i915: Fix use of invalid array size for ring->sync_seqno There are I915_NUM_RINGS-1 inter-ring synchronisation counters, but we were clearing I915_NUM_RINGS of them. Oops. Reported-by: Jiri Slaby <jirislaby@gmail.com> Tested-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2011-01-23 12:52:11 +00:00
Chris Wilson	311bd68e02	drm/i915: Trivial sparse fixes Move code around and invoke iomem annotation in a few more places in order to silence sparse. Still a few more iomem annotations to go... Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2011-01-19 12:39:38 +00:00
Chris Wilson	1591192d3a	drm/i915: Disable GPU semaphores on SandyBridge mobile Hopefully, this is a temporary measure whilst the root cause is understood. At the moment, we experience a hard hang whilst looping urbanterror that has been identified as a result of the use of semaphores, but so far only on SNB mobile. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=32752 Tested-by: mengmeng.meng@intel.com Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2011-01-14 09:51:05 +00:00
Chris Wilson	595dad76a0	drm/i915/execbuffer: Clear domains before beginning reloc processing After reordering the sequence of relocating objects, commit `6fe4f1404`, we can no longer rely on seeing all reloc targets prior to performing the relocation. As a result we were ignoring the need to flush objects from the render cache and invalidate the sampler caches, resulting in rendering glitches. So we need to clear the relocation domains earlier. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Tested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2011-01-13 16:06:05 +00:00
Chris Wilson	dd6864a4ed	drm/i915/execbuffer: Reorder relocations to match new object order On the fault path, commit `6fe4f140` introduction a regression whereby it changed the sequence of the objects but continued to use the original ordering of relocation entries. The result was that incorrect GTT offsets were being fed into the execbuffer causing lots of misrendering and potential hangs. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Tested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2011-01-13 16:06:05 +00:00
Chris Wilson	6fe4f14044	drm/i915/execbuffer: Reorder binding of objects to favour restrictions As the mappable portion of the aperture is always a small subset at the start of the GTT, it is allocated preferentially by drm_mm. This is useful in case we ever need to map an object later. However, if you have a large object that can consume the entire mappable region of the GTT this prevents the batchbuffer from fitting and so causing an error. Instead allocate all those that require a mapping up front in order to improve the likelihood of finding sufficient space to bind them. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2011-01-11 22:55:48 +00:00
Chris Wilson	36cf174230	drm/i915/execbuffer: Correctly clear the current object list upon EFAULT Before releasing the lock in order to copy the relocation list from user pages, we need to drop all the object references as another thread may usurp and execute another batchbuffer before we reacquire the lock. However, the code was buggy and failed to clear the list... Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@kernel.org	2011-01-11 22:55:29 +00:00
Chris Wilson	882417851a	drm/i915: Propagate error from flushing the ring ... in order to avoid a BUG() and potential unbounded waits. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2011-01-11 20:44:50 +00:00
Chris Wilson	b72f3acb71	drm/i915: Handle ringbuffer stalls when flushing Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2011-01-11 20:43:55 +00:00
Chris Wilson	63256ec534	drm/i915: Enforce write ordering through the GTT We need to ensure that writes through the GTT land before any modification to the MMIO registers and so must impose a mandatory write barrier when flushing the GTT domain. This was revealed by relaxing the write ordering by experimentally mapping the registers and the GATT as write-combining. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2011-01-11 20:42:53 +00:00
Chris Wilson	72bfa19c8d	drm/i915: Allow the application to choose the constant addressing mode The relative-to-general state default is useless as it means having to rewrite the streaming kernels for each batch. Relative-to-surface is more useful, as that stream usually needs to be rewritten for each batch. And absolute addressing mode, vital if you start streaming state, is also only available by adjusting the register... Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-12-20 09:41:36 +00:00
Chris Wilson	b8f7ab1788	drm/i915: Mark the user reloc error paths as unlikely Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-12-09 19:46:22 +00:00
Chris Wilson	67731b87e9	drm/i915: Eliminate drm_gem_object_lookup during relocation As we provide a list of all objects that will be accessed from the batchbuffer, we can build a lut of the handles associated with those objects for this invocation and use that to avoid the overhead of looking up those objects again for every relocation. The cost of building and searching a small hash table is much less than that of acquiring a spinlock, searching a radix tree and manipulating an atomic refcnt per relocation. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-12-09 19:46:21 +00:00
Chris Wilson	9b3826bf84	drm/i915: Ignore fenced commands for gpu access on gen4 Userspace should not have been declaring that it needed fenced GPU access with gen4+ as those GPUs have no fenced commands, but to be on the safe side it is easier to ignore userspace in case they did. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-12-05 23:18:32 +00:00
Chris Wilson	1ec14ad313	drm/i915: Implement GPU semaphores for inter-ring synchronisation on SNB The bulk of the change is to convert the growing list of rings into an array so that the relationship between the rings and the semaphore sync registers can be easily computed. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-12-05 00:37:38 +00:00
Chris Wilson	d9e86c0ee6	drm/i915: Pipelined fencing [infrastructure] With this change, every batchbuffer can use all available fences (save pinned and scanout, of course) without ever stalling the gpu! In theory. Currently the actual pipelined update of the register is disabled due to some stability issues. However, just the deferred update is a significant win. Based on a series of patches by Daniel Vetter. The premise is that before every access to a buffer through the GTT we have to declare whether we need a register or not. If the access is by the GPU, a pipelined update to the register is made via the ringbuffer, and we track the last seqno of the batches that access it. If by the CPU we wait for the last GPU access and update the register (either to clear or to set it for the current buffer). One advantage of being able to pipeline changes is that we can defer the actual updating of the fence register until we first need to access the object through the GTT, i.e. we can eliminate the stall on set_tiling. This is important as the userspace bo cache does not track the tiling status of active buffers which generate frequent stalls on gen3 when enabling tiling for an already bound buffer. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2010-12-02 10:07:05 +00:00
Chris Wilson	87ca9c8a7e	drm/i915: Prevent stalling for a GTT read back from a read-only GPU target Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-12-02 10:00:15 +00:00
Chris Wilson	c4e7a41467	drm/i915/ringbuffer: Handle cliprects in the caller This makes the various rings more consistent by removing the anomalous handing of the rendering ring execbuffer dispatch. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-11-30 14:17:51 +00:00
Chris Wilson	602606a472	drm/i915/execbuffer: On error, starting unwinding from the previous object As the error occurred on the current object, it means that its state was not changed and so it should be excluded from the unwind. Reported-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-11-28 15:31:02 +00:00
Chris Wilson	432e58edc9	drm/i915: Avoid allocation for execbuffer object list Besides the minimal improvement in reducing the execbuffer overhead, the real benefit is clarifying a few routines. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-11-25 21:19:26 +00:00
Chris Wilson	54cf91dc4e	drm/i915: Split i915_gem_execbuffer into its own file. A number of dragons have been seen lurking within the execbuffer code. The first step is then to isolate them from the rest and begin to scrutinise them in depth. Suggested by Daniel Vetter. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-11-25 21:19:25 +00:00

1 2 3

105 Commits