Commit Graph

350 Commits

Author SHA1 Message Date
Oscar Mateo ba8b7ccb19 drm/i915/bdw: Workload submission mechanism for Execlists
This is what i915_gem_do_execbuffer calls when it wants to execute some
worload in an Execlists world.

v2: Check arguments before doing stuff in intel_execlists_submission. Also,
get rel_constants parsing right.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Drop the chipset flush, that's pre-gen6. And appease
checkpatch a bit .... again!]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 23:18:38 +02:00
Oscar Mateo a83014d3f8 drm/i915: Abstract the legacy workload submission mechanism away
As suggested by Daniel Vetter. The idea, in subsequent patches, is to
provide an alternative to these vfuncs for the Execlists submission
mechanism.

v2: Splitted into two and reordered to illustrate our intentions, instead
of showing it off. Also, remove the add_request vfunc and added the
stop_ring one.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet:
- Make checkpatch happy.
- Be grumpy about the excessive vtable.
- Ditch gt->is_ring_initialized.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 16:40:32 +02:00
Oscar Mateo ec3e9963a6 drm/i915/bdw: Deferred creation of user-created LRCs
The backing objects and ringbuffers for contexts created via open
fd are actually empty until the user starts sending execbuffers to
them. At that point, we allocate & populate them. We do this because,
at create time, we really don't know which engine is going to be used
with the context later on (and we don't want to waste memory on
objects that we might never use).

v2: As contexts created via ioctl can only be used with the render
ring, we have enough information to allocate & populate them right
away.

v3: Defer the creation always, even with ioctl-created contexts, as
requested by Daniel Vetter.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 16:25:58 +02:00
Chris Wilson 906843c3a1 drm/i915: Simplify relocate_entry_gtt() and make 64-bit safe
Even though we should not try to use 4+GiB GTTs on 32-bit systems, by
using a local variable we can future proof the code whilst making it
easier to read.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Appease checkpatch a bit.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 14:16:04 +02:00
Chris Wilson 060e82c6f4 drm/i915: Remove redundant list_empty(eb->vmas) tests in execbuffer
Part of the pre-validation for an execbuffer call is that there is at
least one object in the execlist. As we bail if we fail to lookup any
object, we can be sure that after the eb_lookup_vma() there is at least
one object in the vma list and so we do not need to assert.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 14:15:03 +02:00
Chris Wilson ad19f10bc2 drm/i915: Pre-validate the NEED_GTTS flag for execbuffer
We have an implementation requirement that precludes the user from
requesting a ggtt entry when the device is operating in ppgtt mode. Move
the current check from inside the execbuffer object collation to the
prevalidation phase.

v2: Roll both invalid flags checks into one

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 14:15:02 +02:00
Daniel Vetter da51a1e7e3 drm/i915: Fix secure dispatch with full ppgtt
Based upon a hunk from a patch from Chris Wilson, but augmented to:
- Process the batch in the full ppgtt vm so that self-relocations
  match again with userspace's expectations..
- Add a comment why plain pin for the global gtt binding is safe at
  that point.

v2: Drop local bind_vm variable (Chris).

v3: Explain why this works despite the lack of proper active tracking
for the ggtt batch vma.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 13:49:57 +02:00
Chris Wilson 82b6b6d786 drm/i915: Remove fenced_gpu_access and pending_fenced_gpu_access
This migrates the fence tracking onto the existing seqno
infrastructure so that the later conversion to tracking via requests is
simplified.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 12:20:25 +02:00
Chris Wilson e6a844687c drm/i915: Force CPU relocations if not GTT mapped
Move the decision on whether we need to have a mappable object during
execbuffer to the fore and then reuse that decision by propagating the
flag through to reservation. As a corollary, before doing the actual
relocation through the GTT, we can make sure that we do have a GTT
mapping through which to operate.

Note that the key to make this work is to ditch the
obj->map_and_fenceable unbind optimization - with full ppgtt it
doesn't make a lot of sense any more anyway.

v2: Revamp and resend to ease future patches.
v3: Refresh patch rationale

References: https://bugs.freedesktop.org/show_bug.cgi?id=81094
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
[danvet: Explain why obj->map_and_fenceable is key and split out the
secure batch fix.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 12:01:29 +02:00
Oscar Mateo 78382593e9 drm/i915: Extract the actual workload submission mechanism from execbuffer
So that we isolate the legacy ringbuffer submission mechanism, which becomes
a good candidate to be abstracted away. This is prep-work for Execlists (which
will its own workload submission mechanism).

No functional changes.

Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-07-08 12:31:06 +02:00
Oscar Mateo 821d66dd7c drm/i915: Emphasize that ctx->id is merely a user handle
This is an Execlists preparatory patch, since they make context ID become an
overloaded term:

- In the software, it was used to distinguish which context userspace was
  trying to use.
- In the BSpec, the term is used to describe the 20-bits long field the
  hardware uses to it to discriminate the contexts that are submitted to
  the ELSP and inform the driver about their current status (via Context
  Switch Interrupts and Context Status Buffers).

Initially, I tried to make the different meanings converge, but it proved
impossible:

- The software ctx->id is per-filp, while the hardware one needs to be
  globally unique.
- Also, we multiplex several backing states objects per intel_context,
  and all of them need unique HW IDs.
- I tried adding a per-filp ID and then composing the HW context ID as:
  ctx->id + file_priv->id + ring->id, but the fact that the hardware only
  uses 20-bits means we have to artificially limit the number of filps or
  contexts the userspace can create.

The ctx->user_handle renaming bits are done with this Cocci patch (plus
manual frobbing of the struct declaration):

    @@
    struct intel_context c;
    @@
    - (c).id
    + c.user_handle

    @@
    struct intel_context *c;
    @@
    - (c)->id
    + c->user_handle

Also, while we are at it, s/DEFAULT_CONTEXT_ID/DEFAULT_CONTEXT_HANDLE and
change the type to unsigned 32 bits.

v2: s/handle/user_handle and change the type to uint32_t as suggested by
Chris Wilson.

Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> (v1)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-07-08 12:30:41 +02:00
Daniel Vetter f99d70690e drm/i915: Track frontbuffer invalidation/flushing
So these are the guts of the new beast. This tracks when a frontbuffer
gets invalidated (due to frontbuffer rendering) and hence should be
constantly scaned out, and when it's flushed again and can be
compressed/one-shot-upload.

Rules for flushing are simple: The frontbuffer needs one more full
upload starting from the next vblank. Which means that the flushing
can _only_ be called once the frontbuffer update has been latched.

But this poses a problem for pageflips: We can't just delay the
flushing until the pageflip is latched, since that would pose the risk
that we override frontbuffer rendering that has been scheduled
in-between the pageflip ioctl and the actual latching.

To handle this track asynchronous invalidations (and also pageflip)
state per-ring and delay any in-between flushing until the rendering
has completed. And also cancel any delayed flushing if we get a new
invalidation request (whether delayed or not).

Also call intel_mark_fb_busy in both cases in all cases to make sure
that we keep the screen at the highest refresh rate both on flips,
synchronous plane updates and for frontbuffer rendering.

v2: Lots of improvements

Suggestions from Chris:
- Move invalidate/flush in flush_*_domain and set_to_*_domain.
- Drop the flush in busy_ioctl since it's redundant. Was a leftover
  from an earlier concept to track flips/delayed flushes.
- Don't forget about the initial modeset enable/final disable.
  Suggested by Chris.

Track flips accurately, too. Since flips complete independently of
rendering we need to track pending flips in a separate mask. Again if
an invalidate happens we need to cancel the evenutal flush to avoid
races.

v3:
Provide correct header declarations for flip functions. Currently not
needed outside of intel_display.c, but part of the proper interface.

v4: Add proper domain management to fbcon so that the fbcon buffer is
also tracked correctly.

v5: Fixup locking around the fbcon set_to_gtt_domain call.

v6: More comments from Chris:
- Split out fbcon changes.
- Drop superflous checks for potential scanout before calling intel_fb
  functions - we can micro-optimize this later.
- s/intel_fb_/intel_fb_obj_/ to make it clear that this deals in gem
  object. We already have precedence for fb_obj in the pin_and_fence
  functions.

v7: Clarify the semantics of the flip flush handling by renaming
things a bit:
- Don't go through a gem object but take the relevant frontbuffer bits
  directly. These functions center on the plane, the actual object is
  irrelevant - even a flip to the same object as already active should
  cause a flush.
- Add a new intel_frontbuffer_flip for synchronous plane updates. It
  currently just calls intel_frontbuffer_flush since the implemenation
  differs.

This way we achieve a clear split between one-shot update events on
one side and frontbuffer rendering with potentially a very long delay
between the invalidate and flush.

Chris and I also had some discussions about mark_busy and whether it
is appropriate to call from flush. But mark busy is a state which
should be derived from the 3 events (invalidate, flush, flip) we now
have by the users, like psr does by tracking relevant information in
psr.busy_frontbuffer_bits. DRRS (the only real use of mark_busy for
frontbuffer) needs to have similar logic. With that the overall
mark_busy in the core could be removed.

v8: Only when retiring gpu buffers only flush frontbuffer bits we
actually invalidated in a batch. Just for safety since before any
additional usage/invalidate we should always retire current rendering.
Suggested by Chris Wilson.

v9: Actually use intel_frontbuffer_flip in all appropriate places.
Spotted by Chris.

v10: Address more comments from Chris:
- Don't call _flip in set_base when the crtc is inactive, avoids redunancy
  in the modeset case with the initial enabling of all planes.
- Add comments explaining that the initial/final plane enable/disable
  still has work left to do before it's fully generic.

v11: Only invalidate for gtt/cpu access when writing. Spotted by Chris.

v12: s/_flush/_flip/ in intel_overlay.c per Chris' comment.

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-06-19 18:14:47 +02:00
Ville Syrjälä d593d992a9 drm/i915: Fix __user sparse warning
CHECK   linux/drivers/gpu/drm/i915/i915_gem_execbuffer.c
linux/drivers/gpu/drm/i915/i915_gem_execbuffer.c:1529:47: warning: incorrect type in initializer (different address spaces)
linux/drivers/gpu/drm/i915/i915_gem_execbuffer.c:1529:47:    expected struct drm_i915_gem_exec_object2 *user_exec_list
linux/drivers/gpu/drm/i915/i915_gem_execbuffer.c:1529:47:    got void [noderef] <asn:1>*
linux/drivers/gpu/drm/i915/i915_gem_execbuffer.c:1533:61: warning: incorrect type in argument 1 (different address spaces)
linux/drivers/gpu/drm/i915/i915_gem_execbuffer.c:1533:61:    expected void [noderef] <asn:1>*dst
linux/drivers/gpu/drm/i915/i915_gem_execbuffer.c:1533:61:    got unsigned long long *<noident>

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-06-13 17:45:29 +02:00
Dave Airlie 8d4ad9d4bb Merge commit '9e9a928eed8796a0a1aaed7e0b676db86ba84594' into drm-next
Merge drm-fixes into drm-next.

Both i915 and radeon need this done for later patches.

Conflicts:
	drivers/gpu/drm/drm_crtc_helper.c
	drivers/gpu/drm/i915/i915_drv.h
	drivers/gpu/drm/i915/i915_gem.c
	drivers/gpu/drm/i915/i915_gem_execbuffer.c
	drivers/gpu/drm/i915/i915_gem_gtt.c
2014-06-05 20:28:59 +10:00
Chris Wilson d23db88c3a drm/i915: Prevent negative relocation deltas from wrapping
This is pure evil. Userspace, I'm looking at you SNA, repacks batch
buffers on the fly after generation as they are being passed to the
kernel for execution. These batches also contain self-referenced
relocations as a single buffer encompasses the state commands, kernels,
vertices and sampler. During generation the buffers are placed at known
offsets within the full batch, and then the relocation deltas (as passed
to the kernel) are tweaked as the batch is repacked into a smaller buffer.
This means that userspace is passing negative relocations deltas, which
subsequently wrap to large values if the batch is at a low address. The
GPU hangs when it then tries to use the large value as a base for its
address offsets, rather than wrapping back to the real value (as one
would hope). As the GPU uses positive offsets from the base, we can
treat the relocation address as the minimum address read by the GPU.
For the upper bound, we trust that userspace will not read beyond the
end of the buffer.

So, how do we fix negative relocations from wrapping? We can either
check that every relocation looks valid when we write it, and then
position each object such that we prevent the offset wraparound, or we
just special-case the self-referential behaviour of SNA and force all
batches to be above 256k. Daniel prefers the latter approach.

This fixes a GPU hang when it tries to use an address (relocation +
offset) greater than the GTT size. The issue would occur quite easily
with full-ppgtt as each fd gets its own VM space, so low offsets would
often be handed out. However, with the rearrangement of the low GTT due
to capturing the BIOS framebuffer, it is already affecting kernels 3.15
onwards. I think only IVB+ is susceptible to this bug, but the workaround
should only kick in rarely, so it seems sensible to always apply it.

v3: Use a bias for batch buffers to prevent small negative delta relocations
from wrapping.

v4 from Daniel:
- s/BIAS/BATCH_OFFSET_BIAS/
- Extract eb_vma_misplaced/i915_vma_misplaced since the conditions
  were growing rather cumbersome.
- Add a comment to eb_get_batch explaining why we do this.
- Apply the batch offset bias everywhere but mention that we've only
  observed it on gen7 gpus.
- Drop PIN_OFFSET_FIX for now, that slipped in from a feature patch.

v5: Add static to eb_get_batch, spotted by 0-day tester.

Testcase: igt/gem_bad_reloc
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78533
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v3)
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-27 11:18:40 +03:00
Chris Wilson 9aab8bff7a drm/i915: Only copy back the modified fields to userspace from execbuffer
We only want to modifiy a single field in the userspace view of the
execbuffer command buffer, so explicitly change that rather than copy
everything back again.

This serves two purposes:

1. The single fields are much cheaper to copy (constant size so the
copy uses special case code) and much smaller than the whole array.

2. We modify the array for internal use that need to be masked from
the user.

Note: We need this backported since without it the next bugfix will
blow up when userspace recycles batchbuffers and relocations.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-27 11:18:39 +03:00
Oscar Mateo 273497e5cd drm/i915: s/i915_hw_context/intel_context
Up until now, contexts had one (and only one) backing object that was
used by the hardware to save/restore render ring contexts (via the
MI_SET_CONTEXT command). Other rings did not have or need this, so
our i915_hw_context struct had a 1:1 relationship with a a real HW
context.

With Logical Ring Contexts and Execlists, this is not possible anymore:
all rings need a backing object, and it cannot be reused. To prepare
for that, rename our contexts to the more generic term intel_context.

No functional changes.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-22 23:41:17 +02:00
Oscar Mateo a4872ba6d0 drm/i915: s/intel_ring_buffer/intel_engine_cs
In the upcoming patches we plan to break the correlation between
engine command streamers (a.k.a. rings) and ringbuffers, so it
makes sense to refactor the code and make the change obvious.

No functional changes.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-22 23:01:05 +02:00
Daniel Vetter bdf1e7e3db drm/i915: move bsd dispatch index somewhere better
Adding stuff at the bottom is really no how this should be done, since
that's the place for ums/dri dungeons.

This was added in

commit a8ebba75b3
Author: Zhao Yakui <yakui.zhao@intel.com>
Date:   Thu Apr 17 10:37:40 2014 +0800

    drm/i915: Use the coarse ping-pong mechanism based on drm fd to dispatch the BSD command on BDW GT3

Also add a note to prevent this from happening again - people really
should be less lazy and take more time to look for a good home of
their new driver-global state.

Cc: Imre Deak <imre.deak@intel.com>
Cc: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-22 15:06:31 +02:00
Chris Wilson 227f782e46 drm/i915: Retire requests before creating a new one
More fallout from

commit c8725f3dc0
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Mar 17 12:21:55 2014 +0000

    drm/i915: Do not call retire_requests from wait_for_rendering

is that we can completely fill all of memory using small objects, such
that we exhaust the filp space, and spend all of our time evicting
objects from the aperture. As such, we never fill the ring, and never
trigger the last resort flushing in

commit 1cf0ba1474
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon May 5 09:07:33 2014 +0100

    drm/i915: Flush request queue when waiting for ring space

and so all the requests are left active and the objects keep that last
active reference. Eventually the system comes to a halt as it runs out
of memory.

The impact is mainly limited to test cases as regular userspace will
trigger retirement by manually checking whether an object is active.

Testcase: igt/gem_lut_handle
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78724
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Guo Jinxian <jinxianx.guo@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-19 15:30:56 +02:00
Daniel Vetter ffd93f2480 drm/i915: Work-around garbage DR4 from UXA
Somehow UXA submits a completely bogus DR4 value since essentially
forever. It was originally introduced in

commit bade7d7d2505a10a8a7d24b084aff9742e2d6d64
Author: Eric Anholt <eric@anholt.net>
Date:   Fri Jun 6 14:03:25 2008 -0700

    Use the DRM for submitting batchbuffers when available.

and dutifully copied around ever since. Since we want to keep the
general dirt catching around just special case the UXA value.

This regression was introduced in

commit 9cb346648d
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Thu Apr 24 08:09:11 2014 +0200

    drm/i915: Catch dirt in unused execbuffer fields

Comment from Chris' review:

"To be fair, it is a sensible value if one supposes a Region style API to
cliprects. Under that API, DR[14] define the extents of the clip region,
and ((0,0), (0,0)) [DR1==DR4==0] would mean all clipped, do not draw
anything."

v2: Pimp commit message a bit and remove the double space.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78494
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jörg Otte <jrg.otte@gmail.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-13 17:16:13 +02:00
Ben Widawsky d9ceb957fd drm/i915: Support 64b relocations
All the rest of the code to enable this is in my branch. Without my
branch, hitting > 32b offsets is impossible. The code has always
"supported" 64b, but it's never actually been run of tested. This change
doesn't actually fix anything. [1] I am not sure why X won't work yet. I
do not get hangs or obvious errors.

There are 3 fixes grouped together here. First is to remove the
hardcoded 0 for the upper dword of the relocation. The next fix is to
use a 64b value for target_offset. The final fix is to not directly
apply target_offset to reloc->delta. reloc->delta is part of ABI, and so
we cannot change it. As it stands, 32b is enough to represent everything
we're interested in representing anyway. The main problem is, we cannot
add greater than 32b values to it directly.

[1] Almost all of intel-gpu-tools is not yet ready to test 64b
relocations. There are a few places that expect 32b values for offsets
and these all won't work.

Cc: Rafael Barbalho <rafael.barbalho@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 16:04:23 +02:00
Ben Widawsky 9bcb144c83 drm/i915: Support 64b execbuf
Previously, our code only had a 32b offset value for where the
batchbuffer starts. With full PPGTT, and 64b canonical GPU address
space, that is an insufficient value. The code to expand is pretty
straight forward, and only one platform needs to do anything with the
extra bits.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 16:01:58 +02:00
Chris Wilson c8725f3dc0 drm/i915: Do not call retire_requests from wait_for_rendering
A common issue we have is that retiring requests causes recursion
through GTT manipulation or page table manipulation which we can only
handle at very specific points. However, to maintain internal
consistency (enforced through our sanity checks on write_domain at
various points in the GEM object lifecycle) we do need to retire the
object prior to marking it with a new write_domain, and also clear the
write_domain for the implicit flush following a batch.

Note that this then allows the unbound objects to still be on the active
lists, and so care must be taken when removing objects from unbound lists
(similar to the caveats we face processing the bound lists).

v2: Fix i915_gem_shrink_all() to handle updated object lifetime rules,
by refactoring it to call into __i915_gem_shrink().

v3: Missed an object-retire prior to changing cache domains in
i915_gem_object_set_cache_leve()

v4: Rebase

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 09:09:15 +02:00
Zhao Yakui a8ebba75b3 drm/i915: Use the coarse ping-pong mechanism based on drm fd to dispatch the BSD command on BDW GT3
The BDW GT3 has two independent BSD rings, which can be used to process the
video commands. To be simpler, it is transparent to user-space driver/middle.
Instead the kernel driver will decide which ring is to dispatch the BSD video
command.

As every BSD ring is powerful, it is enough to dispatch the BSD video command
based on the drm fd. In such case it can play back video stream while encoding
another video stream. The coarse ping-pong mechanism is used to determine
which BSD ring is used to dispatch the BSD video command.

V1->V2: Follow Daniel's comment and use the simple ping-pong mechanism.
This is only to add the support of dual BSD rings on BDW GT3 machine.
The further optimization will be considered in another patch set.

V2->V3: Follow Daniel's comment to use the struct_mutext instead of
atomic_t during determining which ring can be used to dispatch Video command.

Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 09:08:49 +02:00
Zhao Yakui b1a93306ed drm/i915: Update the restrict check to filter out wrong Ring ID passed by user-space
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 09:08:44 +02:00
Daniel Vetter 9cb346648d drm/i915: Catch dirt in unused execbuffer fields
We need to make sure that userspace keeps on following the contract,
otherwise we won't be able to use the reserved fields at all.

v2: Add DRM_DEBUG (Chris)

Testcase: igt/gem_exec_params/*-dirt
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 09:08:43 +02:00
Daniel Vetter c0f5b82cd1 drm/i915: Catch abuse of I915_EXEC_CONSTANTS_*
A bit tricky since 0 is also a valid constant ...

v2: Add DRM_DEBUG (Chris)

Testcase: igt/gem_exec_params/rel-constants-*
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 09:08:42 +02:00
Daniel Vetter 9d662da8b6 drm/i915: Catch abuse of I915_EXEC_GEN7_SOL_RESET
Currently we catch it, but silently succeed. Our userspace is
better than this.

v2: Add DRM_DEBUG (Chris)

Testcase: igt/gem_exec_params/sol-reset-*
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 09:08:41 +02:00
Dave Airlie 885ac04ab3 Merge tag 'drm-intel-next-2014-04-16' of git://anongit.freedesktop.org/drm-intel into drm-next
drm-intel-next-2014-04-16:
- vlv infoframe fixes from Jesse
- dsi/mipi fixes from Shobhit
- gen8 pageflip fixes for LRI/SRM from Damien
- cmd parser fixes from Brad Volkin
- some prep patches for CHV, DRRS, ...
- and tons of little things all over
drm-intel-next-2014-04-04:
- cmd parser for gen7 but only in enforcing and not yet granting mode - the
  batch copying stuff is still missing. Also performance is a bit ... rough
  (Brad Volkin + OACONTROL fix from Ken).
- deprecate UMS harder (i.e. CONFIG_BROKEN)
- interrupt rework from Paulo Zanoni
- runtime PM support for bdw and snb, again from Paulo
- a pile of refactorings from various people all over the place to prep for new
  stuff (irq reworks, power domain polish, ...)

drm-intel-next-2014-04-04:
- cmd parser for gen7 but only in enforcing and not yet granting mode - the
  batch copying stuff is still missing. Also performance is a bit ... rough
  (Brad Volkin + OACONTROL fix from Ken).
- deprecate UMS harder (i.e. CONFIG_BROKEN)
- interrupt rework from Paulo Zanoni
- runtime PM support for bdw and snb, again from Paulo
- a pile of refactorings from various people all over the place to prep for new
  stuff (irq reworks, power domain polish, ...)

Conflicts:
	drivers/gpu/drm/i915/i915_gem_context.c
2014-05-01 09:11:37 +10:00
Chris Wilson 691e6415c8 drm/i915: Always use kref tracking for all contexts.
If we always initialize kref for the context, even if we are using fake
contexts for hangstats when there is no hw support, we can forgo the
dance to dereference the ctx->obj and inspect whether we are permitted
to use kref inside i915_gem_context_reference() and _unreference().

My ulterior motive here is to improve the debugging of a use-after-free
of ctx->obj. This patch avoids the dereference here and instead forces
the assertion checks associated with kref.

v2: Refactor the fake contexts to being even more like the real
contexts, so that there is much less duplicated and special case code.

v3: Tweaks.
v4: Tweaks, minor.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76671
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: lu hua <huax.lu@intel.com>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
[Jani: tiny change to backport to drm-intel-fixes.]
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2014-04-11 13:29:51 +03:00
Ben Widawsky 935f38d694 drm/i915: Unref context on failed eb_create
I opted to do this instead of grabbing the context reference after
eb_create since eb_create can potentially call the shrinker, and that
makes things very complicated. This simple patch balances the ref count
without requiring a great deal of review to make sure the shrinker path
is safe.

Theoretically (by design) the shrinker can end up destroying a context,
which enforces the reasoning for doing the fix this way instead of
moving the reference to later in the function.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-04-09 14:37:11 +02:00
Jani Nikula 50227e1cae drm/i915: prefer struct drm_i915_private to drm_i915_private_t
Remove the rest of the references to drm_i915_private_t. No functional
changes.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
[danvet: Drop hunk in i915_cmd_parser.c]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-03-31 15:34:21 +02:00
Brad Volkin 351e3db2b3 drm/i915: Implement command buffer parsing logic
The command parser scans batch buffers submitted via execbuffer ioctls before
the driver submits them to hardware. At a high level, it looks for several
things:

1) Commands which are explicitly defined as privileged or which should only be
   used by the kernel driver. The parser generally rejects such commands, with
   the provision that it may allow some from the drm master process.
2) Commands which access registers. To support correct/enhanced userspace
   functionality, particularly certain OpenGL extensions, the parser provides a
   whitelist of registers which userspace may safely access (for both normal and
   drm master processes).
3) Commands which access privileged memory (i.e. GGTT, HWS page, etc). The
   parser always rejects such commands.

See the overview comment in the source for more details.

This patch only implements the logic. Subsequent patches will build the tables
that drive the parser.

v2: Don't set the secure bit if the parser succeeds
Fail harder during init
Makefile cleanup
Kerneldoc cleanup
Clarify module param description
Convert ints to bools in a few places
Move client/subclient defs to i915_reg.h
Remove the bits_count field

OTC-Tracker: AXIA-4631
Change-Id: I50b98c71c6655893291c78a2d1b8954577b37a30
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
[danvet: Appease checkpatch.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-03-07 22:37:00 +01:00
Daniel Vetter 8ea99c9287 drm/i915: Only bind each object rather than for every execbuffer
One side-effect of the introduction of ppgtt was that we needed to
rebind the object into the appropriate vm (and global gtt in some
peculiar cases). For simplicity this was done twice for every object on
every call to execbuffer. However, that adds a tremendous amount of CPU
overhead (rewriting all the PTE for all objects into WC memory) per
draw. The fix is to push all the decision about which vm to bind into
and when down into the low-level bind routines through hints rather than
performing the bind unconditionally in the execbuffer routine.

Note that this is a regression introduced in the full ppgtt feature
branch, before this we've only done re-bound objects when the relevant
has_(aliasing_ppgtt|global_gtt)_mapping flag was clear. But since
that's per-object and not per-vma that optimization broke.

v2: Split out prep work and unrelated changes.

v3: Bring back functional change around PIN_GLOBAL that I've
accidentally split out.

v4: Remove the temporary hack for the old binding logic to avoid
bisection issues.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72906
Tested-by: jianx.zhou@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-02-14 14:18:38 +01:00
Daniel Vetter bf3d149b25 drm/i915: split PIN_GLOBAL out from PIN_MAPPABLE
With abitrary pin flags it makes sense to split out a "please bind
this into global gtt" from the "please allocate in the mappable
range".

Use this unconditionally in our global gtt pin helper since this is
what its callers want. Later patches will drop PIN_MAPPABLE where it's
not strictly needed.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-02-14 14:17:27 +01:00
Daniel Vetter 1ec9e26dda drm/i915: Consolidate binding parameters into flags
Anything more than just one bool parameter is just a pain to read,
symbolic constants are much better.

Split out from Chris' vma-binding rework patch.

v2: Undo the behaviour change in object_pin that Chris spotted.

v3: Split out misplaced hunk to handle set_cache_level errors,
spotted by Jani.

v4: Keep the current over-zealous binding logic in the execbuffer code
working with a quick hack while the overall binding code gets shuffled
around.

v5: Reorder the PIN_ flags for more natural patch splitup.

v6: Pull out the PIN_GLOBAL split-up again.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-02-14 14:16:58 +01:00
Jani Nikula d330a9530c drm/i915: move module parameters into a struct, in a new file
With 20+ module parameters, I think referring to them via a struct
improves clarity over just having a bunch of globals. While at it, move
the parameter initialization and definitions into a new file
i915_params.c to reduce clutter in i915_drv.c.

Apart from the ill-named i915_enable_rc6, i915_enable_fbc and
i915_enable_ppgtt parameters, for which we lose the "i915_" prefix
internally, the module parameters now look the same both on the kernel
command line and in code. For example, "i915.modeset".

The downsides of the change are losing static on a couple of variables
and not having the initialization and module_param_named() right next to
each other. On the other hand, all module parameters are now defined in
one place at i915_params.c. Plus you can do this to find all module
parameter references:

$ git grep "i915\." -- drivers/gpu/drm/i915

v2:
- move the definitions into a new file
- s/i915_params/i915/
- make i915_try_reset i915.reset, for consistency

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-01-27 17:16:45 +01:00
Rodrigo Vivi a25eebb0af drm: dp helper: Add DP test sink CRC definition.
This address will be used to verify panel CRC for test and
validation purposes.

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
[danvet: Fix whitespace fail.]
Acked-by: Dave Airlie <airlied@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-01-27 09:55:23 +01:00
Daniel Vetter 0e5539b923 Merge branch 'topic/ppgtt' into drm-intel-next-queued
Because whatever.*

* This should contain a fairly long list of issues and still
unresolved resgressions, but I didn't really get a vote.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-01-25 21:14:57 +01:00
Ben Widawsky 8b78f0e588 drm/i915: Clarify relocation errnos
While trying to find a random -EINVAL from a failing test, I noticed we
had a few hard to follow return values.

The first two hunks in this patch replace completely useless
initialization of ret. The last several hunks help to distinguish
between altering 'return ret' and 'return <ERROR>'

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-01-22 09:58:25 +01:00
Daniel Vetter 0d9d349d87 Merge commit origin/master into drm-intel-next
Conflicts are getting out of hand, and now we have to shuffle even
more in -next which was also shuffled in -fixes (the call for
drm_mode_config_reset needs to move yet again).

So do a proper backmerge. I wanted to wait with this for the 3.13
relaese, but alas let's just do this now.

Conflicts:
	drivers/gpu/drm/i915/i915_reg.h
	drivers/gpu/drm/i915/intel_ddi.c
	drivers/gpu/drm/i915/intel_display.c
	drivers/gpu/drm/i915/intel_pm.c

Besides the conflict around the forcewake get/put (where we chaged the
called function in -fixes and added a new parameter in -next) code all
the current conflicts are of the adjacent lines changed type.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-01-16 22:06:30 +01:00
Ben Widawsky 72ad5c45f0 drm/i915/ppgtt: Fix ioctl errno for "no such context"
Without this fix the ioctls silently succeeded (but actually did
nothing).

It makes all the code which calls into this function way too confusing.

v2: Fix destroy IOCTL as well

v3: Clarify the other two callers of i915_gem_context_get() to never
check for NULL. (Mika)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72903
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Testcase: igt/gem_ctx_exec/basic
[danvet: Fix up the commit message and actually bother to mention the
testcase this fixes.]
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-01-07 08:50:11 +01:00
Daniel Vetter a7c1d426ef drm/i915: Don't check for NEEDS_GTT when deciding the address space
This means something different and is only relevant for gen6 and the
reason why we cant use anything else than aliasing ppgtt there.

Note that the currently implemented logic for secure batches is
broken: Userspace wants the buffer both in ppgtt (for self-referencing
relocations) and in ggtt (for priveledge operations).

This is the same issue the command parser is also facing.
Unfortunately our coverage for corner-cases of self-referencing
batches is spotty.

Note that this will break vsync'ed Xv and DRI2 copies.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 17:50:40 +01:00
Daniel Vetter 2c9f8d56a1 drm/i915: Reject NEEDS_GTT relocations with full ppgtt
Doesn't make sense. Spotted while fixing an issue Chris
noticed in the same area.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 17:50:39 +01:00
Daniel Vetter 7c9c4b8f5d drm/i915: Reject non-default contexts on non-render again
This reverts the abi-change from

commit 67e3d2979b
Author: Ben Widawsky <ben@bwidawsk.net>
Date:   Fri Dec 6 14:11:01 2013 -0800

    drm/i915: Permit contexts on all rings

We don't actually need this, only the internal changes to allow
contexts on all rings for the purpose of ppgtt switching are required.
And I'm not sure whether this is the right thing to do given some of
the hw features in the pipeline.

Also, new abi needs userspace patches as a proof-of-need, which is
completely lacking here.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 17:50:38 +01:00
Ben Widawsky 7e0d96bc03 drm/i915: Use multiple VMs -- the point of no return
As with processes which run on the CPU, the goal of multiple VMs is to
provide process isolation. Specific to GEN, there is also the ability to
map more objects per process (2GB each instead of 2Gb-2k total).

For the most part, all the pipes have been laid, and all we need to do
is remove asserts and actually start changing address spaces with the
context switch. Since prior to this we've converted the setting of the
page tables to a streamed version, this is quite easy.

One important thing to point out (since it'd been hotly contested) is
that with this patch, every context created will have it's own address
space (provided the HW can do it).

v2: Disable BDW on rebase

NOTE: I tried to make this commit as small as possible. I needed one
place where I could "turn everything on" and that is here. It could be
split into finer commits, but I didn't really see much point.

Cc: Eric Anholt <eric@anholt.net>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 16:24:52 +01:00
Daniel Vetter 3d7f0f9dcc Merge commit drm-intel-fixes into topic/ppgtt
I need the tricky do_switch fix before I can merge the final piece of
the ppgtt enabling puzzle. Otherwise the conflict will be a real pain
to resolve since the do_switch hunk from -fixes must be placed at the
exact right place within a hunk in the next patch.

Conflicts:
	drivers/gpu/drm/i915/i915_gem_context.c
	drivers/gpu/drm/i915/i915_gem_execbuffer.c
	drivers/gpu/drm/i915/intel_display.c

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 16:23:37 +01:00
Ben Widawsky 41bde5535a drm/i915: Get context early in execbuf
We need to have the address space when reserving space for the objects.
Since the address space and context are tied together, and reserve
occurs before context switch (for good reason), we must lookup our
context earlier in the process.

This leaves some room for optimizations where we no longer need to use
ctx_id in certain places. This will be addressed in a subsequent patch.

Important tricky bit:
Because slow relocations during execbuffer drop struct_mutex

Perhaps it would be best to acquire the reference when we get the
context, but I'll save that for another day (note I have written the
patch before, and I found the changes required to be uglier than this).

Note that since we currently access everything via context id, and not
the data structure this is fine, though not desirable. The next change
attempts to get the context only once via the context ID idr lookup, and
as such, the following can happen:

CTX-A is created, refcount = 1
CTX-A execbuf, mutex dropped
close IOCTL called on CTX-A, refcount = 0
CTX-A resumes in execbuf.

v2: Rebased on top of
commit b6359918b8
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date:   Wed Oct 30 15:44:16 2013 +0200

    drm/i915: add i915_get_reset_stats_ioctl

v3: Rebased on top of
commit 25b3dfc87b
Author: Mika Westerberg <mika.westerberg@linux.intel.com>
Date:   Tue Nov 12 11:57:30 2013 +0200

Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date:   Tue Nov 26 16:14:33 2013 +0200

    drm/i915: check context reset stats before relocations

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 15:52:42 +01:00
Ben Widawsky 67e3d2979b drm/i915: Permit contexts on all rings
If we want to use contexts in more abstract terms (specifically with
PPGTT in mind), we need to allow them to be specified for any ring.

Since the upcoming patches will bring about the use of multiple address
spaces, and each ring needs to have an address space programmed (which
we intend to do at context switch time), we can no longer only use RCS.

With multiple rings having a last context, we must now unreference these
contexts.

NOTE: This commit requires an update to intel-gpu-tools to make it not
fail.

v2: Rebased with some logical conflicts.
Squashed in the context fini refcount patch

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 15:27:53 +01:00
Ben Widawsky ca01b12b40 drm/i915: Simplify ring handling in execbuf
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 15:27:52 +01:00
Ben Widawsky 3e7a032295 drm/i915: Remove vm arg from relocate entry
The only place we were using it was for GEN6, which won't have PPGTT
support anyway (ie. the VM is always the same). To clear things up,
(it only added confusion for me since it doesn't allow us to assert
vma->vm is what we always want, when just looking at the code).

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 15:27:50 +01:00
Ben Widawsky 6f65e29aca drm/i915: Create bind/unbind abstraction for VMAs
To sum up what goes on here, we abstract the vma binding, similarly to
the previous object binding. This helps for distinguishing legacy
binding, versus modern binding. To keep the code churn as minimal as
possible, I am leaving in insert_entries(). It serves as the per
platform pte writing basically. bind_vma and insert_entries do share a
lot of similarities, and I did have designs to combine the two, but as
mentioned already... too much churn in an already massive patchset.

What follows are the 3 commits which existed discretely in the original
submissions. Upon rebasing on Broadwell support, it became clear that
separation was not good, and only made for more error prone code. Below
are the 3 commit messages with all their history.

drm/i915: Add bind/unbind object functions to VMA
drm/i915: Use the new vm [un]bind functions
drm/i915: reduce vm->insert_entries() usage

drm/i915: Add bind/unbind object functions to VMA

As we plumb the code with more VM information, it has become more
obvious that the easiest way to deal with bind and unbind is to simply
put the function pointers in the vm, and let those choose the correct
way to handle the page table updates. This change allows many places in
the code to simply be vm->bind, and not have to worry about
distinguishing PPGTT vs GGTT.

Notice that this patch has no impact on functionality. I've decided to
save the actual change until the next patch because I think it's easier
to review that way. I'm happy to squash the two, or let Daniel do it on
merge.

v2:
Make ggtt handle the quirky aliasing ppgtt
Add flags to bind object to support above
Don't ever call bind/unbind directly for PPGTT until we have real, full
PPGTT (use NULLs to assert this)
Make sure we rebind the ggtt if there already is a ggtt binding.  This
happens on set cache levels.
Use VMA for bind/unbind (Daniel, Ben)

v3: Reorganize ggtt_vma_bind to be more concise and easier to read
(Ville). Change logic in unbind to only unbind ggtt when there is a
global mapping, and to remove a redundant check if the aliasing ppgtt
exists.

v4: Make the bind function a bit smarter about the cache levels to avoid
unnecessary multiple remaps. "I accept it is a wart, I think unifying
the pin_vma / bind_vma could be unified later" (Chris)
Removed the git notes, and put version info here. (Daniel)

v5: Update the comment to not suck (Chris)

v6:
Move bind/unbind to the VMA. It makes more sense in the VMA structure
(always has, but I was previously lazy). With this change, it will allow
us to keep a distinct insert_entries.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

drm/i915: Use the new vm [un]bind functions

Building on the last patch which created the new function pointers in
the VM for bind/unbind, here we actually put those new function pointers
to use.

Split out as a separate patch to aid in review. I'm fine with squashing
into the previous patch if people request it.

v2: Updated to address the smart ggtt which can do aliasing as needed
Make sure we bind to global gtt when mappable and fenceable. I thought
we could get away without this initialy, but we cannot.

v3: Make the global GTT binding explicitly use the ggtt VM for
bind_vma(). While at it, use the new ggtt_vma helper (Chris)

At this point the original mailing list thread diverges. ie.

v4^:
use target_obj instead of obj for gen6 relocate_entry
vma->bind_vma() can be called safely during pin. So simply do that
instead of the complicated conditionals.
Don't restore PPGTT bound objects on resume path
Bug fix in resume path for globally bound Bos
Properly handle secure dispatch
Rebased on vma bind/unbind conversion

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

drm/i915: reduce vm->insert_entries() usage

FKA: drm/i915: eliminate vm->insert_entries()

With bind/unbind function pointers in place, we no longer need
insert_entries. We could, and want, to remove clear_range, however it's
not totally easy at this point. Since it's used in a couple of place
still that don't only deal in objects: setup, ppgtt init, and restore
gtt mappings.

v2: Don't actually remove insert_entries, just limit its usage. It will
be useful when we introduce gen8. It will always be called from the vma
bind/unbind.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 15:27:50 +01:00
Ben Widawsky d7f46fc4e7 drm/i915: Make pin count per VMA
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 15:27:49 +01:00
Chris Wilson 9ae9ab5220 drm/i915: Prevent double unref following alloc failure during execbuffer
Whilst looking up the objects required for an execbuffer, an untimely
allocation failure in creating the vma results in the object being
unreferenced from two lists. The ownership during the lookup is meant to
be moved from the list of objects being looked to the vma, and this
double unreference upon error results in a use-after-free.

Fixes regression from
commit 27173f1f95
Author: Ben Widawsky <ben@bwidawsk.net>
Date:   Wed Aug 14 11:38:36 2013 +0200

    drm/i915: Convert execbuf code to use vmas

Based on the fix by Ben Widawsky.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: stable@vger.kernel.org
[danvet: Bikeshed the crucial comment above the ownership transfer as
discussed on irc.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-12 10:44:57 +01:00
Paulo Zanoni f65c916898 drm/i915: add runtime put/get calls at the basic places
If I add code to enable runtime PM on my Haswell machine, start a
desktop environment, then enable runtime PM, these functions will
complain that they're trying to read/write registers while the
graphics card is suspended.

v2: - Simplify i915_gem_fault changes.

Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
[danvet: Drop the hunk in i915_hangcheck_elapsed, it's the wrong thing
to do.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-10 22:47:33 +01:00
Mika Kuoppala d299cce76e drm/i915: check context reset stats before relocations
Doing it early prevents moving and relocating objects in vain
for contexts that won't get any GPU time.

Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-04 13:20:31 +01:00
Chris Wilson a415d35564 drm/i915: Pin relocations for the duration of constructing the execbuffer
As the execbuffer dispatch grows ever more complex and involves multiple
stages of moving objects into the aperture, we need to take greater care
that we do not evict our execbuffer objects prior to dispatch. This is
relatively simple as we can just keep the objects pinned for not just
the relocation but until we are finished.

One such example is the possibility of the context switch causing an
eviction or hitting the shrinker in order to fit its object into the
aperture.

Link: http://lists.freedesktop.org/archives/intel-gfx/2013-November/036166.html
Reported-by: "Siluvery, Arun" <arun.siluvery@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: stable@vger.kernel.org
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Add the additional explanations from Chris to the commit
message.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-11-27 09:04:36 +01:00
Ben Widawsky 5ce097254e drm/i915: Missed dropped VMA conversion
This belonged in
commit 07fe0b1280
Author: Ben Widawsky <ben@bwidawsk.net>
Date:   Wed Jul 31 17:00:10 2013 -0700

    drm/i915: plumb VM into bind/unbind code

But it was somehow missed along the way.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-11-26 10:12:32 +01:00
Ben Widawsky 17601cbc93 drm/i915: Removed unused vm args
i915_gem_execbuffer_relocate became defunct in:
commit 27173f1f95
Author: Ben Widawsky <ben@bwidawsk.net>
Date:   Wed Aug 14 11:38:36 2013 +0200

    drm/i915: Convert execbuf code to use vmas

eb_create: never used?

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: The lingering vm parameter to eb_create might have been back
from the days where we didn't yet keep both vmas and obj lists in the
eb struct. But I didn't check really.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-11-26 10:11:21 +01:00
Ben Widawsky 28cf541543 drm/i915/bdw: unleash PPGTT
v2: Squash in fix from Ben: Set PPGTT batches as necessary

This fixes the regression in the last couple of days when we enabled
PPGTT.

v3: Squash in fixup to still use GTT for secure batches from Ville:

BDW doesn't have a separate secure vs. non-secure bit in
MI_BATCH_BUFFER_START. So for secure batches we have to simply
leave the PPGTT bit unset. Fortunately older generations (except
HSW) had similar limitations so execbuffer already creates a GTT
mapping for all secure batches.

Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-11-08 18:09:48 +01:00
Ben Widawsky 3c94ceeee2 drm/i915/bdw: Support 64b relocations
We don't actually return any to userspace yet, however we can pretend
like we do now so userspace will support it when it happens.

This is just to please Chris as the code itself isn't ready for > 64b
relocations.

v2: Rebase on top of the refactored relocate_entry_gtt|cpu functions.

v3: Squash in fixup from Rafal Barbalho for 64 byte relocs using cpu
relocs and those crossing a page boundary.

v4: Squash in a fixup for the fixup from Rafael.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Signed-off-by: Barbalho, Rafael <rafael.barbalho@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-11-08 18:09:41 +01:00
Ben Widawsky e2d05a8b1e drm/i915: Convert active API to VMA
Even though we track object activity and not VMA, because we have the
active_list be based on the VM, it makes the most sense to use VMAs in
the APIs.

NOTE: Daniel intends to eventually rip out active/inactive LRUs, but for
now, leave them be.

v2: Remove leftover hunk from the previous patch which didn't keep
i915_gem_object_move_to_active. That patch had to rely on the ring to
get the dev instead of the obj. (Chris)

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-10-01 07:45:21 +02:00
Daniel Vetter b205ca5721 drm/i915: Use unsigned for overflow checks in execbuf
There's actually no real risk since we already check for stricter
constraints earlier (using UINT_MAX / sizeof (struct
drm_i915_gem_exec_object2) as the limit). But in eb_create we use
signed integers, which steals a factor of 2. Luckily struct
drm_i915_gem_exec_object2 for this to not matter.

Still, be consistent and use unsigned integers.

Similar use unsinged integers when checking for overflows in the
relocation entry processing.

I've also added a new subtests to igt/gem_reloc_overflow to also
test for overflowing args->buffer_count values.

v2: Give the variables again tighter scope to make it clear that the
computation is purely local and doesn't leak out to the 2nd block.
Requested by Chris Wilson.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-10-01 07:45:03 +02:00
Daniel Vetter a1e2265332 drm/i915: Use kcalloc more
No buffer overflows here, but better safe than sorry.

v2:
- Fixup the sizeof conversion, I've missed the pointer deref (Jani).
- Drop the redundant GFP_ZERO, kcalloc alreads memsets (Jani).
- Use kmalloc_array for the execbuf fastpath to avoid the memset
  (Chris). I've opted to leave all other conversions as-is since they
  aren't in a fastpath and dealing with cleared memory instead of
  random garbage is just generally nicer.

Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
[danvet: Drop the contentious kmalloc_array hunk in execbuf.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-10-01 07:45:01 +02:00
Ben Widawsky 68c8c17f52 drm/i915: evict VM instead of everything
When reserving objects during execbuf, it is possible to come across an
object which will not fit given the current fragmentation of the address
space. We do not have any defragment in drm_mm, so the strategy is to
instead evict everything, and reallocate objects.

With the upcoming addition of multiple VMs, there is no point to evict
everything since doing so is overkill for the specific case mentioned
above.

Recommended-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: One additional s/evict_everything/evict_vm/ to update a
comment in the code.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-12 21:58:22 +02:00
Mika Kuoppala be62acb4cc drm/i915: ban badly behaving contexts
Now when we have mechanism in place to track which context
was guilty of hanging the gpu, it is possible to punish
for bad behaviour.

If context has recently submitted a faulty batchbuffers guilty of
gpu hang and submits another batch which hangs gpu in quick
succession, ban it permanently. If ctx is banned, no more
batchbuffers will be queued for execution.

There is no need for global wedge machinery anymore and
it would be unwise to wedge the whole gpu if we have multiple
hanging batches queued for execution. Instead just ban
the guilty ones and carry on.

v2: Store guilty ban status bool in gpu_error instead of pointers
    that might become danling before hang is declared.

v3: Use return value for banned status instead of stashing state
    into gpu_error (Chris Wilson)

v4: - rebase on top of fixed hang stats api
    - add define for ban period
    - rename commit and improve commit msg

v5: - rely context banning instead of wedging the gpu
    - beautification and fix for ban calculation (Chris)

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-06 17:55:50 +02:00
Chris Wilson 2cc86b8260 drm/i915: Always prefer CPU relocations with LLC
A follow-on to the update of the LLC coherency logic is that we can rely
on the LLC being coherent with the CS for rewriting batchbuffers
irrespective of their cache domain. (This should have no effect
currently as all the batch buffers are expected to be I915_CACHE_LLC and
so using the cpu relocation path anyway.)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-04 17:34:43 +02:00
Daniel Vetter e656a6cba0 drm/i915: inline vma_create into lookup_or_create_vma
In the execbuf code we don't clean up any vmas which ended up not
getting bound for code simplicity. To make sure that we don't end up
creating multiple vma for the same vm kill the somewhat dangerous
vma_create function and inline it into lookup_or_create.

This is just a safety measure to prevent surprises in the future.

Also update the somewhat confused comment in the execbuf code and
clarify what kind of magic is going on with a new one.

v2: Keep the function separate as requested by Chris. But give it a __
prefix for paranoia and move it tighter together with the other vma
stuff.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <ben@bwidawsk.net>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-04 17:34:41 +02:00
Ben Widawsky 27173f1f95 drm/i915: Convert execbuf code to use vmas
In order to transition more of our code over to using a VMA instead of
an <OBJ, VM> pair - we must have the vma accessible at execbuf time. Up
until now, we've only had a VMA when actually binding an object.

The previous patch helped handle the distinction on bound vs. unbound.
This patch will help us catch leaks, and other issues before we actually
shuffle a bunch of stuff around.

This attempts to convert all the execbuf code to speak in vmas. Since
the execbuf code is very self contained it was a nice isolated
conversion.

The meat of the code is about turning eb_objects into eb_vma, and then
wiring up the rest of the code to use vmas instead of obj, vm pairs.

Unfortunately, to do this, we must move the exec_list link from the obj
structure. This list is reused in the eviction code, so we must also
modify the eviction code to make this work.

WARNING: This patch makes an already hotly profiled path slower. The cost is
unavoidable. In reply to this mail, I will attach the extra data.

v2: Release table lock early, and two a 2 phase vma lookup to avoid
having to use a GFP_ATOMIC. (Chris)

v3: s/obj_exec_list/obj_exec_link/
Updates to address
commit 6d2b888569
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Aug 7 18:30:54 2013 +0100

    drm/i915: List objects allocated from stolen memory in debugfs

v4: Use obj = vma->obj for neatness in some places (Chris)
need_reloc_mappable() should return false if ppgtt (Chris)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Split out prep patches. Also remove a FIXME comment which is
now taken care of.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-04 17:34:41 +02:00
Daniel Vetter d4d36014ca drm/i915: fix up the relocate_entry refactoring
Somehow we've lost the error handling in the patch split-up between
the internal and external patch. This regression has been introduced
in

commit 5032d871f7
Author: Rafael Barbalho <rafael.barbalho@intel.com>
Date:   Wed Aug 21 17:10:51 2013 +0100

    drm/i915: Cleaning up the relocate entry function

This bug is exercised by igt/gem_reloc_vs_gpu/interruptible.

Cc: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-03 19:18:01 +02:00
Rafael Barbalho 5032d871f7 drm/i915: Cleaning up the relocate entry function
As the relocate entry function was getting a bit too big I've moved
the code that used to use either the cpu or the gtt to for the
relocation into two separate functions.

Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-23 14:52:31 +02:00
Chris Wilson 000433b67e drm/i915: Only do a chipset flush after a clflush
Now that we skip clflushes more often, return a boolean indicating
whether the clflush was actually performed, and only if it was do the
chipset flush. (Though on most of the architectures where the clflush will
be skipped, the chipset flush is a no-op!)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-22 13:31:34 +02:00
Chris Wilson 2c22569bba drm/i915: Update rules for writing through the LLC with the cpu
As mentioned in the previous commit, reads and writes from both the CPU
and GPU go through the LLC. This gives us coherency between the CPU and
GPU irrespective of the attribute settings either device sets. We can
use to avoid having to clflush even uncached memory.

Except for the scanout.

The scanout resides within another functional block that does not use
the LLC but reads directly from main memory. So in order to maintain
coherency with the scanout, writes to uncached memory must be flushed.
In order to optimize writes elsewhere, we start tracking whether an
framebuffer is attached to an object.

v2: Use pin_display tracking rather than fb_count (to ensure we flush
cursors as well etc) and only force the clflush along explicit writes to
the scanout paths (i.e. pin_to_display_plane and pwrite into scanout).

v3: Force the flush after hitting the slowpath in pwrite, as after
dropping the lock the object's cache domain may be invalidated. (Ville)

Based on a patch by Ville Syrjälä.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-10 11:20:49 +02:00
Ben Widawsky ca191b1313 drm/i915: mm_list is per VMA
formerly: "drm/i915: Create VMAs (part 5) - move mm_list"

The mm_list is used for the active/inactive LRUs. Since those LRUs are
per address space, the link should be per VMx .

Because we'll only ever have 1 VMA before this point, it's not incorrect
to defer this change until this point in the patch series, and doing it
here makes the change much easier to understand.

Shamelessly manipulated out of Daniel:
"active/inactive stuff is used by eviction when we run out of address
space, so needs to be per-vma and per-address space. Bound/unbound otoh
is used by the shrinker which only cares about the amount of memory used
and not one bit about in which address space this memory is all used in.
Of course to actual kick out an object we need to unbind it from every
address space, but for that we have the per-object list of vmas."

v2: only bump GGTT LRU in i915_gem_object_set_to_gtt_domain (Chris)

v3: Moved earlier in the series

v4: Add dropped message from v3

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Frob patch to apply and use vma->node.size directly as
discused with Ben. Also drop a needles BUG_ON before move_to_inactive,
the function itself has the same check.]
[danvet 2nd: Rebase on top of the lost "drm/i915: Cleanup more of VMA
in destroy", specifically unlink the vma from the mm_list in
vma_unbind (to keep it symmetric with bind_to_vm) instead of
vma_destroy.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-08 14:06:58 +02:00
Ben Widawsky 9843877d10 drm/i915: turn bound_ggtt checks to bound_any
In some places, we want to know if an object is bound in any address
space, and not just the global GTT. This often applies when there is a
single global resource (object, pages, etc.)

function                             |      reason
--------------------------------------------------
i915_gem_object_is_inactive          | global object
i915_gem_object_put_pages            | object's pages
915_gem_object_unpin                 | global object
i915_gem_execbuffer_unreserve_object | temporary until we plumb vma
pread/pwrite                         | see the note below

Note: set_to_gtt_domain in pwrite/pread is abused as a wait_rendering
call - but that once only worked if the object is bound. We really
should replace this with a plain wait_rendering call, which would have
the upside that in pread it would be clearer that we actually only
wait for oustanding gpu writes.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Explain the set_to_gtt_domain in pwrite/pread and volunteer
Ben to replace those with wait_rendering calls.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-08 14:04:43 +02:00
Ben Widawsky 07fe0b1280 drm/i915: plumb VM into bind/unbind code
As alluded to in several patches, and it will be reiterated later... A
VMA is an abstraction for a GEM BO bound into an address space.
Therefore it stands to reason, that the existing bind, and unbind are
the ones which will be the most impacted. This patch implements this,
and updates all callers which weren't already updated in the series
(because it was too messy).

This patch represents the bulk of an earlier, larger patch. I've pulled
out a bunch of things by the request of Daniel. The history is preserved
for posterity with the email convention of ">" One big change from the
original patch aside from a bunch of cropping is I've created an
i915_vma_unbind() function. That is because we always have the VMA
anyway, and doing an extra lookup is useful. There is a caveat, we
retain an i915_gem_object_ggtt_unbind, for the global cases which might
not talk in VMAs.

> drm/i915: plumb VM into object operations
>
> This patch was formerly known as:
> "drm/i915: Create VMAs (part 3) - plumbing"
>
> This patch adds a VM argument, bind/unbind, and the object
> offset/size/color getters/setters. It preserves the old ggtt helper
> functions because things still need, and will continue to need them.
>
> Some code will still need to be ported over after this.
>
> v2: Fix purge to pick an object and unbind all vmas
> This was doable because of the global bound list change.
>
> v3: With the commit to actually pin/unpin pages in place, there is no
> longer a need to check if unbind succeeded before calling put_pages().
> Make put_pages only BUG() after checking pin count.
>
> v4: Rebased on top of the new hangcheck work by Mika
> plumbed eb_destroy also
> Many checkpatch related fixes
>
> v5: Very large rebase
>
> v6:
> Change BUG_ON to WARN_ON (Daniel)
> Rename vm to ggtt in preallocate stolen, since it is always ggtt when
> dealing with stolen memory. (Daniel)
> list_for_each will short-circuit already (Daniel)
> remove superflous space (Daniel)
> Use per object list of vmas (Daniel)
> Make obj_bound_any() use obj_bound for each vm (Ben)
> s/bind_to_gtt/bind_to_vm/ (Ben)
>
> Fixed up the inactive shrinker. As Daniel noticed the code could
> potentially count the same object multiple times. While it's not
> possible in the current case, since 1 object can only ever be bound into
> 1 address space thus far - we may as well try to get something more
> future proof in place now. With a prep patch before this to switch over
> to using the bound list + inactive check, we're now able to carry that
> forward for every address space an object is bound into.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Rebase on top of the loss of "drm/i915: Cleanup more of VMA
in destroy".]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-08 14:04:20 +02:00
Ben Widawsky 28d6a7bfa2 drm/i915: thread address space through execbuf
This represents the first half of hooking up VMs to execbuf. Here we
basically pass an address space all around to the different internal
functions. It should be much more readable, and have less risk than the
second half, which begins switching over to using VMAs instead of an
obj,vm.

The overall series echoes this style of, "add a VM, then make it smart
later"

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Switch a BUG_ON to WARN_ON.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-05 19:04:11 +02:00
Ben Widawsky c37e220461 drm/i915: Add VM to pin
To verbalize it, one can say, "pin an object into the given address
space." The semantics of pinning remain the same otherwise.

Certain objects will always have to be bound into the global GTT.
Therefore, global GTT is a special case, and keep a special interface
around for it (i915_gem_obj_ggtt_pin).

v2: s/i915_gem_ggtt_pin/i915_gem_obj_ggtt_pin

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-05 19:04:09 +02:00
Chris Wilson de51f04f06 drm/i915: Replace open-coded offset_in_page()
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-07-26 19:45:11 +02:00
Xiong Zhang 0b74b508f7 drm/i915: add prefault_disable module option
prefault is stll enabled by default which prevent most of pwrite/pread/reloc
from running slow path, in order to verify these slow pathes, prefault need
to be disabled.

Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
[danvet: Make checkpatch happy and bikeshed the module option help
text a bit.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-07-19 09:29:26 +02:00
Chris Wilson e852096986 drm/i915: Replace open-coding of DEFAULT_CONTEXT_ID
The intent of the check is made more clear if we use the proper name for
0 here.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-07-16 10:40:40 +02:00
Daniel Vetter db1b76ca6a drm/i915: don't frob mm.suspended when not using ums
In kernel modeset driver mode we're in full control of the chip,
always. So there's no need at all to set mm.suspended in
i915_gem_idle. Hence move that out into the leavevt ioctl. Since
i915_gem_idle doesn't suspend gem any more we can also drop the
re-enabling for KMS in the thaw function.

Also clean up the handling of mm.suspend at driver load by coalescing
all the assignments.

Stumbled over while reading through our resume code for unrelated
reasons.

v2: Shovel mm.suspended into the (newly created) ums dungeon as
suggested by Chris Wilson. The plan is that once we've completely
stopped relying on the register save/restore code we could shovel even
that in there.

v3: Improve the locking for the entervt/leavevt ioctls a bit by moving
the dev->struct_mutex locking outside of i915_gem_idle. Also don't
clear dev_priv->ums.mm_suspended for the kms case, we allocate it with
kzalloc. Both suggested by Chris Wilson.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v2)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-07-10 14:30:25 +02:00
Ben Widawsky f343c5f647 drm/i915: Getter/setter for object attributes
Soon we want to gut a lot of our existing assumptions how many address
spaces an object can live in, and in doing so, embed the drm_mm_node in
the object (and later the VMA).

It's possible in the future we'll want to add more getter/setter
methods, but for now this is enough to enable the VMAs.

v2: Reworked commit message (Ben)
Added comments to the main functions (Ben)
sed -i "s/i915_gem_obj_set_color/i915_gem_obj_ggtt_set_color/" drivers/gpu/drm/i915/*.[ch]
sed -i "s/i915_gem_obj_bound/i915_gem_obj_ggtt_bound/" drivers/gpu/drm/i915/*.[ch]
sed -i "s/i915_gem_obj_size/i915_gem_obj_ggtt_size/" drivers/gpu/drm/i915/*.[ch]
sed -i "s/i915_gem_obj_offset/i915_gem_obj_ggtt_offset/" drivers/gpu/drm/i915/*.[ch]
(Daniel)

v3: Rebased on new reserve_node patch
Changed DRM_DEBUG_KMS to actually work (will need fixing later)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-07-08 22:04:34 +02:00
Mika Kuoppala 7d736f4f0b drm/i915: add batch bo to i915_add_request()
In order to track down a batch buffer and context which
caused the ring to hang, store reference to bo into the request struct.
Request can also cause gpu to hang after the batch in the flush section
in the ring. To detect this add start of the flush portion offset into the
request.

v2: Included comment about request vs batch_obj lifetimes (Chris Wilson)

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-06-13 17:42:16 +02:00
Mika Kuoppala 0025c0772d drm/i915: change i915_add_request to macro
Only execbuffer needed all the parameters on i915_add_request().
By putting __i915_add_request behind macro, all current callsites
become cleaner. Following patch will introduce a new parameter
for __i915_add_request. With this patch, only the relevant callsite
will reflect the change making commit smaller and easier to understand.

v2: _i915_add_request as function name (Chris Wilson)

v3: change name __i915_add_request and fix ordering of params (Ben Widawsky)

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-06-13 17:42:15 +02:00
Chris Wilson c65355bbef drm/i915: Track when we dirty the scanout with render commands
This is required for tracking render damage for use with FBC and will be
used in subsequent patches.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-06-07 17:56:45 +02:00
Xiang, Haihao 82f91b6e93 drm/i915: add I915_EXEC_VEBOX to i915_gem_do_execbuffer()
A user can run batchbuffer via VEBOX ring.

Signed-off-by: Xiang, Haihao <haihao.xiang@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-05-31 20:54:21 +02:00
Dave Airlie 399403c7ce Merge tag 'drm-intel-next-2013-03-23' of git://people.freedesktop.org/~danvet/drm-intel into drm-next
Daniel writes:
Highlights:
- Imre's for_each_sg_pages rework (now also with the stolen mem backed
  case fixed with a hack) plus the drm prime sg list coalescing patch from
  Rahul Sharma. I have some follow-up cleanups pending, already acked by
  Andrew Morton.
- Some prep-work for the crazy no-pch/display-less platform by Ben.
- Some vlv patches, by far not all (Jesse et al).
- Clean up the HDMI/SDVO #define confusion (Paulo)
- gen2-4 vblank fixes from Ville.
- Unclaimed register warning fixes for hsw (Paulo). More still to come ...
- Complete pageflips which have been stuck in a gpu hang, should prevent
  stuck gl compositors (Ville).
- pm patches for vt-switchless resume (Jesse). Note that the i915 enabling
  is not (yet) included, that took a bit longer to settle. PM patches are
  acked by Rafael Wysocki.
- Minor fixlets all over from various people.

* tag 'drm-intel-next-2013-03-23' of git://people.freedesktop.org/~danvet/drm-intel: (79 commits)
  drm/i915: Implement WaSwitchSolVfFArbitrationPriority
  drm/i915: Set the VIC in AVI infoframe for SDVO
  drm/i915: Kill a strange comment about DPMS functions
  drm/i915: Correct sandybrige overclocking
  drm/i915: Introduce GEN7_FEATURES for device info
  drm/i915: Move num_pipes to intel info
  drm/i915: fixup pd vs pt confusion in gen6 ppgtt code
  style nit: Align function parameter continuation properly.
  drm/i915: VLV doesn't have HDMI on port C
  drm/i915: DSPFW and BLC regs are in the display offset range
  drm/i915: set conservative clock gating values on VLV v2
  drm/i915: fix WaDisablePSDDualDispatchEnable on VLV v2
  drm/i915: add more VLV IDs
  drm/i915: use VLV DIP routines on VLV v2
  drm/i915: add media well to VLV force wake routines v2
  drm/i915: don't use plane pipe select on VLV
  drm: modify pages_to_sg prime helper to create optimized SG table
  drm/i915: use for_each_sg_page for setting up the gtt ptes
  drm/i915: create compact dma scatter lists for gem objects
  drm/i915: handle walking compact dma scatter lists
  ...
2013-04-05 10:18:13 +10:00
Lauri Kasanen 27b7c63a7c drm/i915: Fix build failure
ERROR: "__build_bug_on_failed" [drivers/gpu/drm/i915/i915.ko] undefined!

Originally reported at http://www.gossamer-threads.com/lists/linux/kernel/1631803
FDO bug #62775

This needs to be backported to both 3.7 and 3.8 stable trees. Doesn't apply straight,
but it's a quick change.

Signed-off-by: Lauri Kasanen <cand@gmx.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62775
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-03-27 15:05:42 +01:00
Ben Widawsky 41fda59682 drm/i915: Remove unneeded dev argument
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-03-18 03:03:19 +01:00
Ben Widawsky cf144969d5 drm/i915: Remove unused file arg from execbuf
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-03-18 03:02:53 +01:00
Kees Cook 3118a4f652 drm/i915: bounds check execbuffer relocation count
It is possible to wrap the counter used to allocate the buffer for
relocation copies. This could lead to heap writing overflows.

CVE-2013-0913

v3: collapse test, improve comment
v2: move check into validate_exec_list

Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Pinkie Pie
Cc: stable@vger.kernel.org
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-03-13 21:31:03 +01:00
Kees Cook 3058753583 drm/i915: clarify reasoning for the access_ok call
This clarifies the comment above the access_ok check so a missing
VERIFY_READ doesn't alarm anyone.

v2:
 - rewrote comment, thanks to Chris Wilson

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
[danvet: add patch history log to commit message.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-03-13 21:17:28 +01:00
Ville Syrjälä 2bb4629add drm/i915: Add to_user_ptr()
to_user_ptr() simply casts a pointer passed as u64 from user space
to void __user * correctly. Using this lets us get rid of all the
tiresome casts.

The idea came from Chris Wilson <chris@chris-wilson.co.uk>.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-03-03 19:49:11 +01:00
Dave Airlie 6dc1c49da6 Merge branch 'fbcon-locking-fixes' of ssh://people.freedesktop.org/~airlied/linux into drm-next
This pulls in most of Linus tree up to -rc6, this fixes the worst lockdep
reported issues and re-enables fbcon lockdep.

(not the fbcon maintainer)
* 'fbcon-locking-fixes' of ssh://people.freedesktop.org/~airlied/linux: (529 commits)
  Revert "Revert "console: implement lockdep support for console_lock""
  fbcon: fix locking harder
  fb: Yet another band-aid for fixing lockdep mess
  fb: rework locking to fix lock ordering on takeover
2013-02-08 12:10:18 +10:00
Ben Widawsky 5d4545aef5 drm/i915: Create a gtt structure
The purpose of the gtt structure is to help isolate our gtt specific
properties from the rest of the code (in doing so it help us finish the
isolation from the AGP connection).

The following members are pulled out (and renamed):
gtt_start
gtt_total
gtt_mappable_end
gtt_mappable
gtt_base_addr
gsm

The gtt structure will serve as a nice place to put gen specific gtt
routines in upcoming patches. As far as what else I feel belongs in this
structure: it is meant to encapsulate the GTT's physical properties.
This is why I've not added fields which track various drm_mm properties,
or things like gtt_mtrr (which is itself a pretty transient field).

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
[Ben modified commit messages]
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-01-17 22:33:56 +01:00
Chris Wilson eef90ccb8a drm/i915: Use the reloc.handle as an index into the execbuffer array
Using copywinwin10 as an example that is dependent upon emitting a lot
of relocations (2 per operation), we see improvements of:

c2d/gm45: 618000.0/sec to 623000.0/sec.
i3-330m: 748000.0/sec to 789000.0/sec.

(measured relative to a baseline with neither optimisations applied).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-01-17 22:23:47 +01:00
Daniel Vetter ed5982e6ce drm/i915: Allow userspace to hint that the relocations were known
Userspace is able to hint to the kernel that its command stream and
auxiliary state buffers already hold the correct presumed addresses and
so the relocation process may be skipped if the kernel does not need to
move any buffers in preparation for the execbuffer. Thus for the common
case where the allotment of buffers is static between batches, we can
avoid the overhead of individually checking the relocation entries.

Note that this requires userspace to supply the domain tracking and
requests for workarounds itself that would otherwise be computed based
upon the relocation entries.

Using copywinwin10 as an example that is dependent upon emitting a lot
of relocations (2 per operation), we see improvements of:

c2d/gm45: 618000.0/sec to 632000.0/sec.
i3-330m: 748000.0/sec to 830000.0/sec.

(measured relative to a baseline with neither optimisations applied).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Imre Deak <imre.deak@intel.com>
[danvet: Fixup merge conflict in userspace header due to different
baseline trees.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-01-17 22:23:36 +01:00
Chris Wilson bcffc3faa6 drm/i915: Move the execbuffer objects list from the stack into the tracker
Instead of passing around the eb-objects hashtable and a separate object
list, we can include the object list into the eb-objects structure for
convenience.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-01-17 22:08:02 +01:00
Chris Wilson 3b96eff447 drm/i915: Take the handle idr spinlock once for looking up the exec objects
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-01-17 22:08:01 +01:00
Chris Wilson 419fa72a19 drm/i915: Mark a temporary allocation for copy-from-user as such
The difference is that the kernel will then know that this memory will
be reclaimable in the near future.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-01-17 22:08:00 +01:00
Dave Airlie b5cc6c0387 Merge tag 'drm-intel-next-2012-12-21' of git://people.freedesktop.org/~danvet/drm-intel into drm-next
Daniel writes:
- seqno wrap fixes and debug infrastructure from Mika Kuoppala and Chris
  Wilson
- some leftover kill-agp on gen6+ patches from Ben
- hotplug improvements from Damien
- clear fb when allocated from stolen, avoids dirt on the fbcon (Chris)
- Stolen mem support from Chris Wilson, one of the many steps to get to
  real fastboot support.
- Some DDI code cleanups from Paulo.
- Some refactorings around lvds and dp code.
- some random little bits&pieces

* tag 'drm-intel-next-2012-12-21' of git://people.freedesktop.org/~danvet/drm-intel: (93 commits)
  drm/i915: Return the real error code from intel_set_mode()
  drm/i915: Make GSM void
  drm/i915: Move GSM mapping into dev_priv
  drm/i915: Move even more gtt code to i915_gem_gtt
  drm/i915: Make next_seqno debugs entry to use i915_gem_set_seqno
  drm/i915: Introduce i915_gem_set_seqno()
  drm/i915: Always clear semaphore mboxes on seqno wrap
  drm/i915: Initialize hardware semaphore state on ring init
  drm/i915: Introduce ring set_seqno
  drm/i915: Missed conversion to gtt_pte_t
  drm/i915: Bug on unsupported swizzled platforms
  drm/i915: BUG() if fences are used on unsupported platform
  drm/i915: fixup overlay stolen memory leak
  drm/i915: clean up PIPECONF bpc #defines
  drm/i915: add intel_dp_set_signal_levels
  drm/i915: remove leftover display.update_wm assignment
  drm/i915: check for the PCH when setting pch_transcoder
  drm/i915: Clear the stolen fb before enabling
  drm/i915: Access to snooped system memory through the GTT is incoherent
  drm/i915: Remove stale comment about intel_dp_detect()
  ...

Conflicts:
	drivers/gpu/drm/i915/intel_display.c
2013-01-17 20:34:08 +10:00
Chris Wilson 262b6d363f drm/i915: Invalidate the relocation presumed_offsets along the slow path
In the slow path, we are forced to copy the relocations prior to
acquiring the struct mutex in order to handle pagefaults. We forgo
copying the new offsets back into the relocation entries in order to
prevent a recursive locking bug should we trigger a pagefault whilst
holding the mutex for the reservations of the execbuffer. Therefore, we
need to reset the presumed_offsets just in case the objects are rebound
back into their old locations after relocating for this exexbuffer - if
that were to happen we would assume the relocations were valid and leave
the actual pointers to the kernels dangling, instant hang.

Fixes regression from commit bcf50e2775
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sun Nov 21 22:07:12 2010 +0000

    drm/i915: Handle pagefaults in execbuffer user relocations

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55984
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@fwll.ch>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-01-16 10:53:38 +01:00
Daniel Vetter b45305fce5 drm/i915: Implement workaround for broken CS tlb on i830/845
Now that Chris Wilson demonstrated that the key for stability on early
gen 2 is to simple _never_ exchange the physical backing storage of
batch buffers I've tried a stab at a kernel solution. Doesn't look too
nefarious imho, now that I don't try to be too clever for my own good
any more.

v2: After discussing the various techniques, we've decided to always blit
batches on the suspect devices, but allow userspace to opt out of the
kernel workaround assume full responsibility for providing coherent
batches. The principal reason is that avoiding the blit does improve
performance in a few key microbenchmarks and also in cairo-trace
replays.

Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet:
- Drop the hunk which uses HAS_BROKEN_CS_TLB to implement the ring
  wrap w/a. Suggested by Chris Wilson.
- Also add the ACTHD check from Chris Wilson for the error state
  dumping, so that we still catch batches when userspace opts out of
  the w/a.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-12-17 17:27:02 +01:00
Chris Wilson c1f093e09c drm/i915: Remove check for conflicting relocation write-domains
Simply use the last write-domain set for the object in the batch,
trusting userspace to have correctly flushed the caches between usage as
a write target. This check dates back from the golden age of having only
a single operation per batch with the kernel repeating it for each
cliprect, and conflicts both with userspace trying to efficiently batch
multiple operations and with reducing the kernel overhead of relocation
processing.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-12-03 20:13:16 +01:00
Ville Syrjälä ca9c46c5c7 drm/i915: Kill i915_gem_execbuffer_wait_for_flips()
As per Chris Wilson's suggestion make
i915_gem_execbuffer_wait_for_flips() go away.

This was used to stall the GPU ring while there are pending
page flips involving the relevant BO. Ie. while the BO is still
being scanned out by the display controller.

The recommended alternative is to use the page flip events to
wait for the page flips to fully complete before reusing the BO
of the old front buffer. Or use more buffers.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: don't remove obj->pending_flips, still required due to
reorder patches.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-11-29 11:58:15 +01:00
Chris Wilson 9d7730914f drm/i915: Preallocate next seqno before touching the ring
Based on the work by Mika Kuoppala, we realised that we need to handle
seqno wraparound prior to committing our changes to the ring. The most
obvious point then is to grab the seqno inside intel_ring_begin(), and
then to reuse that seqno for all ring operations until the next request.
As intel_ring_begin() can fail, the callers must already be prepared to
handle such failure and so we can safely add further checks.

This patch looks like it should be split up into the interface
changes and the tweaks to move seqno wrapping from the execbuffer into
the core seqno increment. However, I found no easy way to break it into
incremental steps without introducing further broken behaviour.

v2: Mika found a silly mistake and a subtle error in the existing code;
inside i915_gem_retire_requests() we were resetting the sync_seqno of
the target ring based on the seqno from this ring - which are only
related by the order of their allocation, not retirement. Hence we were
applying the optimisation that the rings were synchronised too early,
fortunately the only real casualty there is the handling of seqno
wrapping.

v3: Do not forget to reset the sync_seqno upon module reinitialisation,
ala resume.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=863861
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> [v2]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-11-29 11:43:52 +01:00
Chris Wilson be7cb6347e drm/i915: Remove bogus test for a present execbuffer
The intention of checking obj->gtt_offset!=0 is to verify that the
target object was listed in the execbuffer and had been bound into the
GTT. This is guarranteed by the earlier rearrangement to split the
execbuffer operation into reserve and relocation phases and then
verified by the check that the target handle had been processed during
the reservation phase.

However, the actual checking of obj->gtt_offset==0 is bogus as we can
indeed reference an object at offset 0. For instance, the framebuffer
installed by the BIOS often resides at offset 0 - causing EINVAL as we
legimately try to render using the stolen fb.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-11-21 17:45:03 +01:00
Ben Widawsky e76e9aebcd drm/i915: Stop using AGP layer for GEN6+
As a quick hack we make the old intel_gtt structure mutable so we can
fool a bunch of the existing code which depends on elements in that data
structure. We can/should try to remove this in a subsequent patch.

This should preserve the old gtt init behavior which upon writing these
patches seems incorrect. The next patch will fix these things.

The one exception is VLV which doesn't have the preserved flush control
write behavior. Since we want to do that for all GEN6+ stuff, we'll
handle that in a later patch. Mainstream VLV support doesn't actually
exist yet anyway.

v2: Update the comment to remove the "voodoo"
Check that the last pte written matches what we readback

v3: actually kill cache_level_to_agp_type since most of the flags will
disappear in an upcoming patch

v4: v3 was actually not what we wanted (Daniel)
Make the ggtt bind assertions better and stricter (Chris)
Fix some uncaught errors at gtt init (Chris)
Some other random stuff that Chris wanted

v5: check for i==0 in gen6_ggtt_bind_object to shut up gcc (Ben)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by [v4]: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Make the cache_level -> agp_flags conversion for pre-gen6 a
tad more robust by mapping everything != CACHE_NONE to the cached agp
flag - we have a 1:1 uncached mapping, but different modes of
cacheable (at least on later generations). Suggested by Chris Wilson.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-11-11 23:51:42 +01:00
Daniel Vetter c2fb791692 Linux 3.7-rc2
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJQgvdwAAoJEHm+PkMAQRiG+3AH/i2XsqqN3VctL0nnbWfvds+Q
 aKulfIdJTjKiVAsawPUtRqReZ8ijiebrgA/53lZLlrFOoPPQ5+LHmnSyQF6gErOY
 NuAE1lijXDRM1pwBlhvOBbAj26wUobGjqONFJ9OkKr758Ue8ds/Q7UdxyEgmYgmg
 tvVMzfRcICzryUV3PcqL+3cNPpCUdT6wGGRJ9DCv/jvGiWKExWhOle5oltrmxk+D
 NsqRcws5pEubfHE4J8BvNWr8lE1kHfYVhrJETiLJUiN2XAJcbI4Jy7rU/3EGteNS
 0HMZdaPPjV874lohdM70X2225SbYrCVkAYB5hnZCTeC3tYyCawBBPMQoyAiOcmU=
 =+861
 -----END PGP SIGNATURE-----

Merge tag 'v3.7-rc2' into drm-intel-next-queued

Linux 3.7-rc2

Backmerge to solve two ugly conflicts:
- uapi. We've already added new ioctl definitions for -next. Do I need to say more?
- wc support gtt ptes. We've had to revert this for snb+ for 3.7 and
  also fix a few other things in the code. Now we know how to make it
  work on snb+, but to avoid losing the other fixes do the backmerge
  first before re-enabling wc gtt ptes on snb+.

And a few other minor things, among them git getting confused in
intel_dp.c and seemingly causing a conflict out of nothing ...

Conflicts:
	drivers/gpu/drm/i915/i915_reg.h
	drivers/gpu/drm/i915/intel_display.c
	drivers/gpu/drm/i915/intel_dp.c
	drivers/gpu/drm/i915/intel_modes.c
	include/drm/i915_drm.h

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-10-22 14:34:51 +02:00
Chris Wilson d7d4eeddb8 drm/i915: Allow DRM_ROOT_ONLY|DRM_MASTER to submit privileged batchbuffers
With the introduction of per-process GTT space, the hardware designers
thought it wise to also limit the ability to write to MMIO space to only
a "secure" batch buffer. The ability to rewrite registers is the only
way to program the hardware to perform certain operations like scanline
waits (required for tear-free windowed updates). So we either have a
choice of adding an interface to perform those synchronized updates
inside the kernel, or we permit certain processes the ability to write
to the "safe" registers from within its command stream. This patch
exposes the ability to submit a SECURE batch buffer to
DRM_ROOT_ONLY|DRM_MASTER processes.

v2: Haswell split up bit8 into a ppgtt bit (still bit8) and a security
bit (bit 13, accidentally not set). Also add a comment explaining why
secure batches need a global gtt binding.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
[danvet: added hsw fixup.]
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-10-17 21:06:59 +02:00
Linus Torvalds 612a9aab56 Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux
Pull drm merge (part 1) from Dave Airlie:
 "So first of all my tree and uapi stuff has a conflict mess, its my
  fault as the nouveau stuff didn't hit -next as were trying to rebase
  regressions out of it before we merged.

  Highlights:
   - SH mobile modesetting driver and associated helpers
   - some DRM core documentation
   - i915 modesetting rework, haswell hdmi, haswell and vlv fixes, write
     combined pte writing, ilk rc6 support,
   - nouveau: major driver rework into a hw core driver, makes features
     like SLI a lot saner to implement,
   - psb: add eDP/DP support for Cedarview
   - radeon: 2 layer page tables, async VM pte updates, better PLL
     selection for > 2 screens, better ACPI interactions

  The rest is general grab bag of fixes.

  So why part 1? well I have the exynos pull req which came in a bit
  late but was waiting for me to do something they shouldn't have and it
  looks fairly safe, and David Howells has some more header cleanups
  he'd like me to pull, that seem like a good idea, but I'd like to get
  this merge out of the way so -next dosen't get blocked."

Tons of conflicts mostly due to silly include line changes, but mostly
mindless.  A few other small semantic conflicts too, noted from Dave's
pre-merged branch.

* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (447 commits)
  drm/nv98/crypt: fix fuc build with latest envyas
  drm/nouveau/devinit: fixup various issues with subdev ctor/init ordering
  drm/nv41/vm: fix and enable use of "real" pciegart
  drm/nv44/vm: fix and enable use of "real" pciegart
  drm/nv04/dmaobj: fixup vm target handling in preparation for nv4x pcie
  drm/nouveau: store supported dma mask in vmmgr
  drm/nvc0/ibus: initial implementation of subdev
  drm/nouveau/therm: add support for fan-control modes
  drm/nouveau/hwmon: rename pwm0* to pmw1* to follow hwmon's rules
  drm/nouveau/therm: calculate the pwm divisor on nv50+
  drm/nouveau/fan: rewrite the fan tachometer driver to get more precision, faster
  drm/nouveau/therm: move thermal-related functions to the therm subdev
  drm/nouveau/bios: parse the pwm divisor from the perf table
  drm/nouveau/therm: use the EXTDEV table to detect i2c monitoring devices
  drm/nouveau/therm: rework thermal table parsing
  drm/nouveau/gpio: expose the PWM/TOGGLE parameter found in the gpio vbios table
  drm/nouveau: fix pm initialization order
  drm/nouveau/bios: check that fixed tvdac gpio data is valid before using it
  drm/nouveau: log channel debug/error messages from client object rather than drm client
  drm/nouveau: have drm debugging macros build on top of core macros
  ...
2012-10-03 23:29:23 -07:00
David Howells 760285e7e7 UAPI: (Scripted) Convert #include "..." to #include <path/...> in drivers/gpu/
Convert #include "..." to #include <path/...> in drivers/gpu/.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Dave Jones <davej@redhat.com>
2012-10-02 18:01:07 +01:00
David Howells 4126d5d61f UAPI: (Scripted) Remove redundant DRM UAPI header #inclusions from drivers/gpu/.
Remove redundant DRM UAPI header #inclusions from drivers/gpu/.

Remove redundant #inclusions of core DRM UAPI headers (drm.h, drm_mode.h and
drm_sarea.h).  They are now #included via drmP.h and drm_crtc.h via a preceding
patch.

Without this patch and the patch to make include the UAPI headers from the core
headers, after the UAPI split, the DRM C sources cannot find these UAPI headers
because the DRM code relies on specific -I flags to make #include "..."  work
on headers in include/drm/ - but that does not work after the UAPI split without
adding more -I flags.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Dave Jones <davej@redhat.com>
2012-10-02 18:01:05 +01:00
Chris Wilson 41783eea1a drm/i915: Assert that the exec object lookup table is a power-of-two
As we make the simplification of using a power-of-two size for the
execbuffer handle-to-object TLB, we should validate that this is actually
true and so clarify that premise.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-09-20 14:23:11 +02:00
Chris Wilson ba7a64587b drm/i915: Drop the misleading cast to the wrong user pointer type
The exec_list is of type drm_i915_gem_exec_object2 and so casting it to
a drm_i915_gem_relocation_entry is very confusing!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-09-20 14:23:05 +02:00
Chris Wilson 9da3da660d drm/i915: Replace the array of pages with a scatterlist
Rather than have multiple data structures for describing our page layout
in conjunction with the array of pages, we can migrate all users over to
a scatterlist.

One major advantage, other than unifying the page tracking structures,
this offers is that we replace the vmalloc'ed array (which can be up to
a megabyte in size) with a chain of individual pages which helps reduce
memory pressure.

The disadvantage is that we then do not have a simple array to iterate,
or to access randomly. The common case for this is in the relocation
processing, which will typically fit within a single scatterlist page
and so be almost the same cost as the simple array. For iterating over
the array, the extra function call could be optimised away, but in
reality is an insignificant cost of either binding the pages, or
performing the pwrite/pread.

v2: Fix drm_clflush_sg() to not invoke wbinvd as well! And fix the
trivial compile error from rebasing.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-09-20 14:22:57 +02:00
Chris Wilson 7788a76520 drm/i915: Avoid unbinding due to an interrupted pin_and_fence during execbuffer
If we need to stall in order to complete the pin_and_fence operation
during execbuffer reservation, there is a high likelihood that the
operation will be interrupted by a signal (thanks X!). In order to
simplify the cleanup along that error path, the object was
unconditionally unbound and the error propagated. However, being
interrupted here is far more common than I would like and so we can
strive to avoid the extra work by eliminating the forced unbind.

v2: In discussion over the indecent colour of the new functions and
unwind path, we realised that we can use the new unreserve function to
clean up the code even further.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-24 21:02:51 +02:00
Chris Wilson 504c7267a1 drm/i915: Use cpu relocations if the object is in the GTT but not mappable
This prevents the case of unbinding the object in order to process the
relocations through the GTT and then rebinding it only to then proceed
to use cpu relocations as the object is now in the CPU write domain. By
choosing to use cpu relocations up front, we can therefore avoid the
rebind penalty.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-24 02:29:16 +02:00
Chris Wilson 86a1ee26bb drm/i915: Only pwrite through the GTT if there is space in the aperture
Avoid stalling and waiting for the GPU by checking to see if there is
sufficient inactive space in the aperture for us to bind the buffer
prior to writing through the GTT. If there is inadequate space we will
have to stall waiting for the GPU, and incur overheads moving objects
about. Instead, only incur the clflush overhead on the target object by
writing through shmem.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-24 02:03:33 +02:00
Chris Wilson 6c085a728c drm/i915: Track unbound pages
When dealing with a working set larger than the GATT, or even the
mappable aperture when touching through the GTT, we end up with evicting
objects only to rebind them at a new offset again later. Moving an
object into and out of the GTT requires clflushing the pages, thus
causing a double-clflush penalty for rebinding.

To avoid having to clflush on rebinding, we can track the pages as they
are evicted from the GTT and only relinquish those pages on memory
pressure.

As usual, if it were not for the handling of out-of-memory condition and
having to manually shrink our own bo caches, it would be a net reduction
of code. Alas.

Note: The patch also contains a few changes to the last-hope
evict_everything logic in i916_gem_execbuffer.c - we no longer try to
only evict the purgeable stuff in a first try (since that's superflous
and only helps in OOM corner-cases, not fragmented-gtt trashing
situations).

Also, the extraction of the get_pages retry loop from bind_to_gtt (and
other callsites) to get_pages should imo have been a separate patch.

v2: Ditch the newly added put_pages (for unbound objects only) in
i915_gem_reset. A quick irc discussion hasn't revealed any important
reason for this, so if we need this, I'd like to have a git blame'able
explanation for it.

v3: Undo the s/drm_malloc_ab/kmalloc/ in get_pages that Chris noticed.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Split out code movements and rant a bit in the commit message
with a few Notes. Done v2]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-21 14:34:11 +02:00
Daniel Vetter a22ddff8be Linux 3.6-rc2
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iQEcBAABAgAGBQJQLWtvAAoJEHm+PkMAQRiG/DYH+wd0FqfEuYkYk4KPyAPuhKpX
 zX7HYfLvyJE/ZYIdrhjq1E6Xm2KNr7gtX7/Rdzi2W38M9sjbYzwG1UGIw51qnxWy
 yZJH9BGkfyQgQPeuDGohfB6DkDy2JWr2eqMDvakjOwgBsIzji0PQD/f3UvndhtUa
 c+tTj/kjavHE1Yr2Wy6OnRZz3Uc0hIMn/Q0JqtbCs3LUgEV1KA4OEAe56XNz4Ku4
 WE+FFaGFPvtriQsQON+ohPS5IC8jzQGK/0vbrJ4lWjFnZy4gvZXnborTOwD0WSQG
 fbsNuxp1AaM2/pqfMwXm1w0ADvwOITHNiwwXf9id6DoK81QwTFpUdvKpn6yB6gQ=
 =rurr
 -----END PGP SIGNATURE-----

Merge tag 'v3.6-rc2' into drm-intel-next

Backmerge Linux 3.6-rc2 to resolve a few funny conflicts before we put
even more madness on top:

- drivers/gpu/drm/i915/i915_irq.c: Just a spurious WARN removed in
  -fixes, that has been changed in a variable-rename in -next, too.

- drivers/gpu/drm/i915/intel_ringbuffer.c: -next remove scratch_addr
  (since all their users have been extracted in another fucntion),
  -fixes added another user for a hw workaroudn.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-17 09:01:08 +02:00
Eric Anholt e844b990b1 drm/i915: Don't forget to apply SNB PIPE_CONTROL GTT workaround.
If a buffer that was the target of a PIPE_CONTROL from userland was a
reused one that hadn't been evicted which had not previously had this
workaround applied, then the early return for a correct
presumed_offset in this function meant we would not bind it into the
GTT and the write would land somewhere else.

Fixes reproducible failures with GL_EXT_timer_query usage in apitrace,
and I also expect it to fix the intermittent OQ issues on snb that
danvet's been working on.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=48019
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=52932
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Carl Worth <cworth@cworth.org>
Tested-by: Carl Worth <cworth@cworth.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-05 21:45:01 +02:00
Chris Wilson f047e395dd drm/i915: Avoid concurrent access when marking the device as idle/busy
As suggested by Daniel, rip out the independent timers for device and
crtc busyness and integrate the manual powermanagement of the display
engine into the GEM core and its request tracking. The benefits are that
the code is a lot smaller, fewer moving parts and should fit more neatly
into the overall activity tracking of the driver.

v2: Complete overhaul and removal of the racy timers and workers.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-07-25 18:23:56 +02:00
Chris Wilson a7b9761d0a drm/i915: Split i915_gem_flush_ring() into seperate invalidate/flush funcs
By moving the function to intel_ringbuffer and currying the appropriate
parameter, hopefully we make the callsites easier to read and
understand.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-07-25 18:23:55 +02:00
Chris Wilson 016fd0c1ae drm/i915: Clear the pending_gpu_fenced_access flag at the start of execbuffer
Otherwise once we use the buffer with a BLT command on gen2/3, we will
always regard future command submissions as continuing the fenced
access. However, now that we flush/invalidate between every batch we can
drop this pessimism.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-07-25 18:23:55 +02:00
Daniel Vetter 6ac42f4148 drm/i915: Replace the complex flushing logic with simple invalidate/flush all
Now that we unconditionally flush and invalidate between every batch
buffer, we no longer need the complex logic to decide which domains
require flushing. Remove it and rejoice.

v2 (danvet): Keep around the flip waiting logic. It's gross and
broken, I know, but we can't just kill that thing ... even if we just
keep it around as a reminder that things are broken.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-07-25 18:23:54 +02:00
Chris Wilson 69c2fc8913 drm/i915: Remove the per-ring write list
This is now handled by a global flag to ensure we emit a flush before
the next serialisation point (if we failed to queue one previously).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-07-25 18:23:53 +02:00
Chris Wilson 0201f1ecf4 drm/i915: Replace the pending_gpu_write flag with an explicit seqno
As we always flush the GPU cache prior to emitting the breadcrumb, we no
longer have to worry about the deferred flush causing the
pending_gpu_write to be delayed. So we can instead utilize the known
last_write_seqno to hopefully minimise the wait times.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-07-25 18:23:52 +02:00
Chris Wilson 3bb73aba1e drm/i915: Allow late allocation of request for i915_add_request()
Request preallocation was added to i915_add_request() in order to
support the overlay. However, not all users care and can quite happily
ignore the failure to allocate the request as they will simply repeat
the request in the future.

By pushing the allocation down into i915_add_request(), we can then
remove some rather ugly error handling in the callers.

v2: Nullify request->file_priv otherwise we chase a garbage pointer
when retiring requests.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-07-25 18:23:51 +02:00
Eric Anholt 0da5cec1de drm/i915: Set the context before setting up regs for the context.
Fixes failures in transform feedback on gen7 because our SOL_RESET
flag was setting the transform feedback offsets in the old context
(occasionally happened to be ours) instead of the new context.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-07-25 10:39:59 +02:00
Chris Wilson 09cf7c9a12 drm/i915: Insert a flush between batches if the breadcrumb was dropped
If we drop the breadcrumb request after a batch due to a signal for
example we aim to fix it up at the next opportunity. In this case we
emit a second batchbuffer with no waits upon the first and so no
opportunity to insert the missing request, so we need to emit the
missing flush for coherency. (Note that that invalidating the render
cache is the same as flushing it, so there should have been no
observable corruption.)

Note that beside simply adding the missing flush, avoiding potential
render corruption, this will also fix at least parts of the problem
introduced by some funny interaction of these two commits:

commit de2b998552
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Wed Jul 4 22:52:50 2012 +0200

    drm/i915: don't return a spurious -EIO from intel_ring_begin

which allowed intel_ring_begin to return -ERESTARTSYS and

commit cc889e0f6c
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Wed Jun 13 20:45:19 2012 +0200

    drm/i915: disable flushing_list/gpu_write_list

which essentially disabled the flushing list.

The issue happens when we submit a batch & emit it, but get
interrupted (thanks to the first patch) while trying to emit the
flush. On the next batch we still assume that the full gpu domain
handling is in effect and hence compute the invalidate&flushing
domains. But thanks to the 2nd patch we totally ignore these and only
invalidate all gpu domains, presuming that any required flushes have
been issued already.  Which is wrong and eventually results in us
updating the new write_domain values with the computed
pending_write_domain values, which leaves an object with write_domain
== 0 on the gpu_write_list.

As soon as we try to unbind that object, things blow up.

Fix this by emitting the missing flush according to the new
ring->gpu_caches_dirty flag.

Note that this does _not_ fix all the current cases where we end up
with an object on the flushing_list that can't be flushed.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=52040
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Add bug explanation to commit message.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-07-20 12:21:40 +02:00
Daniel Vetter cc889e0f6c drm/i915: disable flushing_list/gpu_write_list
This is just the minimal patch to disable all this code so that we can
do decent amounts of QA before we rip it all out.

The complicating thing is that we need to flush the gpu caches after
the batchbuffer is emitted. Which is past the point of no return where
execbuffer can't fail any more (otherwise we risk submitting the same
batch multiple times).

Hence we need to add a flag to track whether any caches associated
with that ring are dirty. And emit the flush in add_request if that's
the case.

Note that this has a quite a few behaviour changes:
- Caches get flushed/invalidated unconditionally.
- Invalidation now happens after potential inter-ring sync.

I've bantered around a bit with Chris on irc whether this fixes
anything, and it might or might not. The only thing clear is that with
these changes it's much easier to reason about correctness.

Also rip out a lone get_next_request_seqno in the execbuffer
retire_commands function. I've dug around and I couldn't figure out
why that is still there, with the outstanding lazy request stuff it
shouldn't be necessary.

v2: Chris Wilson complained that I also invalidate the read caches
when flushing after a batchbuffer. Now optimized.

v3: Added some comments to explain the new flushing behaviour.

Cc: Eric Anholt <eric@anholt.net>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-06-20 13:54:28 +02:00
Ben Widawsky 6e0a69dbc8 drm/i915/context: switch contexts with execbuf2
Use the rsvd1 field in execbuf2 to specify the context ID associated
with the workload. This will allow the driver to do the proper context
switch when/if needed.

v2: Add checks for context switches on rings not supporting contexts.
Before the code would silently ignore such requests.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2012-06-14 17:36:21 +02:00
Chris Wilson a15817cf16 drm/i915: Check whether the ring is initialised prior to dispatch
Rather than use the magic feature tests HAS_BLT/HAS_BSD just check
whether the ring we are about to dispatch the execbuffer on is
initialised.

v2: Use intel_ring_initialized()

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-19 22:39:53 +02:00
Chris Wilson acb87dfb4b drm/i915: Limit calling mark-busy only for potential scanouts
The principle of intel_mark_busy() is that we want to spot the
transition of when the display engine is being used in order to bump
powersaving modes and increase display clocks. As such it is only
important when the display is changing, i.e. when rendering to the
scanout or other sprite/plane, and these are characterised by being
pinned.

v2: Mark the whole device as busy on execbuffer and pageflips as well
and rebase against dinq for the minor bug fix to be immediately
applicable.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: fix compile fail.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-08 15:10:34 +02:00
Daniel Vetter 5e13a0c5ec Merge remote-tracking branch 'airlied/drm-core-next' into drm-intel-next-queued
Backmerge of drm-next to resolve a few ugly conflicts and to get a few
fixes from 3.4-rc6 (which drm-next has already merged). Note that this
merge also restricts the stencil cache lra evict policy workaround to
snb (as it should) - I had to frob the code anyway because the
CM0_MASK_SHIFT define died in the masked bit cleanups.

We need the backmerge to get Paulo Zanoni's infoframe regression fix
for gm45 - further bugfixes from him touch the same area and would
needlessly conflict.

Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-08 13:39:59 +02:00
Daniel Vetter dc257cf154 Linux 3.4-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iQEcBAABAgAGBQJPpvY9AAoJEHm+PkMAQRiGpEoIAJgbu+Y8gITnBK/wh9O6zy3S
 5jie5KK4YWdbJsvO58WbNr3CyVIwGIqQ2dUZLiU59aBVLarlGw8xor0MmW+cZwhp
 6fBHaf0qDYAV0MZjD+mnnExOiCRyISa2lPmsfu9dAWywh5KGe6/oAP6/qcXIyok3
 KZyl3qQf4ENpaZPHwZPXCEkUvtuyHgNiszN+QXEadA3s19Ot4VGe9A3VGw+GNrSm
 JqFIq3acQAbKa5BYaqf7TQC02v2FI7//eqt6QHxTqbE6a7LGbTvLfX3HlJ2mnfqa
 1R6QHhM4y4OZDHbaMT2raHZ8WuLXzhehJzhP8Co7AHFOKwVKOb5XbcUr2RrukMU=
 =HkMd
 -----END PGP SIGNATURE-----

Merge tag 'v3.4-rc6' into drm-intel-next

Conflicts:
	drivers/gpu/drm/i915/intel_display.c

Ok, this is a fun story of git totally messing things up. There
/shouldn't/ be any conflict in here, because the fixes in -rc6 do only
touch functions that have not been changed in -next.

The offending commits in drm-next are 14415745b2..1fa611065 which
simply move a few functions from intel_display.c to intel_pm.c. The
problem seems to be that git diff gets completely confused:

$ git diff 14415745b2..1fa611065

is a nice mess in intel_display.c, and the diff leaks into totally
unrelated functions, whereas

$git diff --minimal  14415745b2..1fa611065

is exactly what we want.

Unfortunately there seems to be no way to teach similar smarts to the
merge diff and conflict generation code, because with the minimal diff
there really shouldn't be any conflicts. For added hilarity, every
time something in that area changes the + and - lines in the diff move
around like crazy, again resulting in new conflicts. So I fear this
mess will stay with us for a little longer (and might result in
another backmerge down the road).

Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-07 14:02:14 +02:00
Daniel Vetter 6ebebc9206 drm/i915: disallow clip rects on gen5+
Unfortunately there has been dri1 userspace that used gem to manage
the gtt and hence also needed cliprects in the execbuf ioctl. So
we can't ever remove that code without breaking the ioctl abi.

But at least we can disable it on gen5+, because these horrible
versions of mesa have not supported these chips.

Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-03 11:18:29 +02:00
Ben Widawsky b2da9fe5d5 drm/i915: remove do_retire from i915_wait_request
This originates from a hack by me to quickly fix a bug in an earlier
patch where we needed control over whether or not waiting on a seqno
actually did any retire list processing. Since the two operations aren't
clearly related, we should pull the parameter out of the wait function,
and make the caller responsible for retiring if the action is desired.

The only function call site which did not get an explicit retire_request call
(on purpose) is i915_gem_inactive_shrink(). That code was already calling
retire_request a second time.

v2: don't modify any behavior excepit i915_gem_inactive_shrink(Daniel)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-03 11:18:20 +02:00
Xi Wang 44afb3a043 drm/i915: fix integer overflow in i915_gem_do_execbuffer()
On 32-bit systems, a large args->num_cliprects from userspace via ioctl
may overflow the allocation size, leading to out-of-bounds access.

This vulnerability was introduced in commit 432e58ed ("drm/i915: Avoid
allocation for execbuffer object list").

Signed-off-by: Xi Wang <xi.wang@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-23 22:32:15 +02:00
Xi Wang ed8cd3b2cd drm/i915: fix integer overflow in i915_gem_execbuffer2()
On 32-bit systems, a large args->buffer_count from userspace via ioctl
may overflow the allocation size, leading to out-of-bounds access.

This vulnerability was introduced in commit 8408c282 ("drm/i915:
First try a normal large kmalloc for the temporary exec buffers").

Signed-off-by: Xi Wang <xi.wang@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-23 22:32:02 +02:00
Chris Wilson 06d9813157 drm/i915: Remove the pipelined parameter from get_fence()
We never succeeded in getting pipelined fencing to work (unresolved
spurious GPU hangs), so begin the process of dismantling and removal
the broken code.

Step 1 is the removal of the pipeline parameter to get_fence().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-18 13:15:43 +02:00
Chris Wilson 7b09638f45 drm/i915: Always flush tiling changes before accessing through the GTT
As we defer updating the fence register from set-tiling to the point of
use, we need to declare every access through the GTT as either fenced or
unfenced.

This patches fixes an old bug in the execbuffer relocation processing
which could conceivably be hit by a pathological userspace.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-18 10:48:38 +02:00
Ben Widawsky 2911a35b2e drm/i915: use semaphores for the display plane
In theory this will have performance and power improvements. Performance
because we don't need to stall when the scanout BO is busy, and power
because we don't have to stall when the BO is busy (and the ring can
even go to sleep if the HW supports it).

v2:
squash 2 patches into 1 (me)
un-inline the enable_semaphores function (Daniel)
remove comment about SNB hangs from i915_gem_object_sync (Chris)
rename intel_enable_semaphores to i915_semaphore_is_enabled (me)
removed page flip comment; "no why" (Chris)

To address other comments from Daniel (irc):
update the comment to say 'vt-d is crap, don't enable semaphores'
  - I think you misinterpreted Chris' comment, it already exists.
checking out whether we can pageflip on the render ring on ivb (didn't
work on early silicon)
  - We don't want to enable workarounds for early silicon unless we have
    to.
  - I can't find any references in the docs about this.
optionally use it if the fb is already busy on the render ring
  - This should be how the code already worked, unless I am
    misunderstanding your meaning.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-12 21:14:05 +02:00
Chris Wilson 9a5a53b392 drm/i915: Reorganise rules for get_fence/put_fence
By simplifying the rules to calling get_fence when writing to the
through the GTT in a tiled manner, and calling put_fence before writing
to the object through the GTT in a linear manner, the code becomes
clearer and there is less chance of making a mistake.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
[danvet: fixed up conflict with ppgtt code and spelling in a new
comment.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-12 21:14:04 +02:00
Dave Airlie effbc4fd8e Merge branch 'drm-intel-next' of git://people.freedesktop.org/~danvet/drm-intel into drm-core-next
Daniel Vetter wrote
First pull request for 3.5-next, slightly large than usual because new
things kept coming in since the last pull for 3.4.
Highlights:
- first batch of hw enablement for vlv (Jesse et al) and hsw (Eugeni). pci
 ids are not yet added, and there's still quite a few patches to merge
 (mostly modesetting). To make QA easier I've decided to merge this stuff
 in pieces.
- loads of cleanups and prep patches spurred by the above. Especially vlv
 is a real frankenstein chip, but also hsw is stretching our driver's
 code design. Expect more to come in this area for 3.5.
- more gmbus fixes, cleanups and improvements by Daniel Kurtz. Again,
 there are more patches needed (and some already queued up), but I wanted
 to split this a bit for better testing.
- pwrite/pread rework and retuning. This series has been in the works for
 a few months already and a lot of i-g-t tests have been created for it.
 Now it's finally ready to be merged.  Note that one patch in this series
 touches include/pagemap.h, that patch is acked-by akpm.
- reduce mappable pressure and relocation throughput improvements from
 Chris.
- mmap offset exhaustion mitigation by Chris Wilson.
- a start at figuring out which codepaths in our messy dri1/ums+gem/kms
 driver we actually need to support by bailing out of unsupported case.
 The driver now refuses to load without kms on gen6+ and disallows a few
 ioctls that userspace never used in certain cases. More of this will
 definitely come.
- More decoupling of global gtt and ppgtt.
- Improved dual-link lvds detection by Takashi Iwai.
- Shut up the compiler + plus fix the fallout (Ben)
- Inverted panel brightness handling (mostly Acer manages to break things
 in this way).
- Small fixlets and adjustements and some minor things to help debugging.

Regression-wise QA reported quite a few issues on ivb, but all of them
turned out to be hw stability issues which are already fixed in
drm-intel-fixes (QA runs the nightly regression tests on -next alone,
without -fixes automatically merged in). There's still one issue open on
snb, it looks like occlusion query writes are not quite as cache coherent
as we've expected. With some of the pwrite adjustements we can now
reliably hit this. Kernel workaround for it is in the works."

* 'drm-intel-next' of git://people.freedesktop.org/~danvet/drm-intel: (101 commits)
  drm/i915: VCS is not the last ring
  drm/i915: Add a dual link lvds quirk for MacBook Pro 8,2
  drm/i915: make quirks more verbose
  drm/i915: dump the DMA fetch addr register on pre-gen6
  drm/i915/sdvo: Include YRPB as an additional TV output type
  drm/i915: disallow gem init ioctl on ilk
  drm/i915: refuse to load on gen6+ without kms
  drm/i915: extract gt interrupt handler
  drm/i915: use render gen to switch ring irq functions
  drm/i915: rip out old HWSTAM missed irq WA for vlv
  drm/i915: open code gen6+ ring irqs
  drm/i915: ring irq cleanups
  drm/i915: add SFUSE_STRAP registers for digital port detection
  drm/i915: add WM_LINETIME registers
  drm/i915: add WRPLL clocks
  drm/i915: add LCPLL control registers
  drm/i915: add SSC offsets for SBI access
  drm/i915: add port clock selection support for HSW
  drm/i915: add S PLL control
  drm/i915: add PIXCLK_GATE register
  ...

Conflicts:
	drivers/char/agp/intel-agp.h
	drivers/char/agp/intel-gtt.c
	drivers/gpu/drm/i915/i915_debugfs.c
2012-04-12 10:27:01 +01:00
Chris Wilson 7dd4906586 drm/i915: Mark untiled BLT commands as fenced on gen2/3
The BLT commands on gen2/3 utilize the fence registers and so we cannot
modify any fences for the object whilst those commands are in flight.
Currently we marked tiled commands as occupying a fence, but forgot to
restrict the untiled commands from preventing a fence being assigned
before they were completed.

One side-effect is that we ten have to double check that a fence was
allocated for a fenced buffer during move-to-active.

Reported-by: Jiri Slaby <jirislaby@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43427
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47990
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Testcase: i-g-t/tests/gem_tiled_after_untiled_blt
Tested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-01 12:26:05 +02:00
Daniel Vetter f56f821feb mm: extend prefault helpers to fault in more than PAGE_SIZE
drm/i915 wants to read/write more than one page in its fastpath
and hence needs to prefault more than PAGE_SIZE bytes.

Add new functions in filemap.h to make that possible.

Also kill a copy&pasted spurious space in both functions while at it.

v2: As suggested by Andrew Morton, add a multipage parameter to both
functions to avoid the additional branch for the pagemap.c hotpath.
My gcc 4.6 here seems to dtrt and indeed reap these branches where not
needed.

v3: Becaus I couldn't find a way around adding a uaddr += PAGE_SIZE to
the filemap.c hotpaths (that the compiler couldn't remove again),
let's go with separate new functions for the multipage use-case.

v4: Adjust comment to CodingStlye and fix spelling.

Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:36:30 +02:00
Chris Wilson dabdfe021a drm/i915: Avoid using mappable space for relocation processing through the CPU
We try to avoid writing the relocations through the uncached GTT, if the
buffer is currently in the CPU write domain and so will be flushed out to
main memory afterwards anyway. Also on SandyBridge we can safely write
to the pages in cacheable memory, so long as the buffer is LLC mapped.
In either of these cases, we therefore do not need to force the
reallocation of the buffer into the mappable region of the GTT, reducing
the aperture pressure.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:16:17 +02:00
Chris Wilson 1d83f4426f drm/i915: Batch copy_from_user for relocation processing
Originally the code tried to allocate a large enough array to perform
the copy using vmalloc, performance wasn't great and throughput was
improved by processing each individual relocation entry separately.
This too is not as efficient as one would desire. A compromise would be
to allocate a single page, or to allocate a few entries on the stack,
and process the copy in batches. The latter gives simpler code and more
consistent performance due to a lack of heuristic.

x11perf -copywinwin10:	n450/pnv	i3-330m		i5-2520m (cpu)
               before: 	  249000	 785000		 1280000 (80%)
                 page:	  264000	 896000		 1280000 (65%)
             on-stack:	  264000	 902000		 1280000 (67%)

v2: Use 512-bytes of stack for batching rather than allocate a page.
v3: Tidy the code slightly with more descriptive variable names

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-26 09:59:05 +02:00
Daniel Vetter 149c84077f drm/i915: implement SNB workaround for lazy global gtt
PIPE_CONTROL on snb needs global gtt mappings in place to workaround a
hw gotcha. No other commands need such a workaround. Luckily we can
detect a PIPE_CONTROL commands easily because they have a write_domain
= I915_GEM_DOMAIN_INSTRUCTION (and nothing else has that).

v2: Binding the target of such a reloc into the global gtt actually
works instead of binding the source, which is rather pointless ...

v3: Kill a superflous has_global_gtt_mapping assignement noticed by
Chris Wilson.

Reviewed-and-tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-20 21:53:55 +01:00
Daniel Vetter 9edd576d89 Merge remote-tracking branch 'airlied/drm-fixes' into drm-intel-next-queued
Back-merge from drm-fixes into drm-intel-next to sort out two things:

- interlaced support: -fixes contains a bugfix to correctly clear
  interlaced configuration bits in case the bios sets up an interlaced
  mode and we want to set up the progressive mode (current kernels
  don't support interlaced). The actual feature work to support
  interlaced depends upon (and conflicts with) this bugfix.

- forcewake voodoo to workaround missed IRQ issues: -fixes only enabled
  this for ivybridge, but some recent bug reports indicate that we
  need this on Sandybridge, too. But in a slightly different flavour
  and with other fixes and reworks on top. Additionally there are some
  forcewake cleanup patches heading to -next that would conflict with
  currrent -fixes.

Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-02-10 17:14:49 +01:00
Daniel Vetter 7bddb01fb9 drm/i915: ppgtt binding/unbinding support
This adds support to bind/unbind objects and wires it up. Objects are
only put into the ppgtt when necessary, i.e. at execbuf time.

Objects are still unconditionally put into the global gtt.

v2: Kill the quick hack and explicitly pass cache_level to ppgtt_bind
like for the global gtt function. Noticed by Chris Wilson.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-02-09 21:25:23 +01:00
Daniel Vetter ff240199b6 drm/i915: s/DRM_ERROR/DRM_DEBUG in i915_gem_execbuffer.c
These are all user-trigerable, so tune down their loudness a notch.
For some of these we have i-g-t tests (because they prevent
newly-discovered bugs), without this patches running the test suite
leaves behind a dirty dmesg.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-02-09 11:10:32 +01:00
Daniel Vetter 4ca4a250ac drm/i915: reject GTT domain in relocations
This confuses our domain tracking and can (for gtt write domains) lead
to a subsequent oops.

Tested by tests/gem_exec_bad_domains from i-g-t.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-01-29 18:37:10 +01:00
Chris Wilson 1690e1eb7a drm/i915: Separate fence pin counting from normal bind pin counting
In order to correctly account for reserving space in the GTT and fences
for a batch buffer, we need to independently track whether the fence is
pinned due to a fenced GPU access in the batch or whether the buffer is
pinned in the aperture. Currently we count the fenced as pinned if the
buffer has already been seen in the execbuffer. This leads to a false
accounting of available fence registers, causing frequent mass evictions.
Worse, if coupled with the change to make i915_gem_object_get_fence()
report EDADLK upon fence starvation, the batchbuffer can fail with only
one fence required...

Fixes intel-gpu-tools/tests/gem_fenced_exec_thrash

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38735
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Tested-by: Paul Neumann <paul104x@yahoo.de>
[danvet: Resolve the functional conflict with Jesse Barnes sprite
patches, acked by Chris Wilson on irc.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-01-29 18:23:37 +01:00
Daniel Vetter 96154f2fab drm/i915: switch ring->id to be a real id
... and add a helpr function for the places where we want a flag.

This way we can use ring->id to index into arrays.

v2: Resurrect the missing beautification-space Chris Wilson noted.
I'm moving this space around because I'll reuse ring_str in the next
patch.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-01-29 17:32:58 +01:00
Ben Widawsky b93f9cf14e drm/i915: argument to control retiring behavior
Sometimes it may be the case when we idle the gpu or wait on something
we don't actually want to process the retiring list. This patch allows
callers to choose the behavior.

Reviewed-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-01-26 11:19:19 +01:00
Linus Torvalds 1a464cbb3d Merge branch 'drm-core-next' of git://people.freedesktop.org/~airlied/linux
* 'drm-core-next' of git://people.freedesktop.org/~airlied/linux: (307 commits)
  drm/nouveau/pm: fix build with HWMON off
  gma500: silence gcc warnings in mid_get_vbt_data()
  drm/ttm: fix condition (and vs or)
  drm/radeon: double lock typo in radeon_vm_bo_rmv()
  drm/radeon: use after free in radeon_vm_bo_add()
  drm/sis|via: don't return stack garbage from free_mem ioctl
  drm/radeon/kms: remove pointless CS flags priority struct
  drm/radeon/kms: check if vm is supported in VA ioctl
  drm: introduce drm_can_sleep and use in intel/radeon drivers. (v2)
  radeon: Fix disabling PCI bus mastering on big endian hosts.
  ttm: fix agp since ttm tt rework
  agp: Fix multi-line warning message whitespace
  drm/ttm/dma: Fix accounting error when calling ttm_mem_global_free_page and don't try to free freed pages.
  drm/ttm/dma: Only call set_pages_array_wb when the page is not in WB pool.
  drm/radeon/kms: sync across multiple rings when doing bo moves v3
  drm/radeon/kms: Add support for multi-ring sync in CS ioctl (v2)
  drm/radeon: GPU virtual memory support v22
  drm: make DRM_UNLOCKED ioctls with their own mutex
  drm: no need to hold global mutex for static data
  drm/radeon/benchmark: common modes sweep ignores 640x480@32
  ...

Fix up trivial conflicts in radeon/evergreen.c and vmwgfx/vmwgfx_kms.c
2012-01-10 11:04:36 -08:00
Eric Anholt ae662d3126 drm/i915: Add support for resetting the SO write pointers on gen7.
These registers are automatically incremented by the hardware during
transform feedback to track where the next streamed vertex output
should go.  Unlike the previous generation, which had a packet for
setting the corresponding registers to a defined value, gen7 only has
MI_LOAD_REGISTER_IMM to do so.  That's a secure packet (since it loads
an arbitrary register), so we need to do it from the kernel, and it
needs to be settable atomically with the batchbuffer execution so that
two clients doing transform feedback don't stomp on each others'
state.

Instead of building a more complicated interface involcing setting the
registers to a specific value, just set them to 0 when asked and
userland can tweak its pointers accordingly.

Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Keith Packard <keithp@keithp.com>
2012-01-03 09:31:18 -08:00
Ben Widawsky 84f9f938be drm/i915: Force sync command ordering (Gen6+)
The docs say this is required for Gen7, and since the bit was added for
Gen6, we are also setting it there pit pf paranoia. Particularly as
Chris points out, if PIPE_CONTROL counts as a 3d state packet.

This was found through doc inspection by Ken and applies to Gen6+;

Reported-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Keith Packard <keithp@keithp.com>
2012-01-03 09:09:44 -08:00
Ben Widawsky e2971bdab2 drm/i915: relative_constants_mode race fix
dev_priv keeps track of the current addressing mode that gets set at
execbuffer time. Unfortunately the existing code was doing this before
acquiring struct_mutex which leaves a race with another thread also
doing an execbuffer. If that wasn't bad enough, relocate_slow drops
struct_mutex which opens a much more likely error where another thread
comes in and modifies the state while relocate_slow is being slow.

The solution here is to just defer setting this state until we
absolutely need it, and we know we'll have struct_mutex for the
remainder of our code path.

v2: Keith noticed a bug in the original patch.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Keith Packard <keithp@keithp.com>
2012-01-03 09:09:44 -08:00
Keith Packard ebbd857e6b drm/i915: Disable semaphores by default on SNB
Semaphores still cause problems on some machines:

> From Udo Steinberg:
>
> With Linux-3.2-rc6 I'm frequently seeing GPU hangs when large amounts of
> text scroll in an xterm, such as when extracting a tar archive. Such as this
> one (note the timestamps):
>
>  I can reproduce it fairly easily with something
>  as simple as:
>
>	  while true; do dmesg; done

This patch turns them off on SNB while leaving them on for IVB.

Reported-by: Udo Steinberg <udo@hypervisor.org>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Eugeni Dodonov <eugeni@dodonov.net>
Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-26 21:07:26 -08:00
Eugeni Dodonov f45b55575c drm/i915: enable semaphores on per-device defaults
This adds a default setting for semaphores parameter, and enables
semaphores by default on IVB.

For now, as semaphores interaction with VTd causes random issues on
SNB, we do not enable them by default. But they can still be enabled
via the semaphores=1 kernel parameter.

v2: enables semaphores on SNB when IO remapping is disabled, with base
on Keith Packard patch.

CC: Daniel Vetter <daniel.vetter@ffwll.ch>
CC: Ben Widawsky <ben@bwidawsk.net>
CC: Keith Packard <keithp@keithp.com>
CC: Jesse Barnes <jbarnes@virtuousgeek.org>
CC: Chris Wilson <chris@chris-wilson.co.uk>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=42696
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=40564
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41353
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38862
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
2011-12-16 08:49:59 -08:00
Ben Widawsky c8c99b0f0d drm/i915: Dumb down the semaphore logic
While I think the previous code is correct, it was hard to follow and
hard to debug. Since we already have a ring abstraction, might as well
use it to handle the semaphore updates and compares.

I don't expect this code to make semaphores better or worse, but you
never know...

v2:
Remove magic per Keith's suggestions.
Ran Daniel's gem_ring_sync_loop test on this.

v3:
Ignored one of Keith's suggestions.

v4:
Removed some bloat per Daniel's recommendation.

Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Keith Packard <keithp@keithp.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Keith Packard <keithp@keithp.com>
2011-09-21 14:52:41 -07:00
Eric Anholt e92d03bff9 Revert "drm/i915: Kill GTT mappings when moving from GTT domain"
This reverts commit 4a684a4117.
Userland has always been required to set the object's domain to GTT
before using it through a GTT mapping, it's not something that the
kernel is supposed to enforce.  (The pagefault support is so that we
can handle multiple mappings without userland having to pin across
them, not so that userland can use GTT after GPU domains without
telling the kernel).

Fixes 19.2% +/- 0.8% (n=6) performance regression in cairo-gl
firefox-talos-gfx on my T420 latop.

Signed-off-by: Keith Packard <keithp@keithp.com>
2011-06-21 11:11:02 -07:00
Chris Wilson d4aeee7760 drm/i915: Disable pagefaults along execbuffer relocation fast path
Along the fast path for relocation handling, we attempt to copy directly
from the user data structures whilst holding our mutex. This causes
lockdep to warn about circular lock dependencies if we need to pagefault
the user pages. [Since when handling a page fault on a mmapped bo, we
need to acquire the struct mutex whilst already holding the mm
semaphore, it is then verboten to acquire the mm semaphore when already
holding the struct mutex. The likelihood of the user passing in the
relocations contained in a GTT mmaped bo is low, but conceivable for
extreme pathology.] In order to force the mm to return EFAULT rather
than handle the pagefault, we therefore need to disable pagefaults
across the relocation fast path.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2011-03-23 09:17:01 +00:00
Chris Wilson 47ae63e0c2 Merge branch 'drm-intel-fixes' into drm-intel-next
Apply the trivial conflicting regression fixes, but keep GPU semaphores
enabled.

Conflicts:
	drivers/gpu/drm/i915/i915_drv.h
	drivers/gpu/drm/i915/i915_gem_execbuffer.c
2011-03-07 12:35:15 +00:00
Chris Wilson c59a333f73 drm/i915: Only wait on a pending flip if we intend to write to the buffer
... as if we are only reading from it, we can do that concurrently with
the queue flip.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-03-07 11:06:02 +00:00
Chris Wilson a1656b9090 drm/i915: Disable GPU semaphores by default
Andi Kleen narrowed his GPU hangs on his Sugar Bay (SNB desktop) rev 09
down to the use of GPU semaphores, and we already know that they appear
broken up to Huron River (mobile) rev 08. (I'm optimistic that disabling
GPU semaphores is simply hiding another bug by the latency and
side-effects of the additional device interaction it introduces...)

However, use of semaphores is a massive performance improvement... Only
as long as the system remains stable. Enable at your peril.

Reported-by: Andi Kleen <andi-fd@firstfloor.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=33921
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-03-07 11:00:59 +00:00
Chris Wilson e8b2c3c47a drm/i915: Re-enable GPU semaphores for SandyBridge mobile
This seems to be running stably on my test laptop, so hopefully the
reported hangs where just symptoms of other bugs.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-03-01 19:22:52 +00:00
Chris Wilson 271d81b841 drm/i915: Allow relocation deltas outside of target bo
Userspace has a legitimate requirement to use a delta that points to
outside of the target bo, and so we need to enable this. (As this is an
abi break, albeit a relaxation of the current restrictions, mark the change
with a new flag.)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-03-01 16:01:02 +00:00
Chris Wilson ce453d81cb drm/i915: Use a device flag for non-interruptible phases
The code paths for modesetting are growing in complexity as we may need
to move the buffers around in order to fit the scanout in the aperture.
Therefore we face a choice as to whether to thread the interruptible status
through the entire pinning and unbinding code paths or to add a flag to
the device when we may not be interrupted by a signal. This does the
latter and so fixes a few instances of modesetting failures under stress.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-02-22 15:56:25 +00:00
Chris Wilson 8408c282f0 drm/i915: First try a normal large kmalloc for the temporary exec buffers
As we just need a temporary array whilst performing the relocations for
the execbuffer, first attempt to allocate using kmalloc even if it is
not of order page-0. This avoids the overhead of remapping the
discontiguous array and so gives a moderate boost to execution
throughput.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-02-22 15:56:16 +00:00
Chris Wilson c872522663 drm/i915: Protect against drm_gem_object not being the first member
Dave Airlie spotted that we had a potential bug should we ever rearrange
the drm_i915_gem_object so not the base drm_gem_object was not its first
member. He noticed that we often convert the return of
drm_gem_object_lookup() immediately into drm_i915_gem_object and then
check the result for nullity. This is only valid when the base object is
the first member and so the superobject has the same address. Play safe
instead and use the compiler to convert back to the original return
address for sanity testing.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-02-22 15:55:57 +00:00
Chris Wilson db53a30261 drm/i915: Refine tracepoints
A lot of minor tweaks to fix the tracepoints, improve the outputting for
ftrace, and to generally make the tracepoints useful again. It is a start
and enough to begin identifying performance issues and gaps in our
coverage.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-02-07 14:59:18 +00:00
Chris Wilson bdd92c9ad2 Merge branch 'drm-intel-fixes' into drm-intel-next
Merge important suspend and resume regression fixes and resolve the
small conflict.

Conflicts:
	drivers/gpu/drm/i915/i915_dma.c
2011-01-24 23:45:32 +00:00
Chris Wilson 076e2c0eb8 drm/i915: Fix use of invalid array size for ring->sync_seqno
There are I915_NUM_RINGS-1 inter-ring synchronisation counters, but we
were clearing I915_NUM_RINGS of them. Oops.

Reported-by: Jiri Slaby <jirislaby@gmail.com>
Tested-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-01-23 12:52:11 +00:00
Chris Wilson 311bd68e02 drm/i915: Trivial sparse fixes
Move code around and invoke iomem annotation in a few more places in
order to silence sparse. Still a few more iomem annotations to go...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-01-19 12:39:38 +00:00
Chris Wilson 1591192d3a drm/i915: Disable GPU semaphores on SandyBridge mobile
Hopefully, this is a temporary measure whilst the root cause is
understood. At the moment, we experience a hard hang whilst looping
urbanterror that has been identified as a result of the use of
semaphores, but so far only on SNB mobile.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=32752
Tested-by: mengmeng.meng@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-01-14 09:51:05 +00:00
Chris Wilson 595dad76a0 drm/i915/execbuffer: Clear domains before beginning reloc processing
After reordering the sequence of relocating objects, commit 6fe4f1404,
we can no longer rely on seeing all reloc targets prior to performing
the relocation. As a result we were ignoring the need to flush objects
from the render cache and invalidate the sampler caches, resulting in
rendering glitches. So we need to clear the relocation domains earlier.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-01-13 16:06:05 +00:00
Chris Wilson dd6864a4ed drm/i915/execbuffer: Reorder relocations to match new object order
On the fault path, commit 6fe4f140 introduction a regression whereby it
changed the sequence of the objects but continued to use the original
ordering of relocation entries. The result was that incorrect GTT offsets
were being fed into the execbuffer causing lots of misrendering and
potential hangs.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-01-13 16:06:05 +00:00
Chris Wilson 6fe4f14044 drm/i915/execbuffer: Reorder binding of objects to favour restrictions
As the mappable portion of the aperture is always a small subset at the
start of the GTT, it is allocated preferentially by drm_mm. This is
useful in case we ever need to map an object later. However, if you have
a large object that can consume the entire mappable region of the
GTT this prevents the batchbuffer from fitting and so causing an error.
Instead allocate all those that require a mapping up front in order to
improve the likelihood of finding sufficient space to bind them.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-01-11 22:55:48 +00:00
Chris Wilson 36cf174230 drm/i915/execbuffer: Correctly clear the current object list upon EFAULT
Before releasing the lock in order to copy the relocation list from user
pages, we need to drop all the object references as another thread may
usurp and execute another batchbuffer before we reacquire the lock.
However, the code was buggy and failed to clear the list...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
2011-01-11 22:55:29 +00:00
Chris Wilson 882417851a drm/i915: Propagate error from flushing the ring
... in order to avoid a BUG() and potential unbounded waits.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-01-11 20:44:50 +00:00
Chris Wilson b72f3acb71 drm/i915: Handle ringbuffer stalls when flushing
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-01-11 20:43:55 +00:00
Chris Wilson 63256ec534 drm/i915: Enforce write ordering through the GTT
We need to ensure that writes through the GTT land before any
modification to the MMIO registers and so must impose a mandatory write
barrier when flushing the GTT domain. This was revealed by relaxing the
write ordering by experimentally mapping the registers and the GATT as
write-combining.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-01-11 20:42:53 +00:00
Chris Wilson 72bfa19c8d drm/i915: Allow the application to choose the constant addressing mode
The relative-to-general state default is useless as it means having to
rewrite the streaming kernels for each batch. Relative-to-surface is
more useful, as that stream usually needs to be rewritten for each
batch. And absolute addressing mode, vital if you start streaming
state, is also only available by adjusting the register...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-12-20 09:41:36 +00:00
Chris Wilson b8f7ab1788 drm/i915: Mark the user reloc error paths as unlikely
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-12-09 19:46:22 +00:00
Chris Wilson 67731b87e9 drm/i915: Eliminate drm_gem_object_lookup during relocation
As we provide a list of all objects that will be accessed from the
batchbuffer, we can build a lut of the handles associated with those
objects for this invocation and use that to avoid the overhead of
looking up those objects again for every relocation.

The cost of building and searching a small hash table is much less than
that of acquiring a spinlock, searching a radix tree and manipulating an
atomic refcnt per relocation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-12-09 19:46:21 +00:00
Chris Wilson 9b3826bf84 drm/i915: Ignore fenced commands for gpu access on gen4
Userspace should not have been declaring that it needed fenced GPU
access with gen4+ as those GPUs have no fenced commands, but to be on
the safe side it is easier to ignore userspace in case they did.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-12-05 23:18:32 +00:00
Chris Wilson 1ec14ad313 drm/i915: Implement GPU semaphores for inter-ring synchronisation on SNB
The bulk of the change is to convert the growing list of rings into an
array so that the relationship between the rings and the semaphore sync
registers can be easily computed.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-12-05 00:37:38 +00:00
Chris Wilson d9e86c0ee6 drm/i915: Pipelined fencing [infrastructure]
With this change, every batchbuffer can use all available fences (save
pinned and scanout, of course) without ever stalling the gpu!

In theory. Currently the actual pipelined update of the register is
disabled due to some stability issues. However, just the deferred update
is a significant win.

Based on a series of patches by Daniel Vetter.

The premise is that before every access to a buffer through the GTT we
have to declare whether we need a register or not. If the access is by
the GPU, a pipelined update to the register is made via the ringbuffer,
and we track the last seqno of the batches that access it. If by the
CPU we wait for the last GPU access and update the register (either
to clear or to set it for the current buffer).

One advantage of being able to pipeline changes is that we can defer the
actual updating of the fence register until we first need to access the
object through the GTT, i.e. we can eliminate the stall on set_tiling.
This is important as the userspace bo cache does not track the tiling
status of active buffers which generate frequent stalls on gen3 when
enabling tiling for an already bound buffer.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2010-12-02 10:07:05 +00:00
Chris Wilson 87ca9c8a7e drm/i915: Prevent stalling for a GTT read back from a read-only GPU target
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-12-02 10:00:15 +00:00
Chris Wilson c4e7a41467 drm/i915/ringbuffer: Handle cliprects in the caller
This makes the various rings more consistent by removing the anomalous
handing of the rendering ring execbuffer dispatch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-11-30 14:17:51 +00:00
Chris Wilson 602606a472 drm/i915/execbuffer: On error, starting unwinding from the previous object
As the error occurred on the current object, it means that its state was
not changed and so it should be excluded from the unwind.

Reported-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-11-28 15:31:02 +00:00
Chris Wilson 432e58edc9 drm/i915: Avoid allocation for execbuffer object list
Besides the minimal improvement in reducing the execbuffer overhead, the
real benefit is clarifying a few routines.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-11-25 21:19:26 +00:00
Chris Wilson 54cf91dc4e drm/i915: Split i915_gem_execbuffer into its own file.
A number of dragons have been seen lurking within the execbuffer code.
The first step is then to isolate them from the rest and begin to
scrutinise them in depth. Suggested by Daniel Vetter.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-11-25 21:19:25 +00:00