Commit graph

83 commits

Author SHA1 Message Date
Chris Wilson
a4f5ea64f0 drm/i915: Refactor object page API
The plan is to make obtaining the backing storage for the object avoid
struct_mutex (i.e. use its own locking). The first step is to update the
API so that normal users only call pin/unpin whilst working on the
backing storage.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-12-chris@chris-wilson.co.uk
2016-10-28 20:53:46 +01:00
Akash Goel
3b3f1650b1 drm/i915: Allocate intel_engine_cs structure only for the enabled engines
With the possibility of addition of many more number of rings in future,
the drm_i915_private structure could bloat as an array, of type
intel_engine_cs, is embedded inside it.
	struct intel_engine_cs engine[I915_NUM_ENGINES];
Though this is still fine as generally there is only a single instance of
drm_i915_private structure used, but not all of the possible rings would be
enabled or active on most of the platforms. Some memory can be saved by
allocating intel_engine_cs structure only for the enabled/active engines.
Currently the engine/ring ID is kept static and dev_priv->engine[] is simply
indexed using the enums defined in intel_engine_id.
To save memory and continue using the static engine/ring IDs, 'engine' is
defined as an array of pointers.
	struct intel_engine_cs *engine[I915_NUM_ENGINES];
dev_priv->engine[engine_ID] will be NULL for disabled engine instances.

There is a text size reduction of 928 bytes, from 1028200 to 1027272, for
i915.o file (but for i915.ko file text size remain same as 1193131 bytes).

v2:
- Remove the engine iterator field added in drm_i915_private structure,
  instead pass a local iterator variable to the for_each_engine**
  macros. (Chris)
- Do away with intel_engine_initialized() and instead directly use the
  NULL pointer check on engine pointer. (Chris)

v3:
- Remove for_each_engine_id() macro, as the updated macro for_each_engine()
  can be used in place of it. (Chris)
- Protect the access to Render engine Fault register with a NULL check, as
  engine specific init is done later in Driver load sequence.

v4:
- Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris)
- Kill the superfluous init_engine_lists().

v5:
- Cleanup the intel_engines_init() & intel_engines_setup(), with respect to
  allocation of intel_engine_cs structure. (Chris)

v6:
- Rebase.

v7:
- Optimize the for_each_engine_masked() macro. (Chris)
- Change the type of 'iter' local variable to enum intel_engine_id. (Chris)
- Rebase.

v8: Rebase.

v9: Rebase.

v10:
- For index calculation use engine ID instead of pointer based arithmetic in
  intel_engine_sync_index() as engine pointers are not contiguous now (Chris)
- For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas)
- Use for_each_engine macro for cleanup in intel_engines_init() and remove
  check for NULL engine pointer in cleanup() routines. (Joonas)

v11: Rebase.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
2016-10-14 09:58:43 +01:00
Jani Nikula
911f4869eb drm/i915: use NULL for NULL pointers
Fix sparse warning:

drivers/gpu/drm/i915/i915_cmd_parser.c:987:72: warning: Using plain
integer as NULL pointer

Fixes: 52a42cec4b ("drm/i915/cmdparser: Accelerate copies from WC memory")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1473946137-1931-4-git-send-email-jani.nikula@intel.com
2016-09-16 10:35:43 +03:00
Chris Wilson
52a42cec4b drm/i915/cmdparser: Accelerate copies from WC memory
If we need to use clflush to prepare our batch for reads from memory, we
can bypass the cache instead by using non-temporal copies.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-39-chris@chris-wilson.co.uk
2016-08-18 22:37:01 +01:00
Chris Wilson
76ff480ec9 drm/i915/cmdparser: Use binary search for faster register lookup
A significant proportion of the cmdparsing time for some batches is the
cost to find the register in the mmiotable. We ensure that those tables
are in ascending order such that we could do a binary search if it was
ever merited. It is.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-38-chris@chris-wilson.co.uk
2016-08-18 22:37:01 +01:00
Chris Wilson
ea884f09e5 drm/i915/cmdparser: Check for SKIP descriptors first
If the command descriptor says to skip it, ignore checking for anyother
other conflict.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-37-chris@chris-wilson.co.uk
2016-08-18 22:37:00 +01:00
Chris Wilson
efdfd91f9b drm/i915/cmdparser: Compare against the previous command descriptor
On the blitter (and in test code), we see long sequences of repeated
commands, e.g. XY_PIXEL_BLT, XY_SCANLINE_BLT, or XY_SRC_COPY. For these,
we can skip the hashtable lookup by remembering the previous command
descriptor and doing a straightforward compare of the command header.
The corollary is that we need to do one extra comparison before lookup
up new commands.

v2: Less magic mask (ok, it is still magic, but now you cannot see!)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-36-chris@chris-wilson.co.uk
2016-08-18 22:37:00 +01:00
Chris Wilson
d6a4ead7a3 drm/i915/cmdparser: Improve hash function
The existing code's hashfunction is very suboptimal (most 3D commands
use the same bucket degrading the hash to a long list). The code even
acknowledge that the issue was known and the fix simple:

/*
 * If we attempt to generate a perfect hash, we should be able to look at bits
 * 31:29 of a command from a batch buffer and use the full mask for that
 * client. The existing INSTR_CLIENT_MASK/SHIFT defines can be used for this.
 */

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-35-chris@chris-wilson.co.uk
2016-08-18 22:36:59 +01:00
Chris Wilson
ed13033f02 drm/i915/cmdparser: Only cache the dst vmap
For simplicity, we want to continue using a contiguous mapping of the
command buffer, but we can reduce the number of vmappings we hold by
switching over to a page-by-page copy from the user batch buffer to the
shadow. The cost for saving one linear mapping is about 5% in trivial
workloads - which is more or less the overhead in calling kmap_atomic().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-34-chris@chris-wilson.co.uk
2016-08-18 22:36:59 +01:00
Chris Wilson
0b5372727b drm/i915/cmdparser: Use cached vmappings
The single largest factor in the overhead of parsing the commands is the
setup of the virtual mapping to provide a continuous block for the batch
buffer. If we keep those vmappings around (against the better judgement
of mm/vmalloc.c, which we offset by handwaving and looking suggestively
at the shrinker) we can dramatically improve the performance of the
parser for small batches (such as media workloads). Furthermore, we can
use the prepare shmem read/write functions to determine  how best we
need to clflush the range (rather than every page of the object).

The impact of caching both src/dst vmaps is +80% on ivb and +140% on byt
for the throughput on small batches. (Caching just the dst vmap and
iterating over the src, doing a page by page copy is roughly 5% slower
on both platforms. That may be an acceptable trade-off to eliminate one
cached vmapping, and we may be able to reduce the per-page copying overhead
further.) For *this* simple test case, the cmdparser is now within a
factor of 2 of ideal performance.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-33-chris@chris-wilson.co.uk
2016-08-18 22:36:59 +01:00
Chris Wilson
068715b922 drm/i915/cmdparser: Add the TIMESTAMP register for the other engines
Since I have been using the BCS_TIMESTAMP to measure latency of
execution upon the blitter ring, allow regular userspace to also read
from that register. They are already allowed RCS_TIMESTAMP!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-32-chris@chris-wilson.co.uk
2016-08-18 22:36:58 +01:00
Chris Wilson
7756e45407 drm/i915/cmdparser: Make initialisation failure non-fatal
If the developer adds a register in the wrong order, we BUG during boot.
That makes development and testing very difficult. Let's be a bit more
friendly and disable the command parser with a big warning if the tables
are invalid.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-31-chris@chris-wilson.co.uk
2016-08-18 22:36:58 +01:00
Chris Wilson
43394c7d0d drm/i915: Extract i915_gem_obj_prepare_shmem_write()
This is a companion to i915_gem_obj_prepare_shmem_read() that prepares
the backing storage for direct writes. It first serialises with the GPU,
pins the backing storage and then indicates what clfushes are required in
order for the writes to be coherent.

Whilst here, fix support for ancient CPUs without clflush for which we
cannot do the GTT+clflush tricks.

v2: Add i915_gem_obj_finish_shmem_access() for symmetry

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-8-chris@chris-wilson.co.uk
2016-08-18 22:36:44 +01:00
Chris Wilson
33a051a5fc drm/i915/cmdparser: Remove stray intel_engine_cs *ring
When we refer to intel_engine_cs, we want to use engine so as not to
confuse ourselves about ringbuffers.

v2: Rename all the functions as well, as well as a few more stray comments.
v3: Split the really long error message strings

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-6-git-send-email-chris@chris-wilson.co.uk
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469606850-28659-1-git-send-email-chris@chris-wilson.co.uk
2016-07-27 16:23:05 +01:00
Tvrtko Ursulin
14bb2c1179 drm/i915: Fix a buch of kerneldoc warnings
Just a bunch of stale kerneldocs generating warnings when
building the docs. Mostly function parameters so not very
useful but still.

v2: Tidy.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1464958937-23344-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-06-06 13:04:26 +01:00
Chris Wilson
c033666a94 drm/i915: Store a i915 backpointer from engine, and use it
text	   data	    bss	    dec	    hex	filename
6309351	3578714	 696320	10584385	 a18141	vmlinux
6308391	3578714	 696320	10583425	 a17d81	vmlinux

Almost 1KiB of code reduction.

v2: More s/INTEL_INFO()->gen/INTEL_GEN()/ and IS_GENx() conversions

   text	   data	    bss	    dec	    hex	filename
6304579	3578778	 696320	10579677	 a16edd	vmlinux
6303427	3578778	 696320	10578525	 a16a5d	vmlinux

Now over 1KiB!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1462545621-30125-3-git-send-email-chris@chris-wilson.co.uk
2016-05-09 13:41:24 +01:00
Kenneth Graunke
6761d0a184 drm/i915: Allow MI_LOAD_REGISTER_REG between whitelisted registers.
Allowing register copies where the source and destination are both
whitelisted should be safe, and is useful.  For example, Mesa uses
this to load the command streamer math registers with data from the
pipeline statistics counters.

v2: Reject writes to OACONTROL (and reads as well :(

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> # v1
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1462521014-13595-1-git-send-email-chris@chris-wilson.co.uk
2016-05-09 08:30:48 +01:00
Chris Wilson
1ca3712ca3 drm/i915: Report command parser version 0 if disabled
If the command parser is not active, then it is appropriate to report it
as operating at version 0 as no higher mode is supported. This greatly
simplifies userspace querying for the command parser as we then do not
need to second guess when it will be active (a mixture of module
parameters and generational support, which may change over time).

v2: s/comand/command/ misspelling in comment

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1462368336-21230-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2016-05-05 08:40:02 +01:00
Jordan Justen
6cf0716c03 drm/i915: Bump command parser version for new whitelisted registers
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1457335830-30923-6-git-send-email-jordan.l.justen@intel.com
2016-03-21 10:03:26 +01:00
Jordan Justen
1b85066bb1 drm/i915: Add Haswell CS GPR registers to whitelist
This is needed for the Mesa Vulkan driver on Haswell.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1457335830-30923-5-git-send-email-jordan.l.justen@intel.com
2016-03-21 10:03:17 +01:00
Jordan Justen
99c5aeca94 drm/i915: Move Haswell registers to separate whitelist table
Now that we can whitelist registers only on Haswell, move HSW_SCRATCH1
and HSW_ROW_CHICKEN3 into a separate Haswell only table.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1457335830-30923-4-git-send-email-jordan.l.justen@intel.com
2016-03-21 10:02:46 +01:00
Jordan Justen
361b027bc6 drm/i915: Use an array of register tables in command parser
For Haswell, we will want another table of registers while retaining
the large common table of whitelisted registers shared by all gen7
devices.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
[danvet: Pipe patch through sed -e 's/\<ring\>/engine/g' to make it
apply.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2016-03-21 10:02:01 +01:00
Jordan Justen
a6573e1f54 drm/i915: Add TIMESTAMP to register whitelist
This is needed for the Mesa Vulkan driver on Haswell.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kristian Høgsberg <krh@bitplanet.net>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1457335830-30923-2-git-send-email-jordan.l.justen@intel.com
2016-03-21 09:56:42 +01:00
Tvrtko Ursulin
0bc40be85f drm/i915: Rename intel_engine_cs function parameters
@@
identifier func;
@@
func(..., struct intel_engine_cs *
- ring
+ engine
, ...)
{
<...
- ring
+ engine
...>
}
@@
identifier func;
type T;
@@
T func(..., struct intel_engine_cs *
- ring
+ engine
, ...);

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-03-16 15:33:10 +00:00
Ville Syrjälä
f0f59a00a1 drm/i915: Type safe register read/write
Make I915_READ and I915_WRITE more type safe by wrapping the register
offset in a struct. This should eliminate most of the fumbles we've had
with misplaced parens.

This only takes care of normal mmio registers. We could extend the idea
to other register types and define each with its own struct. That way
you wouldn't be able to accidentally pass the wrong thing to a specific
register access function.

The gpio_reg setup is probably the ugliest thing left. But I figure I'd
just leave it for now, and wait for some divine inspiration to strike
before making it nice.

As for the generated code, it's actually a bit better sometimes. Eg.
looking at i915_irq_handler(), we can see the following change:
  lea    0x70024(%rdx,%rax,1),%r9d
  mov    $0x1,%edx
- movslq %r9d,%r9
- mov    %r9,%rsi
- mov    %r9,-0x58(%rbp)
- callq  *0xd8(%rbx)
+ mov    %r9d,%esi
+ mov    %r9d,-0x48(%rbp)
 callq  *0xd8(%rbx)

So previously gcc thought the register offset might be signed and
decided to sign extend it, just in case. The rest appears to be
mostly just minor shuffling of instructions.

v2: i915_mmio_reg_{offset,equal,valid}() helpers added
    s/_REG/_MMIO/ in the register defines
    mo more switch statements left to worry about
    ring_emit stuff got sorted in a prep patch
    cmd parser, lrc context and w/a batch buildup also in prep patch
    vgpu stuff cleaned up and moved to a prep patch
    all other unrelated changes split out
v3: Rebased due to BXT DSI/BLC, MOCS, etc.
v4: Rebased due to churn, s/i915_mmio_reg_t/i915_reg_t/

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1447853606-2751-1-git-send-email-ville.syrjala@linux.intel.com
2015-11-18 15:39:11 +02:00
Ville Syrjälä
e597ef4045 drm/i915: Make the cmd parser 64bit regs explicit
Add defines for the upper halves of the registers used by the cmd
parser. Getting rid of the arithmetic with the register offset
will help in making registers type safe.

v2: s/_HI/_UDW/ (Chris)

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1446839080-18732-1-git-send-email-ville.syrjala@linux.intel.com
2015-11-18 14:35:20 +02:00
Jordan Justen
7b9748cb51 drm/i915: Add GEN7_GPGPU_DISPATCHDIMX/Y/Z to the register whitelist
This is required to support glDispatchComputeIndirect for gen7.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-10-06 10:40:22 +02:00
Chris Wilson
614f4ad798 drm/i915: Fix cmdparser STORE/LOAD command descriptors
Fixes regression from
commit f1afe24f0e
Author: Arun Siluvery <arun.siluvery@linux.intel.com>
Date:   Tue Aug 4 16:22:20 2015 +0100

    drm/i915: Change SRM, LRM instructions to use correct length

which forgot to account for the length bias when declaring the fixed
length.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91844
Reported-by: Andreas Reis <andreas.reis@gmail.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-09-04 10:43:09 +02:00
Francisco Jerez
2bbe6bbb0d drm/i915: Bump command parser version number.
This was forgotten in

commit d351f6d948
Author: Francisco Jerez <currojerez@riseup.net>
Date:   Fri May 29 16:44:15 2015 +0300

    drm/i915: Add SCRATCH1 and ROW_CHICKEN3 to the register whitelist.

Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-09-01 11:40:15 +02:00
Arun Siluvery
f1afe24f0e drm/i915: Change SRM, LRM instructions to use correct length
MI_STORE_REGISTER_MEM, MI_LOAD_REGISTER_MEM instructions are not really
variable length instructions unlike MI_LOAD_REGISTER_IMM where it expects
(reg, addr) pairs so use fixed length for these instructions.

v2: rebase

Cc: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
[danvet: Appease checkpatch as Mika spotted in i915_reg.h - it seems
terminally unhappy about i915_cmd_parser.c so that would be a separate
patch.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-08-26 08:44:41 +02:00
Hanno Böck
8453580cb8 drm/i915: Fix command parser table validator
As we may like to use a bisection search on the tables in future, we
need them to be ordered. For convenience we expect the compiled tables
to be order and check on initialisation. However, the validator used the
wrong iterators failed to spot the misordered MI tables and instead
walked off into the unknown (as spotted by kasan).

Signed-off-by: Hanno Boeck <hanno@hboeck.de>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Again hand-assemble patch ...]
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2015-07-29 10:31:04 +02:00
Hanno Böck
9f58582c7a drm/i915: Properly sort MI coomand table
In the future, we may want to speed up command/register searching using
a bisection and so we require them to be in ascending order respectively
by command value or register address. However, this was not true for one
pair in the MI table; make it so.

Signed-off-by: Hanno Boeck <hanno@hboeck.de>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Hand-assemble patch from raw patch from Hanno and commit message from Chris.]
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2015-07-29 10:29:58 +02:00
Arun Siluvery
9e00084750 drm/i915: Update WaFlushCoherentL3CacheLinesAtContextSwitch
In this WA we need to set GEN8_L3SQCREG4[21:21] and reset it after PIPE_CONTROL
instruction but there is a slight complication as this is applied in WA batch
where the values are only initialized once.
Dave identified an issue with the current implementation where the register value
is read once at the beginning and it is reused; this patch corrects this by saving
the register value to memory, update register with the bit of our interest and
restore it back with original value.

This implementation uses MI_LOAD_REGISTER_MEM which is currently only used
by command parser and was using a default length of 0. This is now updated
with correct length and moved to appropriate place.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-07-06 14:37:39 +02:00
Francisco Jerez
d351f6d948 drm/i915: Add SCRATCH1 and ROW_CHICKEN3 to the register whitelist.
Only bit 27 of SCRATCH1 and bit 6 of ROW_CHICKEN3 are allowed to be
set because of security-sensitive bits we don't want userspace to mess
with.  On HSW hardware the whitelisted bits control whether atomic
read-modify-write operations are performed on L3 or on GTI, and when
set to L3 (which can be 10x-30x better performing than on GTI,
depending on the application) require great care to avoid a system
hang, so we currently program them to be handled on GTI by default.

Beignet can immediately start taking advantage of this change to
enable L3 atomics.  Mesa should eventually switch to L3 atomics too,
but a number of non-trivial changes are still required so it will
continue using GTI atomics for now.

Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-15 12:34:58 +02:00
Francisco Jerez
4e86f725ce drm/i915: Extend the parser to check register writes against a mask/value pair.
In some cases it might be unnecessary or dangerous to give userspace
the right to write arbitrary values to some register, even though it
might be desirable to give it control of some of its bits.  This patch
extends the register whitelist entries to contain a mask/value pair in
addition to the register offset.  For registers with non-zero mask,
any LRM writes and LRI writes where the bits of the immediate given by
the mask don't match the specified value will be rejected.

This will be used in my next patch to grant userspace partial write
access to some sensitive registers.

Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-15 12:34:50 +02:00
Francisco Jerez
6a65c5b932 drm/i915: Fix command parser to validate multiple register access with the same command.
Until now the software command checker assumed that commands could
read or write at most a single register per packet.  This is not
necessarily the case, MI_LOAD_REGISTER_IMM expects a variable-length
list of offset/value pairs and writes them in sequence.  The previous
code would only check whether the first entry was valid, effectively
allowing userspace to write unrestricted registers of the MMIO space
by sending a multi-register write with a legal first register, with
potential security implications on Gen6 and 7 hardware.

Fix it by extending the drm_i915_cmd_descriptor table to represent
multi-register access and making validate_cmd() iterate for all
register offsets present in the command packet.

Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-15 12:34:26 +02:00
Chris Wilson
de4e783a3f drm/i915: Tidy batch pool logic
Move the madvise logic out of the execbuffer main path into the
relatively rare allocation path, making the execbuffer manipulation less
fragile.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-04-10 08:56:04 +02:00
Mika Kuoppala
72c5ba9562 drm/i915: Fix vmap_batch page iterator overrun
vmap_batch() calculates amount of needed pages for the mapping
we are going to create. And it uses this page count as an
argument for the for_each_sg_pages() macro. The macro takes the number
of sg list entities as an argument, not the page count. So we ended
up iterating through all the pages on the mapped object, corrupting
memory past the smaller pages[] array.

Fix this by bailing out when we have enough pages.

This regression has been introduced in

commit 17cabf571e
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Jan 14 11:20:57 2015 +0000

    drm/i915: Trim the command parser allocations

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-03-17 22:30:31 +01:00
Chris Wilson
17cabf571e drm/i915: Trim the command parser allocations
Currently, the command parser tries to create a secondary batch exactly
as large as the original, and vmap both. This is open to abuse by
userspace using extremely large batch objects, but only executing very
short batches. For example, this would be if userspace were to implement
a command submission ringbuffer. However, we only need to allocate pages
for just the contents of the command sequence in the batch - all
relocations copied to the secondary batch will reference the original
batch and so there can be no access to the secondary batch outside of
the explicit execution region.

Testcase: igt/gem_exec_big #ivb,byt,hsw
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88308
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-02-23 17:07:40 +01:00
Jordan Justen
c61200c2c7 drm/i915: Add GPGPU_THREADS_DISPATCHED to the register whitelist
This will allow us to read the number of dispatched compute threads
for GL_ARB_pipeline_statistics_query.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-16 10:39:10 +01:00
Brad Volkin
7174537627 drm/i915: Tidy up execbuffer command parsing code
Move it to a separate function since the main do_execbuffer function
already has so much going on.

v2:
- Move pin/unpin calls inside i915_parse_cmds() (Chris W, v4 7/7
  feedback)

Issue: VIZ-4719
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
Reviewed-By: Jon Bloomfield <jon.bloomfield@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-16 10:39:10 +01:00
Brad Volkin
b9ffd80ed6 drm/i915: Use batch length instead of object size in command parser
Previously we couldn't trust the user-supplied batch length because
it came directly from userspace (i.e. untrusted code). It would have
affected what commands software parsed without regard to what hardware
would actually execute, leaving a potential hole.

With the parser now copying the user supplied batch buffer and writing
MI_NOP commands to any space after the copied region, we can safely use
the batch length input. This should be a performance win as the actual
batch length is frequently much smaller than the allocated object size.

v2: Fix handling of non-zero batch_start_offset

Issue: VIZ-4719
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
Reviewed-By: Jon Bloomfield <jon.bloomfield@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-16 10:39:09 +01:00
Brad Volkin
78a423772d drm/i915: Use batch pools with the command parser
This patch sets up all of the tracking and copying necessary to
use batch pools with the command parser and dispatches the copied
(shadow) batch to the hardware.

After this patch, the parser is in 'enabling' mode.

Note that performance takes a hit from the copy in some cases
and will likely need some work. At a rough pass, the memcpy
appears to be the bottleneck. Without having done a deeper
analysis, two ideas that come to mind are:
1) Copy sections of the batch at a time, as they are reached
   by parsing. Might improve cache locality.
2) Copy only up to the userspace-supplied batch length and
   memset the rest of the buffer. Reduces the number of reads.

v2:
- Remove setting the capacity of the pool
- One global pool instead of per-ring pools
- Replace batch_obj with shadow_batch_obj and hook into eb->vmas
- Memset any space in the shadow batch beyond what gets copied
- Rebased on execlist prep refactoring

v3:
- Rebase on chained batch handling
- Squash in setting the secure dispatch flag
- Add a note about the interaction w/secure dispatch pinning
- Check for request->batch_obj == NULL in i915_gem_free_request

v4:
- Fix read domains for shadow_batch_obj
- Remove the set_to_gtt_domain call from i915_parse_cmds
- ggtt_pin/unpin in the parser block to simplify error handling
- Check USES_FULL_PPGTT before setting DISPATCH_SECURE flag
- Remove i915_gem_batch_pool_put calls

v5:
- Move 'pending_read_domains |= I915_GEM_DOMAIN_COMMAND' after
  the parser (danvet, from v4 0/7 feedback)

Issue: VIZ-4719
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
Reviewed-By: Jon Bloomfield <jon.bloomfield@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-16 10:39:09 +01:00
Michael H. Nguyen
86ef630d53 drm/i915: Add MI_SET_APPID cmd to cmd parser tables
Was missing.

Issue: VIZ-4701
Signed-off-by: Michael H. Nguyen <michael.h.nguyen@intel.com>
Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-10 17:47:21 +01:00
Daniel Vetter
bfc882b4e3 drm/i915: Flatten engine init control flow
Now that sanity prevails and we have the clean split between software
init and starting the engines we can drop all the "have we allocate
this struct already?" nonsense.

Execlist code could benefit quite a bit more still, but that's for
another patch.

Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2014-12-03 09:35:29 +01:00
Neil Roberts
f1f55cc055 drm/i915: Add the predicate source registers to the register whitelist
The predicate source registers are needed to implement conditional
rendering without stalling. The two source registers are used to load
the previous values of the PS_DEPTH_COUNT register saved from
PIPE_CONTROL commands. These can then be compared and used to set the
predicate enable bit via the MI_PREDICATE command.

The command parser version number is increased to 2 to make it easier
to detect the new functionality in user space.

Signed-off-by: Neil Roberts <neil@linux.intel.com>
Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-11-14 10:29:24 +01:00
Daniel Vetter
eb84f976c8 Merge remote-tracking branch 'airlied/drm-next' into HEAD
Backmerge drm-next so that I can keep merging patches. Specifically I
want:
- atomic stuff, yay!
- eld parsing patch from Jani.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2014-11-10 10:55:35 +01:00
Dave Airlie
1f9e14baa9 Merge tag 'topic/core-stuff-2014-11-05' of git://anongit.freedesktop.org/drm-intel into drm-next
Just various stuff all over from a bunch of people. Shortlog gives a beter
overview, it's really all misc drm patches.

* tag 'topic/core-stuff-2014-11-05' of git://anongit.freedesktop.org/drm-intel:
  drm/edid: add #defines and helpers for ELD
  drm/dp: Add counters in the drm_dp_aux struct for I2C NACKs and DEFERs
  drm: Remove compiler BUG_ON() test
  drm: Fix DRM_FORCE_ON_DIGITAL use
  drm/gma500: Don't destroy DRM properties in the driver
  drm/i915: Don't destroy DRM properties in the driver
  drm: Add a note to drm_property_create() about property lifetime
  gpu: drm: Fix warning caused by a parameter description in drm_crtc.c
  drm/dp-helper: Move the legacy helpers to gma500
  drm/crtc: Remove duplicated ioctl code
  drm/crtc: Fix two typos
  gpu:drm: Fix typo in Documentation/DocBook/drm.xml
  gpu: drm: drm_dp_mst_topology.c: Fix improper use of strncat
  drm: drm_err: Remove unnecessary __func__ argument
  drm: Implement O_NONBLOCK support on /dev/dri/cardN
2014-11-07 10:58:46 +10:00
Brad Volkin
42c7156af9 drm/i915: Abort command parsing for chained batches
libva uses chained batch buffers in a way that the command parser
can't generally handle. Fortunately, libva doesn't need to write
registers from batch buffers in the way that mesa does, so this
patch causes the driver to fall back to non-secure dispatch if
the parser detects a chained batch buffer.

Note: The 2nd hunk to munge the error code of the parser looks a bit
superflous. At least until we have the batch copy code ready and can
run the cmd parser in granting mode. But it isn't since we still need
to let existing libva buffers pass (though not with elevated privs
ofc!).

Testcase: igt/gem_exec_parse/chained-batch
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
[danvet: Add note - this confused me in review and Brad clarified
things (after a few mails ...).]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-11-04 14:04:54 +01:00
Dave Airlie
bbf0ef0334 Merge tag 'drm-intel-next-2014-10-03-no-ppgtt' of git://anongit.freedesktop.org/drm-intel into drm-next
Ok, new attempt, this time around with full ppgtt disabled again.

drm-intel-next-2014-10-03:
- first batch of skl stage 1 enabling
- fixes from Rodrigo to the PSR, fbc and sink crc code
- kerneldoc for the frontbuffer tracking code, runtime pm code and the basic
  interrupt enable/disable functions
- smaller stuff all over
drm-intel-next-2014-09-19:
- bunch more i830M fixes from Ville
- full ppgtt now again enabled by default
- more ppgtt fixes from Michel Thierry and Chris Wilson
- plane config work from Gustavo Padovan
- spinlock clarifications
- piles of smaller improvements all over, as usual

* tag 'drm-intel-next-2014-10-03-no-ppgtt' of git://anongit.freedesktop.org/drm-intel: (114 commits)
  Revert "drm/i915: Enable full PPGTT on gen7"
  drm/i915: Update DRIVER_DATE to 20141003
  drm/i915: Remove the duplicated logic between the two shrink phases
  drm/i915: kerneldoc for interrupt enable/disable functions
  drm/i915: Use dev_priv instead of dev in irq setup functions
  drm/i915: s/pm._irqs_disabled/pm.irqs_enabled/
  drm/i915: Clear TX FIFO reset master override bits on chv
  drm/i915: Make sure hardware uses the correct swing margin/deemph bits on chv
  drm/i915: make sink_crc return -EIO on aux read/write failure
  drm/i915: Constify send buffer for intel_dp_aux_ch
  drm/i915: De-magic the PSR AUX message
  drm/i915: Reinstate error level message for non-simulated gpu hangs
  drm/i915: Kerneldoc for intel_runtime_pm.c
  drm/i915: Call runtime_pm_disable directly
  drm/i915: Move intel_display_set_init_power to intel_runtime_pm.c
  drm/i915: Bikeshed rpm functions name a bit.
  drm/i915: Extract intel_runtime_pm.c
  drm/i915: Remove intel_modeset_suspend_hw
  drm/i915: spelling fixes for frontbuffer tracking kerneldoc
  drm/i915: Tighting frontbuffer tracking around flips
  ...
2014-10-28 12:37:58 +10:00