linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-10-03 23:58:05 +00:00

Author	SHA1	Message	Date
Ville Syrjälä	6ec5bd3489	drm/i915: Deprecate I915_SET_COLORKEY_NONE Deprecate the silly I915_SET_COLORKEY_NONE flag. The obvious way to disable colorkey is to just set flags to 0, which is exactly what the intel ddx has been doing all along. Currently when userspace sets the flags to 0, we end up in a funny state where colorkey is disabled, but various colorkey vs. scaling checks still consider colorkey to be enabled, and thus we don't allow plane scaling to kick in. In case there is some other userspace out there that actually uses this flag (unlikely as this is an i915 specific uapi) we'll keep on accepting it. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180202204231.27905-1-ville.syrjala@linux.intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2018-02-05 20:54:01 +02:00
Tvrtko Ursulin	3452fa3095	drm/i915/pmu: Aggregate all RC6 states into one counter Chris has discovered that RC6, RC6p and RC6pp counters are mutually exclusive, and even that on some SNB SKUs you get RC6p increasing, and on the others RC6. Furthermore RC6p and RC6pp were only present starting from GEN6 until, GEN7, not including Haswell. All this combined makes it questionable whether we need to reserve new ABI for these counters. One idea was to just combine them all under the RC6 counter to simplify things for userspace. So that is what this patch does. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20171124171331.17981-1-tvrtko.ursulin@linux.intel.com	2017-11-24 17:20:04 +00:00
Tvrtko Ursulin	b552ae444e	drm/i915/pmu: Drop I915_ENGINE_SAMPLE_MAX from uapi headers We have agreed during the engine classes discussion that fields marked as non-ABI are better left out altogether from uapi headers. v2: Use a local define for maintanability. (Chris Wilson) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20171123100701.18430-1-tvrtko.ursulin@linux.intel.com	2017-11-23 12:27:43 +00:00
Tvrtko Ursulin	6060b6aec0	drm/i915/pmu: Add RC6 residency metrics For clients like intel-gpu-overlay it is easier to read the counters via the perf API than having to parse sysfs. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20171121181852.16128-9-tvrtko.ursulin@linux.intel.com	2017-11-22 11:25:06 +00:00
Tvrtko Ursulin	0cd4684d6e	drm/i915/pmu: Add interrupt count metric For clients like intel-gpu-overlay it is easier to read the count via the perf API than having to parse /proc. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20171121181852.16128-7-tvrtko.ursulin@linux.intel.com	2017-11-22 11:25:04 +00:00
Tvrtko Ursulin	b46a33e271	drm/i915/pmu: Expose a PMU interface for perf queries From: Chris Wilson <chris@chris-wilson.co.uk> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> From: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> The first goal is to be able to measure GPU (and invidual ring) busyness without having to poll registers from userspace. (Which not only incurs holding the forcewake lock indefinitely, perturbing the system, but also runs the risk of hanging the machine.) As an alternative we can use the perf event counter interface to sample the ring registers periodically and send those results to userspace. Functionality we are exporting to userspace is via the existing perf PMU API and can be exercised via the existing tools. For example: perf stat -a -e i915/rcs0-busy/ -I 1000 Will print the render engine busynnes once per second. All the performance counters can be enumerated (perf list) and have their unit of measure correctly reported in sysfs. v1-v2 (Chris Wilson): v2: Use a common timer for the ring sampling. v3: (Tvrtko Ursulin) * Decouple uAPI from i915 engine ids. * Complete uAPI defines. * Refactor some code to helpers for clarity. * Skip sampling disabled engines. * Expose counters in sysfs. * Pass in fake regs to avoid null ptr deref in perf core. * Convert to class/instance uAPI. * Use shared driver code for rc6 residency, power and frequency. v4: (Dmitry Rogozhkin) * Register PMU with .task_ctx_nr=perf_invalid_context * Expose cpumask for the PMU with the single CPU in the mask * Properly support pmu->stop(): it should call pmu->read() * Properly support pmu->del(): it should call stop(event, PERF_EF_UPDATE) * Introduce refcounting of event subscriptions. * Make pmu.busy_stats a refcounter to avoid busy stats going away with some deleted event. * Expose cpumask for i915 PMU to avoid multiple events creation of the same type followed by counter aggregation by perf-stat. * Track CPUs getting online/offline to migrate perf context. If (likely) cpumask will initially set CPU0, CONFIG_BOOTPARAM_HOTPLUG_CPU0 will be needed to see effect of CPU status tracking. * End result is that only global events are supported and perf stat works correctly. * Deny perf driver level sampling - it is prohibited for uncore PMU. v5: (Tvrtko Ursulin) * Don't hardcode number of engine samplers. * Rewrite event ref-counting for correctness and simplicity. * Store initial counter value when starting already enabled events to correctly report values to all listeners. * Fix RC6 residency readout. * Comments, GPL header. v6: * Add missing entry to v4 changelog. * Fix accounting in CPU hotplug case by copying the approach from arch/x86/events/intel/cstate.c. (Dmitry Rogozhkin) v7: * Log failure message only on failure. * Remove CPU hotplug notification state on unregister. v8: * Fix error unwind on failed registration. * Checkpatch cleanup. v9: * Drop the energy metric, it is available via intel_rapl_perf. (Ville Syrjälä) * Use HAS_RC6(p). (Chris Wilson) * Handle unsupported non-engine events. (Dmitry Rogozhkin) * Rebase for intel_rc6_residency_ns needing caller managed runtime pm. * Drop HAS_RC6 checks from the read callback since creating those events will be rejected at init time already. * Add counter units to sysfs so perf stat output is nicer. * Cleanup the attribute tables for brevity and readability. v10: * Fixed queued accounting. v11: * Move intel_engine_lookup_user to intel_engine_cs.c * Commit update. (Joonas Lahtinen) v12: * More accurate sampling. (Chris Wilson) * Store and report frequency in MHz for better usability from perf stat. * Removed metrics: queued, interrupts, rc6 counters. * Sample engine busyness based on seqno difference only for less MMIO (and forcewake) on all platforms. (Chris Wilson) v13: * Comment spelling, use mul_u32_u32 to work around potential GCC issue and somne code alignment changes. (Chris Wilson) v14: * Rebase. v15: * Rebase for RPS refactoring. v16: * Use the dynamic slot in the CPU hotplug state machine so that we are free to setup our state as multi-instance. Previously we were re-using the CPUHP_AP_PERF_X86_UNCORE_ONLINE slot which is neither used as multi-instance, nor owned by our driver to start with. * Register the CPU hotplug handlers after the PMU, otherwise the callback will get called before the PMU is initialized which can end up in perf_pmu_migrate_context with an un-initialized base. * Added workaround for a probable bug in cpuhp core. v17: * Remove workaround for the cpuhp bug. v18: * Rebase for drm_i915_gem_engine_class getting upstream before us. v19: * Rebase. (trivial) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171121181852.16128-2-tvrtko.ursulin@linux.intel.com	2017-11-22 11:24:57 +00:00
Lionel Landwerlin	dab9178333	drm/i915: expose command stream timestamp frequency to userspace We use to have this fixed per generation, but starting with CNL userspace cannot tell just off the PCI ID. Let's make this information available. This is particularly useful for performance monitoring where much of the normalization work is done using those timestamps (this include pipeline statistics in both GL & Vulkan as well as OA reports). v2: Use variables for 24MHz/19.2MHz values (Ewelina) Renamed function & coding style (Sagar) v3: Fix frequency read on Broadwell (Sagar) Fix missing divide by 4 on <= gen4 (Sagar) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Tested-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171110190845.32574-7-lionel.g.landwerlin@intel.com	2017-11-13 15:59:30 +00:00
Chris Wilson	d2b4b97933	drm/i915: Record the default hw state after reset upon load Take a copy of the HW state after a reset upon module loading by executing a context switch from a blank context to the kernel context, thus saving the default hw state over the blank context image. We can then use the default hw state to initialise any future context, ensuring that each starts with the default view of hw state. v2: Unmap our default state from the GTT after stealing it from the context. This should stop us from accidentally overwriting it via the GTT (and frees up some precious GTT space). Testcase: igt/gem_ctx_isolation Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171110142634.10551-7-chris@chris-wilson.co.uk	2017-11-10 17:23:10 +00:00
Tvrtko Ursulin	1803fcbca2	drm/i915: Define an engine class enum for the uABI We want to be able to report back to userspace details about an engine's class, and in return for userspace to be able to request actions regarding certain classes of engines. To isolate the uABI from any variations between hw generations, we define an abstract class for the engines and internally map onto the hw. v2: Remove MAX from the uABI; keep it internal if we need it, but don't let userspace make the mistake of using it themselves. v3: s/OTHER/INVALID/ The use of OTHER is ill-defined, so remove it from the uABI as any future new type of engine can define a class to suit it. But keep a reserved value for an invalid class, so that we can always unambiguously express when something doesn't belong to the classification. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> #v2 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171110142634.10551-1-chris@chris-wilson.co.uk	2017-11-10 17:20:24 +00:00
Tvrtko Ursulin	ebcaa1ff8b	drm/i915: Reject unknown syncobj flags We have to reject unknown flags for uAPI considerations, and also because the curent implementation limits their i915 storage space to two bits. v2: (Chris Wilson) * Fix fail in ABI check. * Added unknown flags and BUILD_BUG_ON. v3: * Use ARCH_KMALLOC_MINALIGN instead of alignof. (Chris Wilson) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Fixes: `cf6e7bac63` ("drm/i915: Add support for drm syncobjs") Cc: Jason Ekstrand <jason@jlekstrand.net> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: David Airlie <airlied@linux.ie> Cc: intel-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20171031102326.9738-1-tvrtko.ursulin@linux.intel.com	2017-11-03 09:28:05 +00:00
Joonas Lahtinen	822a4b6732	drm/i915: Don't use BIT() in UAPI section Lets not introduce BIT() macro requirement for UAPI for now. Fixes: `3fd3a6ffe2` ("drm/i915: Simplify i915_reg_read_ioctl") Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20171006104559.17312-1-joonas.lahtinen@linux.intel.com	2017-10-06 14:08:55 +03:00
Chris Wilson	ac14fbd460	drm/i915/scheduler: Support user-defined priorities Use a priority stored in the context as the initial value when submitting a request. This allows us to change the default priority on a per-context basis, allowing different contexts to be favoured with GPU time at the expense of lower importance work. The user can adjust the context's priority via I915_CONTEXT_PARAM_PRIORITY, with more positive values being higher priority (they will be serviced earlier, after their dependencies have been resolved). Any prerequisite work for an execbuf will have its priority raised to match the new request as required. Normal users can specify any value in the range of -1023 to 0 [default], i.e. they can reduce the priority of their workloads (and temporarily boost it back to normal if so desired). Privileged users can specify any value in the range of -1023 to 1023, [default is 0], i.e. they can raise their priority above all overs and so potentially starve the system. Note that the existing schedulers are not fair, nor load balancing, the execution is strictly by priority on a first-come, first-served basis, and the driver may choose to boost some requests above the range available to users. This priority was originally based around nice(2), but evolved to allow clients to adjust their priority within a small range, and allow for a privileged high priority range. For example, this can be used to implement EGL_IMG_context_priority https://www.khronos.org/registry/egl/extensions/IMG/EGL_IMG_context_priority.txt EGL_CONTEXT_PRIORITY_LEVEL_IMG determines the priority level of the context to be created. This attribute is a hint, as an implementation may not support multiple contexts at some priority levels and system policy may limit access to high priority contexts to appropriate system privilege level. The default value for EGL_CONTEXT_PRIORITY_LEVEL_IMG is EGL_CONTEXT_PRIORITY_MEDIUM_IMG." so we can map PRIORITY_HIGH -> 1023 [privileged, will failback to 0] PRIORITY_MED -> 0 [default] PRIORITY_LOW -> -1023 They also map onto the priorities used by VkQueue (and a VkQueue is essentially a timeline, our i915_gem_context under full-ppgtt). v2: s/CAP_SYS_ADMIN/CAP_SYS_NICE/ v3: Report min/max user priorities as defines in the uapi, and rebase internal priorities on the exposed values. Testcase: igt/gem_exec_schedule Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171003203453.15692-9-chris@chris-wilson.co.uk	2017-10-04 17:52:46 +01:00
Chris Wilson	bf64e0b00e	drm/i915: Expand I915_PARAM_HAS_SCHEDULER into a capability bitmask In the next few patches, we wish to enable different features for the scheduler, some which may subtlety change ABI (e.g. allow requests to be reordered under different circumstances). So we need to make sure userspace is cognizant of the changes (if they care), by which we employ the usual method of a GETPARAM. We already have an I915_PARAM_HAS_SCHEDULER (which notes the existing ability to reorder requests to avoid bubbles), and now we wish to extend that to be a bitmask to describe the different capabilities implemented. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171003203453.15692-7-chris@chris-wilson.co.uk	2017-10-04 17:52:46 +01:00
Lionel Landwerlin	ee427e2595	uapi/drm/i915: document field usage of drm_i915_perf_oa_config Document the expected length of buffers config pointers (tuple of u32 values). Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20170918114241.30105-1-lionel.g.landwerlin@intel.com	2017-09-18 14:46:21 +01:00
Joonas Lahtinen	3fd3a6ffe2	drm/i915: Simplify i915_reg_read_ioctl Convert to use the freshly available made INTEL_GEN_MASK for easier grepping and improve function readability and clarify the UABI documentation. No functional changes. v2: - Lift GEM_BUG_ONs and use is_power_of_2 (Chris) - Retain -EINVAL on bad flags behavior (Chris) v3: - Extract flags with 'entry->size - 1' (Chris) v4: - Add GEM_BUG_ON on for flags vs entry offset (Chris) v5: - Use 'u16' to match 'dev_priv' (Ville) v6: - Fix checkpatch.pl errors Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20170913115255.13851-2-joonas.lahtinen@linux.intel.com	2017-09-14 11:15:52 +03:00
Chris Wilson	17ad4fdd09	drm/i915/perf: Remove __user from u64 in drm_i915_perf_oa_config Sparse complains that these integers from which we form void __user *, and so we don't need the annotation itself inside the uABI. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170901145729.21363-2-chris@chris-wilson.co.uk Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-09-05 11:57:10 +01:00
Jason Ekstrand	cf6e7bac63	drm/i915: Add support for drm syncobjs This commit adds support for waiting on or signaling DRM syncobjs as part of execbuf. It does so by hijacking the currently unused cliprects pointer to instead point to an array of i915_gem_exec_fence structs which containe a DRM syncobj and a flags parameter which specifies whether to wait on it or to signal it. This implementation theoretically allows for both flags to be set in which case it waits on the dma_fence that was in the syncobj and then immediately replaces it with the dma_fence from the current execbuf. v2: - Rebase on new syncobj API v3: - Pull everything out into helpers - Do all allocation in gem_execbuffer2 - Pack the flags in the bottom 2 bits of the drm_syncobj* v4: - Prevent a potential race on syncobj->fence Testcase: igt/gem_exec_fence/syncobj* Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Link: https://patchwork.freedesktop.org/patch/msgid/1499289202-25441-1-git-send-email-jason.ekstrand@intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20170815145733.4562-1-chris@chris-wilson.co.uk	2017-08-15 16:46:57 +01:00
Lionel Landwerlin	f89823c212	drm/i915/perf: Implement I915_PERF_ADD/REMOVE_CONFIG interface The motivation behind this new interface is expose at runtime the creation of new OA configs which can be used as part of the i915 perf open interface. This will enable the kernel to learn new configs which may be experimental, or otherwise not part of the core set currently available through the i915 perf interface. v2: Drop DRM_ERROR for userspace errors (Matthew) Add padding to userspace structure (Matthew) s/guid/uuid/ (Matthew) v3: Use u32 instead of int to iterate through registers (Matthew) v4: Lock access to dynamic config list (Lionel) v5: by Matthew: Fix uninitialized error values Fix incorrect unwiding when opening perf stream Use kmalloc_array() to store register Use uuid_is_valid() to valid config uuids Declare ioctls as write only Check padding members are set to 0 by Lionel: Return ENOENT rather than EINVAL when trying to remove non existing config v6: by Chris: Use ref counts for OA configs Store UUID in drm_i915_perf_oa_config rather then using pointer Shuffle fields of drm_i915_perf_oa_config to avoid padding v7: by Chris Rename uapi pointers fields to end with '_ptr' v8: by Andrzej, Marek, Sebastian Update register whitelisting by Lionel Add more register names for documentation Allow configuration programming in non-paranoid mode Add support for value filter for a couple of registers already programmed in other part of the kernel v9: Documentation fix (Lionel) Allow writing WAIT_FOR_RC6_EXIT only on Gen8+ (Andrzej) v10: Perform read access_ok() on register pointers (Lionel) Signed-off-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Andrzej Datczuk <andrzej.datczuk@intel.com> Reviewed-by: Andrzej Datczuk <andrzej.datczuk@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170803165812.2373-2-lionel.g.landwerlin@intel.com	2017-08-03 18:19:53 +01:00
Chris Wilson	1a71cf2fa6	drm/i915: Allow execbuffer to use the first object as the batch Currently, the last object in the execlist is the always the batch. However, when building the batch buffer we often know the batch object first and if we can use the first slot in the execlist we can emit relocation instructions relative to it immediately and avoid a separate pass to adjust the relocations to point to the last execlist slot. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2017-06-16 16:54:05 +01:00
Robert Bragg	19f81df285	drm/i915/perf: Add OA unit support for Gen 8+ Enables access to OA unit metrics for BDW, CHV, SKL and BXT which all share (more-or-less) the same OA unit design. Of particular note in comparison to Haswell: some OA unit HW config state has become per-context state and as a consequence it is somewhat more complicated to manage synchronous state changes from the cpu while there's no guarantee of what context (if any) is currently actively running on the gpu. The periodic sampling frequency which can be particularly useful for system-wide analysis (as opposed to command stream synchronised MI_REPORT_PERF_COUNT commands) is perhaps the most surprising state to have become per-context save and restored (while the OABUFFER destination is still a shared, system-wide resource). This support for gen8+ takes care to consider a number of timing challenges involved in synchronously updating per-context state primarily by programming all config state from the cpu and updating all current and saved contexts synchronously while the OA unit is still disabled. The driver intentionally avoids depending on command streamer programming to update OA state considering the lack of synchronization between the automatic loading of OACTXCONTROL state (that includes the periodic sampling state and enable state) on context restore and the parsing of any general purpose BB the driver can control. I.e. this implementation is careful to avoid the possibility of a context restore temporarily enabling any out-of-date periodic sampling state. In addition to the risk of transiently-out-of-date state being loaded automatically; there are also internal HW latencies involved in the loading of MUX configurations which would be difficult to account for from the command streamer (and we only want to enable the unit when once the MUX configuration is complete). Since the Gen8+ OA unit design no longer supports clock gating the unit off for a single given context (which effectively stopped any progress of counters while any other context was running) and instead supports tagging OA reports with a context ID for filtering on the CPU, it means we can no longer hide the system-wide progress of counters from a non-privileged application only interested in metrics for its own context. Although we could theoretically try and subtract the progress of other contexts before forwarding reports via read() we aren't in a position to filter reports captured via MI_REPORT_PERF_COUNT commands. As a result, for Gen8+, we always require the dev.i915.perf_stream_paranoid to be unset for any access to OA metrics if not root. v5: Drain submitted requests when enabling metric set to ensure no lite-restore erases the context image we just updated (Lionel) v6: In addition to drain, switch to kernel context & update all context in place (Chris) v7: Add missing mutex_unlock() if switching to kernel context fails (Matthew) v8: Simplify OA period/flex-eu-counters programming by using the batchbuffer instead of modifying ctx-image (Lionel) v9: Back to updating the context image (due to erroneous testing, batchbuffer programming the OA unit doesn't actually work) (Lionel) Pin context before updating context image (Chris) Drop MMIO programming now that we switch to a kernel context with right values in initial context image (Chris) v10: Just pin_map the contexts we want to modify or let the configuration happen on first use (Chris) v11: Update kernel context OA config through the batchbuffer rather than on the fly ctx-image update (Lionel) v12: Rework OA context registers update again by swithing away from user contexts and reconfiguring the kernel context through the batchbuffer and updating all the other contexts' context image. Also take care to lock slice/subslice configuration when OA is on. (Lionel) v13: Request rpcs updates on all engine when updating the OA config (Lionel) v14: Drop any kind of rpcs management now that we monitor sseu configuration changes in a later patch (Lionel) Remove usleep after programming the NOA configs on Gen8+, this doesn't seem to be needed (Lionel) v15: Respect coding style for block comments (Chris) v16: Add missing i915_add_request() in case we fail to emit OA configuration (Matthew) Signed-off-by: Robert Bragg <robert@sixbynine.org> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> \o/ Signed-off-by: Ben Widawsky <ben@bwidawsk.net>	2017-06-14 12:31:57 -07:00
Robert Bragg	f532023381	drm/i915: expose _SUBSLICE_MASK GETPARM Assuming a uniform mask across all slices, this enables userspace to determine the specific sub slices can be enabled. This information is required, for example, to be able to analyse some OA counter reports where the counter configuration depends on the HW sub slice configuration. Signed-off-by: Robert Bragg <robert@sixbynine.org> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>	2017-06-14 12:31:57 -07:00
Robert Bragg	7fed555c02	drm/i915: expose _SLICE_MASK GETPARM Enables userspace to determine the maximum number of slices that can be enabled on the device and also know what specific slices can be enabled. This information is required, for example, to be able to analyse some OA counter reports where the counter configuration depends on the HW slice configuration. Signed-off-by: Robert Bragg <robert@sixbynine.org> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>	2017-06-14 12:31:57 -07:00
Chris Wilson	b0fd47adc6	drm/i915: Copy user requested buffers into the error state Introduce a new execobject.flag (EXEC_OBJECT_CAPTURE) that userspace may use to indicate that it wants the contents of this buffer preserved in the error state (/sys/class/drm/cardN/error) following a GPU hang involving this batch. Use this at your discretion, the contents of the error state. although compressed, are allocated with GFP_ATOMIC (i.e. limited) and kept for all eternity (until the error state is destroyed). Based on an earlier patch by Ben Widawsky <ben@bwidawsk.net> Testcase: igt/gem_exec_capture Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <ben@bwidawsk.net> Cc: Matt Turner <mattst88@gmail.com> Acked-by: Ben Widawsky <ben@bwidawsk.net> Acked-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170415093902.22581-1-chris@chris-wilson.co.uk	2017-04-15 12:39:57 +01:00
Chris Wilson	e22d8e3c69	drm/i915: Treat WC a separate cache domain When discussing a new WC mmap, we based the interface upon the assumption that GTT was fully coherent. How naive! Commits `3b5724d702` ("drm/i915: Wait for writes through the GTT to land before reading back") and `ed4596ea99` ("drm/i915/guc: WA to address the Ringbuffer coherency issue") demonstrate that writes through the GTT are indeed delayed and may be overtaken by direct WC access. To be safe, if userspace is mixing WC mmaps with other potential GTT access (pwrite, GTT mmaps) it should use set_domain(WC). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96563 Testcase: igt/gem_pwrite/small-gtt* Testcase: igt/drv_selftest/coherency Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170412110111.26626-2-chris@chris-wilson.co.uk	2017-04-12 12:35:17 +01:00
Chris Wilson	fec0445caa	drm/i915: Support explicit fencing for execbuf Now that the user can opt-out of implicit fencing, we need to give them back control over the fencing. We employ sync_file to wrap our drm_i915_gem_request and provide an fd that userspace can merge with other sync_file fds and pass back to the kernel to wait upon before future execution. Testcase: igt/gem_exec_fence Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Acked-by: Chad Versace <chadversary@chromium.org> Link: http://patchwork.freedesktop.org/patch/msgid/20170127094008.27489-2-chris@chris-wilson.co.uk	2017-01-27 19:55:48 +00:00
Chris Wilson	77ae995789	drm/i915: Enable userspace to opt-out of implicit fencing Userspace is faced with a dilemma. The kernel requires implicit fencing to manage resource usage (we always must wait for the GPU to finish before releasing its PTE) and for third parties. However, userspace may wish to avoid this serialisation if it is either using explicit fencing between parties and wants more fine-grained access to buffers (e.g. it may partition the buffer between uses and track fences on ranges rather than the implicit fences tracking the whole object). It follows that userspace needs a mechanism to avoid the kernel's serialisation on its implicit fences before execbuf execution. The next question is whether this is an object, execbuf or context flag. Hybrid users (such as using explicit EGL_ANDROID_native_sync fencing on shared winsys buffers, but implicit fencing on internal surfaces) require a per-object level flag. Given that this flag need to be only set once for the lifetime of the object, this reduces the convenience of having an execbuf or context level flag (and avoids having multiple pieces of uABI controlling the same feature). Incorrect use of this flag will result in rendering corruption and GPU hangs - but will not result in use-after-free or similar resource tracking issues. Serious caveat: write ordering is not strictly correct after setting this flag on a render target on multiple engines. This affects all subsequent GEM operations (execbuf, set-domain, pread) and shared dma-buf operations. A fix is possible - but costly (both in terms of further ABI changes and runtime overhead). Testcase: igt/gem_exec_async Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Acked-by: Chad Versace <chadversary@chromium.org> Link: http://patchwork.freedesktop.org/patch/msgid/20170127094008.27489-1-chris@chris-wilson.co.uk	2017-01-27 19:55:35 +00:00
Anusha Srivatsa	5464cd6576	drm/i915/get_params: Add HuC status to getparams This patch will allow for getparams to return the status of the HuC. As the HuC has to be validated by the GuC this patch uses the validated status to show when the HuC is loaded and ready for use. You cannot use the loaded status as with the GuC as the HuC is verified after it is loaded and is not usable until it is verified. v2: removed the forewakes as the registers are already force-woken. (T.Ursulin) v3: rebased on top of drm-tip. Removed any reference to intel_huc.h v4: rebased. Rename I915_PARAM_HAS_HUC to I915_PARAM_HUC_STATUS. Remove intel_is_huc_valid() since it is used only in one place. Put the case of I915_PARAM_HAS_HUC() in the right place. v5: rebased. Add a comment to specify that I915_READ(reg) does not read garbage value. The register HUC_STATUS2 is force woken and no rpm is needed. Signed-off-by: Anusha Srivatsa <anusha.srivatsa@intel.com> Signed-off-by: Peter Antoine <peter.antoine@intel.com> Reviewed-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1484755558-1234-6-git-send-email-anusha.srivatsa@intel.com	2017-01-19 11:19:10 +02:00
Chris Wilson	cd8bddc4ab	drm/i915/perf: Treat u64 in uabi as a normal integer Forgo marking up the u64 integer representing a user pointer as this just annoys sparse. The conversion from u64 to a user pointer is managed by u64_to_user_ptr(). Fixes: `eec688e142` ("drm/i915: Add i915 perf infrastructure") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Robert Bragg <robert@sixbynine.org> Cc: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161130164649.26809-1-chris@chris-wilson.co.uk Reviewed-by: Matthew Auld <matthew.auld@intel.com>	2016-12-01 07:37:51 +00:00
Robert Bragg	d79651522e	drm/i915: Enable i915 perf stream for Haswell OA unit Gen graphics hardware can be set up to periodically write snapshots of performance counters into a circular buffer via its Observation Architecture and this patch exposes that capability to userspace via the i915 perf interface. v2: Make sure to initialize ->specific_ctx_id when opening, without relying on _pin_notify hook, in case ctx already pinned. v3: Revert back to pinning ctx upfront when opening stream, removing need to hook in to pinning and to update OACONTROL on the fly. Signed-off-by: Robert Bragg <robert@sixbynine.org> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Sourab Gupta <sourab.gupta@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20161107194957.3385-7-robert@sixbynine.org	2016-11-22 14:38:13 +01:00
Robert Bragg	eec688e142	drm/i915: Add i915 perf infrastructure Adds base i915 perf infrastructure for Gen performance metrics. This adds a DRM_IOCTL_I915_PERF_OPEN ioctl that takes an array of uint64 properties to configure a stream of metrics and returns a new fd usable with standard VFS system calls including read() to read typed and sized records; ioctl() to enable or disable capture and poll() to wait for data. A stream is opened something like: uint64_t properties[] = { /* Single context sampling / DRM_I915_PERF_PROP_CTX_HANDLE, ctx_handle, / Include OA reports in samples / DRM_I915_PERF_PROP_SAMPLE_OA, true, / OA unit configuration */ DRM_I915_PERF_PROP_OA_METRICS_SET, metrics_set_id, DRM_I915_PERF_PROP_OA_FORMAT, report_format, DRM_I915_PERF_PROP_OA_EXPONENT, period_exponent, }; struct drm_i915_perf_open_param parm = { .flags = I915_PERF_FLAG_FD_CLOEXEC \| I915_PERF_FLAG_FD_NONBLOCK \| I915_PERF_FLAG_DISABLED, .properties_ptr = (uint64_t)properties, .num_properties = sizeof(properties) / 16, }; int fd = drmIoctl(drm_fd, DRM_IOCTL_I915_PERF_OPEN, &param); Records read all start with a common { type, size } header with DRM_I915_PERF_RECORD_SAMPLE being of most interest. Sample records contain an extensible number of fields and it's the DRM_I915_PERF_PROP_SAMPLE_xyz properties given when opening that determine what's included in every sample. No specific streams are supported yet so any attempt to open a stream will return an error. v2: use i915_gem_context_get() - Chris Wilson v3: update read() interface to avoid passing state struct - Chris Wilson fix some rebase fallout, with i915-perf init/deinit v4: s/DRM_IORW/DRM_IOW/ - Emil Velikov Signed-off-by: Robert Bragg <robert@sixbynine.org> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Sourab Gupta <sourab.gupta@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20161107194957.3385-2-robert@sixbynine.org	2016-11-22 14:27:18 +01:00
Mika Kuoppala	841021713a	drm/i915: Add bannable context parameter Now when driver has per context scoring of 'hanging badness' and also subsequent hangs during short windows are allowed, if there is progress made in between, it does not make sense to expose a ban timing window as a context parameter anymore. Let the scoring be the sole indicator for ban policy and substitute ban period context parameter as a boolean to get/set context bannable property. v2: allow non root to opt into being banned (Chris) Cc: Chris Wilson <chris@chris-wilson.co.uk> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>	2016-11-21 14:36:40 +02:00
Chris Wilson	0de9136dbb	drm/i915/scheduler: Signal the arrival of a new request The start of the scheduler, add a hook into request submission for the scheduler to see the arrival of new requests and prepare its runqueues. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-6-chris@chris-wilson.co.uk	2016-11-14 21:00:26 +00:00
Chris Wilson	4cc6907501	drm/i915: Add I915_PARAM_MMAP_GTT_VERSION to advertise unlimited mmaps Now that we have working partial VMA and faulting support for all objects, including fence support, advertise to userspace that it can take advantage of unlimited GGTT mmaps. v2: Make room in the kerneldoc for a more detailed explanation of the limitations of the GTT mmap interface. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20160825180519.11341-1-chris@chris-wilson.co.uk	2016-08-26 08:42:26 +01:00
Chris Wilson	1255501d86	drm/i915: Embrace the race in busy-ioctl Daniel Vetter proposed a new challenge to the serialisation inside the busy-ioctl that exposed a flaw that could result in us reporting the wrong engine as being busy. If the request is reallocated as we test its busyness and then reassigned to this object by another thread, we would not notice that the test itself was incorrect. We are faced with a choice of using __i915_gem_active_get_request_rcu() to first acquire a reference to the request preventing the race, or to acknowledge the race and accept the limitations upon the accuracy of the busy flags. Note that we guarantee that we never falsely report the object as idle (providing userspace itself doesn't race), and so the most important use of the busy-ioctl and its guarantees are fulfilled. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471337440-16777-1-git-send-email-chris@chris-wilson.co.uk	2016-08-16 10:35:02 +01:00
Chris Wilson	deeb1519b6	drm/i915: Document and reject invalid tiling modes Through the GTT interface to the fence registers, we can only handle linear, X and Y tiling. The more esoteric tiling patterns are ignored. Document that the tiling ABI only supports upto Y tiling, and reject any attempts to set a tiling mode other than NONE, X or Y. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-17-git-send-email-chris@chris-wilson.co.uk	2016-08-05 10:54:42 +01:00
Chris Wilson	91b2db6f65	drm/i915: Pad GTT views of exec objects up to user specified size Our GPUs impose certain requirements upon buffers that depend upon how exactly they are used. Typically this is expressed as that they require a larger surface than would be naively computed by pitch * height. Normally such requirements are hidden away in the userspace driver, but when we accept pointers from strangers and later impose extra conditions on them, the original client allocator has no idea about the monstrosities in the GPU and we require the userspace driver to inform the kernel how many padding pages are required beyond the client allocation. v2: Long time, no see v3: Try an anonymous union for uapi struct compatibility Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-7-git-send-email-chris@chris-wilson.co.uk	2016-08-04 20:19:53 +01:00
Imre Deak	3373ce2ecc	drm/i915: Give proper names to MOCS entries The purpose for each MOCS entry isn't well defined atm. Defining these is important to remove any uncertainty about the use of these entries for example in terms of performance and GPU/CPU coherency. Suggested by Ville. v4: - Rename I915_MOCS_AUTO to I915_MOCS_PTE. (Chris) CC: Rong R Yang <rong.r.yang@intel.com> CC: Yakui Zhao <yakui.zhao@intel.com> CC: Ville Syrjälä <ville.syrjala@linux.intel.com> CC: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467383528-16142-1-git-send-email-imre.deak@intel.com	2016-07-19 20:35:37 +03:00
Dave Gordon	9e2793f6e4	drm/i915: compile-time consistency check on __EXEC_OBJECT flags Two different sets of flag bits are stored in the 'flags' member of a 'struct drm_i915_gem_exec_object2', and they're defined in two different source files, increasing the risk of an accidental clash. Some flags in this field are supplied by the user; these are defined in i915_drm.h, and they start from the LSB and work up. Other flags are defined in i915_gem_execbuffer, for internal use within that file only; they start from the MSB and work down. So here we add a compile-time check that the two sets of flags do not overlap, which would cause all sorts of confusion. Signed-off-by: Dave Gordon <david.s.gordon@intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1468504324-12690-1-git-send-email-david.s.gordon@intel.com	2016-07-19 09:06:16 +02:00
Chris Wilson	bc3d674462	drm/i915: Allow userspace to request no-error-capture upon GPU hangs igt likes to inject GPU hangs into its command streams. However, as we expect these hangs, we don't actually want them recorded in the dmesg output or stored in the i915_error_state (usually). To accommodate this allow userspace to set a flag on the context that any hang emanating from that context will not be recorded. We still do the error capture (otherwise how do we find the guilty context and know its intent?) as part of the reason for random GPU hang injection is to exercise the race conditions between the error capture and normal execution. v2: Split out the request->ringbuf error capture changes. v3: Move the flag defines next to the intel_context->flags definition Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Dave Gordon <david.s.gordon@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467616119-4093-9-git-send-email-chris@chris-wilson.co.uk	2016-07-04 08:18:24 +01:00
arun.siluvery@linux.intel.com	37f501afed	drm/i915/bxt: Export pooled eu info to userspace Pooled EU is a bxt only feature and kernel changes are already merged. This feature is not yet exposed to userspace as the support was not yet available. Beignet team expressed interest and added patches to use this. Since we now have a user and patches to use them, expose them from the kernel side as well. v2: fix compile error [1] https://lists.freedesktop.org/archives/beignet/2016-June/007698.html [2] https://lists.freedesktop.org/archives/beignet/2016-June/007699.html Cc: Winiarski, Michal <michal.winiarski@intel.com> Cc: Zou, Nanhai <nanhai.zou@intel.com> Cc: Yang, Rong R <rong.r.yang@intel.com> Cc: Tim Gore <tim.gore@intel.com> Cc: Jeff McGee <jeff.mcgee@intel.com> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467369782-25992-1-git-send-email-arun.siluvery@linux.intel.com Acked-by: Jani Nikula <jani.nikula@intel.com>	2016-07-01 14:53:52 +01:00
Emil Velikov	b1c1f5c400	drm/i915: add extern C guard for the UAPI header Cc: Jani Nikula <jani.nikula@linux.intel.com> Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Acked-by: Daniel Vetter <daniel.vetter@intel.com>	2016-05-13 14:05:53 +01:00
Tvrtko Ursulin	d9da6aa035	drm/i915: Fix VCS ring selection after uapi decoupling This got broken in: commit `de1add3605` Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Date: Fri Jan 15 15:12:50 2016 +0000 drm/i915: Decouple execbuf uAPI from internal implementation BSD ring flags need to be shifted before they can be considered indices into the ring array. Reported by Zhipeng Gong. v2: Simplify the code. (Chris Wilson) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Zhipeng Gong <zhipeng.gong@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1453902069-31353-1-git-send-email-tvrtko.ursulin@linux.intel.com Testcase: igt/gem_exec_basic # bdw-gt3	2016-01-28 10:25:49 +00:00
Chris Wilson	426960bed3	drm/i915: Seal busy-ioctl uABI and prevent leaking of internal ids Tvrtko was looking through the execbuffer-ioctl and noticed that the uABI was tightly coupled to our internal engine identifiers. Close inspection also revealed that we leak those internal engine identifiers through the busy-ioctl, and those internal identifiers already do not match the user identifiers. Fortuitiously, there is only one user of the set of busy rings from the busy-ioctl, and they only wish to choose between the RENDER and the BLT engines. Let's fix the userspace ABI while we still can. v2: Update the uAPI documentation to explain the identifiers. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Testcase: igt/gem_busy Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1452876706-21620-1-git-send-email-chris@chris-wilson.co.uk	2016-01-21 11:00:35 +00:00
Dave Airlie	ade1ba7346	Merge tag 'drm-intel-next-2015-12-18' of git://anongit.freedesktop.org/drm-intel into drm-next - fix atomic watermark recomputation logic (Maarten) - modeset sequence fixes for LPT (Ville) - more kbl enabling&prep work (Rodrigo, Wayne) - first bits for mst audio - page dirty tracking fixes from Dave Gordon - new get_eld hook from Takashi, also included in the sound tree - fixup cursor handling when placed at address 0 (Ville) - refactor VBT parsing code (Jani) - rpm wakelock debug infrastructure ( Imre) - fbdev is pinned again (Chris) - tune the busywait logic to avoid wasting cpu cycles (Chris) * tag 'drm-intel-next-2015-12-18' of git://anongit.freedesktop.org/drm-intel: (81 commits) drm/i915: Update DRIVER_DATE to 20151218 drm/i915/skl: Default to noncoherent access up to F0 drm/i915: Only spin whilst waiting on the current request drm/i915: Limit the busy wait on requests to 5us not 10ms! drm/i915: Break busywaiting for requests on pending signals drm/i915: don't enable autosuspend on platforms without RPM support drm/i915/backlight: prefer dev_priv over dev pointer drm/i915: Disable primary plane if we fail to reconstruct BIOS fb (v2) drm/i915: Pin the ifbdev for the info->system_base GGTT mmapping drm/i915: Set the map-and-fenceable flag for preallocated objects drm/i915: mdelay(10) considered harmful drm/i915: check that we are in an RPM atomic section in GGTT PTE updaters drm/i915: add support for checking RPM atomic sections drm/i915: check that we hold an RPM wakelock ref before we put it drm/i915: add support for checking if we hold an RPM reference drm/i915: use assert_rpm_wakelock_held instead of opencoding it drm/i915: add assert_rpm_wakelock_held helper drm/i915: remove HAS_RUNTIME_PM check from RPM get/put/assert helpers drm/i915: get a permanent RPM reference on platforms w/o RPM support drm/i915: refactor RPM disabling due to RC6 being disabled ...	2015-12-23 14:22:09 +10:00
Dave Airlie	663a233eef	Merge branch 'drm-header-fixes' of https://github.com/GabrielL/linux into drm-next Fix all the problems with the header files and userspace builds off them. I really care so little about this, but hey who am I to stop progress. * 'drm-header-fixes' of https://github.com/GabrielL/linux: (30 commits) drm: fix inclusion of drm.h in via_drm.h drm: fix inclusion of drm.h in vmwgfx_drm.h drm: fix inclusion of drm.h in virtgpu_drm.h drm: fix inclusion of drm.h in tegra_drm.h drm: fix inclusion of drm.h in savage_drm.h drm: fix inclusion of drm.h in r128_drm.h drm: fix inclusion of drm.h in qxl_drm.h drm: fix inclusion of drm.h in omap_drm.h drm: fix inclusion of drm.h in msm_drm.h drm: fix inclusion of drm.h in mga_drm.h drm: fix inclusion of drm.h in exynos_sarea.h drm: fix inclusion of drm.h in i810_drm.h drm: fix inclusion of drm.h in exynos_sarea.h drm: fix inclusion of drm.h in drm_sarea.h drm: drm_mode.h fix includes drm: drm_fourcc.h fix includes drm: include drm.h in armada_drm.h include/uapi/drm/amdgpu_drm.h: use __u32 and __u64 from <linux/types.h> drm: Kbuild: add admgpu_drm.h to the installed headers drm: use __u{32,64} instead of uint{32,64}_t in virtgpu_drm.h ...	2015-12-11 13:46:05 +10:00
Gabriel Laskar	1049102ff7	drm: fix inclusion of drm.h in exynos_sarea.h Using `#include "drm.h"` instead of `#include <drm/drm.h>` allow drm headers to be moved in another directory without changes, like for the libdrm imports. Signed-off-by: Gabriel Laskar <gabriel@lse.epita.fr> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com> CC: Emil Velikov <emil.l.velikov@gmail.com> CC: Mikko Rapeli <mikko.rapeli@iki.fi>	2015-12-10 12:33:23 +01:00
Chris Wilson	506a8e87d8	drm/i915: Add soft-pinning API for execbuffer Userspace can pass in an offset that it presumes the object is located at. The kernel will then do its utmost to fit the object into that location. The assumption is that userspace is handling its own object locations (for example along with full-ppgtt) and that the kernel will rarely have to make space for the user's requests. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> v2: Fixed incorrect eviction found by Michal Winiarski - fix suggested by Chris Wilson. Fixed incorrect error paths causing crash found by Michal Winiarski. (Not published externally) v3: Rebased because of trivial conflict in object_bind_to_vm. Fixed eviction to allow eviction of soft-pinned objects when another soft-pinned object used by a subsequent execbuffer overlaps reported by Michal Winiarski. (Not published externally) v4: Moved soft-pinned objects to the front of ordered_vmas so that they are pinned first after an address conflict happens to avoid repeated conflicts in rare cases (Suggested by Chris Wilson). Expanded comment on drm_i915_gem_exec_object2.offset to cover this new API. v5: Added I915_PARAM_HAS_EXEC_SOFTPIN parameter for detecting this capability (Kristian). Added check for multiple pinnings on eviction (Akash). Made sure buffers are not considered misplaced without the user specifying EXEC_OBJECT_SUPPORTS_48B_ADDRESS. User must assume responsibility for any addressing workarounds. Updated object2.offset field comment again to clarify NO_RELOC case (Chris). checkpatch cleanup. v6: Trivial rebase on latest drm-intel-nightly v7: Catch attempts to pin above the max virtual address size and return EINVAL (Tvrtko). Decouple EXEC_OBJECT_SUPPORTS_48B_ADDRESS and EXEC_OBJECT_PINNED flags, user must pass both flags in any attempt to pin something at an offset above 4GB (Chris, Daniel Vetter). Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Akash Goel <akash.goel@intel.com> Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Cc: Michal Winiarski <michal.winiarski@intel.com> Cc: Zou Nanhai <nanhai.zou@intel.com> Cc: Kristian Høgsberg <hoegsberg@gmail.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Michel Thierry <michel.thierry@intel.com> Acked-by: PDT Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1449575707-20933-1-git-send-email-thomas.daniel@intel.com	2015-12-09 10:20:17 +00:00
Ville Syrjälä	8697600b40	drm/i915: Make the high dword offset more explicit in i915_reg_read_ioctl Store the upper dword of the register offset in the whitelist as well. This would allow it to read register where the two halves aren't sitting right next to each other, and it'll make it easier to make register access type safe. While at it change the register offsets to u32 from u64. Our register space isn't quite that big, yet :) v2: Use ldw/udw as the suffixes, and add a note about 64bit wide split regs (Chris) Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1446839021-18599-1-git-send-email-ville.syrjala@linux.intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-11-18 14:35:16 +02:00
Chris Wilson	fa8848f278	drm/i915: Report context GTT size Since the beginning we have conflated the size of the global GTT with that of the per-process context sizes. In recent times (gen8+), those are no longer the same where the global GTT is limited to 2/4GiB but the per-process GTT may be anything up to 256TiB. Userspace knows nothing of this discrepancy and outside of one or two hacks, uses the getaperture ioctl to determine the maximum size it can use. Let's leave that as reporting the global GTT and use the context reporting method to describe the per-process value (which naturally fallsback to reporting the aliasing or global on older platforms, so userspace can always use this method where available). Testcase: igt/gem_userptr_blits/minor-normal-sync Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90065 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-10-19 12:16:46 +02:00
Michel Thierry	101b506a7f	drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset There are some allocations that must be only referenced by 32-bit offsets. To limit the chances of having the first 4GB already full, objects not requiring this workaround use DRM_MM_SEARCH_BELOW/ DRM_MM_CREATE_TOP flags In specific, any resource used with flat/heapless (0x00000000-0xfffff000) General State Heap (GSH) or Instruction State Heap (ISH) must be in a 32-bit range, because the General State Offset and Instruction State Offset are limited to 32-bits. Objects must have EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag to indicate if they can be allocated above the 32-bit address range. To limit the chances of having the first 4GB already full, objects will use DRM_MM_SEARCH_BELOW + DRM_MM_CREATE_TOP flags when possible. The libdrm user of the EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag is here: http://lists.freedesktop.org/archives/intel-gfx/2015-September/075836.html v2: Changed flag logic from neeeds_32b, to supports_48b. v3: Moved 48-bit support flag back to exec_object. (Chris, Daniel) v4: Split pin flags into PIN_ZONE_4G and PIN_HIGH; update PIN_OFFSET_MASK to use last PIN_ defined instead of hard-coded value; use correct limit check in eb_vma_misplaced. (Chris) v5: Don't touch PIN_OFFSET_MASK and update workaround comment (Chris) v6: Apply pin-high for ggtt too (Chris) v7: Handle simultaneous pin-high and pin-mappable end correctly (Akash) Fix check for entries currently using +4GB addresses, use min_t and other polish in object_bind_to_vm (Chris) v8: Commit message updated to point to libdrm patch. v9: vmas are allocated in the correct ozone, so only check flag when the vma has not been allocated. (Chris) Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v4) Signed-off-by: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-10-01 18:12:17 +02:00
Artem Savkov	16f7249ddf	uapi/drm/i915_drm.h: fix userspace compilation. commit `346add7834` Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Tue Jul 14 18:07:30 2015 +0200 drm/i915: Use expcitly fixed type in compat32 structs changed the type of param field in drm_i915_getparam from int to s32. This header is exported to userspace and needs to use userspace type __s32 instead. This fixes userspace compilation errors like the following: include/drm/i915_drm.h:361:2: error: unknown type name 's32' s32 param; Signed-off-by: Artem Savkov <asavkov@redhat.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2015-09-02 16:26:58 +03:00
Dave Airlie	4eebf60b74	Linux 4.2-rc7 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJV0R4AAAoJEHm+PkMAQRiG8xIH/AmiRd+JDrs0qqEy46p6X8Gn 0lB5/KsGycvIGIBTiy2nZzcT0Ly6LeFUKUjzPytlOhIZPMrxMVMShDaQKCXXIMUr 1mN6hkvpkLNnUhvL2fR6mm0zkjbz3zZEazFY+Jic8wQrtSkHgfH0DXqSAo8le0f8 kNrd5BPPhIwvpHGaNGFdTpbgpPcalXyQk/fHyvDGidbyXzY/d7l05QfYJ6XCD4Zm IAy48iK5BFts2+z3aOYrOeuuCcm1qFX8YArqzE1rfPp+U/LQpfUfij4cmOqDLn/F qnv9E7bRRVovvrgKe4I3Trta8kT53VLJvqpdw2Usqo8zvhs4VyrYpHC+gEE6YUY= =9Rd4 -----END PGP SIGNATURE----- Merge tag 'v4.2-rc7' into drm-next Linux 4.2-rc7 Backmerge master for i915 fixes	2015-08-17 14:13:53 +10:00
Chris Wilson	648a9bc530	drm/i915: Use two 32bit reads for select 64bit REG_READ ioctls Since the hardware sometimes mysteriously totally flummoxes the 64bit read of a 64bit register when read using a single instruction, split the read into two instructions. Since the read here is of automatically incrementing timestamp counters, we also have to be very careful in order to make sure that it does not increment between the two instructions. However, since userspace tried to workaround this issue and so enshrined this ABI for a broken hardware read and in the process neglected that the read only fails in some environments, we have to introduce a new uABI flag for userspace to request the 2x32 bit accurate read of the timestamp. v2: Fix alignment check and include details of the workaround for userspace. Reported-by: Karol Herbst <freedesktop@karolherbst.de> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91317 Testcase: igt/gem_reg_read Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: stable@vger.kernel.org Tested-by: Michał Winiarski <michal.winiarski@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-07-21 11:49:13 +02:00
Daniel Vetter	346add7834	drm/i915: Use expcitly fixed type in compat32 structs I was confused shortly whether the compat was needed for the int, until I noticed the pointer in the original. Also remove typedef. v2: Review from Chris. - Add comments. - Also change the int param in the original structure. Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-07-15 11:39:59 +02:00
Abdiel Janulgue	a9ed33ca07	drm/i915: Expose I915_EXEC_RESOURCE_STREAMER flag and getparam Ensures that the batch buffer is executed by the resource streamer. And will let userspace know whether Resource Streamer is supported in the kernel. v2: Don't skip 1<<15 for the exec flags (Jani Nikula) v3: Use HAS_RESOURCE_STREAMER macro for execbuf validation (Chris Wilson) (from getparam patch) v2: Update I915_PARAM_HAS_RESOURCE_STREAMER so it's after I915_PARAM_HAS_GPU_RESET. v3: Only advertise RS support for hardware that supports it. v4: Add HAS_RESOURCE_STREAMER() macro (Chris) Testcase: igt/gem_exec_params Cc: Jani Nikula <jani.nikula@intel.com> Cc: Kenneth Graunke <kenneth@whitecape.org> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com> [danvet: squash in getparam patch since it'd break bisect, suggested by Chris.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-07-06 10:36:46 +02:00
Chris Wilson	49e4d842f0	drm/i915: Report to userspace if we have a (presumed) working GPU reset In igt, we want to test handling of GPU hangs, both for recovery purposes and for reporting. However, we don't want to inject a genuine GPU hang onto a machine that cannot recover and so be permenantly wedged. Rather than embed heuristics into igt, have the kernel report exactly when it expects the GPU reset to work. This can also be usefully extended in future to indicate different levels of fine-grained resets. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Tim Gore <tim.gore@intel.com> Cc: Tomas Elf <tomas.elf@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-06-15 16:59:58 +02:00
David Weinehall	b1b38278e1	drm/i915: add a context parameter to {en, dis}able zero address mapping Export a new context parameter that can be set/queried through the context_{get,set}param ioctls. This parameter is passed as a context flag and decides whether or not a GPU address mapping is allowed to be made at address zero. The default is to allow such mappings. Signed-off-by: David Weinehall <david.weinehall@intel.com> Acked-by: "Zou, Nanhai" <nanhai.zou@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-05-29 10:15:19 +02:00
Damien Lespiau	21631f10ea	drm/i915: Fix the confusing comment about the ioctl limits It was reported that this comment was confusing, and indeed it is. v2: (one year later!) Add the range for the DRM_I915_* iotcl defines (Daniel) Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-05-26 17:20:30 +02:00
Chris Wilson	ea9da4e460	drm/i915: Allow disabling the destination colorkey for overlay Sometimes userspace wants a true overlay that is never clipped. In such cases, we need to disable the destination colorkey. However, it is currently unconditionally enabled in the overlay with no means of disabling. So rectify that by always default to on, and extending the UPDATE_ATTR ioctl to support explicit disabling of the colorkey. This is contrast to the spite code which requires explicit enabling of either the destination or source colorkey. Handling source colorkey is still todo for the overlay. (Of course it may be worth migrating overlay to sprite before then.) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-04-10 08:55:54 +02:00
Tommi Rantala	2c60fae148	drm/i915: fix definition of the DRM_IOCTL_I915_GET_SPRITE_COLORKEY ioctl Fix definition of the DRM_IOCTL_I915_GET_SPRITE_COLORKEY ioctl, so that it is different from the DRM_IOCTL_I915_SET_SPRITE_COLORKEY ioctl. Note that this is just for accuracy, the ioctl implementation itself is totally unused and already ripped out. Signed-off-by: Tommi Rantala <tt.rantala@gmail.com> [danvet: Add note that this is a dead ioctl.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-03-27 09:10:26 +01:00
Jeff McGee	a1559ffefb	drm/i915: Export total subslice and EU counts Setup new I915_GETPARAM ioctl entries for subslice total and EU total. Userspace drivers need these values when constructing GPGPU commands. This kernel query method is intended to replace the PCI ID-based tables that userspace drivers currently maintain. The kernel driver can employ fuse register reads as needed to ensure the most accurate determination of GT config attributes. This first became important with Cherryview in which the config could differ between devices with the same PCI ID. The kernel detection of these values is device-specific and not included in this patch. Because zero is not a valid value for any of these parameters, a value of zero is interpreted as unknown for the device. Userspace drivers should continue to maintain ID-based tables for older devices not supported by the new query method. v2: Increment our I915_GETPARAM indices to fit after REVISION which was merged ahead of us. For: VIZ-4636 Signed-off-by: Jeff McGee <jeff.mcgee@intel.com> Tested-by: Zhigang Gong <zhigang.gong@linux.intel.com> Acked-by: Zhigang Gong <zhigang.gong@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-03-17 22:30:31 +01:00
Neil Roberts	27cd44618b	drm/i915: Add I915_PARAM_REVISION Adds a parameter which can be used with DRM_I915_GETPARAM to query the GPU revision. The intention is to use this in Mesa to implement the WaDisableSIMD16On3SrcInstr workaround on Skylake but only for revision 2. Signed-off-by: Neil Roberts <neil@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-03-17 22:29:54 +01:00
Zhipeng Gong	08e16dc874	drm/i915: add I915_PARAM_HAS_BSD2 to i915_getparam This will let userland only try to use the new ring when the appropriate kernel is present v2: change the number to be consistent with upstream (Zhipeng) Signed-off-by: Zhipeng Gong <zhipeng.gong@intel.com> Reviewed--by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-01-27 09:51:05 +01:00
Zhipeng Gong	8d360dffd6	drm/i915: Specify bsd rings through exec flag On Skylake GT3 we have 2 Video Command Streamers (VCS), which is asymmetrical. For example, HEVC GPU commands can be only dispatched to VCS1 ring. But userspace has no control when using VCS1 or VCS2. This patch introduces a mechanism to avoid the default ping-pong mode and use one specific ring through execution flag. This mechanism is usable for all the platforms with 2 VCS rings. The open source usage is from these two commits in vaapi/intel: commit 702050f04131a44ef8ac16651708ce8a8d98e4b8 Author: Zhao, Yakui <yakui.zhao@intel.com> Date: Mon Nov 17 12:44:19 2014 +0800 Allow the batchbuffer to be submitted with override flag commit a56efcdf27d11ad9b21664b4a2cda72d7f90f5a8 Author: Zhao Yakui <yakui.zhao@intel.com> Date: Mon Nov 17 12:44:22 2014 +0800 Add the override flag to assure that HEVC video command always uses BSD ring0 for SKL GT3 machine v2: fix whitespace (Rodrigo) v3: remove incorrect chunk that came on -collector rebase. (Rodrigo) v4: change the comment (Zhipeng) v5: address Daniel's comment (Zhipeng) Signed-off-by: Zhipeng Gong <zhipeng.gong@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-01-27 09:51:05 +01:00
Chris Wilson	c9dc0f3598	drm/i915: Add ioctl to set per-context parameters Sometimes we wish to tweak how an individual context behaves. Since we always create a context for every filp, this means that individual processes can fine tune their behaviour even if they do not explicitly create a context. The first example parameter here is to enable multi-process GPU testing, but the interface should be able to cope with passing arbitrarily complex parameters. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Testcase: igt/gem_reset_stats/ban-period-* Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-01-07 18:19:06 +01:00
Akash Goel	1816f92363	drm/i915: Support creation of unbound wc user mappings for objects This patch provides support to create write-combining virtual mappings of GEM object. It intends to provide the same funtionality of 'mmap_gtt' interface without the constraints and contention of a limited aperture space, but requires clients handles the linear to tile conversion on their own. This is for improving the CPU write operation performance, as with such mapping, writes and reads are almost 50% faster than with mmap_gtt. Similar to the GTT mmapping, unlike the regular CPU mmapping, it avoids the cache flush after update from CPU side, when object is passed onto GPU. This type of mapping is specially useful in case of sub-region update, i.e. when only a portion of the object is to be updated. Using a CPU mmap in such cases would normally incur a clflush of the whole object, and using a GTT mmapping would likely require eviction of an active object or fence and thus stall. The write-combining CPU mmap avoids both. To ensure the cache coherency, before using this mapping, the GTT domain has been reused here. This provides the required cache flush if the object is in CPU domain or synchronization against the concurrent rendering. Although the access through an uncached mmap should automatically invalidate the cache lines, this may not be true for non-temporal write instructions and also not all pages of the object may be updated at any given point of time through this mapping. Having a call to get_pages in set_to_gtt_domain function, as added in the earlier patch 'drm/i915: Broaden application of set-domain(GTT)', would guarantee the clflush and so there will be no cachelines holding the data for the object before it is accessed through this map. The drm_i915_gem_mmap structure (for the DRM_I915_GEM_MMAP_IOCTL) has been extended with a new flags field (defaulting to 0 for existent users). In order for userspace to detect the extended ioctl, a new parameter I915_PARAM_MMAP_VERSION has been added for versioning the ioctl interface. v2: Fix error handling, invalid flag detection, renaming (ickle) v3: Rebase to latest drm-intel-nightly codebase The new mmapping is exercised by igt/gem_mmap_wc, igt/gem_concurrent_blit and igt/gem_gtt_speed. Change-Id: Ie883942f9e689525f72fe9a8d3780c3a9faa769a Signed-off-by: Akash Goel <akash.goel@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-01-06 09:08:00 +01:00
Chris Wilson	6a2c4232ec	drm/i915: Make the physical object coherent with GTT Currently objects for which the hardware needs a contiguous physical address are allocated a shadow backing storage to satisfy the contraint. This shadow buffer is not wired into the normal obj->pages and so the physical object is incoherent with accesses via the GPU, GTT and CPU. By setting up the appropriate scatter-gather table, we can allow userspace to access the physical object via either a GTT mmaping of or by rendering into the GEM bo. However, keeping the CPU mmap of the shmemfs backing storage coherent with the contiguous shadow is not yet possible. Fortuituously, CPU mmaps of objects requiring physical addresses are not expected to be coherent anyway. This allows the physical constraint of the GEM object to be transparent to userspace and allow it to efficiently render into or update them via the GTT and GPU. v2: Fix leak of pci handle spotted by Ville v3: Remove the now duplicate call to detach_phys_object during free. v4: Wait for rendering before pwrite. As this patch makes it possible to render into the phys object, we should make it correct as well! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-11-14 10:29:18 +01:00
Chris Wilson	70f2f5c704	drm/i915: Report the actual swizzling back to userspace Userspace cares about whether or not swizzling depends on the page address for its direct access into bound objects. Extend the get_tiling ioctl to report the physical swizzling value in addition to the logical swizzling value so that userspace can accurately determine when it is possible for manual detiling. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Akash Goel <akash.goel@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Testcase: igt/gem_tiled_wc Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-11-07 18:42:01 +01:00
Chris Wilson	5cc9ed4b9a	drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl By exporting the ability to map user address and inserting PTEs representing their backing pages into the GTT, we can exploit UMA in order to utilize normal application data as a texture source or even as a render target (depending upon the capabilities of the chipset). This has a number of uses, with zero-copy downloads to the GPU and efficient readback making the intermixed streaming of CPU and GPU operations fairly efficient. This ability has many widespread implications from faster rendering of client-side software rasterisers (chromium), mitigation of stalls due to read back (firefox) and to faster pipelining of texture data (such as pixel buffer objects in GL or data blobs in CL). v2: Compile with CONFIG_MMU_NOTIFIER v3: We can sleep while performing invalidate-range, which we can utilise to drop our page references prior to the kernel manipulating the vma (for either discard or cloning) and so protect normal users. v4: Only run the invalidate notifier if the range intercepts the bo. v5: Prevent userspace from attempting to GTT mmap non-page aligned buffers v6: Recheck after reacquire mutex for lost mmu. v7: Fix implicit padding of ioctl struct by rounding to next 64bit boundary. v8: Fix rebasing error after forwarding porting the back port. v9: Limit the userptr to page aligned entries. We now expect userspace to handle all the offset-in-page adjustments itself. v10: Prevent vma from being copied across fork to avoid issues with cow. v11: Drop vma behaviour changes -- locking is nigh on impossible. Use a worker to load user pages to avoid lock inversions. v12: Use get_task_mm()/mmput() for correct refcounting of mm. v13: Use a worker to release the mmu_notifier to avoid lock inversion v14: Decouple mmu_notifier from struct_mutex using a custom mmu_notifer with its own locking and tree of objects for each mm/mmu_notifier. v15: Prevent overlapping userptr objects, and invalidate all objects within the mmu_notifier range v16: Fix a typo for iterating over multiple objects in the range and rearrange error path to destroy the mmu_notifier locklessly. Also close a race between invalidate_range and the get_pages_worker. v17: Close a race between get_pages_worker/invalidate_range and fresh allocations of the same userptr range - and notice that struct_mutex was presumed to be held when during creation it wasn't. v18: Sigh. Fix the refactor of st_set_pages() to allocate enough memory for the struct sg_table and to clear it before reporting an error. v19: Always error out on read-only userptr requests as we don't have the hardware infrastructure to support them at the moment. v20: Refuse to implement read-only support until we have the required infrastructure - but reserve the bit in flags for future use. v21: use_mm() is not required for get_user_pages(). It is only meant to be used to fix up the kernel thread's current->mm for use with copy_user(). v22: Use sg_alloc_table_from_pages for that chunky feeling v23: Export a function for sanity checking dma-buf rather than encode userptr details elsewhere, and clean up comments based on suggestions by Bradley. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com> Cc: Akash Goel <akash.goel@intel.com> Cc: "Volkin, Bradley D" <bradley.d.volkin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com> [danvet: Frob ioctl allocation to pick the next one - will cause a bit of fuss with create2 apparently, but such are the rules.] [danvet2: oops, forgot to git add after manual patch application] [danvet3: Appease sparse.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-16 19:31:29 +02:00
Brad Volkin	d728c8ef8b	drm/i915: Add a CMD_PARSER_VERSION getparam So userspace can query the kernel for command parser support. v2: Add i915_cmd_parser_get_version(), history log, and kerneldoc OTC-Tracker: AXIA-4631 Change-Id: I58af650db9f6753c2dcac9c54ab432fd31db302f Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-04-01 22:58:15 +02:00
Geert Uytterhoeven	c3d19d3c3f	drm/i915: Spelling s/auxilliary/auxiliary/ Signed-off-by: Geert Uytterhoeven <geert+renesas@linux-m68k.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-01-22 09:58:24 +01:00
Mika Kuoppala	b6359918b8	drm/i915: add i915_get_reset_stats_ioctl This ioctl returns reset stats for specified context. The struct returned contains context loss counters. reset_count: all resets across all contexts batch_active: active batches lost on resets batch_pending: pending batches lost on resets v2: get rid of state tracking completely and deliver only counts. Idea from Chris Wilson. v3: fix commit message v4: default context handled inside i915_gem_context_get_hang_stats v5: reset_count only for priviledged process v6: ctx=0 needs CAP_SYS_ADMIN for batch_* counters (Chris Wilson) v7: context hang stats never returns NULL v8: rebased on top of reworked context hang stats DRM_RENDER_ALLOW for ioctl v9: use DEFAULT_CONTEXT_ID. Improve comments for ioctl struct members Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Ian Romanick <idr@freedesktop.org> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-11-12 14:15:48 +01:00
Ben Widawsky	35a85ac606	drm/i915: Add second slice l3 remapping Certain HSW SKUs have a second bank of L3. This L3 remapping has a separate register set, and interrupt from the first "slice". A slice is simply a term to define some subset of the GPU's l3 cache. This patch implements both the interrupt handler, and ability to communicate with userspace about this second slice. v2: Remove redundant check about non-existent slice. Change warning about interrupts of unknown slices to WARN_ON_ONCE Handle the case where we get 2 slice interrupts concurrently, and switch the tracking of interrupts to be non-destructive (all Ville) Don't enable/mask the second slice parity interrupt for ivb/vlv (even though all docs I can find claim it's rsvd) (Ville + Bryan) Keep BYT excluded from L3 parity v3: Fix the slice = ffs to be decremented by one (found by Ville). When I initially did my testing on the series, I was using 1-based slice counting, so this code was correct. Not sure why my simpler tests that I've been running since then didn't pick it up sooner. Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-09-19 20:37:04 +02:00
Chris Wilson	651d794fae	drm/i915: Use Write-Through cacheing for the display plane on Iris Haswell GT3e has the unique feature of supporting Write-Through cacheing of objects within the eLLC/LLC. The purpose of this is to enable the display plane to remain coherent whilst objects lie resident in the eLLC/LLC - so that we, in theory, get the best of both worlds, perfect display and fast access. However, we still need to be careful as the CPU does not see the WT when accessing the cache. In particular, this means that we need to flush the cache lines after writing to an object through the CPU, and on transitioning from a cached state to WT. v2: Actually do the clflush on transition to WT, nagging by Ville. v3: Flush the CPU cache after writes into WT objects. v4: Rease onto LLC updates and report WT as "uncached" for get_cache_level_ioctl to remain symmetric with set_cache_level_ioctl. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-22 13:31:38 +02:00
Daniel Vetter	35c7ab421a	drm/i915: reserve I915_CACHING_DISPLAY and document cache modes Resolve the catch-22 of igt needing a stable number and patches first needing testcases by reserving the interface number up-front. v2: Improve the spelling a bit. v3: More spelling fail spotted by Chris. Requested-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-22 13:31:34 +02:00
Ben Widawsky	cce723ed09	drm/i915: Make i915 events part of uapi Make the uevent strings part of the user API for people who wish to write their own listeners. v2: Make a space in the string concatenation. (Chad) Use the "UEVENT" suffix intead of "EVENT" (Chad) Make kernel-doc parseable Docbook comments (Daniel) v3: Undid reset change introduced in last submission (Daniel) Fixed up comments to address removal changes. Thanks to Daniel Vetter for a majority of the parity error comments. CC: Chad Versace <chad.versace@linux.intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-19 18:26:57 +02:00
Xiang, Haihao	a1f2cc73c7	drm/i915: add I915_PARAM_HAS_VEBOX to i915_getparam This will let userland only try to use the new ring when the appropriate kernel is present Signed-off-by: Xiang, Haihao <haihao.xiang@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-05-31 20:54:22 +02:00
Xiang, Haihao	82f91b6e93	drm/i915: add I915_EXEC_VEBOX to i915_gem_do_execbuffer() A user can run batchbuffer via VEBOX ring. Signed-off-by: Xiang, Haihao <haihao.xiang@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-05-31 20:54:21 +02:00
Chris Wilson	eef90ccb8a	drm/i915: Use the reloc.handle as an index into the execbuffer array Using copywinwin10 as an example that is dependent upon emitting a lot of relocations (2 per operation), we see improvements of: c2d/gm45: 618000.0/sec to 623000.0/sec. i3-330m: 748000.0/sec to 789000.0/sec. (measured relative to a baseline with neither optimisations applied). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-01-17 22:23:47 +01:00
Daniel Vetter	ed5982e6ce	drm/i915: Allow userspace to hint that the relocations were known Userspace is able to hint to the kernel that its command stream and auxiliary state buffers already hold the correct presumed addresses and so the relocation process may be skipped if the kernel does not need to move any buffers in preparation for the execbuffer. Thus for the common case where the allotment of buffers is static between batches, we can avoid the overhead of individually checking the relocation entries. Note that this requires userspace to supply the domain tracking and requests for workarounds itself that would otherwise be computed based upon the relocation entries. Using copywinwin10 as an example that is dependent upon emitting a lot of relocations (2 per operation), we see improvements of: c2d/gm45: 618000.0/sec to 632000.0/sec. i3-330m: 748000.0/sec to 830000.0/sec. (measured relative to a baseline with neither optimisations applied). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Imre Deak <imre.deak@intel.com> [danvet: Fixup merge conflict in userspace header due to different baseline trees.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-01-17 22:23:36 +01:00
Daniel Vetter	b45305fce5	drm/i915: Implement workaround for broken CS tlb on i830/845 Now that Chris Wilson demonstrated that the key for stability on early gen 2 is to simple _never_ exchange the physical backing storage of batch buffers I've tried a stab at a kernel solution. Doesn't look too nefarious imho, now that I don't try to be too clever for my own good any more. v2: After discussing the various techniques, we've decided to always blit batches on the suspect devices, but allow userspace to opt out of the kernel workaround assume full responsibility for providing coherent batches. The principal reason is that avoiding the blit does improve performance in a few key microbenchmarks and also in cairo-trace replays. Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: - Drop the hunk which uses HAS_BROKEN_CS_TLB to implement the ring wrap w/a. Suggested by Chris Wilson. - Also add the ACTHD check from Chris Wilson for the error state dumping, so that we still catch batches when userspace opts out of the w/a.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-12-17 17:27:02 +01:00
Daniel Vetter	c2fb791692	Linux 3.7-rc2 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQEcBAABAgAGBQJQgvdwAAoJEHm+PkMAQRiG+3AH/i2XsqqN3VctL0nnbWfvds+Q aKulfIdJTjKiVAsawPUtRqReZ8ijiebrgA/53lZLlrFOoPPQ5+LHmnSyQF6gErOY NuAE1lijXDRM1pwBlhvOBbAj26wUobGjqONFJ9OkKr758Ue8ds/Q7UdxyEgmYgmg tvVMzfRcICzryUV3PcqL+3cNPpCUdT6wGGRJ9DCv/jvGiWKExWhOle5oltrmxk+D NsqRcws5pEubfHE4J8BvNWr8lE1kHfYVhrJETiLJUiN2XAJcbI4Jy7rU/3EGteNS 0HMZdaPPjV874lohdM70X2225SbYrCVkAYB5hnZCTeC3tYyCawBBPMQoyAiOcmU= =+861 -----END PGP SIGNATURE----- Merge tag 'v3.7-rc2' into drm-intel-next-queued Linux 3.7-rc2 Backmerge to solve two ugly conflicts: - uapi. We've already added new ioctl definitions for -next. Do I need to say more? - wc support gtt ptes. We've had to revert this for snb+ for 3.7 and also fix a few other things in the code. Now we know how to make it work on snb+, but to avoid losing the other fixes do the backmerge first before re-enabling wc gtt ptes on snb+. And a few other minor things, among them git getting confused in intel_dp.c and seemingly causing a conflict out of nothing ... Conflicts: drivers/gpu/drm/i915/i915_reg.h drivers/gpu/drm/i915/intel_display.c drivers/gpu/drm/i915/intel_dp.c drivers/gpu/drm/i915/intel_modes.c include/drm/i915_drm.h Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-10-22 14:34:51 +02:00
David Howells	718dcedd7e	UAPI: (Scripted) Disintegrate include/drm Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Dave Jones <davej@redhat.com>	2012-10-04 18:21:50 +01:00

1 2 3 4

183 commits