Merge tag 'drm-intel-gt-next-2021-08-06-1' of ssh://git.freedesktop.org/git/drm/drm-intel into drm-next

UAPI Changes:

- Add I915_MMAP_OFFSET_FIXED

  On devices with local memory `I915_MMAP_OFFSET_FIXED` is the only valid
  type. On devices without local memory, this caching mode is invalid.

  As caching mode when specifying `I915_MMAP_OFFSET_FIXED`, WC or WB will
  be used, depending on the object placement on creation. WB will be used
  when the object can only exist in system memory, WC otherwise.

  Userspace: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11888

- Reinstate the mmap ioctl for (already released) integrated Gen12 platforms

  Rationale: Otherwise media driver breaks eg. for ADL-P. Long term goal is
  still to sunset the IOCTL even for integrated and require using mmap_offset.

- Reject caching/set_domain IOCTLs on discrete

  Expected to become immutable property of the BO

- Disallow changing context parameters after first use on Gen12 and earlier
- Require setting context parameters at creation on platforms after Gen12

  Rationale (for both): Allow less dynamic changes to the context to simplify
  the implementation and avoid user shooting theirselves in the foot.

- Drop I915_CONTEXT_PARAM_RINGSIZE

  Userspace PR for compute-driver has not been merged

- Drop I915_CONTEXT_PARAM_NO_ZEROMAP

  Userspace PR for libdrm / Beignet was never landed

- Drop CONTEXT_CLONE API

  Userspace PR for Mesa was never landed

- Drop getparam support for I915_CONTEXT_PARAM_ENGINES

  Only existed for symmetry wrt. setparam, never used.

- Disallow bonding of virtual engines

  Drop the prep work, no hardware has been released needing it.

- (Implicit) Disable gpu relocations

  Media userspace was the last userspace to still use them. They
  have converted so performance can be regained with an update.

Core Changes:

- Merge topic branch 'topic/i915-ttm-2021-06-11' (from Maarten)
- Merge topic branch 'topic/revid_steppings' (from Matt R)
- Merge topic branch 'topic/xehp-dg2-definitions-2021-07-21' (from Matt R)
- Backmerges drm-next (Rodrigo)

Driver Changes:

- Initial workarounds for ADL-P (Clint)
- Preliminary code for XeHP/DG2 (Stuart, Umesh, Matt R, Prathap, Ram,
  Venkata, Akeem, Tvrtko, John, Lucas)
- Fix ADL-S DMA mask size to 39 bits (Tejas)
- Remove code for CNL (Lucas)
- Add ADL-P GuC/HuC firmwares (John)
- Update HuC to 7.9.3 for TGL/ADL-S/RKL (John)
- Fix -EDEADLK handling regression (Ville)
- Implement Wa_1508744258 for DG1 and Gen12 iGFX (Jose)
- Extend Wa_1406941453 to ADL-S (Jose)
- Drop unnecessary workarounds per stepping for SKL/BXT/ICL (Matt R)
- Use fuse info to enable SFC on Gen12 (Venkata)
- Unconditionally flush the pages on acquire on EHL/JSL (Matt A)
- Probe existence of backing struct pages upon userptr creation (Chris, Matt A)

- Add an intermediate GEM proto-context to delay real context creation (Jason)
- Implement SINGLE_TIMELINE with a syncobj (Jason)
- Set the watchdog timeout directly in intel_context_set_gem (Jason)
- Disallow userspace from creating contexts with too many engines (Jason)
- Revert "drm/i915/gem: Asynchronous cmdparser" (Jason)
- Revert "drm/i915: Propagate errors on awaiting already signaled fences" (Jason)
- Revert "drm/i915: Skip over MI_NOOP when parsing" (Jason)
- Revert "drm/i915: Shrink the GEM kmem_caches upon idling" (Daniel)
- Always let TTM handle object migration (Jason)
- Correct the locking and pin pattern for dma-buf (Thomas H, Michael R, Jason)
- Migrate to system at dma-buf attach time (Thomas, Michael R)

- MAJOR refactoring of the GuC backend code to allow for enabling on Gen11+
  (Matt B, John, Michal Wa., Fernando, Daniele, Vinay)
- Update GuC firmware interface to v62.0.0 (John, Michal Wa., Matt B)
- Add GuCRC feature to hand over the control of HW RC6 to the GuC on
  Gen12+ when GuC submission is enabled (Vinay, Sujaritha, Daniele,
  John, Tvrtko)
- Use the correct IRQ during resume and eliminate DRM IRQ midlayer (Thomas Z)
- Add pipelined page migration and clearing (Chris, Thomas H)
- Use TTM for system memory on discrete (Thomas H)
- Implement object migration for display vs. dma-buf (Thomas H)
- Perform execbuffer object locking as a separate step (Thomas H)
- Add support for explicit L3BANK steering (Matt, Daniele)
- Remove duplicated call to ops->pread (Daniel)
- Fix pagefault disabling in the first execbuf slowpath (Daniel)
- Simplify userptr locking (Thomas H)
- Improvements to the GuC CTB code (Matt B, John)
- Make GT workaround upper bounds exclusive (Matt R)
- Check for nomodeset in i915_init() first (Daniel)
- Delete now unused gpu reloc code (Daniel)

- Document RFC plans for GuC submission, DRM scheduler and new parallel
  submit uAPI (Matt B)
- Reintroduce buddy allocator this time with TTM (Matt A)
- Support forcing page size with LMEM (Matt A)
- Add i915_sched_engine to abstract a submission queue between backends (Matt B)
- Use accelerated move in TTM (Ram)
- Fix memory leaks from TTM backend (Thomas H)
- Introduce WW transaction helper (Thomas H)
- Improve debug Kconfig texts a bit (Daniel)
- Unify user object creation code (Jason)
- Use a table for i915_init/exit (Jason)
- Move slabs to module init/exit (Daniel)
- Remove now unused i915_globals (Daniel)
- Extract i915_module.c (Daniel)

- Consistently use adl-p/adl-s in WA comments (Jose)
- Finish INTEL_GEN and friends conversion (Lucas)
- Correct variable/function namings (Lucas)
- Code checker fixes (Wan, Matt A)
- Tracepoint improvements (Matt B)
- Kerneldoc improvements (Tvrtko, Jason, Matt A, Maarten)
- Selftest improvements (Chris, Matt A, Tejas, Thomas H, John, Matt B,
  Rahul, Vinay)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/YQ0JmYiXhGskNcrI@jlahtine-mobl.ger.corp.intel.com
This commit is contained in:
Dave Airlie 2021-08-12 09:43:38 +10:00
commit 25fed6b324
208 changed files with 17562 additions and 7651 deletions

View File

@ -422,9 +422,16 @@ Batchbuffer Parsing
User Batchbuffer Execution
--------------------------
.. kernel-doc:: drivers/gpu/drm/i915/gem/i915_gem_context_types.h
.. kernel-doc:: drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
:doc: User command execution
Scheduling
----------
.. kernel-doc:: drivers/gpu/drm/i915/i915_scheduler_types.h
:functions: i915_sched_engine
Logical Rings, Logical Ring Contexts and Execlists
--------------------------------------------------
@ -518,6 +525,14 @@ GuC-based command submission
.. kernel-doc:: drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
:doc: GuC-based command submission
GuC ABI
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. kernel-doc:: drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h
.. kernel-doc:: drivers/gpu/drm/i915/gt/uc/abi/guc_communication_mmio_abi.h
.. kernel-doc:: drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h
.. kernel-doc:: drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
HuC
---
.. kernel-doc:: drivers/gpu/drm/i915/gt/uc/intel_huc.c

View File

@ -0,0 +1,122 @@
/* SPDX-License-Identifier: MIT */
/*
* Copyright © 2021 Intel Corporation
*/
#define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see i915_context_engines_parallel_submit */
/**
* struct drm_i915_context_engines_parallel_submit - Configure engine for
* parallel submission.
*
* Setup a slot in the context engine map to allow multiple BBs to be submitted
* in a single execbuf IOCTL. Those BBs will then be scheduled to run on the GPU
* in parallel. Multiple hardware contexts are created internally in the i915
* run these BBs. Once a slot is configured for N BBs only N BBs can be
* submitted in each execbuf IOCTL and this is implicit behavior e.g. The user
* doesn't tell the execbuf IOCTL there are N BBs, the execbuf IOCTL knows how
* many BBs there are based on the slot's configuration. The N BBs are the last
* N buffer objects or first N if I915_EXEC_BATCH_FIRST is set.
*
* The default placement behavior is to create implicit bonds between each
* context if each context maps to more than 1 physical engine (e.g. context is
* a virtual engine). Also we only allow contexts of same engine class and these
* contexts must be in logically contiguous order. Examples of the placement
* behavior described below. Lastly, the default is to not allow BBs to
* preempted mid BB rather insert coordinated preemption on all hardware
* contexts between each set of BBs. Flags may be added in the future to change
* both of these default behaviors.
*
* Returns -EINVAL if hardware context placement configuration is invalid or if
* the placement configuration isn't supported on the platform / submission
* interface.
* Returns -ENODEV if extension isn't supported on the platform / submission
* interface.
*
* .. code-block:: none
*
* Example 1 pseudo code:
* CS[X] = generic engine of same class, logical instance X
* INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
* set_engines(INVALID)
* set_parallel(engine_index=0, width=2, num_siblings=1,
* engines=CS[0],CS[1])
*
* Results in the following valid placement:
* CS[0], CS[1]
*
* Example 2 pseudo code:
* CS[X] = generic engine of same class, logical instance X
* INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
* set_engines(INVALID)
* set_parallel(engine_index=0, width=2, num_siblings=2,
* engines=CS[0],CS[2],CS[1],CS[3])
*
* Results in the following valid placements:
* CS[0], CS[1]
* CS[2], CS[3]
*
* This can also be thought of as 2 virtual engines described by 2-D array
* in the engines the field with bonds placed between each index of the
* virtual engines. e.g. CS[0] is bonded to CS[1], CS[2] is bonded to
* CS[3].
* VE[0] = CS[0], CS[2]
* VE[1] = CS[1], CS[3]
*
* Example 3 pseudo code:
* CS[X] = generic engine of same class, logical instance X
* INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
* set_engines(INVALID)
* set_parallel(engine_index=0, width=2, num_siblings=2,
* engines=CS[0],CS[1],CS[1],CS[3])
*
* Results in the following valid and invalid placements:
* CS[0], CS[1]
* CS[1], CS[3] - Not logical contiguous, return -EINVAL
*/
struct drm_i915_context_engines_parallel_submit {
/**
* @base: base user extension.
*/
struct i915_user_extension base;
/**
* @engine_index: slot for parallel engine
*/
__u16 engine_index;
/**
* @width: number of contexts per parallel engine
*/
__u16 width;
/**
* @num_siblings: number of siblings per context
*/
__u16 num_siblings;
/**
* @mbz16: reserved for future use; must be zero
*/
__u16 mbz16;
/**
* @flags: all undefined flags must be zero, currently not defined flags
*/
__u64 flags;
/**
* @mbz64: reserved for future use; must be zero
*/
__u64 mbz64[3];
/**
* @engines: 2-d array of engine instances to configure parallel engine
*
* length = width (i) * num_siblings (j)
* index = j + i * num_siblings
*/
struct i915_engine_class_instance engines[0];
} __packed;

View File

@ -0,0 +1,148 @@
=========================================
I915 GuC Submission/DRM Scheduler Section
=========================================
Upstream plan
=============
For upstream the overall plan for landing GuC submission and integrating the
i915 with the DRM scheduler is:
* Merge basic GuC submission
* Basic submission support for all gen11+ platforms
* Not enabled by default on any current platforms but can be enabled via
modparam enable_guc
* Lots of rework will need to be done to integrate with DRM scheduler so
no need to nit pick everything in the code, it just should be
functional, no major coding style / layering errors, and not regress
execlists
* Update IGTs / selftests as needed to work with GuC submission
* Enable CI on supported platforms for a baseline
* Rework / get CI heathly for GuC submission in place as needed
* Merge new parallel submission uAPI
* Bonding uAPI completely incompatible with GuC submission, plus it has
severe design issues in general, which is why we want to retire it no
matter what
* New uAPI adds I915_CONTEXT_ENGINES_EXT_PARALLEL context setup step
which configures a slot with N contexts
* After I915_CONTEXT_ENGINES_EXT_PARALLEL a user can submit N batches to
a slot in a single execbuf IOCTL and the batches run on the GPU in
paralllel
* Initially only for GuC submission but execlists can be supported if
needed
* Convert the i915 to use the DRM scheduler
* GuC submission backend fully integrated with DRM scheduler
* All request queues removed from backend (e.g. all backpressure
handled in DRM scheduler)
* Resets / cancels hook in DRM scheduler
* Watchdog hooks into DRM scheduler
* Lots of complexity of the GuC backend can be pulled out once
integrated with DRM scheduler (e.g. state machine gets
simplier, locking gets simplier, etc...)
* Execlists backend will minimum required to hook in the DRM scheduler
* Legacy interface
* Features like timeslicing / preemption / virtual engines would
be difficult to integrate with the DRM scheduler and these
features are not required for GuC submission as the GuC does
these things for us
* ROI low on fully integrating into DRM scheduler
* Fully integrating would add lots of complexity to DRM
scheduler
* Port i915 priority inheritance / boosting feature in DRM scheduler
* Used for i915 page flip, may be useful to other DRM drivers as
well
* Will be an optional feature in the DRM scheduler
* Remove in-order completion assumptions from DRM scheduler
* Even when using the DRM scheduler the backends will handle
preemption, timeslicing, etc... so it is possible for jobs to
finish out of order
* Pull out i915 priority levels and use DRM priority levels
* Optimize DRM scheduler as needed
TODOs for GuC submission upstream
=================================
* Need an update to GuC firmware / i915 to enable error state capture
* Open source tool to decode GuC logs
* Public GuC spec
New uAPI for basic GuC submission
=================================
No major changes are required to the uAPI for basic GuC submission. The only
change is a new scheduler attribute: I915_SCHEDULER_CAP_STATIC_PRIORITY_MAP.
This attribute indicates the 2k i915 user priority levels are statically mapped
into 3 levels as follows:
* -1k to -1 Low priority
* 0 Medium priority
* 1 to 1k High priority
This is needed because the GuC only has 4 priority bands. The highest priority
band is reserved with the kernel. This aligns with the DRM scheduler priority
levels too.
Spec references:
----------------
* https://www.khronos.org/registry/EGL/extensions/IMG/EGL_IMG_context_priority.txt
* https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/chap5.html#devsandqueues-priority
* https://spec.oneapi.com/level-zero/latest/core/api.html#ze-command-queue-priority-t
New parallel submission uAPI
============================
The existing bonding uAPI is completely broken with GuC submission because
whether a submission is a single context submit or parallel submit isn't known
until execbuf time activated via the I915_SUBMIT_FENCE. To submit multiple
contexts in parallel with the GuC the context must be explicitly registered with
N contexts and all N contexts must be submitted in a single command to the GuC.
The GuC interfaces do not support dynamically changing between N contexts as the
bonding uAPI does. Hence the need for a new parallel submission interface. Also
the legacy bonding uAPI is quite confusing and not intuitive at all. Furthermore
I915_SUBMIT_FENCE is by design a future fence, so not really something we should
continue to support.
The new parallel submission uAPI consists of 3 parts:
* Export engines logical mapping
* A 'set_parallel' extension to configure contexts for parallel
submission
* Extend execbuf2 IOCTL to support submitting N BBs in a single IOCTL
Export engines logical mapping
------------------------------
Certain use cases require BBs to be placed on engine instances in logical order
(e.g. split-frame on gen11+). The logical mapping of engine instances can change
based on fusing. Rather than making UMDs be aware of fusing, simply expose the
logical mapping with the existing query engine info IOCTL. Also the GuC
submission interface currently only supports submitting multiple contexts to
engines in logical order which is a new requirement compared to execlists.
Lastly, all current platforms have at most 2 engine instances and the logical
order is the same as uAPI order. This will change on platforms with more than 2
engine instances.
A single bit will be added to drm_i915_engine_info.flags indicating that the
logical instance has been returned and a new field,
drm_i915_engine_info.logical_instance, returns the logical instance.
A 'set_parallel' extension to configure contexts for parallel submission
------------------------------------------------------------------------
The 'set_parallel' extension configures a slot for parallel submission of N BBs.
It is a setup step that must be called before using any of the contexts. See
I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE or I915_CONTEXT_ENGINES_EXT_BOND for
similar existing examples. Once a slot is configured for parallel submission the
execbuf2 IOCTL can be called submitting N BBs in a single IOCTL. Initially only
supports GuC submission. Execlists supports can be added later if needed.
Add I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT and
drm_i915_context_engines_parallel_submit to the uAPI to implement this
extension.
.. kernel-doc:: Documentation/gpu/rfc/i915_parallel_execbuf.h
:functions: drm_i915_context_engines_parallel_submit
Extend execbuf2 IOCTL to support submitting N BBs in a single IOCTL
-------------------------------------------------------------------
Contexts that have been configured with the 'set_parallel' extension can only
submit N BBs in a single execbuf2 IOCTL. The BBs are either the last N objects
in the drm_i915_gem_exec_object2 list or the first N if I915_EXEC_BATCH_FIRST is
set. The number of BBs is implicit based on the slot submitted and how it has
been configured by 'set_parallel' or other extensions. No uAPI changes are
required to the execbuf2 IOCTL.

View File

@ -19,3 +19,7 @@ host such documentation:
.. toctree::
i915_gem_lmem.rst
.. toctree::
i915_scheduler.rst

View File

@ -207,6 +207,8 @@ config DRM_I915_LOW_LEVEL_TRACEPOINTS
This provides the ability to precisely monitor engine utilisation
and also analyze the request dependency resolving timeline.
Recommended for driver developers only.
If in doubt, say "N".
config DRM_I915_DEBUG_VBLANK_EVADE
@ -220,6 +222,8 @@ config DRM_I915_DEBUG_VBLANK_EVADE
is exceeded, even if there isn't an actual risk of missing
the vblank.
Recommended for driver developers only.
If in doubt, say "N".
config DRM_I915_DEBUG_RUNTIME_PM
@ -232,4 +236,6 @@ config DRM_I915_DEBUG_RUNTIME_PM
runtime PM functionality. This may introduce overhead during
driver loading, suspend and resume operations.
Recommended for driver developers only.
If in doubt, say "N"

View File

@ -38,6 +38,7 @@ i915-y += i915_drv.o \
i915_irq.o \
i915_getparam.o \
i915_mitigations.o \
i915_module.o \
i915_params.o \
i915_pci.o \
i915_scatterlist.o \
@ -89,7 +90,6 @@ gt-y += \
gt/gen8_ppgtt.o \
gt/intel_breadcrumbs.o \
gt/intel_context.o \
gt/intel_context_param.o \
gt/intel_context_sseu.o \
gt/intel_engine_cs.o \
gt/intel_engine_heartbeat.o \
@ -108,6 +108,7 @@ gt-y += \
gt/intel_gtt.o \
gt/intel_llc.o \
gt/intel_lrc.o \
gt/intel_migrate.o \
gt/intel_mocs.o \
gt/intel_ppgtt.o \
gt/intel_rc6.o \
@ -135,7 +136,6 @@ i915-y += $(gt-y)
gem-y += \
gem/i915_gem_busy.o \
gem/i915_gem_clflush.o \
gem/i915_gem_client_blt.o \
gem/i915_gem_context.o \
gem/i915_gem_create.o \
gem/i915_gem_dmabuf.o \
@ -143,7 +143,6 @@ gem-y += \
gem/i915_gem_execbuffer.o \
gem/i915_gem_internal.o \
gem/i915_gem_object.o \
gem/i915_gem_object_blt.o \
gem/i915_gem_lmem.o \
gem/i915_gem_mman.o \
gem/i915_gem_pages.o \
@ -162,15 +161,17 @@ gem-y += \
i915-y += \
$(gem-y) \
i915_active.o \
i915_buddy.o \
i915_cmd_parser.o \
i915_gem_evict.o \
i915_gem_gtt.o \
i915_gem_ww.o \
i915_gem.o \
i915_globals.o \
i915_query.o \
i915_request.o \
i915_scheduler.o \
i915_trace_points.o \
i915_ttm_buddy_manager.o \
i915_vma.o \
intel_wopcm.o
@ -185,6 +186,8 @@ i915-y += gt/uc/intel_uc.o \
gt/uc/intel_guc_fw.o \
gt/uc/intel_guc_log.o \
gt/uc/intel_guc_log_debugfs.o \
gt/uc/intel_guc_rc.o \
gt/uc/intel_guc_slpc.o \
gt/uc/intel_guc_submission.o \
gt/uc/intel_huc.o \
gt/uc/intel_huc_debugfs.o \
@ -277,7 +280,9 @@ i915-y += i915_perf.o
# Post-mortem debug and GPU hang state capture
i915-$(CONFIG_DRM_I915_CAPTURE_ERROR) += i915_gpu_error.o
i915-$(CONFIG_DRM_I915_SELFTEST) += \
gem/selftests/i915_gem_client_blt.o \
gem/selftests/igt_gem_utils.o \
selftests/intel_scheduler_helpers.o \
selftests/i915_random.o \
selftests/i915_selftest.o \
selftests/igt_atomic.o \

View File

@ -1331,6 +1331,9 @@ retry:
ret = i915_gem_object_lock(obj, &ww);
if (!ret && phys_cursor)
ret = i915_gem_object_attach_phys(obj, alignment);
else if (!ret && HAS_LMEM(dev_priv))
ret = i915_gem_object_migrate(obj, &ww, INTEL_REGION_LMEM);
/* TODO: Do we need to sync when migration becomes async? */
if (!ret)
ret = i915_gem_object_pin_pages(obj);
if (ret)
@ -11778,7 +11781,7 @@ intel_user_framebuffer_create(struct drm_device *dev,
/* object is backed with LMEM for discrete */
i915 = to_i915(obj->base.dev);
if (HAS_LMEM(i915) && !i915_gem_object_validates_to_lmem(obj)) {
if (HAS_LMEM(i915) && !i915_gem_object_can_migrate(obj, INTEL_REGION_LMEM)) {
/* object is "remote", not in local memory */
i915_gem_object_put(obj);
return ERR_PTR(-EREMOTE);

View File

@ -5799,7 +5799,7 @@ static void tgl_bw_buddy_init(struct drm_i915_private *dev_priv)
int config, i;
if (IS_ALDERLAKE_S(dev_priv) ||
IS_DG1_REVID(dev_priv, DG1_REVID_A0, DG1_REVID_A0) ||
IS_DG1_DISPLAY_STEP(dev_priv, STEP_A0, STEP_A0) ||
IS_TGL_DISPLAY_STEP(dev_priv, STEP_A0, STEP_B0))
/* Wa_1409767108:tgl,dg1,adl-s */
table = wa_1409767108_buddy_page_masks;

View File

@ -2667,15 +2667,15 @@ static bool cnl_ddi_hdmi_pll_dividers(struct intel_crtc_state *crtc_state)
}
/*
* Display WA #22010492432: ehl, tgl
* Display WA #22010492432: ehl, tgl, adl-p
* Program half of the nominal DCO divider fraction value.
*/
static bool
ehl_combo_pll_div_frac_wa_needed(struct drm_i915_private *i915)
{
return ((IS_PLATFORM(i915, INTEL_ELKHARTLAKE) &&
IS_JSL_EHL_REVID(i915, EHL_REVID_B0, REVID_FOREVER)) ||
IS_TIGERLAKE(i915)) &&
IS_JSL_EHL_DISPLAY_STEP(i915, STEP_B0, STEP_FOREVER)) ||
IS_TIGERLAKE(i915) || IS_ALDERLAKE_P(i915)) &&
i915->dpll.ref_clks.nssc == 38400;
}

View File

@ -594,7 +594,7 @@ static void hsw_activate_psr2(struct intel_dp *intel_dp)
if (intel_dp->psr.psr2_sel_fetch_enabled) {
/* WA 1408330847 */
if (IS_TGL_DISPLAY_STEP(dev_priv, STEP_A0, STEP_A0) ||
IS_RKL_REVID(dev_priv, RKL_REVID_A0, RKL_REVID_A0))
IS_RKL_DISPLAY_STEP(dev_priv, STEP_A0, STEP_A0))
intel_de_rmw(dev_priv, CHICKEN_PAR1_1,
DIS_RAM_BYPASS_PSR2_MAN_TRACK,
DIS_RAM_BYPASS_PSR2_MAN_TRACK);
@ -1342,7 +1342,7 @@ static void intel_psr_disable_locked(struct intel_dp *intel_dp)
/* WA 1408330847 */
if (intel_dp->psr.psr2_sel_fetch_enabled &&
(IS_TGL_DISPLAY_STEP(dev_priv, STEP_A0, STEP_A0) ||
IS_RKL_REVID(dev_priv, RKL_REVID_A0, RKL_REVID_A0)))
IS_RKL_DISPLAY_STEP(dev_priv, STEP_A0, STEP_A0)))
intel_de_rmw(dev_priv, CHICKEN_PAR1_1,
DIS_RAM_BYPASS_PSR2_MAN_TRACK, 0);

View File

@ -24,13 +24,11 @@ static void __do_clflush(struct drm_i915_gem_object *obj)
i915_gem_object_flush_frontbuffer(obj, ORIGIN_CPU);
}
static int clflush_work(struct dma_fence_work *base)
static void clflush_work(struct dma_fence_work *base)
{
struct clflush *clflush = container_of(base, typeof(*clflush), base);
__do_clflush(clflush->obj);
return 0;
}
static void clflush_release(struct dma_fence_work *base)

View File

@ -1,355 +0,0 @@
// SPDX-License-Identifier: MIT
/*
* Copyright © 2019 Intel Corporation
*/
#include "i915_drv.h"
#include "gt/intel_context.h"
#include "gt/intel_engine_pm.h"
#include "i915_gem_client_blt.h"
#include "i915_gem_object_blt.h"
struct i915_sleeve {
struct i915_vma *vma;
struct drm_i915_gem_object *obj;
struct sg_table *pages;
struct i915_page_sizes page_sizes;
};
static int vma_set_pages(struct i915_vma *vma)
{
struct i915_sleeve *sleeve = vma->private;
vma->pages = sleeve->pages;
vma->page_sizes = sleeve->page_sizes;
return 0;
}
static void vma_clear_pages(struct i915_vma *vma)
{
GEM_BUG_ON(!vma->pages);
vma->pages = NULL;
}
static void vma_bind(struct i915_address_space *vm,
struct i915_vm_pt_stash *stash,
struct i915_vma *vma,
enum i915_cache_level cache_level,
u32 flags)
{
vm->vma_ops.bind_vma(vm, stash, vma, cache_level, flags);
}
static void vma_unbind(struct i915_address_space *vm, struct i915_vma *vma)
{
vm->vma_ops.unbind_vma(vm, vma);
}
static const struct i915_vma_ops proxy_vma_ops = {
.set_pages = vma_set_pages,
.clear_pages = vma_clear_pages,
.bind_vma = vma_bind,
.unbind_vma = vma_unbind,
};
static struct i915_sleeve *create_sleeve(struct i915_address_space *vm,
struct drm_i915_gem_object *obj,
struct sg_table *pages,
struct i915_page_sizes *page_sizes)
{
struct i915_sleeve *sleeve;
struct i915_vma *vma;
int err;
sleeve = kzalloc(sizeof(*sleeve), GFP_KERNEL);
if (!sleeve)
return ERR_PTR(-ENOMEM);
vma = i915_vma_instance(obj, vm, NULL);
if (IS_ERR(vma)) {
err = PTR_ERR(vma);
goto err_free;
}
vma->private = sleeve;
vma->ops = &proxy_vma_ops;
sleeve->vma = vma;
sleeve->pages = pages;
sleeve->page_sizes = *page_sizes;
return sleeve;
err_free:
kfree(sleeve);
return ERR_PTR(err);
}
static void destroy_sleeve(struct i915_sleeve *sleeve)
{
kfree(sleeve);
}
struct clear_pages_work {
struct dma_fence dma;
struct dma_fence_cb cb;
struct i915_sw_fence wait;
struct work_struct work;
struct irq_work irq_work;
struct i915_sleeve *sleeve;
struct intel_context *ce;
u32 value;
};
static const char *clear_pages_work_driver_name(struct dma_fence *fence)
{
return DRIVER_NAME;
}
static const char *clear_pages_work_timeline_name(struct dma_fence *fence)
{
return "clear";
}
static void clear_pages_work_release(struct dma_fence *fence)
{
struct clear_pages_work *w = container_of(fence, typeof(*w), dma);
destroy_sleeve(w->sleeve);
i915_sw_fence_fini(&w->wait);
BUILD_BUG_ON(offsetof(typeof(*w), dma));
dma_fence_free(&w->dma);
}
static const struct dma_fence_ops clear_pages_work_ops = {
.get_driver_name = clear_pages_work_driver_name,
.get_timeline_name = clear_pages_work_timeline_name,
.release = clear_pages_work_release,
};
static void clear_pages_signal_irq_worker(struct irq_work *work)
{
struct clear_pages_work *w = container_of(work, typeof(*w), irq_work);
dma_fence_signal(&w->dma);
dma_fence_put(&w->dma);
}
static void clear_pages_dma_fence_cb(struct dma_fence *fence,
struct dma_fence_cb *cb)
{
struct clear_pages_work *w = container_of(cb, typeof(*w), cb);
if (fence->error)
dma_fence_set_error(&w->dma, fence->error);
/*
* Push the signalling of the fence into yet another worker to avoid
* the nightmare locking around the fence spinlock.
*/
irq_work_queue(&w->irq_work);
}
static void clear_pages_worker(struct work_struct *work)
{
struct clear_pages_work *w = container_of(work, typeof(*w), work);
struct drm_i915_gem_object *obj = w->sleeve->vma->obj;
struct i915_vma *vma = w->sleeve->vma;
struct i915_gem_ww_ctx ww;
struct i915_request *rq;
struct i915_vma *batch;
int err = w->dma.error;
if (unlikely(err))
goto out_signal;
if (obj->cache_dirty) {
if (i915_gem_object_has_struct_page(obj))
drm_clflush_sg(w->sleeve->pages);
obj->cache_dirty = false;
}
obj->read_domains = I915_GEM_GPU_DOMAINS;
obj->write_domain = 0;
i915_gem_ww_ctx_init(&ww, false);
intel_engine_pm_get(w->ce->engine);
retry:
err = intel_context_pin_ww(w->ce, &ww);
if (err)
goto out_signal;
batch = intel_emit_vma_fill_blt(w->ce, vma, &ww, w->value);
if (IS_ERR(batch)) {
err = PTR_ERR(batch);
goto out_ctx;
}
rq = i915_request_create(w->ce);
if (IS_ERR(rq)) {
err = PTR_ERR(rq);
goto out_batch;
}
/* There's no way the fence has signalled */
if (dma_fence_add_callback(&rq->fence, &w->cb,
clear_pages_dma_fence_cb))
GEM_BUG_ON(1);
err = intel_emit_vma_mark_active(batch, rq);
if (unlikely(err))
goto out_request;
/*
* w->dma is already exported via (vma|obj)->resv we need only
* keep track of the GPU activity within this vma/request, and
* propagate the signal from the request to w->dma.
*/
err = __i915_vma_move_to_active(vma, rq);
if (err)
goto out_request;
if (rq->engine->emit_init_breadcrumb) {
err = rq->engine->emit_init_breadcrumb(rq);
if (unlikely(err))
goto out_request;
}
err = rq->engine->emit_bb_start(rq,
batch->node.start, batch->node.size,
0);
out_request:
if (unlikely(err)) {
i915_request_set_error_once(rq, err);
err = 0;
}
i915_request_add(rq);
out_batch:
intel_emit_vma_release(w->ce, batch);
out_ctx:
intel_context_unpin(w->ce);
out_signal:
if (err == -EDEADLK) {
err = i915_gem_ww_ctx_backoff(&ww);
if (!err)
goto retry;
}
i915_gem_ww_ctx_fini(&ww);
i915_vma_unpin(w->sleeve->vma);
intel_engine_pm_put(w->ce->engine);
if (unlikely(err)) {
dma_fence_set_error(&w->dma, err);
dma_fence_signal(&w->dma);
dma_fence_put(&w->dma);
}
}
static int pin_wait_clear_pages_work(struct clear_pages_work *w,
struct intel_context *ce)
{
struct i915_vma *vma = w->sleeve->vma;
struct i915_gem_ww_ctx ww;
int err;
i915_gem_ww_ctx_init(&ww, false);
retry:
err = i915_gem_object_lock(vma->obj, &ww);
if (err)
goto out;
err = i915_vma_pin_ww(vma, &ww, 0, 0, PIN_USER);
if (unlikely(err))
goto out;
err = i915_sw_fence_await_reservation(&w->wait,
vma->obj->base.resv, NULL,
true, 0, I915_FENCE_GFP);
if (err)
goto err_unpin_vma;
dma_resv_add_excl_fence(vma->obj->base.resv, &w->dma);
err_unpin_vma:
if (err)
i915_vma_unpin(vma);
out:
if (err == -EDEADLK) {
err = i915_gem_ww_ctx_backoff(&ww);
if (!err)
goto retry;
}
i915_gem_ww_ctx_fini(&ww);
return err;
}
static int __i915_sw_fence_call
clear_pages_work_notify(struct i915_sw_fence *fence,
enum i915_sw_fence_notify state)
{
struct clear_pages_work *w = container_of(fence, typeof(*w), wait);
switch (state) {
case FENCE_COMPLETE:
schedule_work(&w->work);
break;
case FENCE_FREE:
dma_fence_put(&w->dma);
break;
}
return NOTIFY_DONE;
}
static DEFINE_SPINLOCK(fence_lock);
/* XXX: better name please */
int i915_gem_schedule_fill_pages_blt(struct drm_i915_gem_object *obj,
struct intel_context *ce,
struct sg_table *pages,
struct i915_page_sizes *page_sizes,
u32 value)
{
struct clear_pages_work *work;
struct i915_sleeve *sleeve;
int err;
sleeve = create_sleeve(ce->vm, obj, pages, page_sizes);
if (IS_ERR(sleeve))
return PTR_ERR(sleeve);
work = kmalloc(sizeof(*work), GFP_KERNEL);
if (!work) {
destroy_sleeve(sleeve);
return -ENOMEM;
}
work->value = value;
work->sleeve = sleeve;
work->ce = ce;
INIT_WORK(&work->work, clear_pages_worker);
init_irq_work(&work->irq_work, clear_pages_signal_irq_worker);
dma_fence_init(&work->dma, &clear_pages_work_ops, &fence_lock, 0, 0);
i915_sw_fence_init(&work->wait, clear_pages_work_notify);
err = pin_wait_clear_pages_work(work, ce);
if (err < 0)
dma_fence_set_error(&work->dma, err);
dma_fence_get(&work->dma);
i915_sw_fence_commit(&work->wait);
return err;
}
#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
#include "selftests/i915_gem_client_blt.c"
#endif

View File

@ -1,21 +0,0 @@
/* SPDX-License-Identifier: MIT */
/*
* Copyright © 2019 Intel Corporation
*/
#ifndef __I915_GEM_CLIENT_BLT_H__
#define __I915_GEM_CLIENT_BLT_H__
#include <linux/types.h>
struct drm_i915_gem_object;
struct i915_page_sizes;
struct intel_context;
struct sg_table;
int i915_gem_schedule_fill_pages_blt(struct drm_i915_gem_object *obj,
struct intel_context *ce,
struct sg_table *pages,
struct i915_page_sizes *page_sizes,
u32 value);
#endif

File diff suppressed because it is too large Load Diff

View File

@ -133,6 +133,9 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
int i915_gem_context_reset_stats_ioctl(struct drm_device *dev, void *data,
struct drm_file *file);
struct i915_gem_context *
i915_gem_context_lookup(struct drm_i915_file_private *file_priv, u32 id);
static inline struct i915_gem_context *
i915_gem_context_get(struct i915_gem_context *ctx)
{
@ -221,6 +224,9 @@ i915_gem_engines_iter_next(struct i915_gem_engines_iter *it);
for (i915_gem_engines_iter_init(&(it), (engines)); \
((ce) = i915_gem_engines_iter_next(&(it)));)
void i915_gem_context_module_exit(void);
int i915_gem_context_module_init(void);
struct i915_lut_handle *i915_lut_handle_alloc(void);
void i915_lut_handle_free(struct i915_lut_handle *lut);

View File

@ -30,22 +30,176 @@ struct i915_address_space;
struct intel_timeline;
struct intel_ring;
/**
* struct i915_gem_engines - A set of engines
*/
struct i915_gem_engines {
union {
/** @link: Link in i915_gem_context::stale::engines */
struct list_head link;
/** @rcu: RCU to use when freeing */
struct rcu_head rcu;
};
/** @fence: Fence used for delayed destruction of engines */
struct i915_sw_fence fence;
/** @ctx: i915_gem_context backpointer */
struct i915_gem_context *ctx;
/** @num_engines: Number of engines in this set */
unsigned int num_engines;
/** @engines: Array of engines */
struct intel_context *engines[];
};
/**
* struct i915_gem_engines_iter - Iterator for an i915_gem_engines set
*/
struct i915_gem_engines_iter {
/** @idx: Index into i915_gem_engines::engines */
unsigned int idx;
/** @engines: Engine set being iterated */
const struct i915_gem_engines *engines;
};
/**
* enum i915_gem_engine_type - Describes the type of an i915_gem_proto_engine
*/
enum i915_gem_engine_type {
/** @I915_GEM_ENGINE_TYPE_INVALID: An invalid engine */
I915_GEM_ENGINE_TYPE_INVALID = 0,
/** @I915_GEM_ENGINE_TYPE_PHYSICAL: A single physical engine */
I915_GEM_ENGINE_TYPE_PHYSICAL,
/** @I915_GEM_ENGINE_TYPE_BALANCED: A load-balanced engine set */
I915_GEM_ENGINE_TYPE_BALANCED,
};
/**
* struct i915_gem_proto_engine - prototype engine
*
* This struct describes an engine that a context may contain. Engines
* have three types:
*
* - I915_GEM_ENGINE_TYPE_INVALID: Invalid engines can be created but they
* show up as a NULL in i915_gem_engines::engines[i] and any attempt to
* use them by the user results in -EINVAL. They are also useful during
* proto-context construction because the client may create invalid
* engines and then set them up later as virtual engines.
*
* - I915_GEM_ENGINE_TYPE_PHYSICAL: A single physical engine, described by
* i915_gem_proto_engine::engine.
*
* - I915_GEM_ENGINE_TYPE_BALANCED: A load-balanced engine set, described
* i915_gem_proto_engine::num_siblings and i915_gem_proto_engine::siblings.
*/
struct i915_gem_proto_engine {
/** @type: Type of this engine */
enum i915_gem_engine_type type;
/** @engine: Engine, for physical */
struct intel_engine_cs *engine;
/** @num_siblings: Number of balanced siblings */
unsigned int num_siblings;
/** @siblings: Balanced siblings */
struct intel_engine_cs **siblings;
/** @sseu: Client-set SSEU parameters */
struct intel_sseu sseu;
};
/**
* struct i915_gem_proto_context - prototype context
*
* The struct i915_gem_proto_context represents the creation parameters for
* a struct i915_gem_context. This is used to gather parameters provided
* either through creation flags or via SET_CONTEXT_PARAM so that, when we
* create the final i915_gem_context, those parameters can be immutable.
*
* The context uAPI allows for two methods of setting context parameters:
* SET_CONTEXT_PARAM and CONTEXT_CREATE_EXT_SETPARAM. The former is
* allowed to be called at any time while the later happens as part of
* GEM_CONTEXT_CREATE. When these were initially added, Currently,
* everything settable via one is settable via the other. While some
* params are fairly simple and setting them on a live context is harmless
* such the context priority, others are far trickier such as the VM or the
* set of engines. To avoid some truly nasty race conditions, we don't
* allow setting the VM or the set of engines on live contexts.
*
* The way we dealt with this without breaking older userspace that sets
* the VM or engine set via SET_CONTEXT_PARAM is to delay the creation of
* the actual context until after the client is done configuring it with
* SET_CONTEXT_PARAM. From the perspective of the client, it has the same
* u32 context ID the whole time. From the perspective of i915, however,
* it's an i915_gem_proto_context right up until the point where we attempt
* to do something which the proto-context can't handle at which point the
* real context gets created.
*
* This is accomplished via a little xarray dance. When GEM_CONTEXT_CREATE
* is called, we create a proto-context, reserve a slot in context_xa but
* leave it NULL, the proto-context in the corresponding slot in
* proto_context_xa. Then, whenever we go to look up a context, we first
* check context_xa. If it's there, we return the i915_gem_context and
* we're done. If it's not, we look in proto_context_xa and, if we find it
* there, we create the actual context and kill the proto-context.
*
* At the time we made this change (April, 2021), we did a fairly complete
* audit of existing userspace to ensure this wouldn't break anything:
*
* - Mesa/i965 didn't use the engines or VM APIs at all
*
* - Mesa/ANV used the engines API but via CONTEXT_CREATE_EXT_SETPARAM and
* didn't use the VM API.
*
* - Mesa/iris didn't use the engines or VM APIs at all
*
* - The open-source compute-runtime didn't yet use the engines API but
* did use the VM API via SET_CONTEXT_PARAM. However, CONTEXT_SETPARAM
* was always the second ioctl on that context, immediately following
* GEM_CONTEXT_CREATE.
*
* - The media driver sets engines and bonding/balancing via
* SET_CONTEXT_PARAM. However, CONTEXT_SETPARAM to set the VM was
* always the second ioctl on that context, immediately following
* GEM_CONTEXT_CREATE and setting engines immediately followed that.
*
* In order for this dance to work properly, any modification to an
* i915_gem_proto_context that is exposed to the client via
* drm_i915_file_private::proto_context_xa must be guarded by
* drm_i915_file_private::proto_context_lock. The exception is when a
* proto-context has not yet been exposed such as when handling
* CONTEXT_CREATE_SET_PARAM during GEM_CONTEXT_CREATE.
*/
struct i915_gem_proto_context {
/** @vm: See &i915_gem_context.vm */
struct i915_address_space *vm;
/** @user_flags: See &i915_gem_context.user_flags */
unsigned long user_flags;
/** @sched: See &i915_gem_context.sched */
struct i915_sched_attr sched;
/** @num_user_engines: Number of user-specified engines or -1 */
int num_user_engines;
/** @user_engines: User-specified engines */
struct i915_gem_proto_engine *user_engines;
/** @legacy_rcs_sseu: Client-set SSEU parameters for the legacy RCS */
struct intel_sseu legacy_rcs_sseu;
/** @single_timeline: See See &i915_gem_context.syncobj */
bool single_timeline;
};
/**
* struct i915_gem_context - client state
*
@ -53,10 +207,10 @@ struct i915_gem_engines_iter {
* logical hardware state for a particular client.
*/
struct i915_gem_context {
/** i915: i915 device backpointer */
/** @i915: i915 device backpointer */
struct drm_i915_private *i915;
/** file_priv: owning file descriptor */
/** @file_priv: owning file descriptor */
struct drm_i915_file_private *file_priv;
/**
@ -81,9 +235,23 @@ struct i915_gem_context {
* CONTEXT_USER_ENGINES flag is set).
*/
struct i915_gem_engines __rcu *engines;
struct mutex engines_mutex; /* guards writes to engines */
struct intel_timeline *timeline;
/** @engines_mutex: guards writes to engines */
struct mutex engines_mutex;
/**
* @syncobj: Shared timeline syncobj
*
* When the SHARED_TIMELINE flag is set on context creation, we
* emulate a single timeline across all engines using this syncobj.
* For every execbuffer2 call, this syncobj is used as both an in-
* and out-fence. Unlike the real intel_timeline, this doesn't
* provide perfect atomic in-order guarantees if the client races
* with itself by calling execbuffer2 twice concurrently. However,
* if userspace races with itself, that's not likely to yield well-
* defined results anyway so we choose to not care.
*/
struct drm_syncobj *syncobj;
/**
* @vm: unique address space (GTT)
@ -106,7 +274,7 @@ struct i915_gem_context {
*/
struct pid *pid;
/** link: place with &drm_i915_private.context_list */
/** @link: place with &drm_i915_private.context_list */
struct list_head link;
/**
@ -129,7 +297,6 @@ struct i915_gem_context {
* @user_flags: small set of booleans controlled by the user
*/
unsigned long user_flags;
#define UCONTEXT_NO_ZEROMAP 0
#define UCONTEXT_NO_ERROR_CAPTURE 1
#define UCONTEXT_BANNABLE 2
#define UCONTEXT_RECOVERABLE 3
@ -142,11 +309,13 @@ struct i915_gem_context {
#define CONTEXT_CLOSED 0
#define CONTEXT_USER_ENGINES 1
/** @mutex: guards everything that isn't engines or handles_vma */
struct mutex mutex;
/** @sched: scheduler parameters */
struct i915_sched_attr sched;
/** guilty_count: How many times this context has caused a GPU hang. */
/** @guilty_count: How many times this context has caused a GPU hang. */
atomic_t guilty_count;
/**
* @active_count: How many times this context was active during a GPU
@ -154,25 +323,23 @@ struct i915_gem_context {
*/
atomic_t active_count;
struct {
u64 timeout_us;
} watchdog;
/**
* @hang_timestamp: The last time(s) this context caused a GPU hang
*/
unsigned long hang_timestamp[2];
#define CONTEXT_FAST_HANG_JIFFIES (120 * HZ) /* 3 hangs within 120s? Banned! */
/** remap_slice: Bitmask of cache lines that need remapping */
/** @remap_slice: Bitmask of cache lines that need remapping */
u8 remap_slice;
/**
* handles_vma: rbtree to look up our context specific obj/vma for
* @handles_vma: rbtree to look up our context specific obj/vma for
* the user handle. (user handles are per fd, but the binding is
* per vm, which may be one per context or shared with the global GTT)
*/
struct radix_tree_root handles_vma;
/** @lut_mutex: Locks handles_vma */
struct mutex lut_mutex;
/**
@ -184,8 +351,11 @@ struct i915_gem_context {
*/
char name[TASK_COMM_LEN + 8];
/** @stale: tracks stale engines to be destroyed */
struct {
/** @lock: guards engines */
spinlock_t lock;
/** @engines: list of stale engines */
struct list_head engines;
} stale;
};

View File

@ -11,13 +11,14 @@
#include "i915_trace.h"
#include "i915_user_extensions.h"
static u32 object_max_page_size(struct drm_i915_gem_object *obj)
static u32 object_max_page_size(struct intel_memory_region **placements,
unsigned int n_placements)
{
u32 max_page_size = 0;
int i;
for (i = 0; i < obj->mm.n_placements; i++) {
struct intel_memory_region *mr = obj->mm.placements[i];
for (i = 0; i < n_placements; i++) {
struct intel_memory_region *mr = placements[i];
GEM_BUG_ON(!is_power_of_2(mr->min_page_size));
max_page_size = max_t(u32, max_page_size, mr->min_page_size);
@ -27,10 +28,13 @@ static u32 object_max_page_size(struct drm_i915_gem_object *obj)
return max_page_size;
}
static void object_set_placements(struct drm_i915_gem_object *obj,
struct intel_memory_region **placements,
unsigned int n_placements)
static int object_set_placements(struct drm_i915_gem_object *obj,
struct intel_memory_region **placements,
unsigned int n_placements)
{
struct intel_memory_region **arr;
unsigned int i;
GEM_BUG_ON(!n_placements);
/*
@ -44,9 +48,20 @@ static void object_set_placements(struct drm_i915_gem_object *obj,
obj->mm.placements = &i915->mm.regions[mr->id];
obj->mm.n_placements = 1;
} else {
obj->mm.placements = placements;
arr = kmalloc_array(n_placements,
sizeof(struct intel_memory_region *),
GFP_KERNEL);
if (!arr)
return -ENOMEM;
for (i = 0; i < n_placements; i++)
arr[i] = placements[i];
obj->mm.placements = arr;
obj->mm.n_placements = n_placements;
}
return 0;
}
static int i915_gem_publish(struct drm_i915_gem_object *obj,
@ -67,22 +82,46 @@ static int i915_gem_publish(struct drm_i915_gem_object *obj,
return 0;
}
static int
i915_gem_setup(struct drm_i915_gem_object *obj, u64 size)
/**
* Creates a new object using the same path as DRM_I915_GEM_CREATE_EXT
* @i915: i915 private
* @size: size of the buffer, in bytes
* @placements: possible placement regions, in priority order
* @n_placements: number of possible placement regions
*
* This function is exposed primarily for selftests and does very little
* error checking. It is assumed that the set of placement regions has
* already been verified to be valid.
*/
struct drm_i915_gem_object *
__i915_gem_object_create_user(struct drm_i915_private *i915, u64 size,
struct intel_memory_region **placements,
unsigned int n_placements)
{
struct intel_memory_region *mr = obj->mm.placements[0];
struct intel_memory_region *mr = placements[0];
struct drm_i915_gem_object *obj;
unsigned int flags;
int ret;
size = round_up(size, object_max_page_size(obj));
i915_gem_flush_free_objects(i915);
size = round_up(size, object_max_page_size(placements, n_placements));
if (size == 0)
return -EINVAL;
return ERR_PTR(-EINVAL);
/* For most of the ABI (e.g. mmap) we think in system pages */
GEM_BUG_ON(!IS_ALIGNED(size, PAGE_SIZE));
if (i915_gem_object_size_2big(size))
return -E2BIG;
return ERR_PTR(-E2BIG);
obj = i915_gem_object_alloc();
if (!obj)
return ERR_PTR(-ENOMEM);
ret = object_set_placements(obj, placements, n_placements);
if (ret)
goto object_free;
/*
* I915_BO_ALLOC_USER will make sure the object is cleared before
@ -90,14 +129,20 @@ i915_gem_setup(struct drm_i915_gem_object *obj, u64 size)
*/
flags = I915_BO_ALLOC_USER;
ret = mr->ops->init_object(mr, obj, size, flags);
ret = mr->ops->init_object(mr, obj, size, 0, flags);
if (ret)
return ret;
goto object_free;
GEM_BUG_ON(size != obj->base.size);
trace_i915_gem_object_create(obj);
return 0;
return obj;
object_free:
if (obj->mm.n_placements > 1)
kfree(obj->mm.placements);
i915_gem_object_free(obj);
return ERR_PTR(ret);
}
int
@ -110,7 +155,6 @@ i915_gem_dumb_create(struct drm_file *file,
enum intel_memory_type mem_type;
int cpp = DIV_ROUND_UP(args->bpp, 8);
u32 format;
int ret;
switch (cpp) {
case 1:
@ -143,22 +187,13 @@ i915_gem_dumb_create(struct drm_file *file,
if (HAS_LMEM(to_i915(dev)))
mem_type = INTEL_MEMORY_LOCAL;
obj = i915_gem_object_alloc();
if (!obj)
return -ENOMEM;
mr = intel_memory_region_by_type(to_i915(dev), mem_type);
object_set_placements(obj, &mr, 1);
ret = i915_gem_setup(obj, args->size);
if (ret)
goto object_free;
obj = __i915_gem_object_create_user(to_i915(dev), args->size, &mr, 1);
if (IS_ERR(obj))
return PTR_ERR(obj);
return i915_gem_publish(obj, file, &args->size, &args->handle);
object_free:
i915_gem_object_free(obj);
return ret;
}
/**
@ -175,31 +210,20 @@ i915_gem_create_ioctl(struct drm_device *dev, void *data,
struct drm_i915_gem_create *args = data;
struct drm_i915_gem_object *obj;
struct intel_memory_region *mr;
int ret;
i915_gem_flush_free_objects(i915);
obj = i915_gem_object_alloc();
if (!obj)
return -ENOMEM;
mr = intel_memory_region_by_type(i915, INTEL_MEMORY_SYSTEM);
object_set_placements(obj, &mr, 1);
ret = i915_gem_setup(obj, args->size);
if (ret)
goto object_free;
obj = __i915_gem_object_create_user(i915, args->size, &mr, 1);
if (IS_ERR(obj))
return PTR_ERR(obj);
return i915_gem_publish(obj, file, &args->size, &args->handle);
object_free:
i915_gem_object_free(obj);
return ret;
}
struct create_ext {
struct drm_i915_private *i915;
struct drm_i915_gem_object *vanilla_object;
struct intel_memory_region *placements[INTEL_REGION_UNKNOWN];
unsigned int n_placements;
};
static void repr_placements(char *buf, size_t size,
@ -230,8 +254,7 @@ static int set_placements(struct drm_i915_gem_create_ext_memory_regions *args,
struct drm_i915_private *i915 = ext_data->i915;
struct drm_i915_gem_memory_class_instance __user *uregions =
u64_to_user_ptr(args->regions);
struct drm_i915_gem_object *obj = ext_data->vanilla_object;
struct intel_memory_region **placements;
struct intel_memory_region *placements[INTEL_REGION_UNKNOWN];
u32 mask;
int i, ret = 0;
@ -245,6 +268,8 @@ static int set_placements(struct drm_i915_gem_create_ext_memory_regions *args,
ret = -EINVAL;
}
BUILD_BUG_ON(ARRAY_SIZE(i915->mm.regions) != ARRAY_SIZE(placements));
BUILD_BUG_ON(ARRAY_SIZE(ext_data->placements) != ARRAY_SIZE(placements));
if (args->num_regions > ARRAY_SIZE(i915->mm.regions)) {
drm_dbg(&i915->drm, "num_regions is too large\n");
ret = -EINVAL;
@ -253,21 +278,13 @@ static int set_placements(struct drm_i915_gem_create_ext_memory_regions *args,
if (ret)
return ret;
placements = kmalloc_array(args->num_regions,
sizeof(struct intel_memory_region *),
GFP_KERNEL);
if (!placements)
return -ENOMEM;
mask = 0;
for (i = 0; i < args->num_regions; i++) {
struct drm_i915_gem_memory_class_instance region;
struct intel_memory_region *mr;
if (copy_from_user(&region, uregions, sizeof(region))) {
ret = -EFAULT;
goto out_free;
}
if (copy_from_user(&region, uregions, sizeof(region)))
return -EFAULT;
mr = intel_memory_region_lookup(i915,
region.memory_class,
@ -293,14 +310,14 @@ static int set_placements(struct drm_i915_gem_create_ext_memory_regions *args,
++uregions;
}
if (obj->mm.placements) {
if (ext_data->n_placements) {
ret = -EINVAL;
goto out_dump;
}
object_set_placements(obj, placements, args->num_regions);
if (args->num_regions == 1)
kfree(placements);
ext_data->n_placements = args->num_regions;
for (i = 0; i < args->num_regions; i++)
ext_data->placements[i] = placements[i];
return 0;
@ -308,11 +325,11 @@ out_dump:
if (1) {
char buf[256];
if (obj->mm.placements) {
if (ext_data->n_placements) {
repr_placements(buf,
sizeof(buf),
obj->mm.placements,
obj->mm.n_placements);
ext_data->placements,
ext_data->n_placements);
drm_dbg(&i915->drm,
"Placements were already set in previous EXT. Existing placements: %s\n",
buf);
@ -322,8 +339,6 @@ out_dump:
drm_dbg(&i915->drm, "New placements(so far validated): %s\n", buf);
}
out_free:
kfree(placements);
return ret;
}
@ -358,44 +373,30 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
struct drm_i915_private *i915 = to_i915(dev);
struct drm_i915_gem_create_ext *args = data;
struct create_ext ext_data = { .i915 = i915 };
struct intel_memory_region **placements_ext;
struct drm_i915_gem_object *obj;
int ret;
if (args->flags)
return -EINVAL;
i915_gem_flush_free_objects(i915);
obj = i915_gem_object_alloc();
if (!obj)
return -ENOMEM;
ext_data.vanilla_object = obj;
ret = i915_user_extensions(u64_to_user_ptr(args->extensions),
create_extensions,
ARRAY_SIZE(create_extensions),
&ext_data);
placements_ext = obj->mm.placements;
if (ret)
goto object_free;
return ret;
if (!placements_ext) {
struct intel_memory_region *mr =
if (!ext_data.n_placements) {
ext_data.placements[0] =
intel_memory_region_by_type(i915, INTEL_MEMORY_SYSTEM);
object_set_placements(obj, &mr, 1);
ext_data.n_placements = 1;
}
ret = i915_gem_setup(obj, args->size);
if (ret)
goto object_free;
obj = __i915_gem_object_create_user(i915, args->size,
ext_data.placements,
ext_data.n_placements);
if (IS_ERR(obj))
return PTR_ERR(obj);
return i915_gem_publish(obj, file, &args->size, &args->handle);
object_free:
if (obj->mm.n_placements > 1)
kfree(placements_ext);
i915_gem_object_free(obj);
return ret;
}

View File

@ -12,6 +12,8 @@
#include "i915_gem_object.h"
#include "i915_scatterlist.h"
I915_SELFTEST_DECLARE(static bool force_different_devices;)
static struct drm_i915_gem_object *dma_buf_to_obj(struct dma_buf *buf)
{
return to_intel_bo(buf->priv);
@ -25,15 +27,11 @@ static struct sg_table *i915_gem_map_dma_buf(struct dma_buf_attachment *attachme
struct scatterlist *src, *dst;
int ret, i;
ret = i915_gem_object_pin_pages_unlocked(obj);
if (ret)
goto err;
/* Copy sg so that we make an independent mapping */
st = kmalloc(sizeof(struct sg_table), GFP_KERNEL);
if (st == NULL) {
ret = -ENOMEM;
goto err_unpin_pages;
goto err;
}
ret = sg_alloc_table(st, obj->mm.pages->nents, GFP_KERNEL);
@ -58,8 +56,6 @@ err_free_sg:
sg_free_table(st);
err_free:
kfree(st);
err_unpin_pages:
i915_gem_object_unpin_pages(obj);
err:
return ERR_PTR(ret);
}
@ -68,13 +64,9 @@ static void i915_gem_unmap_dma_buf(struct dma_buf_attachment *attachment,
struct sg_table *sg,
enum dma_data_direction dir)
{
struct drm_i915_gem_object *obj = dma_buf_to_obj(attachment->dmabuf);
dma_unmap_sgtable(attachment->dev, sg, dir, DMA_ATTR_SKIP_CPU_SYNC);
sg_free_table(sg);
kfree(sg);
i915_gem_object_unpin_pages(obj);
}
static int i915_gem_dmabuf_vmap(struct dma_buf *dma_buf, struct dma_buf_map *map)
@ -168,7 +160,46 @@ retry:
return err;
}
static int i915_gem_dmabuf_attach(struct dma_buf *dmabuf,
struct dma_buf_attachment *attach)
{
struct drm_i915_gem_object *obj = dma_buf_to_obj(dmabuf);
struct i915_gem_ww_ctx ww;
int err;
if (!i915_gem_object_can_migrate(obj, INTEL_REGION_SMEM))
return -EOPNOTSUPP;
for_i915_gem_ww(&ww, err, true) {
err = i915_gem_object_lock(obj, &ww);
if (err)
continue;
err = i915_gem_object_migrate(obj, &ww, INTEL_REGION_SMEM);
if (err)
continue;
err = i915_gem_object_wait_migration(obj, 0);
if (err)
continue;
err = i915_gem_object_pin_pages(obj);
}
return err;
}
static void i915_gem_dmabuf_detach(struct dma_buf *dmabuf,
struct dma_buf_attachment *attach)
{
struct drm_i915_gem_object *obj = dma_buf_to_obj(dmabuf);
i915_gem_object_unpin_pages(obj);
}
static const struct dma_buf_ops i915_dmabuf_ops = {
.attach = i915_gem_dmabuf_attach,
.detach = i915_gem_dmabuf_detach,
.map_dma_buf = i915_gem_map_dma_buf,
.unmap_dma_buf = i915_gem_unmap_dma_buf,
.release = drm_gem_dmabuf_release,
@ -204,6 +235,8 @@ static int i915_gem_object_get_pages_dmabuf(struct drm_i915_gem_object *obj)
struct sg_table *pages;
unsigned int sg_page_sizes;
assert_object_held(obj);
pages = dma_buf_map_attachment(obj->base.import_attach,
DMA_BIDIRECTIONAL);
if (IS_ERR(pages))
@ -241,7 +274,8 @@ struct drm_gem_object *i915_gem_prime_import(struct drm_device *dev,
if (dma_buf->ops == &i915_dmabuf_ops) {
obj = dma_buf_to_obj(dma_buf);
/* is it from our device? */
if (obj->base.dev == dev) {
if (obj->base.dev == dev &&
!I915_SELFTEST_ONLY(force_different_devices)) {
/*
* Importing dmabuf exported from out own gem increases
* refcount on gem itself instead of f_count of dmabuf.

View File

@ -268,6 +268,9 @@ int i915_gem_get_caching_ioctl(struct drm_device *dev, void *data,
struct drm_i915_gem_object *obj;
int err = 0;
if (IS_DGFX(to_i915(dev)))
return -ENODEV;
rcu_read_lock();
obj = i915_gem_object_lookup_rcu(file, args->handle);
if (!obj) {
@ -303,6 +306,9 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
enum i915_cache_level level;
int ret = 0;
if (IS_DGFX(i915))
return -ENODEV;
switch (args->caching) {
case I915_CACHING_NONE:
level = I915_CACHE_NONE;
@ -375,7 +381,7 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
struct i915_vma *vma;
int ret;
/* Frame buffer must be in LMEM (no migration yet) */
/* Frame buffer must be in LMEM */
if (HAS_LMEM(i915) && !i915_gem_object_is_lmem(obj))
return ERR_PTR(-EINVAL);
@ -484,6 +490,9 @@ i915_gem_set_domain_ioctl(struct drm_device *dev, void *data,
u32 write_domain = args->write_domain;
int err;
if (IS_DGFX(to_i915(dev)))
return -ENODEV;
/* Only handle setting domains to types used by the CPU. */
if ((write_domain | read_domains) & I915_GEM_GPU_DOMAINS)
return -EINVAL;

View File

@ -277,18 +277,9 @@ struct i915_execbuffer {
bool has_llc : 1;
bool has_fence : 1;
bool needs_unfenced : 1;
struct i915_request *rq;
u32 *rq_cmd;
unsigned int rq_size;
struct intel_gt_buffer_pool_node *pool;
} reloc_cache;
struct intel_gt_buffer_pool_node *reloc_pool; /** relocation pool for -EDEADLK handling */
struct intel_context *reloc_context;
u64 invalid_flags; /** Set of execobj.flags that are invalid */
u32 context_flags; /** Set of execobj.flags to insert from the ctx */
u64 batch_len; /** Length of batch within object */
u32 batch_start_offset; /** Location within object of batch */
@ -539,9 +530,6 @@ eb_validate_vma(struct i915_execbuffer *eb,
entry->flags |= EXEC_OBJECT_NEEDS_GTT | __EXEC_OBJECT_NEEDS_MAP;
}
if (!(entry->flags & EXEC_OBJECT_PINNED))
entry->flags |= eb->context_flags;
return 0;
}
@ -741,17 +729,13 @@ static int eb_select_context(struct i915_execbuffer *eb)
struct i915_gem_context *ctx;
ctx = i915_gem_context_lookup(eb->file->driver_priv, eb->args->rsvd1);
if (unlikely(!ctx))
return -ENOENT;
if (unlikely(IS_ERR(ctx)))
return PTR_ERR(ctx);
eb->gem_context = ctx;
if (rcu_access_pointer(ctx->vm))
eb->invalid_flags |= EXEC_OBJECT_NEEDS_GTT;
eb->context_flags = 0;
if (test_bit(UCONTEXT_NO_ZEROMAP, &ctx->user_flags))
eb->context_flags |= __EXEC_OBJECT_NEEDS_BIAS;
return 0;
}
@ -920,6 +904,23 @@ err:
return err;
}
static int eb_lock_vmas(struct i915_execbuffer *eb)
{
unsigned int i;
int err;
for (i = 0; i < eb->buffer_count; i++) {
struct eb_vma *ev = &eb->vma[i];
struct i915_vma *vma = ev->vma;
err = i915_gem_object_lock(vma->obj, &eb->ww);
if (err)
return err;
}
return 0;
}
static int eb_validate_vmas(struct i915_execbuffer *eb)
{
unsigned int i;
@ -927,15 +928,15 @@ static int eb_validate_vmas(struct i915_execbuffer *eb)
INIT_LIST_HEAD(&eb->unbound);
err = eb_lock_vmas(eb);
if (err)
return err;
for (i = 0; i < eb->buffer_count; i++) {
struct drm_i915_gem_exec_object2 *entry = &eb->exec[i];
struct eb_vma *ev = &eb->vma[i];
struct i915_vma *vma = ev->vma;
err = i915_gem_object_lock(vma->obj, &eb->ww);
if (err)
return err;
err = eb_pin_vma(eb, entry, ev);
if (err == -EDEADLK)
return err;
@ -992,7 +993,7 @@ eb_get_vma(const struct i915_execbuffer *eb, unsigned long handle)
}
}
static void eb_release_vmas(struct i915_execbuffer *eb, bool final, bool release_userptr)
static void eb_release_vmas(struct i915_execbuffer *eb, bool final)
{
const unsigned int count = eb->buffer_count;
unsigned int i;
@ -1006,11 +1007,6 @@ static void eb_release_vmas(struct i915_execbuffer *eb, bool final, bool release
eb_unreserve_vma(ev);
if (release_userptr && ev->flags & __EXEC_OBJECT_USERPTR_INIT) {
ev->flags &= ~__EXEC_OBJECT_USERPTR_INIT;
i915_gem_object_userptr_submit_fini(vma->obj);
}
if (final)
i915_vma_put(vma);
}
@ -1020,8 +1016,6 @@ static void eb_release_vmas(struct i915_execbuffer *eb, bool final, bool release
static void eb_destroy(const struct i915_execbuffer *eb)
{
GEM_BUG_ON(eb->reloc_cache.rq);
if (eb->lut_size > 0)
kfree(eb->buckets);
}
@ -1033,14 +1027,6 @@ relocation_target(const struct drm_i915_gem_relocation_entry *reloc,
return gen8_canonical_addr((int)reloc->delta + target->node.start);
}
static void reloc_cache_clear(struct reloc_cache *cache)
{
cache->rq = NULL;
cache->rq_cmd = NULL;
cache->pool = NULL;
cache->rq_size = 0;
}
static void reloc_cache_init(struct reloc_cache *cache,
struct drm_i915_private *i915)
{
@ -1053,7 +1039,6 @@ static void reloc_cache_init(struct reloc_cache *cache,
cache->has_fence = cache->graphics_ver < 4;
cache->needs_unfenced = INTEL_INFO(i915)->unfenced_needs_alignment;
cache->node.flags = 0;
reloc_cache_clear(cache);
}
static inline void *unmask_page(unsigned long p)
@ -1075,48 +1060,10 @@ static inline struct i915_ggtt *cache_to_ggtt(struct reloc_cache *cache)
return &i915->ggtt;
}
static void reloc_cache_put_pool(struct i915_execbuffer *eb, struct reloc_cache *cache)
{
if (!cache->pool)
return;
/*
* This is a bit nasty, normally we keep objects locked until the end
* of execbuffer, but we already submit this, and have to unlock before
* dropping the reference. Fortunately we can only hold 1 pool node at
* a time, so this should be harmless.
*/
i915_gem_ww_unlock_single(cache->pool->obj);
intel_gt_buffer_pool_put(cache->pool);
cache->pool = NULL;
}
static void reloc_gpu_flush(struct i915_execbuffer *eb, struct reloc_cache *cache)
{
struct drm_i915_gem_object *obj = cache->rq->batch->obj;
GEM_BUG_ON(cache->rq_size >= obj->base.size / sizeof(u32));
cache->rq_cmd[cache->rq_size] = MI_BATCH_BUFFER_END;
i915_gem_object_flush_map(obj);
i915_gem_object_unpin_map(obj);
intel_gt_chipset_flush(cache->rq->engine->gt);
i915_request_add(cache->rq);
reloc_cache_put_pool(eb, cache);
reloc_cache_clear(cache);
eb->reloc_pool = NULL;
}
static void reloc_cache_reset(struct reloc_cache *cache, struct i915_execbuffer *eb)
{
void *vaddr;
if (cache->rq)
reloc_gpu_flush(eb, cache);
if (!cache->vaddr)
return;
@ -1298,295 +1245,6 @@ static void clflush_write32(u32 *addr, u32 value, unsigned int flushes)
*addr = value;
}
static int reloc_move_to_gpu(struct i915_request *rq, struct i915_vma *vma)
{
struct drm_i915_gem_object *obj = vma->obj;
int err;
assert_vma_held(vma);
if (obj->cache_dirty & ~obj->cache_coherent)
i915_gem_clflush_object(obj, 0);
obj->write_domain = 0;
err = i915_request_await_object(rq, vma->obj, true);
if (err == 0)
err = i915_vma_move_to_active(vma, rq, EXEC_OBJECT_WRITE);
return err;
}
static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
struct intel_engine_cs *engine,
struct i915_vma *vma,
unsigned int len)
{
struct reloc_cache *cache = &eb->reloc_cache;
struct intel_gt_buffer_pool_node *pool = eb->reloc_pool;
struct i915_request *rq;
struct i915_vma *batch;
u32 *cmd;
int err;
if (!pool) {
pool = intel_gt_get_buffer_pool(engine->gt, PAGE_SIZE,
cache->has_llc ?
I915_MAP_WB :
I915_MAP_WC);
if (IS_ERR(pool))
return PTR_ERR(pool);
}
eb->reloc_pool = NULL;
err = i915_gem_object_lock(pool->obj, &eb->ww);
if (err)
goto err_pool;
cmd = i915_gem_object_pin_map(pool->obj, pool->type);
if (IS_ERR(cmd)) {
err = PTR_ERR(cmd);
goto err_pool;
}
intel_gt_buffer_pool_mark_used(pool);
memset32(cmd, 0, pool->obj->base.size / sizeof(u32));
batch = i915_vma_instance(pool->obj, vma->vm, NULL);
if (IS_ERR(batch)) {
err = PTR_ERR(batch);
goto err_unmap;
}
err = i915_vma_pin_ww(batch, &eb->ww, 0, 0, PIN_USER | PIN_NONBLOCK);
if (err)
goto err_unmap;
if (engine == eb->context->engine) {
rq = i915_request_create(eb->context);
} else {
struct intel_context *ce = eb->reloc_context;
if (!ce) {
ce = intel_context_create(engine);
if (IS_ERR(ce)) {
err = PTR_ERR(ce);
goto err_unpin;
}
i915_vm_put(ce->vm);
ce->vm = i915_vm_get(eb->context->vm);
eb->reloc_context = ce;
}
err = intel_context_pin_ww(ce, &eb->ww);
if (err)
goto err_unpin;
rq = i915_request_create(ce);
intel_context_unpin(ce);
}
if (IS_ERR(rq)) {
err = PTR_ERR(rq);
goto err_unpin;
}
err = intel_gt_buffer_pool_mark_active(pool, rq);
if (err)
goto err_request;
err = reloc_move_to_gpu(rq, vma);
if (err)
goto err_request;
err = eb->engine->emit_bb_start(rq,
batch->node.start, PAGE_SIZE,
cache->graphics_ver > 5 ? 0 : I915_DISPATCH_SECURE);
if (err)
goto skip_request;
assert_vma_held(batch);
err = i915_request_await_object(rq, batch->obj, false);
if (err == 0)
err = i915_vma_move_to_active(batch, rq, 0);
if (err)
goto skip_request;
rq->batch = batch;
i915_vma_unpin(batch);
cache->rq = rq;
cache->rq_cmd = cmd;
cache->rq_size = 0;
cache->pool = pool;
/* Return with batch mapping (cmd) still pinned */
return 0;
skip_request:
i915_request_set_error_once(rq, err);
err_request:
i915_request_add(rq);
err_unpin:
i915_vma_unpin(batch);
err_unmap:
i915_gem_object_unpin_map(pool->obj);
err_pool:
eb->reloc_pool = pool;
return err;
}
static bool reloc_can_use_engine(const struct intel_engine_cs *engine)
{
return engine->class != VIDEO_DECODE_CLASS || GRAPHICS_VER(engine->i915) != 6;
}
static u32 *reloc_gpu(struct i915_execbuffer *eb,
struct i915_vma *vma,
unsigned int len)
{
struct reloc_cache *cache = &eb->reloc_cache;
u32 *cmd;
if (cache->rq_size > PAGE_SIZE/sizeof(u32) - (len + 1))
reloc_gpu_flush(eb, cache);
if (unlikely(!cache->rq)) {
int err;
struct intel_engine_cs *engine = eb->engine;
/* If we need to copy for the cmdparser, we will stall anyway */
if (eb_use_cmdparser(eb))
return ERR_PTR(-EWOULDBLOCK);
if (!reloc_can_use_engine(engine)) {
engine = engine->gt->engine_class[COPY_ENGINE_CLASS][0];
if (!engine)
return ERR_PTR(-ENODEV);
}
err = __reloc_gpu_alloc(eb, engine, vma, len);
if (unlikely(err))
return ERR_PTR(err);
}
cmd = cache->rq_cmd + cache->rq_size;
cache->rq_size += len;
return cmd;
}
static inline bool use_reloc_gpu(struct i915_vma *vma)
{
if (DBG_FORCE_RELOC == FORCE_GPU_RELOC)
return true;
if (DBG_FORCE_RELOC)
return false;
return !dma_resv_test_signaled(vma->resv, true);
}
static unsigned long vma_phys_addr(struct i915_vma *vma, u32 offset)
{
struct page *page;
unsigned long addr;
GEM_BUG_ON(vma->pages != vma->obj->mm.pages);
page = i915_gem_object_get_page(vma->obj, offset >> PAGE_SHIFT);
addr = PFN_PHYS(page_to_pfn(page));
GEM_BUG_ON(overflows_type(addr, u32)); /* expected dma32 */
return addr + offset_in_page(offset);
}
static int __reloc_entry_gpu(struct i915_execbuffer *eb,
struct i915_vma *vma,
u64 offset,
u64 target_addr)
{
const unsigned int ver = eb->reloc_cache.graphics_ver;
unsigned int len;
u32 *batch;
u64 addr;
if (ver >= 8)
len = offset & 7 ? 8 : 5;
else if (ver >= 4)
len = 4;
else
len = 3;
batch = reloc_gpu(eb, vma, len);
if (batch == ERR_PTR(-EDEADLK))
return -EDEADLK;
else if (IS_ERR(batch))
return false;
addr = gen8_canonical_addr(vma->node.start + offset);
if (ver >= 8) {
if (offset & 7) {
*batch++ = MI_STORE_DWORD_IMM_GEN4;
*batch++ = lower_32_bits(addr);
*batch++ = upper_32_bits(addr);
*batch++ = lower_32_bits(target_addr);
addr = gen8_canonical_addr(addr + 4);
*batch++ = MI_STORE_DWORD_IMM_GEN4;
*batch++ = lower_32_bits(addr);
*batch++ = upper_32_bits(addr);
*batch++ = upper_32_bits(target_addr);
} else {
*batch++ = (MI_STORE_DWORD_IMM_GEN4 | (1 << 21)) + 1;
*batch++ = lower_32_bits(addr);
*batch++ = upper_32_bits(addr);
*batch++ = lower_32_bits(target_addr);
*batch++ = upper_32_bits(target_addr);
}
} else if (ver >= 6) {
*batch++ = MI_STORE_DWORD_IMM_GEN4;
*batch++ = 0;
*batch++ = addr;
*batch++ = target_addr;
} else if (IS_I965G(eb->i915)) {
*batch++ = MI_STORE_DWORD_IMM_GEN4;
*batch++ = 0;
*batch++ = vma_phys_addr(vma, offset);
*batch++ = target_addr;
} else if (ver >= 4) {
*batch++ = MI_STORE_DWORD_IMM_GEN4 | MI_USE_GGTT;
*batch++ = 0;
*batch++ = addr;
*batch++ = target_addr;
} else if (ver >= 3 &&
!(IS_I915G(eb->i915) || IS_I915GM(eb->i915))) {
*batch++ = MI_STORE_DWORD_IMM | MI_MEM_VIRTUAL;
*batch++ = addr;
*batch++ = target_addr;
} else {
*batch++ = MI_STORE_DWORD_IMM;
*batch++ = vma_phys_addr(vma, offset);
*batch++ = target_addr;
}
return true;
}
static int reloc_entry_gpu(struct i915_execbuffer *eb,
struct i915_vma *vma,
u64 offset,
u64 target_addr)
{
if (eb->reloc_cache.vaddr)
return false;
if (!use_reloc_gpu(vma))
return false;
return __reloc_entry_gpu(eb, vma, offset, target_addr);
}
static u64
relocate_entry(struct i915_vma *vma,
const struct drm_i915_gem_relocation_entry *reloc,
@ -1595,32 +1253,25 @@ relocate_entry(struct i915_vma *vma,
{
u64 target_addr = relocation_target(reloc, target);
u64 offset = reloc->offset;
int reloc_gpu = reloc_entry_gpu(eb, vma, offset, target_addr);
if (reloc_gpu < 0)
return reloc_gpu;
if (!reloc_gpu) {
bool wide = eb->reloc_cache.use_64bit_reloc;
void *vaddr;
bool wide = eb->reloc_cache.use_64bit_reloc;
void *vaddr;
repeat:
vaddr = reloc_vaddr(vma->obj, eb,
offset >> PAGE_SHIFT);
if (IS_ERR(vaddr))
return PTR_ERR(vaddr);
vaddr = reloc_vaddr(vma->obj, eb,
offset >> PAGE_SHIFT);
if (IS_ERR(vaddr))
return PTR_ERR(vaddr);
GEM_BUG_ON(!IS_ALIGNED(offset, sizeof(u32)));
clflush_write32(vaddr + offset_in_page(offset),
lower_32_bits(target_addr),
eb->reloc_cache.vaddr);
GEM_BUG_ON(!IS_ALIGNED(offset, sizeof(u32)));
clflush_write32(vaddr + offset_in_page(offset),
lower_32_bits(target_addr),
eb->reloc_cache.vaddr);
if (wide) {
offset += sizeof(u32);
target_addr >>= 32;
wide = false;
goto repeat;
}
if (wide) {
offset += sizeof(u32);
target_addr >>= 32;
wide = false;
goto repeat;
}
return target->node.start | UPDATE;
@ -1992,7 +1643,7 @@ repeat:
}
/* We may process another execbuffer during the unlock... */
eb_release_vmas(eb, false, true);
eb_release_vmas(eb, false);
i915_gem_ww_ctx_fini(&eb->ww);
if (rq) {
@ -2061,9 +1712,7 @@ repeat_validate:
list_for_each_entry(ev, &eb->relocs, reloc_link) {
if (!have_copy) {
pagefault_disable();
err = eb_relocate_vma(eb, ev);
pagefault_enable();
if (err)
break;
} else {
@ -2096,7 +1745,7 @@ repeat_validate:
err:
if (err == -EDEADLK) {
eb_release_vmas(eb, false, false);
eb_release_vmas(eb, false);
err = i915_gem_ww_ctx_backoff(&eb->ww);
if (!err)
goto repeat_validate;
@ -2193,7 +1842,7 @@ retry:
err:
if (err == -EDEADLK) {
eb_release_vmas(eb, false, false);
eb_release_vmas(eb, false);
err = i915_gem_ww_ctx_backoff(&eb->ww);
if (!err)
goto retry;
@ -2270,7 +1919,7 @@ static int eb_move_to_gpu(struct i915_execbuffer *eb)
#ifdef CONFIG_MMU_NOTIFIER
if (!err && (eb->args->flags & __EXEC_USERPTR_USED)) {
spin_lock(&eb->i915->mm.notifier_lock);
read_lock(&eb->i915->mm.notifier_lock);
/*
* count is always at least 1, otherwise __EXEC_USERPTR_USED
@ -2288,7 +1937,7 @@ static int eb_move_to_gpu(struct i915_execbuffer *eb)
break;
}
spin_unlock(&eb->i915->mm.notifier_lock);
read_unlock(&eb->i915->mm.notifier_lock);
}
#endif
@ -3156,8 +2805,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
eb.exec = exec;
eb.vma = (struct eb_vma *)(exec + args->buffer_count + 1);
eb.vma[0].vma = NULL;
eb.reloc_pool = eb.batch_pool = NULL;
eb.reloc_context = NULL;
eb.batch_pool = NULL;
eb.invalid_flags = __EXEC_OBJECT_UNKNOWN_FLAGS;
reloc_cache_init(&eb.reloc_cache, eb.i915);
@ -3232,7 +2880,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
err = eb_lookup_vmas(&eb);
if (err) {
eb_release_vmas(&eb, true, true);
eb_release_vmas(&eb, true);
goto err_engine;
}
@ -3255,9 +2903,6 @@ i915_gem_do_execbuffer(struct drm_device *dev,
batch = eb.batch->vma;
/* All GPU relocation batches must be submitted prior to the user rq */
GEM_BUG_ON(eb.reloc_cache.rq);
/* Allocate a request for this batch buffer nice and early. */
eb.request = i915_request_create(eb.context);
if (IS_ERR(eb.request)) {
@ -3265,11 +2910,20 @@ i915_gem_do_execbuffer(struct drm_device *dev,
goto err_vma;
}
if (unlikely(eb.gem_context->syncobj)) {
struct dma_fence *fence;
fence = drm_syncobj_fence_get(eb.gem_context->syncobj);
err = i915_request_await_dma_fence(eb.request, fence);
dma_fence_put(fence);
if (err)
goto err_ext;
}
if (in_fence) {
if (args->flags & I915_EXEC_FENCE_SUBMIT)
err = i915_request_await_execution(eb.request,
in_fence,
eb.engine->bond_execute);
in_fence);
else
err = i915_request_await_dma_fence(eb.request,
in_fence);
@ -3322,10 +2976,16 @@ err_request:
fput(out_fence->file);
}
}
if (unlikely(eb.gem_context->syncobj)) {
drm_syncobj_replace_fence(eb.gem_context->syncobj,
&eb.request->fence);
}
i915_request_put(eb.request);
err_vma:
eb_release_vmas(&eb, true, true);
eb_release_vmas(&eb, true);
if (eb.trampoline)
i915_vma_unpin(eb.trampoline);
WARN_ON(err == -EDEADLK);
@ -3333,10 +2993,6 @@ err_vma:
if (eb.batch_pool)
intel_gt_buffer_pool_put(eb.batch_pool);
if (eb.reloc_pool)
intel_gt_buffer_pool_put(eb.reloc_pool);
if (eb.reloc_context)
intel_context_put(eb.reloc_context);
err_engine:
eb_put_engine(&eb);
err_context:
@ -3450,7 +3106,3 @@ end:;
kvfree(exec2_list);
return err;
}
#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
#include "selftests/i915_gem_execbuffer.c"
#endif

View File

@ -177,8 +177,8 @@ i915_gem_object_create_internal(struct drm_i915_private *i915,
return ERR_PTR(-ENOMEM);
drm_gem_private_object_init(&i915->drm, &obj->base, size);
i915_gem_object_init(obj, &i915_gem_object_internal_ops, &lock_class,
I915_BO_ALLOC_STRUCT_PAGE);
i915_gem_object_init(obj, &i915_gem_object_internal_ops, &lock_class, 0);
obj->mem_flags |= I915_BO_FLAG_STRUCT_PAGE;
/*
* Mark the object as volatile, such that the pages are marked as

View File

@ -23,27 +23,6 @@ i915_gem_object_lmem_io_map(struct drm_i915_gem_object *obj,
return io_mapping_map_wc(&obj->mm.region->iomap, offset, size);
}
/**
* i915_gem_object_validates_to_lmem - Whether the object is resident in
* lmem when pages are present.
* @obj: The object to check.
*
* Migratable objects residency may change from under us if the object is
* not pinned or locked. This function is intended to be used to check whether
* the object can only reside in lmem when pages are present.
*
* Return: Whether the object is always resident in lmem when pages are
* present.
*/
bool i915_gem_object_validates_to_lmem(struct drm_i915_gem_object *obj)
{
struct intel_memory_region *mr = READ_ONCE(obj->mm.region);
return !i915_gem_object_migratable(obj) &&
mr && (mr->type == INTEL_MEMORY_LOCAL ||
mr->type == INTEL_MEMORY_STOLEN_LOCAL);
}
/**
* i915_gem_object_is_lmem - Whether the object is resident in
* lmem
@ -71,11 +50,64 @@ bool i915_gem_object_is_lmem(struct drm_i915_gem_object *obj)
mr->type == INTEL_MEMORY_STOLEN_LOCAL);
}
/**
* __i915_gem_object_is_lmem - Whether the object is resident in
* lmem while in the fence signaling critical path.
* @obj: The object to check.
*
* This function is intended to be called from within the fence signaling
* path where the fence keeps the object from being migrated. For example
* during gpu reset or similar.
*
* Return: Whether the object is resident in lmem.
*/
bool __i915_gem_object_is_lmem(struct drm_i915_gem_object *obj)
{
struct intel_memory_region *mr = READ_ONCE(obj->mm.region);
#ifdef CONFIG_LOCKDEP
GEM_WARN_ON(dma_resv_test_signaled(obj->base.resv, true));
#endif
return mr && (mr->type == INTEL_MEMORY_LOCAL ||
mr->type == INTEL_MEMORY_STOLEN_LOCAL);
}
/**
* __i915_gem_object_create_lmem_with_ps - Create lmem object and force the
* minimum page size for the backing pages.
* @i915: The i915 instance.
* @size: The size in bytes for the object. Note that we need to round the size
* up depending on the @page_size. The final object size can be fished out from
* the drm GEM object.
* @page_size: The requested minimum page size in bytes for this object. This is
* useful if we need something bigger than the regions min_page_size due to some
* hw restriction, or in some very specialised cases where it needs to be
* smaller, where the internal fragmentation cost is too great when rounding up
* the object size.
* @flags: The optional BO allocation flags.
*
* Note that this interface assumes you know what you are doing when forcing the
* @page_size. If this is smaller than the regions min_page_size then it can
* never be inserted into any GTT, otherwise it might lead to undefined
* behaviour.
*
* Return: The object pointer, which might be an ERR_PTR in the case of failure.
*/
struct drm_i915_gem_object *
__i915_gem_object_create_lmem_with_ps(struct drm_i915_private *i915,
resource_size_t size,
resource_size_t page_size,
unsigned int flags)
{
return i915_gem_object_create_region(i915->mm.regions[INTEL_REGION_LMEM],
size, page_size, flags);
}
struct drm_i915_gem_object *
i915_gem_object_create_lmem(struct drm_i915_private *i915,
resource_size_t size,
unsigned int flags)
{
return i915_gem_object_create_region(i915->mm.regions[INTEL_REGION_LMEM],
size, flags);
size, 0, flags);
}

View File

@ -21,6 +21,13 @@ i915_gem_object_lmem_io_map(struct drm_i915_gem_object *obj,
bool i915_gem_object_is_lmem(struct drm_i915_gem_object *obj);
bool __i915_gem_object_is_lmem(struct drm_i915_gem_object *obj);
struct drm_i915_gem_object *
__i915_gem_object_create_lmem_with_ps(struct drm_i915_private *i915,
resource_size_t size,
resource_size_t page_size,
unsigned int flags);
struct drm_i915_gem_object *
i915_gem_object_create_lmem(struct drm_i915_private *i915,
resource_size_t size,

View File

@ -645,7 +645,8 @@ mmap_offset_attach(struct drm_i915_gem_object *obj,
goto insert;
/* Attempt to reap some mmap space from dead objects */
err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT);
err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT,
NULL);
if (err)
goto err;
@ -679,13 +680,19 @@ __assign_mmap_offset(struct drm_i915_gem_object *obj,
return -ENODEV;
if (obj->ops->mmap_offset) {
if (mmap_type != I915_MMAP_TYPE_FIXED)
return -ENODEV;
*offset = obj->ops->mmap_offset(obj);
return 0;
}
if (mmap_type == I915_MMAP_TYPE_FIXED)
return -ENODEV;
if (mmap_type != I915_MMAP_TYPE_GTT &&
!i915_gem_object_has_struct_page(obj) &&
!i915_gem_object_type_has(obj, I915_GEM_OBJECT_HAS_IOMEM))
!i915_gem_object_has_iomem(obj))
return -ENODEV;
mmo = mmap_offset_attach(obj, mmap_type, file);
@ -709,7 +716,12 @@ __assign_mmap_offset_handle(struct drm_file *file,
if (!obj)
return -ENOENT;
err = i915_gem_object_lock_interruptible(obj, NULL);
if (err)
goto out_put;
err = __assign_mmap_offset(obj, mmap_type, offset, file);
i915_gem_object_unlock(obj);
out_put:
i915_gem_object_put(obj);
return err;
}
@ -722,7 +734,9 @@ i915_gem_dumb_mmap_offset(struct drm_file *file,
{
enum i915_mmap_type mmap_type;
if (boot_cpu_has(X86_FEATURE_PAT))
if (HAS_LMEM(to_i915(dev)))
mmap_type = I915_MMAP_TYPE_FIXED;
else if (boot_cpu_has(X86_FEATURE_PAT))
mmap_type = I915_MMAP_TYPE_WC;
else if (!i915_ggtt_has_aperture(&to_i915(dev)->ggtt))
return -ENODEV;
@ -793,6 +807,10 @@ i915_gem_mmap_offset_ioctl(struct drm_device *dev, void *data,
type = I915_MMAP_TYPE_UC;
break;
case I915_MMAP_OFFSET_FIXED:
type = I915_MMAP_TYPE_FIXED;
break;
default:
return -EINVAL;
}
@ -933,10 +951,7 @@ int i915_gem_mmap(struct file *filp, struct vm_area_struct *vma)
return PTR_ERR(anon);
}
vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP;
if (i915_gem_object_has_iomem(obj))
vma->vm_flags |= VM_IO;
vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_IO;
/*
* We keep the ref on mmo->obj, not vm_file, but we require
@ -966,6 +981,9 @@ int i915_gem_mmap(struct file *filp, struct vm_area_struct *vma)
vma->vm_ops = &vm_ops_cpu;
break;
case I915_MMAP_TYPE_FIXED:
GEM_WARN_ON(1);
fallthrough;
case I915_MMAP_TYPE_WB:
vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
vma->vm_ops = &vm_ops_cpu;

View File

@ -30,14 +30,10 @@
#include "i915_gem_context.h"
#include "i915_gem_mman.h"
#include "i915_gem_object.h"
#include "i915_globals.h"
#include "i915_memcpy.h"
#include "i915_trace.h"
static struct i915_global_object {
struct i915_global base;
struct kmem_cache *slab_objects;
} global;
static struct kmem_cache *slab_objects;
static const struct drm_gem_object_funcs i915_gem_object_funcs;
@ -45,7 +41,7 @@ struct drm_i915_gem_object *i915_gem_object_alloc(void)
{
struct drm_i915_gem_object *obj;
obj = kmem_cache_zalloc(global.slab_objects, GFP_KERNEL);
obj = kmem_cache_zalloc(slab_objects, GFP_KERNEL);
if (!obj)
return NULL;
obj->base.funcs = &i915_gem_object_funcs;
@ -55,7 +51,7 @@ struct drm_i915_gem_object *i915_gem_object_alloc(void)
void i915_gem_object_free(struct drm_i915_gem_object *obj)
{
return kmem_cache_free(global.slab_objects, obj);
return kmem_cache_free(slab_objects, obj);
}
void i915_gem_object_init(struct drm_i915_gem_object *obj,
@ -475,34 +471,200 @@ bool i915_gem_object_migratable(struct drm_i915_gem_object *obj)
return obj->mm.n_placements > 1;
}
/**
* i915_gem_object_has_struct_page - Whether the object is page-backed
* @obj: The object to query.
*
* This function should only be called while the object is locked or pinned,
* otherwise the page backing may change under the caller.
*
* Return: True if page-backed, false otherwise.
*/
bool i915_gem_object_has_struct_page(const struct drm_i915_gem_object *obj)
{
#ifdef CONFIG_LOCKDEP
if (IS_DGFX(to_i915(obj->base.dev)) &&
i915_gem_object_evictable((void __force *)obj))
assert_object_held_shared(obj);
#endif
return obj->mem_flags & I915_BO_FLAG_STRUCT_PAGE;
}
/**
* i915_gem_object_has_iomem - Whether the object is iomem-backed
* @obj: The object to query.
*
* This function should only be called while the object is locked or pinned,
* otherwise the iomem backing may change under the caller.
*
* Return: True if iomem-backed, false otherwise.
*/
bool i915_gem_object_has_iomem(const struct drm_i915_gem_object *obj)
{
#ifdef CONFIG_LOCKDEP
if (IS_DGFX(to_i915(obj->base.dev)) &&
i915_gem_object_evictable((void __force *)obj))
assert_object_held_shared(obj);
#endif
return obj->mem_flags & I915_BO_FLAG_IOMEM;
}
/**
* i915_gem_object_can_migrate - Whether an object likely can be migrated
*
* @obj: The object to migrate
* @id: The region intended to migrate to
*
* Check whether the object backend supports migration to the
* given region. Note that pinning may affect the ability to migrate as
* returned by this function.
*
* This function is primarily intended as a helper for checking the
* possibility to migrate objects and might be slightly less permissive
* than i915_gem_object_migrate() when it comes to objects with the
* I915_BO_ALLOC_USER flag set.
*
* Return: true if migration is possible, false otherwise.
*/
bool i915_gem_object_can_migrate(struct drm_i915_gem_object *obj,
enum intel_region_id id)
{
struct drm_i915_private *i915 = to_i915(obj->base.dev);
unsigned int num_allowed = obj->mm.n_placements;
struct intel_memory_region *mr;
unsigned int i;
GEM_BUG_ON(id >= INTEL_REGION_UNKNOWN);
GEM_BUG_ON(obj->mm.madv != I915_MADV_WILLNEED);
mr = i915->mm.regions[id];
if (!mr)
return false;
if (obj->mm.region == mr)
return true;
if (!i915_gem_object_evictable(obj))
return false;
if (!obj->ops->migrate)
return false;
if (!(obj->flags & I915_BO_ALLOC_USER))
return true;
if (num_allowed == 0)
return false;
for (i = 0; i < num_allowed; ++i) {
if (mr == obj->mm.placements[i])
return true;
}
return false;
}
/**
* i915_gem_object_migrate - Migrate an object to the desired region id
* @obj: The object to migrate.
* @ww: An optional struct i915_gem_ww_ctx. If NULL, the backend may
* not be successful in evicting other objects to make room for this object.
* @id: The region id to migrate to.
*
* Attempt to migrate the object to the desired memory region. The
* object backend must support migration and the object may not be
* pinned, (explicitly pinned pages or pinned vmas). The object must
* be locked.
* On successful completion, the object will have pages pointing to
* memory in the new region, but an async migration task may not have
* completed yet, and to accomplish that, i915_gem_object_wait_migration()
* must be called.
*
* Note: the @ww parameter is not used yet, but included to make sure
* callers put some effort into obtaining a valid ww ctx if one is
* available.
*
* Return: 0 on success. Negative error code on failure. In particular may
* return -ENXIO on lack of region space, -EDEADLK for deadlock avoidance
* if @ww is set, -EINTR or -ERESTARTSYS if signal pending, and
* -EBUSY if the object is pinned.
*/
int i915_gem_object_migrate(struct drm_i915_gem_object *obj,
struct i915_gem_ww_ctx *ww,
enum intel_region_id id)
{
struct drm_i915_private *i915 = to_i915(obj->base.dev);
struct intel_memory_region *mr;
GEM_BUG_ON(id >= INTEL_REGION_UNKNOWN);
GEM_BUG_ON(obj->mm.madv != I915_MADV_WILLNEED);
assert_object_held(obj);
mr = i915->mm.regions[id];
GEM_BUG_ON(!mr);
if (!i915_gem_object_can_migrate(obj, id))
return -EINVAL;
if (!obj->ops->migrate) {
if (GEM_WARN_ON(obj->mm.region != mr))
return -EINVAL;
return 0;
}
return obj->ops->migrate(obj, mr);
}
/**
* i915_gem_object_placement_possible - Check whether the object can be
* placed at certain memory type
* @obj: Pointer to the object
* @type: The memory type to check
*
* Return: True if the object can be placed in @type. False otherwise.
*/
bool i915_gem_object_placement_possible(struct drm_i915_gem_object *obj,
enum intel_memory_type type)
{
unsigned int i;
if (!obj->mm.n_placements) {
switch (type) {
case INTEL_MEMORY_LOCAL:
return i915_gem_object_has_iomem(obj);
case INTEL_MEMORY_SYSTEM:
return i915_gem_object_has_pages(obj);
default:
/* Ignore stolen for now */
GEM_BUG_ON(1);
return false;
}
}
for (i = 0; i < obj->mm.n_placements; i++) {
if (obj->mm.placements[i]->type == type)
return true;
}
return false;
}
void i915_gem_init__objects(struct drm_i915_private *i915)
{
INIT_WORK(&i915->mm.free_work, __i915_gem_free_work);
}
static void i915_global_objects_shrink(void)
void i915_objects_module_exit(void)
{
kmem_cache_shrink(global.slab_objects);
kmem_cache_destroy(slab_objects);
}
static void i915_global_objects_exit(void)
int __init i915_objects_module_init(void)
{
kmem_cache_destroy(global.slab_objects);
}
static struct i915_global_object global = { {
.shrink = i915_global_objects_shrink,
.exit = i915_global_objects_exit,
} };
int __init i915_global_objects_init(void)
{
global.slab_objects =
KMEM_CACHE(drm_i915_gem_object, SLAB_HWCACHE_ALIGN);
if (!global.slab_objects)
slab_objects = KMEM_CACHE(drm_i915_gem_object, SLAB_HWCACHE_ALIGN);
if (!slab_objects)
return -ENOMEM;
i915_global_register(&global.base);
return 0;
}
@ -515,6 +677,7 @@ static const struct drm_gem_object_funcs i915_gem_object_funcs = {
#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
#include "selftests/huge_gem_object.c"
#include "selftests/huge_pages.c"
#include "selftests/i915_gem_migrate.c"
#include "selftests/i915_gem_object.c"
#include "selftests/i915_gem_coherency.c"
#endif

View File

@ -12,10 +12,14 @@
#include <drm/drm_device.h>
#include "display/intel_frontbuffer.h"
#include "intel_memory_region.h"
#include "i915_gem_object_types.h"
#include "i915_gem_gtt.h"
#include "i915_gem_ww.h"
#include "i915_vma_types.h"
enum intel_region_id;
/*
* XXX: There is a prevalence of the assumption that we fit the
* object's page count inside a 32bit _signed_ variable. Let's document
@ -44,6 +48,9 @@ static inline bool i915_gem_object_size_2big(u64 size)
void i915_gem_init__objects(struct drm_i915_private *i915);
void i915_objects_module_exit(void);
int i915_objects_module_init(void);
struct drm_i915_gem_object *i915_gem_object_alloc(void);
void i915_gem_object_free(struct drm_i915_gem_object *obj);
@ -57,6 +64,10 @@ i915_gem_object_create_shmem(struct drm_i915_private *i915,
struct drm_i915_gem_object *
i915_gem_object_create_shmem_from_data(struct drm_i915_private *i915,
const void *data, resource_size_t size);
struct drm_i915_gem_object *
__i915_gem_object_create_user(struct drm_i915_private *i915, u64 size,
struct intel_memory_region **placements,
unsigned int n_placements);
extern const struct drm_i915_gem_object_ops i915_gem_shmem_ops;
@ -147,7 +158,7 @@ i915_gem_object_put(struct drm_i915_gem_object *obj)
/*
* If more than one potential simultaneous locker, assert held.
*/
static inline void assert_object_held_shared(struct drm_i915_gem_object *obj)
static inline void assert_object_held_shared(const struct drm_i915_gem_object *obj)
{
/*
* Note mm list lookup is protected by
@ -169,13 +180,17 @@ static inline int __i915_gem_object_lock(struct drm_i915_gem_object *obj,
else
ret = dma_resv_lock(obj->base.resv, ww ? &ww->ctx : NULL);
if (!ret && ww)
if (!ret && ww) {
i915_gem_object_get(obj);
list_add_tail(&obj->obj_link, &ww->obj_list);
}
if (ret == -EALREADY)
ret = 0;
if (ret == -EDEADLK)
if (ret == -EDEADLK) {
i915_gem_object_get(obj);
ww->contended = obj;
}
return ret;
}
@ -261,17 +276,9 @@ i915_gem_object_type_has(const struct drm_i915_gem_object *obj,
return obj->ops->flags & flags;
}
static inline bool
i915_gem_object_has_struct_page(const struct drm_i915_gem_object *obj)
{
return obj->flags & I915_BO_ALLOC_STRUCT_PAGE;
}
bool i915_gem_object_has_struct_page(const struct drm_i915_gem_object *obj);
static inline bool
i915_gem_object_has_iomem(const struct drm_i915_gem_object *obj)
{
return i915_gem_object_type_has(obj, I915_GEM_OBJECT_HAS_IOMEM);
}
bool i915_gem_object_has_iomem(const struct drm_i915_gem_object *obj);
static inline bool
i915_gem_object_is_shrinkable(const struct drm_i915_gem_object *obj)
@ -342,22 +349,22 @@ struct scatterlist *
__i915_gem_object_get_sg(struct drm_i915_gem_object *obj,
struct i915_gem_object_page_iter *iter,
unsigned int n,
unsigned int *offset, bool allow_alloc, bool dma);
unsigned int *offset, bool dma);
static inline struct scatterlist *
i915_gem_object_get_sg(struct drm_i915_gem_object *obj,
unsigned int n,
unsigned int *offset, bool allow_alloc)
unsigned int *offset)
{
return __i915_gem_object_get_sg(obj, &obj->mm.get_page, n, offset, allow_alloc, false);
return __i915_gem_object_get_sg(obj, &obj->mm.get_page, n, offset, false);
}
static inline struct scatterlist *
i915_gem_object_get_sg_dma(struct drm_i915_gem_object *obj,
unsigned int n,
unsigned int *offset, bool allow_alloc)
unsigned int *offset)
{
return __i915_gem_object_get_sg(obj, &obj->mm.get_dma_page, n, offset, allow_alloc, true);
return __i915_gem_object_get_sg(obj, &obj->mm.get_dma_page, n, offset, true);
}
struct page *
@ -598,7 +605,18 @@ bool i915_gem_object_evictable(struct drm_i915_gem_object *obj);
bool i915_gem_object_migratable(struct drm_i915_gem_object *obj);
bool i915_gem_object_validates_to_lmem(struct drm_i915_gem_object *obj);
int i915_gem_object_migrate(struct drm_i915_gem_object *obj,
struct i915_gem_ww_ctx *ww,
enum intel_region_id id);
bool i915_gem_object_can_migrate(struct drm_i915_gem_object *obj,
enum intel_region_id id);
int i915_gem_object_wait_migration(struct drm_i915_gem_object *obj,
unsigned int flags);
bool i915_gem_object_placement_possible(struct drm_i915_gem_object *obj,
enum intel_memory_type type);
#ifdef CONFIG_MMU_NOTIFIER
static inline bool
@ -609,14 +627,12 @@ i915_gem_object_is_userptr(struct drm_i915_gem_object *obj)
int i915_gem_object_userptr_submit_init(struct drm_i915_gem_object *obj);
int i915_gem_object_userptr_submit_done(struct drm_i915_gem_object *obj);
void i915_gem_object_userptr_submit_fini(struct drm_i915_gem_object *obj);
int i915_gem_object_userptr_validate(struct drm_i915_gem_object *obj);
#else
static inline bool i915_gem_object_is_userptr(struct drm_i915_gem_object *obj) { return false; }
static inline int i915_gem_object_userptr_submit_init(struct drm_i915_gem_object *obj) { GEM_BUG_ON(1); return -ENODEV; }
static inline int i915_gem_object_userptr_submit_done(struct drm_i915_gem_object *obj) { GEM_BUG_ON(1); return -ENODEV; }
static inline void i915_gem_object_userptr_submit_fini(struct drm_i915_gem_object *obj) { GEM_BUG_ON(1); }
static inline int i915_gem_object_userptr_validate(struct drm_i915_gem_object *obj) { GEM_BUG_ON(1); return -ENODEV; }
#endif

View File

@ -1,461 +0,0 @@
// SPDX-License-Identifier: MIT
/*
* Copyright © 2019 Intel Corporation
*/
#include "i915_drv.h"
#include "gt/intel_context.h"
#include "gt/intel_engine_pm.h"
#include "gt/intel_gpu_commands.h"
#include "gt/intel_gt.h"
#include "gt/intel_gt_buffer_pool.h"
#include "gt/intel_ring.h"
#include "i915_gem_clflush.h"
#include "i915_gem_object_blt.h"
struct i915_vma *intel_emit_vma_fill_blt(struct intel_context *ce,
struct i915_vma *vma,
struct i915_gem_ww_ctx *ww,
u32 value)
{
struct drm_i915_private *i915 = ce->vm->i915;
const u32 block_size = SZ_8M; /* ~1ms at 8GiB/s preemption delay */
struct intel_gt_buffer_pool_node *pool;
struct i915_vma *batch;
u64 offset;
u64 count;
u64 rem;
u32 size;
u32 *cmd;
int err;
GEM_BUG_ON(intel_engine_is_virtual(ce->engine));
intel_engine_pm_get(ce->engine);
count = div_u64(round_up(vma->size, block_size), block_size);
size = (1 + 8 * count) * sizeof(u32);
size = round_up(size, PAGE_SIZE);
pool = intel_gt_get_buffer_pool(ce->engine->gt, size, I915_MAP_WC);
if (IS_ERR(pool)) {
err = PTR_ERR(pool);
goto out_pm;
}
err = i915_gem_object_lock(pool->obj, ww);
if (err)
goto out_put;
batch = i915_vma_instance(pool->obj, ce->vm, NULL);
if (IS_ERR(batch)) {
err = PTR_ERR(batch);
goto out_put;
}
err = i915_vma_pin_ww(batch, ww, 0, 0, PIN_USER);
if (unlikely(err))
goto out_put;
/* we pinned the pool, mark it as such */
intel_gt_buffer_pool_mark_used(pool);
cmd = i915_gem_object_pin_map(pool->obj, pool->type);
if (IS_ERR(cmd)) {
err = PTR_ERR(cmd);
goto out_unpin;
}
rem = vma->size;
offset = vma->node.start;
do {
u32 size = min_t(u64, rem, block_size);
GEM_BUG_ON(size >> PAGE_SHIFT > S16_MAX);
if (GRAPHICS_VER(i915) >= 8) {
*cmd++ = XY_COLOR_BLT_CMD | BLT_WRITE_RGBA | (7 - 2);
*cmd++ = BLT_DEPTH_32 | BLT_ROP_COLOR_COPY | PAGE_SIZE;
*cmd++ = 0;
*cmd++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
*cmd++ = lower_32_bits(offset);
*cmd++ = upper_32_bits(offset);
*cmd++ = value;
} else {
*cmd++ = XY_COLOR_BLT_CMD | BLT_WRITE_RGBA | (6 - 2);
*cmd++ = BLT_DEPTH_32 | BLT_ROP_COLOR_COPY | PAGE_SIZE;
*cmd++ = 0;
*cmd++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
*cmd++ = offset;
*cmd++ = value;
}
/* Allow ourselves to be preempted in between blocks. */
*cmd++ = MI_ARB_CHECK;
offset += size;
rem -= size;
} while (rem);
*cmd = MI_BATCH_BUFFER_END;
i915_gem_object_flush_map(pool->obj);
i915_gem_object_unpin_map(pool->obj);
intel_gt_chipset_flush(ce->vm->gt);
batch->private = pool;
return batch;
out_unpin:
i915_vma_unpin(batch);
out_put:
intel_gt_buffer_pool_put(pool);
out_pm:
intel_engine_pm_put(ce->engine);
return ERR_PTR(err);
}
int intel_emit_vma_mark_active(struct i915_vma *vma, struct i915_request *rq)
{
int err;
err = i915_request_await_object(rq, vma->obj, false);
if (err == 0)
err = i915_vma_move_to_active(vma, rq, 0);
if (unlikely(err))
return err;
return intel_gt_buffer_pool_mark_active(vma->private, rq);
}
void intel_emit_vma_release(struct intel_context *ce, struct i915_vma *vma)
{
i915_vma_unpin(vma);
intel_gt_buffer_pool_put(vma->private);
intel_engine_pm_put(ce->engine);
}
static int
move_obj_to_gpu(struct drm_i915_gem_object *obj,
struct i915_request *rq,
bool write)
{
if (obj->cache_dirty & ~obj->cache_coherent)
i915_gem_clflush_object(obj, 0);
return i915_request_await_object(rq, obj, write);
}
int i915_gem_object_fill_blt(struct drm_i915_gem_object *obj,
struct intel_context *ce,
u32 value)
{
struct i915_gem_ww_ctx ww;
struct i915_request *rq;
struct i915_vma *batch;
struct i915_vma *vma;
int err;
vma = i915_vma_instance(obj, ce->vm, NULL);
if (IS_ERR(vma))
return PTR_ERR(vma);
i915_gem_ww_ctx_init(&ww, true);
intel_engine_pm_get(ce->engine);
retry:
err = i915_gem_object_lock(obj, &ww);
if (err)
goto out;
err = intel_context_pin_ww(ce, &ww);
if (err)
goto out;
err = i915_vma_pin_ww(vma, &ww, 0, 0, PIN_USER);
if (err)
goto out_ctx;
batch = intel_emit_vma_fill_blt(ce, vma, &ww, value);
if (IS_ERR(batch)) {
err = PTR_ERR(batch);
goto out_vma;
}
rq = i915_request_create(ce);
if (IS_ERR(rq)) {
err = PTR_ERR(rq);
goto out_batch;
}
err = intel_emit_vma_mark_active(batch, rq);
if (unlikely(err))
goto out_request;
err = move_obj_to_gpu(vma->obj, rq, true);
if (err == 0)
err = i915_vma_move_to_active(vma, rq, EXEC_OBJECT_WRITE);
if (unlikely(err))
goto out_request;
if (ce->engine->emit_init_breadcrumb)
err = ce->engine->emit_init_breadcrumb(rq);
if (likely(!err))
err = ce->engine->emit_bb_start(rq,
batch->node.start,
batch->node.size,
0);
out_request:
if (unlikely(err))
i915_request_set_error_once(rq, err);
i915_request_add(rq);
out_batch:
intel_emit_vma_release(ce, batch);
out_vma:
i915_vma_unpin(vma);
out_ctx:
intel_context_unpin(ce);
out:
if (err == -EDEADLK) {
err = i915_gem_ww_ctx_backoff(&ww);
if (!err)
goto retry;
}
i915_gem_ww_ctx_fini(&ww);
intel_engine_pm_put(ce->engine);
return err;
}
/* Wa_1209644611:icl,ehl */
static bool wa_1209644611_applies(struct drm_i915_private *i915, u32 size)
{
u32 height = size >> PAGE_SHIFT;
if (GRAPHICS_VER(i915) != 11)
return false;
return height % 4 == 3 && height <= 8;
}
struct i915_vma *intel_emit_vma_copy_blt(struct intel_context *ce,
struct i915_gem_ww_ctx *ww,
struct i915_vma *src,
struct i915_vma *dst)
{
struct drm_i915_private *i915 = ce->vm->i915;
const u32 block_size = SZ_8M; /* ~1ms at 8GiB/s preemption delay */
struct intel_gt_buffer_pool_node *pool;
struct i915_vma *batch;
u64 src_offset, dst_offset;
u64 count, rem;
u32 size, *cmd;
int err;
GEM_BUG_ON(src->size != dst->size);
GEM_BUG_ON(intel_engine_is_virtual(ce->engine));
intel_engine_pm_get(ce->engine);
count = div_u64(round_up(dst->size, block_size), block_size);
size = (1 + 11 * count) * sizeof(u32);
size = round_up(size, PAGE_SIZE);
pool = intel_gt_get_buffer_pool(ce->engine->gt, size, I915_MAP_WC);
if (IS_ERR(pool)) {
err = PTR_ERR(pool);
goto out_pm;
}
err = i915_gem_object_lock(pool->obj, ww);
if (err)
goto out_put;
batch = i915_vma_instance(pool->obj, ce->vm, NULL);
if (IS_ERR(batch)) {
err = PTR_ERR(batch);
goto out_put;
}
err = i915_vma_pin_ww(batch, ww, 0, 0, PIN_USER);
if (unlikely(err))
goto out_put;
/* we pinned the pool, mark it as such */
intel_gt_buffer_pool_mark_used(pool);
cmd = i915_gem_object_pin_map(pool->obj, pool->type);
if (IS_ERR(cmd)) {
err = PTR_ERR(cmd);
goto out_unpin;
}
rem = src->size;
src_offset = src->node.start;
dst_offset = dst->node.start;
do {
size = min_t(u64, rem, block_size);
GEM_BUG_ON(size >> PAGE_SHIFT > S16_MAX);
if (GRAPHICS_VER(i915) >= 9 &&
!wa_1209644611_applies(i915, size)) {
*cmd++ = GEN9_XY_FAST_COPY_BLT_CMD | (10 - 2);
*cmd++ = BLT_DEPTH_32 | PAGE_SIZE;
*cmd++ = 0;
*cmd++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
*cmd++ = lower_32_bits(dst_offset);
*cmd++ = upper_32_bits(dst_offset);
*cmd++ = 0;
*cmd++ = PAGE_SIZE;
*cmd++ = lower_32_bits(src_offset);
*cmd++ = upper_32_bits(src_offset);
} else if (GRAPHICS_VER(i915) >= 8) {
*cmd++ = XY_SRC_COPY_BLT_CMD | BLT_WRITE_RGBA | (10 - 2);
*cmd++ = BLT_DEPTH_32 | BLT_ROP_SRC_COPY | PAGE_SIZE;
*cmd++ = 0;
*cmd++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
*cmd++ = lower_32_bits(dst_offset);
*cmd++ = upper_32_bits(dst_offset);
*cmd++ = 0;
*cmd++ = PAGE_SIZE;
*cmd++ = lower_32_bits(src_offset);
*cmd++ = upper_32_bits(src_offset);
} else {
*cmd++ = SRC_COPY_BLT_CMD | BLT_WRITE_RGBA | (6 - 2);
*cmd++ = BLT_DEPTH_32 | BLT_ROP_SRC_COPY | PAGE_SIZE;
*cmd++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE;
*cmd++ = dst_offset;
*cmd++ = PAGE_SIZE;
*cmd++ = src_offset;
}
/* Allow ourselves to be preempted in between blocks. */
*cmd++ = MI_ARB_CHECK;
src_offset += size;
dst_offset += size;
rem -= size;
} while (rem);
*cmd = MI_BATCH_BUFFER_END;
i915_gem_object_flush_map(pool->obj);
i915_gem_object_unpin_map(pool->obj);
intel_gt_chipset_flush(ce->vm->gt);
batch->private = pool;
return batch;
out_unpin:
i915_vma_unpin(batch);
out_put:
intel_gt_buffer_pool_put(pool);
out_pm:
intel_engine_pm_put(ce->engine);
return ERR_PTR(err);
}
int i915_gem_object_copy_blt(struct drm_i915_gem_object *src,
struct drm_i915_gem_object *dst,
struct intel_context *ce)
{
struct i915_address_space *vm = ce->vm;
struct i915_vma *vma[2], *batch;
struct i915_gem_ww_ctx ww;
struct i915_request *rq;
int err, i;
vma[0] = i915_vma_instance(src, vm, NULL);
if (IS_ERR(vma[0]))
return PTR_ERR(vma[0]);
vma[1] = i915_vma_instance(dst, vm, NULL);
if (IS_ERR(vma[1]))
return PTR_ERR(vma[1]);
i915_gem_ww_ctx_init(&ww, true);
intel_engine_pm_get(ce->engine);
retry:
err = i915_gem_object_lock(src, &ww);
if (!err)
err = i915_gem_object_lock(dst, &ww);
if (!err)
err = intel_context_pin_ww(ce, &ww);
if (err)
goto out;
err = i915_vma_pin_ww(vma[0], &ww, 0, 0, PIN_USER);
if (err)
goto out_ctx;
err = i915_vma_pin_ww(vma[1], &ww, 0, 0, PIN_USER);
if (unlikely(err))
goto out_unpin_src;
batch = intel_emit_vma_copy_blt(ce, &ww, vma[0], vma[1]);
if (IS_ERR(batch)) {
err = PTR_ERR(batch);
goto out_unpin_dst;
}
rq = i915_request_create(ce);
if (IS_ERR(rq)) {
err = PTR_ERR(rq);
goto out_batch;
}
err = intel_emit_vma_mark_active(batch, rq);
if (unlikely(err))
goto out_request;
for (i = 0; i < ARRAY_SIZE(vma); i++) {
err = move_obj_to_gpu(vma[i]->obj, rq, i);
if (unlikely(err))
goto out_request;
}
for (i = 0; i < ARRAY_SIZE(vma); i++) {
unsigned int flags = i ? EXEC_OBJECT_WRITE : 0;
err = i915_vma_move_to_active(vma[i], rq, flags);
if (unlikely(err))
goto out_request;
}
if (rq->engine->emit_init_breadcrumb) {
err = rq->engine->emit_init_breadcrumb(rq);
if (unlikely(err))
goto out_request;
}
err = rq->engine->emit_bb_start(rq,
batch->node.start, batch->node.size,
0);
out_request:
if (unlikely(err))
i915_request_set_error_once(rq, err);
i915_request_add(rq);
out_batch:
intel_emit_vma_release(ce, batch);
out_unpin_dst:
i915_vma_unpin(vma[1]);
out_unpin_src:
i915_vma_unpin(vma[0]);
out_ctx:
intel_context_unpin(ce);
out:
if (err == -EDEADLK) {
err = i915_gem_ww_ctx_backoff(&ww);
if (!err)
goto retry;
}
i915_gem_ww_ctx_fini(&ww);
intel_engine_pm_put(ce->engine);
return err;
}
#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
#include "selftests/i915_gem_object_blt.c"
#endif

View File

@ -1,39 +0,0 @@
/* SPDX-License-Identifier: MIT */
/*
* Copyright © 2019 Intel Corporation
*/
#ifndef __I915_GEM_OBJECT_BLT_H__
#define __I915_GEM_OBJECT_BLT_H__
#include <linux/types.h>
#include "gt/intel_context.h"
#include "gt/intel_engine_pm.h"
#include "i915_vma.h"
struct drm_i915_gem_object;
struct i915_gem_ww_ctx;
struct i915_vma *intel_emit_vma_fill_blt(struct intel_context *ce,
struct i915_vma *vma,
struct i915_gem_ww_ctx *ww,
u32 value);
struct i915_vma *intel_emit_vma_copy_blt(struct intel_context *ce,
struct i915_gem_ww_ctx *ww,
struct i915_vma *src,
struct i915_vma *dst);
int intel_emit_vma_mark_active(struct i915_vma *vma, struct i915_request *rq);
void intel_emit_vma_release(struct intel_context *ce, struct i915_vma *vma);
int i915_gem_object_fill_blt(struct drm_i915_gem_object *obj,
struct intel_context *ce,
u32 value);
int i915_gem_object_copy_blt(struct drm_i915_gem_object *src,
struct drm_i915_gem_object *dst,
struct intel_context *ce);
#endif

View File

@ -18,6 +18,7 @@
struct drm_i915_gem_object;
struct intel_fronbuffer;
struct intel_memory_region;
/*
* struct i915_lut_handle tracks the fast lookups from handle to vma used
@ -33,10 +34,9 @@ struct i915_lut_handle {
struct drm_i915_gem_object_ops {
unsigned int flags;
#define I915_GEM_OBJECT_HAS_IOMEM BIT(1)
#define I915_GEM_OBJECT_IS_SHRINKABLE BIT(2)
#define I915_GEM_OBJECT_IS_PROXY BIT(3)
#define I915_GEM_OBJECT_NO_MMAP BIT(4)
#define I915_GEM_OBJECT_IS_SHRINKABLE BIT(1)
#define I915_GEM_OBJECT_IS_PROXY BIT(2)
#define I915_GEM_OBJECT_NO_MMAP BIT(3)
/* Interface between the GEM object and its backing storage.
* get_pages() is called once prior to the use of the associated set
@ -78,12 +78,100 @@ struct drm_i915_gem_object_ops {
* delayed_free - Override the default delayed free implementation
*/
void (*delayed_free)(struct drm_i915_gem_object *obj);
/**
* migrate - Migrate object to a different region either for
* pinning or for as long as the object lock is held.
*/
int (*migrate)(struct drm_i915_gem_object *obj,
struct intel_memory_region *mr);
void (*release)(struct drm_i915_gem_object *obj);
const struct vm_operations_struct *mmap_ops;
const char *name; /* friendly name for debug, e.g. lockdep classes */
};
/**
* enum i915_cache_level - The supported GTT caching values for system memory
* pages.
*
* These translate to some special GTT PTE bits when binding pages into some
* address space. It also determines whether an object, or rather its pages are
* coherent with the GPU, when also reading or writing through the CPU cache
* with those pages.
*
* Userspace can also control this through struct drm_i915_gem_caching.
*/
enum i915_cache_level {
/**
* @I915_CACHE_NONE:
*
* GPU access is not coherent with the CPU cache. If the cache is dirty
* and we need the underlying pages to be coherent with some later GPU
* access then we need to manually flush the pages.
*
* On shared LLC platforms reads and writes through the CPU cache are
* still coherent even with this setting. See also
* &drm_i915_gem_object.cache_coherent for more details. Due to this we
* should only ever use uncached for scanout surfaces, otherwise we end
* up over-flushing in some places.
*
* This is the default on non-LLC platforms.
*/
I915_CACHE_NONE = 0,
/**
* @I915_CACHE_LLC:
*
* GPU access is coherent with the CPU cache. If the cache is dirty,
* then the GPU will ensure that access remains coherent, when both
* reading and writing through the CPU cache. GPU writes can dirty the
* CPU cache.
*
* Not used for scanout surfaces.
*
* Applies to both platforms with shared LLC(HAS_LLC), and snooping
* based platforms(HAS_SNOOP).
*
* This is the default on shared LLC platforms. The only exception is
* scanout objects, where the display engine is not coherent with the
* CPU cache. For such objects I915_CACHE_NONE or I915_CACHE_WT is
* automatically applied by the kernel in pin_for_display, if userspace
* has not done so already.
*/
I915_CACHE_LLC,
/**
* @I915_CACHE_L3_LLC:
*
* Explicitly enable the Gfx L3 cache, with coherent LLC.
*
* The Gfx L3 sits between the domain specific caches, e.g
* sampler/render caches, and the larger LLC. LLC is coherent with the
* GPU, but L3 is only visible to the GPU, so likely needs to be flushed
* when the workload completes.
*
* Not used for scanout surfaces.
*
* Only exposed on some gen7 + GGTT. More recent hardware has dropped
* this explicit setting, where it should now be enabled by default.
*/
I915_CACHE_L3_LLC,
/**
* @I915_CACHE_WT:
*
* Write-through. Used for scanout surfaces.
*
* The GPU can utilise the caches, while still having the display engine
* be coherent with GPU writes, as a result we don't need to flush the
* CPU caches when moving out of the render domain. This is the default
* setting chosen by the kernel, if supported by the HW, otherwise we
* fallback to I915_CACHE_NONE. On the CPU side writes through the CPU
* cache still need to be flushed, to remain coherent with the display
* engine.
*/
I915_CACHE_WT,
};
enum i915_map_type {
I915_MAP_WB = 0,
I915_MAP_WC,
@ -97,6 +185,7 @@ enum i915_mmap_type {
I915_MMAP_TYPE_WC,
I915_MMAP_TYPE_WB,
I915_MMAP_TYPE_UC,
I915_MMAP_TYPE_FIXED,
};
struct i915_mmap_offset {
@ -201,25 +290,138 @@ struct drm_i915_gem_object {
unsigned long flags;
#define I915_BO_ALLOC_CONTIGUOUS BIT(0)
#define I915_BO_ALLOC_VOLATILE BIT(1)
#define I915_BO_ALLOC_STRUCT_PAGE BIT(2)
#define I915_BO_ALLOC_CPU_CLEAR BIT(3)
#define I915_BO_ALLOC_USER BIT(4)
#define I915_BO_ALLOC_CPU_CLEAR BIT(2)
#define I915_BO_ALLOC_USER BIT(3)
#define I915_BO_ALLOC_FLAGS (I915_BO_ALLOC_CONTIGUOUS | \
I915_BO_ALLOC_VOLATILE | \
I915_BO_ALLOC_STRUCT_PAGE | \
I915_BO_ALLOC_CPU_CLEAR | \
I915_BO_ALLOC_USER)
#define I915_BO_READONLY BIT(5)
#define I915_TILING_QUIRK_BIT 6 /* unknown swizzling; do not release! */
#define I915_BO_READONLY BIT(4)
#define I915_TILING_QUIRK_BIT 5 /* unknown swizzling; do not release! */
/*
* Is the object to be mapped as read-only to the GPU
* Only honoured if hardware has relevant pte bit
/**
* @mem_flags - Mutable placement-related flags
*
* These are flags that indicate specifics of the memory region
* the object is currently in. As such they are only stable
* either under the object lock or if the object is pinned.
*/
unsigned int mem_flags;
#define I915_BO_FLAG_STRUCT_PAGE BIT(0) /* Object backed by struct pages */
#define I915_BO_FLAG_IOMEM BIT(1) /* Object backed by IO memory */
/**
* @cache_level: The desired GTT caching level.
*
* See enum i915_cache_level for possible values, along with what
* each does.
*/
unsigned int cache_level:3;
unsigned int cache_coherent:2;
/**
* @cache_coherent:
*
* Track whether the pages are coherent with the GPU if reading or
* writing through the CPU caches. The largely depends on the
* @cache_level setting.
*
* On platforms which don't have the shared LLC(HAS_SNOOP), like on Atom
* platforms, coherency must be explicitly requested with some special
* GTT caching bits(see enum i915_cache_level). When enabling coherency
* it does come at a performance and power cost on such platforms. On
* the flip side the kernel does not need to manually flush any buffers
* which need to be coherent with the GPU, if the object is not coherent
* i.e @cache_coherent is zero.
*
* On platforms that share the LLC with the CPU(HAS_LLC), all GT memory
* access will automatically snoop the CPU caches(even with CACHE_NONE).
* The one exception is when dealing with the display engine, like with
* scanout surfaces. To handle this the kernel will always flush the
* surface out of the CPU caches when preparing it for scanout. Also
* note that since scanout surfaces are only ever read by the display
* engine we only need to care about flushing any writes through the CPU
* cache, reads on the other hand will always be coherent.
*
* Something strange here is why @cache_coherent is not a simple
* boolean, i.e coherent vs non-coherent. The reasoning for this is back
* to the display engine not being fully coherent. As a result scanout
* surfaces will either be marked as I915_CACHE_NONE or I915_CACHE_WT.
* In the case of seeing I915_CACHE_NONE the kernel makes the assumption
* that this is likely a scanout surface, and will set @cache_coherent
* as only I915_BO_CACHE_COHERENT_FOR_READ, on platforms with the shared
* LLC. The kernel uses this to always flush writes through the CPU
* cache as early as possible, where it can, in effect keeping
* @cache_dirty clean, so we can potentially avoid stalling when
* flushing the surface just before doing the scanout. This does mean
* we might unnecessarily flush non-scanout objects in some places, but
* the default assumption is that all normal objects should be using
* I915_CACHE_LLC, at least on platforms with the shared LLC.
*
* Supported values:
*
* I915_BO_CACHE_COHERENT_FOR_READ:
*
* On shared LLC platforms, we use this for special scanout surfaces,
* where the display engine is not coherent with the CPU cache. As such
* we need to ensure we flush any writes before doing the scanout. As an
* optimisation we try to flush any writes as early as possible to avoid
* stalling later.
*
* Thus for scanout surfaces using I915_CACHE_NONE, on shared LLC
* platforms, we use:
*
* cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ
*
* While for normal objects that are fully coherent, including special
* scanout surfaces marked as I915_CACHE_WT, we use:
*
* cache_coherent = I915_BO_CACHE_COHERENT_FOR_READ |
* I915_BO_CACHE_COHERENT_FOR_WRITE
*
* And then for objects that are not coherent at all we use:
*
* cache_coherent = 0
*
* I915_BO_CACHE_COHERENT_FOR_WRITE:
*
* When writing through the CPU cache, the GPU is still coherent. Note
* that this also implies I915_BO_CACHE_COHERENT_FOR_READ.
*/
#define I915_BO_CACHE_COHERENT_FOR_READ BIT(0)
#define I915_BO_CACHE_COHERENT_FOR_WRITE BIT(1)
unsigned int cache_coherent:2;
/**
* @cache_dirty:
*
* Track if we are we dirty with writes through the CPU cache for this
* object. As a result reading directly from main memory might yield
* stale data.
*
* This also ties into whether the kernel is tracking the object as
* coherent with the GPU, as per @cache_coherent, as it determines if
* flushing might be needed at various points.
*
* Another part of @cache_dirty is managing flushing when first
* acquiring the pages for system memory, at this point the pages are
* considered foreign, so the default assumption is that the cache is
* dirty, for example the page zeroing done by the kernel might leave
* writes though the CPU cache, or swapping-in, while the actual data in
* main memory is potentially stale. Note that this is a potential
* security issue when dealing with userspace objects and zeroing. Now,
* whether we actually need apply the big sledgehammer of flushing all
* the pages on acquire depends on if @cache_coherent is marked as
* I915_BO_CACHE_COHERENT_FOR_WRITE, i.e that the GPU will be coherent
* for both reads and writes though the CPU cache.
*
* Note that on shared LLC platforms we still apply the heavy flush for
* I915_CACHE_NONE objects, under the assumption that this is going to
* be used for scanout.
*
* Update: On some hardware there is now also the 'Bypass LLC' MOCS
* entry, which defeats our @cache_coherent tracking, since userspace
* can freely bypass the CPU cache when touching the pages with the GPU,
* where the kernel is completely unaware. On such platform we need
* apply the sledgehammer-on-acquire regardless of the @cache_coherent.
*/
unsigned int cache_dirty:1;
/**
@ -265,9 +467,10 @@ struct drm_i915_gem_object {
struct intel_memory_region *region;
/**
* Memory manager node allocated for this object.
* Memory manager resource allocated for this object. Only
* needed for the mock region.
*/
void *st_mm_node;
struct ttm_resource *res;
/**
* Element within memory_region->objects or region->purgeable

View File

@ -321,8 +321,7 @@ static void *i915_gem_object_map_pfn(struct drm_i915_gem_object *obj,
dma_addr_t addr;
void *vaddr;
if (type != I915_MAP_WC)
return ERR_PTR(-ENODEV);
GEM_BUG_ON(type != I915_MAP_WC);
if (n_pfn > ARRAY_SIZE(stack)) {
/* Too big for stack -- allocate temporary array instead */
@ -351,7 +350,7 @@ void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj,
int err;
if (!i915_gem_object_has_struct_page(obj) &&
!i915_gem_object_type_has(obj, I915_GEM_OBJECT_HAS_IOMEM))
!i915_gem_object_has_iomem(obj))
return ERR_PTR(-ENXIO);
assert_object_held(obj);
@ -374,6 +373,34 @@ void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj,
}
GEM_BUG_ON(!i915_gem_object_has_pages(obj));
/*
* For discrete our CPU mappings needs to be consistent in order to
* function correctly on !x86. When mapping things through TTM, we use
* the same rules to determine the caching type.
*
* The caching rules, starting from DG1:
*
* - If the object can be placed in device local-memory, then the
* pages should be allocated and mapped as write-combined only.
*
* - Everything else is always allocated and mapped as write-back,
* with the guarantee that everything is also coherent with the
* GPU.
*
* Internal users of lmem are already expected to get this right, so no
* fudging needed there.
*/
if (i915_gem_object_placement_possible(obj, INTEL_MEMORY_LOCAL)) {
if (type != I915_MAP_WC && !obj->mm.n_placements) {
ptr = ERR_PTR(-ENODEV);
goto err_unpin;
}
type = I915_MAP_WC;
} else if (IS_DGFX(to_i915(obj->base.dev))) {
type = I915_MAP_WB;
}
ptr = page_unpack_bits(obj->mm.mapping, &has_type);
if (ptr && has_type != type) {
if (pinned) {
@ -467,7 +494,7 @@ __i915_gem_object_get_sg(struct drm_i915_gem_object *obj,
struct i915_gem_object_page_iter *iter,
unsigned int n,
unsigned int *offset,
bool allow_alloc, bool dma)
bool dma)
{
struct scatterlist *sg;
unsigned int idx, count;
@ -489,9 +516,6 @@ __i915_gem_object_get_sg(struct drm_i915_gem_object *obj,
if (n < READ_ONCE(iter->sg_idx))
goto lookup;
if (!allow_alloc)
goto manual_lookup;
mutex_lock(&iter->lock);
/* We prefer to reuse the last sg so that repeated lookup of this
@ -541,16 +565,7 @@ scan:
if (unlikely(n < idx)) /* insertion completed by another thread */
goto lookup;
goto manual_walk;
manual_lookup:
idx = 0;
sg = obj->mm.pages->sgl;
count = __sg_page_count(sg);
manual_walk:
/*
* In case we failed to insert the entry into the radixtree, we need
/* In case we failed to insert the entry into the radixtree, we need
* to look beyond the current sg.
*/
while (idx + count <= n) {
@ -597,7 +612,7 @@ i915_gem_object_get_page(struct drm_i915_gem_object *obj, unsigned int n)
GEM_BUG_ON(!i915_gem_object_has_struct_page(obj));
sg = i915_gem_object_get_sg(obj, n, &offset, true);
sg = i915_gem_object_get_sg(obj, n, &offset);
return nth_page(sg_page(sg), offset);
}
@ -623,7 +638,7 @@ i915_gem_object_get_dma_address_len(struct drm_i915_gem_object *obj,
struct scatterlist *sg;
unsigned int offset;
sg = i915_gem_object_get_sg_dma(obj, n, &offset, true);
sg = i915_gem_object_get_sg_dma(obj, n, &offset);
if (len)
*len = sg_dma_len(sg) - (offset << PAGE_SHIFT);

View File

@ -76,7 +76,7 @@ static int i915_gem_object_get_pages_phys(struct drm_i915_gem_object *obj)
intel_gt_chipset_flush(&to_i915(obj->base.dev)->gt);
/* We're no longer struct page backed */
obj->flags &= ~I915_BO_ALLOC_STRUCT_PAGE;
obj->mem_flags &= ~I915_BO_FLAG_STRUCT_PAGE;
__i915_gem_object_set_pages(obj, st, sg->length);
return 0;

View File

@ -13,11 +13,7 @@ void i915_gem_object_init_memory_region(struct drm_i915_gem_object *obj,
{
obj->mm.region = intel_memory_region_get(mem);
if (obj->base.size <= mem->min_page_size)
obj->flags |= I915_BO_ALLOC_CONTIGUOUS;
mutex_lock(&mem->objects.lock);
list_add(&obj->mm.region_link, &mem->objects.list);
mutex_unlock(&mem->objects.lock);
}
@ -36,9 +32,11 @@ void i915_gem_object_release_memory_region(struct drm_i915_gem_object *obj)
struct drm_i915_gem_object *
i915_gem_object_create_region(struct intel_memory_region *mem,
resource_size_t size,
resource_size_t page_size,
unsigned int flags)
{
struct drm_i915_gem_object *obj;
resource_size_t default_page_size;
int err;
/*
@ -52,7 +50,14 @@ i915_gem_object_create_region(struct intel_memory_region *mem,
if (!mem)
return ERR_PTR(-ENODEV);
size = round_up(size, mem->min_page_size);
default_page_size = mem->min_page_size;
if (page_size)
default_page_size = page_size;
GEM_BUG_ON(!is_power_of_2_u64(default_page_size));
GEM_BUG_ON(default_page_size < PAGE_SIZE);
size = round_up(size, default_page_size);
GEM_BUG_ON(!size);
GEM_BUG_ON(!IS_ALIGNED(size, I915_GTT_MIN_ALIGNMENT));
@ -64,7 +69,7 @@ i915_gem_object_create_region(struct intel_memory_region *mem,
if (!obj)
return ERR_PTR(-ENOMEM);
err = mem->ops->init_object(mem, obj, size, flags);
err = mem->ops->init_object(mem, obj, size, page_size, flags);
if (err)
goto err_object_free;

View File

@ -19,6 +19,7 @@ void i915_gem_object_release_memory_region(struct drm_i915_gem_object *obj);
struct drm_i915_gem_object *
i915_gem_object_create_region(struct intel_memory_region *mem,
resource_size_t size,
resource_size_t page_size,
unsigned int flags);
#endif

View File

@ -182,6 +182,24 @@ rebuild_st:
if (i915_gem_object_needs_bit17_swizzle(obj))
i915_gem_object_do_bit_17_swizzle(obj, st);
/*
* EHL and JSL add the 'Bypass LLC' MOCS entry, which should make it
* possible for userspace to bypass the GTT caching bits set by the
* kernel, as per the given object cache_level. This is troublesome
* since the heavy flush we apply when first gathering the pages is
* skipped if the kernel thinks the object is coherent with the GPU. As
* a result it might be possible to bypass the cache and read the
* contents of the page directly, which could be stale data. If it's
* just a case of userspace shooting themselves in the foot then so be
* it, but since i915 takes the stance of always zeroing memory before
* handing it to userspace, we need to prevent this.
*
* By setting cache_dirty here we make the clflush in set_pages
* unconditional on such platforms.
*/
if (IS_JSL_EHL(i915) && obj->flags & I915_BO_ALLOC_USER)
obj->cache_dirty = true;
__i915_gem_object_set_pages(obj, st, sg_page_sizes);
return 0;
@ -302,6 +320,7 @@ void i915_gem_object_put_pages_shmem(struct drm_i915_gem_object *obj, struct sg_
struct pagevec pvec;
struct page *page;
GEM_WARN_ON(IS_DGFX(to_i915(obj->base.dev)));
__i915_gem_object_release_shmem(obj, pages, true);
i915_gem_gtt_finish_pages(obj, pages);
@ -444,7 +463,7 @@ shmem_pread(struct drm_i915_gem_object *obj,
static void shmem_release(struct drm_i915_gem_object *obj)
{
if (obj->flags & I915_BO_ALLOC_STRUCT_PAGE)
if (i915_gem_object_has_struct_page(obj))
i915_gem_object_release_memory_region(obj);
fput(obj->base.filp);
@ -489,6 +508,7 @@ static int __create_shmem(struct drm_i915_private *i915,
static int shmem_object_init(struct intel_memory_region *mem,
struct drm_i915_gem_object *obj,
resource_size_t size,
resource_size_t page_size,
unsigned int flags)
{
static struct lock_class_key lock_class;
@ -513,9 +533,8 @@ static int shmem_object_init(struct intel_memory_region *mem,
mapping_set_gfp_mask(mapping, mask);
GEM_BUG_ON(!(mapping_gfp_mask(mapping) & __GFP_RECLAIM));
i915_gem_object_init(obj, &i915_gem_shmem_ops, &lock_class,
I915_BO_ALLOC_STRUCT_PAGE);
i915_gem_object_init(obj, &i915_gem_shmem_ops, &lock_class, 0);
obj->mem_flags |= I915_BO_FLAG_STRUCT_PAGE;
obj->write_domain = I915_GEM_DOMAIN_CPU;
obj->read_domains = I915_GEM_DOMAIN_CPU;
@ -548,7 +567,7 @@ i915_gem_object_create_shmem(struct drm_i915_private *i915,
resource_size_t size)
{
return i915_gem_object_create_region(i915->mm.regions[INTEL_REGION_SMEM],
size, 0);
size, 0, 0);
}
/* Allocate a new GEM object and fill it with the supplied data */
@ -561,6 +580,7 @@ i915_gem_object_create_shmem_from_data(struct drm_i915_private *dev_priv,
resource_size_t offset;
int err;
GEM_WARN_ON(IS_DGFX(dev_priv));
obj = i915_gem_object_create_shmem(dev_priv, round_up(size, PAGE_SIZE));
if (IS_ERR(obj))
return obj;

View File

@ -670,6 +670,7 @@ static int __i915_gem_object_create_stolen(struct intel_memory_region *mem,
static int _i915_gem_object_stolen_init(struct intel_memory_region *mem,
struct drm_i915_gem_object *obj,
resource_size_t size,
resource_size_t page_size,
unsigned int flags)
{
struct drm_i915_private *i915 = mem->i915;
@ -708,7 +709,7 @@ struct drm_i915_gem_object *
i915_gem_object_create_stolen(struct drm_i915_private *i915,
resource_size_t size)
{
return i915_gem_object_create_region(i915->mm.stolen_region, size, 0);
return i915_gem_object_create_region(i915->mm.stolen_region, size, 0, 0);
}
static int init_stolen_smem(struct intel_memory_region *mem)

View File

@ -15,6 +15,9 @@
#include "gem/i915_gem_ttm.h"
#include "gem/i915_gem_mman.h"
#include "gt/intel_migrate.h"
#include "gt/intel_engine_pm.h"
#define I915_PL_LMEM0 TTM_PL_PRIV
#define I915_PL_SYSTEM TTM_PL_SYSTEM
#define I915_PL_STOLEN TTM_PL_VRAM
@ -24,6 +27,11 @@
#define I915_TTM_PRIO_NO_PAGES 1
#define I915_TTM_PRIO_HAS_PAGES 2
/*
* Size of struct ttm_place vector in on-stack struct ttm_placement allocs
*/
#define I915_TTM_MAX_PLACEMENTS INTEL_REGION_UNKNOWN
/**
* struct i915_ttm_tt - TTM page vector with additional private information
* @ttm: The base TTM page vector.
@ -42,36 +50,123 @@ struct i915_ttm_tt {
struct sg_table *cached_st;
};
static const struct ttm_place lmem0_sys_placement_flags[] = {
{
.fpfn = 0,
.lpfn = 0,
.mem_type = I915_PL_LMEM0,
.flags = 0,
}, {
.fpfn = 0,
.lpfn = 0,
.mem_type = I915_PL_SYSTEM,
.flags = 0,
}
};
static struct ttm_placement i915_lmem0_placement = {
.num_placement = 1,
.placement = &lmem0_sys_placement_flags[0],
.num_busy_placement = 1,
.busy_placement = &lmem0_sys_placement_flags[0],
static const struct ttm_place sys_placement_flags = {
.fpfn = 0,
.lpfn = 0,
.mem_type = I915_PL_SYSTEM,
.flags = 0,
};
static struct ttm_placement i915_sys_placement = {
.num_placement = 1,
.placement = &lmem0_sys_placement_flags[1],
.placement = &sys_placement_flags,
.num_busy_placement = 1,
.busy_placement = &lmem0_sys_placement_flags[1],
.busy_placement = &sys_placement_flags,
};
static int i915_ttm_err_to_gem(int err)
{
/* Fastpath */
if (likely(!err))
return 0;
switch (err) {
case -EBUSY:
/*
* TTM likes to convert -EDEADLK to -EBUSY, and wants us to
* restart the operation, since we don't record the contending
* lock. We use -EAGAIN to restart.
*/
return -EAGAIN;
case -ENOSPC:
/*
* Memory type / region is full, and we can't evict.
* Except possibly system, that returns -ENOMEM;
*/
return -ENXIO;
default:
break;
}
return err;
}
static bool gpu_binds_iomem(struct ttm_resource *mem)
{
return mem->mem_type != TTM_PL_SYSTEM;
}
static bool cpu_maps_iomem(struct ttm_resource *mem)
{
/* Once / if we support GGTT, this is also false for cached ttm_tts */
return mem->mem_type != TTM_PL_SYSTEM;
}
static enum i915_cache_level
i915_ttm_cache_level(struct drm_i915_private *i915, struct ttm_resource *res,
struct ttm_tt *ttm)
{
return ((HAS_LLC(i915) || HAS_SNOOP(i915)) && !gpu_binds_iomem(res) &&
ttm->caching == ttm_cached) ? I915_CACHE_LLC :
I915_CACHE_NONE;
}
static void i915_ttm_adjust_lru(struct drm_i915_gem_object *obj);
static enum ttm_caching
i915_ttm_select_tt_caching(const struct drm_i915_gem_object *obj)
{
/*
* Objects only allowed in system get cached cpu-mappings.
* Other objects get WC mapping for now. Even if in system.
*/
if (obj->mm.region->type == INTEL_MEMORY_SYSTEM &&
obj->mm.n_placements <= 1)
return ttm_cached;
return ttm_write_combined;
}
static void
i915_ttm_place_from_region(const struct intel_memory_region *mr,
struct ttm_place *place,
unsigned int flags)
{
memset(place, 0, sizeof(*place));
place->mem_type = intel_region_to_ttm_type(mr);
if (flags & I915_BO_ALLOC_CONTIGUOUS)
place->flags = TTM_PL_FLAG_CONTIGUOUS;
}
static void
i915_ttm_placement_from_obj(const struct drm_i915_gem_object *obj,
struct ttm_place *requested,
struct ttm_place *busy,
struct ttm_placement *placement)
{
unsigned int num_allowed = obj->mm.n_placements;
unsigned int flags = obj->flags;
unsigned int i;
placement->num_placement = 1;
i915_ttm_place_from_region(num_allowed ? obj->mm.placements[0] :
obj->mm.region, requested, flags);
/* Cache this on object? */
placement->num_busy_placement = num_allowed;
for (i = 0; i < placement->num_busy_placement; ++i)
i915_ttm_place_from_region(obj->mm.placements[i], busy + i, flags);
if (num_allowed == 0) {
*busy = *requested;
placement->num_busy_placement = 1;
}
placement->placement = requested;
placement->busy_placement = busy;
}
static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
uint32_t page_flags)
{
@ -89,7 +184,8 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
man->use_tt)
page_flags |= TTM_PAGE_FLAG_ZERO_ALLOC;
ret = ttm_tt_init(&i915_tt->ttm, bo, page_flags, ttm_write_combined);
ret = ttm_tt_init(&i915_tt->ttm, bo, page_flags,
i915_ttm_select_tt_caching(obj));
if (ret) {
kfree(i915_tt);
return NULL;
@ -119,6 +215,7 @@ static void i915_ttm_tt_destroy(struct ttm_device *bdev, struct ttm_tt *ttm)
struct i915_ttm_tt *i915_tt = container_of(ttm, typeof(*i915_tt), ttm);
ttm_tt_destroy_common(bdev, ttm);
ttm_tt_fini(ttm);
kfree(i915_tt);
}
@ -128,11 +225,7 @@ static bool i915_ttm_eviction_valuable(struct ttm_buffer_object *bo,
struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
/* Will do for now. Our pinned objects are still on TTM's LRU lists */
if (!i915_gem_object_evictable(obj))
return false;
/* This isn't valid with a buddy allocator */
return ttm_bo_eviction_valuable(bo, place);
return i915_gem_object_evictable(obj);
}
static void i915_ttm_evict_flags(struct ttm_buffer_object *bo,
@ -175,6 +268,55 @@ static void i915_ttm_free_cached_io_st(struct drm_i915_gem_object *obj)
obj->ttm.cached_io_st = NULL;
}
static void
i915_ttm_adjust_domains_after_move(struct drm_i915_gem_object *obj)
{
struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
if (cpu_maps_iomem(bo->resource) || bo->ttm->caching != ttm_cached) {
obj->write_domain = I915_GEM_DOMAIN_WC;
obj->read_domains = I915_GEM_DOMAIN_WC;
} else {
obj->write_domain = I915_GEM_DOMAIN_CPU;
obj->read_domains = I915_GEM_DOMAIN_CPU;
}
}
static void i915_ttm_adjust_gem_after_move(struct drm_i915_gem_object *obj)
{
struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
unsigned int cache_level;
unsigned int i;
/*
* If object was moved to an allowable region, update the object
* region to consider it migrated. Note that if it's currently not
* in an allowable region, it's evicted and we don't update the
* object region.
*/
if (intel_region_to_ttm_type(obj->mm.region) != bo->resource->mem_type) {
for (i = 0; i < obj->mm.n_placements; ++i) {
struct intel_memory_region *mr = obj->mm.placements[i];
if (intel_region_to_ttm_type(mr) == bo->resource->mem_type &&
mr != obj->mm.region) {
i915_gem_object_release_memory_region(obj);
i915_gem_object_init_memory_region(obj, mr);
break;
}
}
}
obj->mem_flags &= ~(I915_BO_FLAG_STRUCT_PAGE | I915_BO_FLAG_IOMEM);
obj->mem_flags |= cpu_maps_iomem(bo->resource) ? I915_BO_FLAG_IOMEM :
I915_BO_FLAG_STRUCT_PAGE;
cache_level = i915_ttm_cache_level(to_i915(bo->base.dev), bo->resource,
bo->ttm);
i915_gem_object_set_cache_coherency(obj, cache_level);
}
static void i915_ttm_purge(struct drm_i915_gem_object *obj)
{
struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
@ -190,8 +332,10 @@ static void i915_ttm_purge(struct drm_i915_gem_object *obj)
/* TTM's purge interface. Note that we might be reentering. */
ret = ttm_bo_validate(bo, &place, &ctx);
if (!ret) {
obj->write_domain = 0;
obj->read_domains = 0;
i915_ttm_adjust_gem_after_move(obj);
i915_ttm_free_cached_io_st(obj);
obj->mm.madv = __I915_MADV_PURGED;
}
@ -214,6 +358,7 @@ static void i915_ttm_delete_mem_notify(struct ttm_buffer_object *bo)
if (likely(obj)) {
/* This releases all gem object bindings to the backend. */
i915_ttm_free_cached_io_st(obj);
__i915_gem_free_object(obj);
}
}
@ -273,13 +418,75 @@ i915_ttm_resource_get_st(struct drm_i915_gem_object *obj,
struct ttm_resource *res)
{
struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
struct ttm_resource_manager *man =
ttm_manager_type(bo->bdev, res->mem_type);
if (man->use_tt)
if (!gpu_binds_iomem(res))
return i915_ttm_tt_get_st(bo->ttm);
return intel_region_ttm_node_to_st(obj->mm.region, res);
/*
* If CPU mapping differs, we need to add the ttm_tt pages to
* the resulting st. Might make sense for GGTT.
*/
GEM_WARN_ON(!cpu_maps_iomem(res));
return intel_region_ttm_resource_to_st(obj->mm.region, res);
}
static int i915_ttm_accel_move(struct ttm_buffer_object *bo,
struct ttm_resource *dst_mem,
struct sg_table *dst_st)
{
struct drm_i915_private *i915 = container_of(bo->bdev, typeof(*i915),
bdev);
struct ttm_resource_manager *src_man =
ttm_manager_type(bo->bdev, bo->resource->mem_type);
struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
struct sg_table *src_st;
struct i915_request *rq;
struct ttm_tt *ttm = bo->ttm;
enum i915_cache_level src_level, dst_level;
int ret;
if (!i915->gt.migrate.context)
return -EINVAL;
dst_level = i915_ttm_cache_level(i915, dst_mem, ttm);
if (!ttm || !ttm_tt_is_populated(ttm)) {
if (bo->type == ttm_bo_type_kernel)
return -EINVAL;
if (ttm && !(ttm->page_flags & TTM_PAGE_FLAG_ZERO_ALLOC))
return 0;
intel_engine_pm_get(i915->gt.migrate.context->engine);
ret = intel_context_migrate_clear(i915->gt.migrate.context, NULL,
dst_st->sgl, dst_level,
gpu_binds_iomem(dst_mem),
0, &rq);
if (!ret && rq) {
i915_request_wait(rq, 0, MAX_SCHEDULE_TIMEOUT);
i915_request_put(rq);
}
intel_engine_pm_put(i915->gt.migrate.context->engine);
} else {
src_st = src_man->use_tt ? i915_ttm_tt_get_st(ttm) :
obj->ttm.cached_io_st;
src_level = i915_ttm_cache_level(i915, bo->resource, ttm);
intel_engine_pm_get(i915->gt.migrate.context->engine);
ret = intel_context_migrate_copy(i915->gt.migrate.context,
NULL, src_st->sgl, src_level,
gpu_binds_iomem(bo->resource),
dst_st->sgl, dst_level,
gpu_binds_iomem(dst_mem),
&rq);
if (!ret && rq) {
i915_request_wait(rq, 0, MAX_SCHEDULE_TIMEOUT);
i915_request_put(rq);
}
intel_engine_pm_put(i915->gt.migrate.context->engine);
}
return ret;
}
static int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
@ -290,8 +497,6 @@ static int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
struct ttm_resource_manager *dst_man =
ttm_manager_type(bo->bdev, dst_mem->mem_type);
struct ttm_resource_manager *src_man =
ttm_manager_type(bo->bdev, bo->resource->mem_type);
struct intel_memory_region *dst_reg, *src_reg;
union {
struct ttm_kmap_iter_tt tt;
@ -332,34 +537,40 @@ static int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
if (IS_ERR(dst_st))
return PTR_ERR(dst_st);
/* If we start mapping GGTT, we can no longer use man::use_tt here. */
dst_iter = dst_man->use_tt ?
ttm_kmap_iter_tt_init(&_dst_iter.tt, bo->ttm) :
ttm_kmap_iter_iomap_init(&_dst_iter.io, &dst_reg->iomap,
dst_st, dst_reg->region.start);
ret = i915_ttm_accel_move(bo, dst_mem, dst_st);
if (ret) {
/* If we start mapping GGTT, we can no longer use man::use_tt here. */
dst_iter = !cpu_maps_iomem(dst_mem) ?
ttm_kmap_iter_tt_init(&_dst_iter.tt, bo->ttm) :
ttm_kmap_iter_iomap_init(&_dst_iter.io, &dst_reg->iomap,
dst_st, dst_reg->region.start);
src_iter = src_man->use_tt ?
ttm_kmap_iter_tt_init(&_src_iter.tt, bo->ttm) :
ttm_kmap_iter_iomap_init(&_src_iter.io, &src_reg->iomap,
obj->ttm.cached_io_st,
src_reg->region.start);
src_iter = !cpu_maps_iomem(bo->resource) ?
ttm_kmap_iter_tt_init(&_src_iter.tt, bo->ttm) :
ttm_kmap_iter_iomap_init(&_src_iter.io, &src_reg->iomap,
obj->ttm.cached_io_st,
src_reg->region.start);
ttm_move_memcpy(bo, dst_mem->num_pages, dst_iter, src_iter);
ttm_move_memcpy(bo, dst_mem->num_pages, dst_iter, src_iter);
}
/* Below dst_mem becomes bo->resource. */
ttm_bo_move_sync_cleanup(bo, dst_mem);
i915_ttm_adjust_domains_after_move(obj);
i915_ttm_free_cached_io_st(obj);
if (!dst_man->use_tt) {
if (gpu_binds_iomem(dst_mem) || cpu_maps_iomem(dst_mem)) {
obj->ttm.cached_io_st = dst_st;
obj->ttm.get_io_page.sg_pos = dst_st->sgl;
obj->ttm.get_io_page.sg_idx = 0;
}
i915_ttm_adjust_gem_after_move(obj);
return 0;
}
static int i915_ttm_io_mem_reserve(struct ttm_device *bdev, struct ttm_resource *mem)
{
if (mem->mem_type < I915_PL_LMEM0)
if (!cpu_maps_iomem(mem))
return 0;
mem->bus.caching = ttm_write_combined;
@ -378,7 +589,7 @@ static unsigned long i915_ttm_io_mem_pfn(struct ttm_buffer_object *bo,
GEM_WARN_ON(bo->ttm);
sg = __i915_gem_object_get_sg(obj, &obj->ttm.get_io_page, page_offset, &ofs, true, true);
sg = __i915_gem_object_get_sg(obj, &obj->ttm.get_io_page, page_offset, &ofs, true);
return ((base + sg_dma_address(sg)) >> PAGE_SHIFT) + ofs;
}
@ -406,7 +617,8 @@ struct ttm_device_funcs *i915_ttm_driver(void)
return &i915_ttm_bo_driver;
}
static int i915_ttm_get_pages(struct drm_i915_gem_object *obj)
static int __i915_ttm_get_pages(struct drm_i915_gem_object *obj,
struct ttm_placement *placement)
{
struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
struct ttm_operation_ctx ctx = {
@ -414,25 +626,111 @@ static int i915_ttm_get_pages(struct drm_i915_gem_object *obj)
.no_wait_gpu = false,
};
struct sg_table *st;
int real_num_busy;
int ret;
/* Move to the requested placement. */
ret = ttm_bo_validate(bo, &i915_lmem0_placement, &ctx);
if (ret)
return ret == -ENOSPC ? -ENXIO : ret;
/* First try only the requested placement. No eviction. */
real_num_busy = fetch_and_zero(&placement->num_busy_placement);
ret = ttm_bo_validate(bo, placement, &ctx);
if (ret) {
ret = i915_ttm_err_to_gem(ret);
/*
* Anything that wants to restart the operation gets to
* do that.
*/
if (ret == -EDEADLK || ret == -EINTR || ret == -ERESTARTSYS ||
ret == -EAGAIN)
return ret;
/* Object either has a page vector or is an iomem object */
st = bo->ttm ? i915_ttm_tt_get_st(bo->ttm) : obj->ttm.cached_io_st;
if (IS_ERR(st))
return PTR_ERR(st);
__i915_gem_object_set_pages(obj, st, i915_sg_dma_sizes(st->sgl));
/*
* If the initial attempt fails, allow all accepted placements,
* evicting if necessary.
*/
placement->num_busy_placement = real_num_busy;
ret = ttm_bo_validate(bo, placement, &ctx);
if (ret)
return i915_ttm_err_to_gem(ret);
}
i915_ttm_adjust_lru(obj);
if (bo->ttm && !ttm_tt_is_populated(bo->ttm)) {
ret = ttm_tt_populate(bo->bdev, bo->ttm, &ctx);
if (ret)
return ret;
i915_ttm_adjust_domains_after_move(obj);
i915_ttm_adjust_gem_after_move(obj);
}
if (!i915_gem_object_has_pages(obj)) {
/* Object either has a page vector or is an iomem object */
st = bo->ttm ? i915_ttm_tt_get_st(bo->ttm) : obj->ttm.cached_io_st;
if (IS_ERR(st))
return PTR_ERR(st);
__i915_gem_object_set_pages(obj, st, i915_sg_dma_sizes(st->sgl));
}
return ret;
}
static int i915_ttm_get_pages(struct drm_i915_gem_object *obj)
{
struct ttm_place requested, busy[I915_TTM_MAX_PLACEMENTS];
struct ttm_placement placement;
GEM_BUG_ON(obj->mm.n_placements > I915_TTM_MAX_PLACEMENTS);
/* Move to the requested placement. */
i915_ttm_placement_from_obj(obj, &requested, busy, &placement);
return __i915_ttm_get_pages(obj, &placement);
}
/**
* DOC: Migration vs eviction
*
* GEM migration may not be the same as TTM migration / eviction. If
* the TTM core decides to evict an object it may be evicted to a
* TTM memory type that is not in the object's allowable GEM regions, or
* in fact theoretically to a TTM memory type that doesn't correspond to
* a GEM memory region. In that case the object's GEM region is not
* updated, and the data is migrated back to the GEM region at
* get_pages time. TTM may however set up CPU ptes to the object even
* when it is evicted.
* Gem forced migration using the i915_ttm_migrate() op, is allowed even
* to regions that are not in the object's list of allowable placements.
*/
static int i915_ttm_migrate(struct drm_i915_gem_object *obj,
struct intel_memory_region *mr)
{
struct ttm_place requested;
struct ttm_placement placement;
int ret;
i915_ttm_place_from_region(mr, &requested, obj->flags);
placement.num_placement = 1;
placement.num_busy_placement = 1;
placement.placement = &requested;
placement.busy_placement = &requested;
ret = __i915_ttm_get_pages(obj, &placement);
if (ret)
return ret;
/*
* Reinitialize the region bindings. This is primarily
* required for objects where the new region is not in
* its allowable placements.
*/
if (obj->mm.region != mr) {
i915_gem_object_release_memory_region(obj);
i915_gem_object_init_memory_region(obj, mr);
}
return 0;
}
static void i915_ttm_put_pages(struct drm_i915_gem_object *obj,
struct sg_table *st)
{
@ -561,15 +859,15 @@ static u64 i915_ttm_mmap_offset(struct drm_i915_gem_object *obj)
return drm_vma_node_offset_addr(&obj->base.vma_node);
}
const struct drm_i915_gem_object_ops i915_gem_ttm_obj_ops = {
static const struct drm_i915_gem_object_ops i915_gem_ttm_obj_ops = {
.name = "i915_gem_object_ttm",
.flags = I915_GEM_OBJECT_HAS_IOMEM,
.get_pages = i915_ttm_get_pages,
.put_pages = i915_ttm_put_pages,
.truncate = i915_ttm_purge,
.adjust_lru = i915_ttm_adjust_lru,
.delayed_free = i915_ttm_delayed_free,
.migrate = i915_ttm_migrate,
.mmap_offset = i915_ttm_mmap_offset,
.mmap_ops = &vm_ops_ttm,
};
@ -596,37 +894,32 @@ void i915_ttm_bo_destroy(struct ttm_buffer_object *bo)
int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
struct drm_i915_gem_object *obj,
resource_size_t size,
resource_size_t page_size,
unsigned int flags)
{
static struct lock_class_key lock_class;
struct drm_i915_private *i915 = mem->i915;
struct ttm_operation_ctx ctx = {
.interruptible = true,
.no_wait_gpu = false,
};
enum ttm_bo_type bo_type;
size_t alignment = 0;
int ret;
/* Adjust alignment to GPU- and CPU huge page sizes. */
if (mem->is_range_manager) {
if (size >= SZ_1G)
alignment = SZ_1G >> PAGE_SHIFT;
else if (size >= SZ_2M)
alignment = SZ_2M >> PAGE_SHIFT;
else if (size >= SZ_64K)
alignment = SZ_64K >> PAGE_SHIFT;
}
drm_gem_private_object_init(&i915->drm, &obj->base, size);
i915_gem_object_init(obj, &i915_gem_ttm_obj_ops, &lock_class, flags);
i915_gem_object_init_memory_region(obj, mem);
i915_gem_object_make_unshrinkable(obj);
obj->read_domains = I915_GEM_DOMAIN_WC | I915_GEM_DOMAIN_GTT;
i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE);
INIT_RADIX_TREE(&obj->ttm.get_io_page.radix, GFP_KERNEL | __GFP_NOWARN);
mutex_init(&obj->ttm.get_io_page.lock);
bo_type = (obj->flags & I915_BO_ALLOC_USER) ? ttm_bo_type_device :
ttm_bo_type_kernel;
obj->base.vma_node.driver_private = i915_gem_to_ttm(obj);
/* Forcing the page size is kernel internal only */
GEM_BUG_ON(page_size && obj->mm.n_placements);
/*
* If this function fails, it will call the destructor, but
* our caller still owns the object. So no freeing in the
@ -634,14 +927,39 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
* Similarly, in delayed_destroy, we can't call ttm_bo_put()
* until successful initialization.
*/
obj->base.vma_node.driver_private = i915_gem_to_ttm(obj);
ret = ttm_bo_init(&i915->bdev, i915_gem_to_ttm(obj), size,
bo_type, &i915_sys_placement, alignment,
true, NULL, NULL, i915_ttm_bo_destroy);
ret = ttm_bo_init_reserved(&i915->bdev, i915_gem_to_ttm(obj), size,
bo_type, &i915_sys_placement,
page_size >> PAGE_SHIFT,
&ctx, NULL, NULL, i915_ttm_bo_destroy);
if (ret)
return i915_ttm_err_to_gem(ret);
if (!ret)
obj->ttm.created = true;
obj->ttm.created = true;
i915_ttm_adjust_domains_after_move(obj);
i915_ttm_adjust_gem_after_move(obj);
i915_gem_object_unlock(obj);
/* i915 wants -ENXIO when out of memory region space. */
return (ret == -ENOSPC) ? -ENXIO : ret;
return 0;
}
static const struct intel_memory_region_ops ttm_system_region_ops = {
.init_object = __i915_gem_ttm_object_init,
};
struct intel_memory_region *
i915_gem_ttm_system_setup(struct drm_i915_private *i915,
u16 type, u16 instance)
{
struct intel_memory_region *mr;
mr = intel_memory_region_create(i915, 0,
totalram_pages() << PAGE_SHIFT,
PAGE_SIZE, 0,
type, instance,
&ttm_system_region_ops);
if (IS_ERR(mr))
return mr;
intel_memory_region_set_name(mr, "system-ttm");
return mr;
}

View File

@ -44,5 +44,6 @@ i915_ttm_to_gem(struct ttm_buffer_object *bo)
int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
struct drm_i915_gem_object *obj,
resource_size_t size,
resource_size_t page_size,
unsigned int flags);
#endif

View File

@ -67,11 +67,11 @@ static bool i915_gem_userptr_invalidate(struct mmu_interval_notifier *mni,
if (!mmu_notifier_range_blockable(range))
return false;
spin_lock(&i915->mm.notifier_lock);
write_lock(&i915->mm.notifier_lock);
mmu_interval_set_seq(mni, cur_seq);
spin_unlock(&i915->mm.notifier_lock);
write_unlock(&i915->mm.notifier_lock);
/*
* We don't wait when the process is exiting. This is valid
@ -107,16 +107,15 @@ i915_gem_userptr_init__mmu_notifier(struct drm_i915_gem_object *obj)
static void i915_gem_object_userptr_drop_ref(struct drm_i915_gem_object *obj)
{
struct drm_i915_private *i915 = to_i915(obj->base.dev);
struct page **pvec = NULL;
spin_lock(&i915->mm.notifier_lock);
assert_object_held_shared(obj);
if (!--obj->userptr.page_ref) {
pvec = obj->userptr.pvec;
obj->userptr.pvec = NULL;
}
GEM_BUG_ON(obj->userptr.page_ref < 0);
spin_unlock(&i915->mm.notifier_lock);
if (pvec) {
const unsigned long num_pages = obj->base.size >> PAGE_SHIFT;
@ -128,7 +127,6 @@ static void i915_gem_object_userptr_drop_ref(struct drm_i915_gem_object *obj)
static int i915_gem_userptr_get_pages(struct drm_i915_gem_object *obj)
{
struct drm_i915_private *i915 = to_i915(obj->base.dev);
const unsigned long num_pages = obj->base.size >> PAGE_SHIFT;
unsigned int max_segment = i915_sg_segment_size();
struct sg_table *st;
@ -141,16 +139,13 @@ static int i915_gem_userptr_get_pages(struct drm_i915_gem_object *obj)
if (!st)
return -ENOMEM;
spin_lock(&i915->mm.notifier_lock);
if (GEM_WARN_ON(!obj->userptr.page_ref)) {
spin_unlock(&i915->mm.notifier_lock);
ret = -EFAULT;
if (!obj->userptr.page_ref) {
ret = -EAGAIN;
goto err_free;
}
obj->userptr.page_ref++;
pvec = obj->userptr.pvec;
spin_unlock(&i915->mm.notifier_lock);
alloc_table:
sg = __sg_alloc_table_from_pages(st, pvec, num_pages, 0,
@ -241,7 +236,7 @@ i915_gem_userptr_put_pages(struct drm_i915_gem_object *obj,
i915_gem_object_userptr_drop_ref(obj);
}
static int i915_gem_object_userptr_unbind(struct drm_i915_gem_object *obj, bool get_pages)
static int i915_gem_object_userptr_unbind(struct drm_i915_gem_object *obj)
{
struct sg_table *pages;
int err;
@ -259,15 +254,11 @@ static int i915_gem_object_userptr_unbind(struct drm_i915_gem_object *obj, bool
if (!IS_ERR_OR_NULL(pages))
i915_gem_userptr_put_pages(obj, pages);
if (get_pages)
err = ____i915_gem_object_get_pages(obj);
return err;
}
int i915_gem_object_userptr_submit_init(struct drm_i915_gem_object *obj)
{
struct drm_i915_private *i915 = to_i915(obj->base.dev);
const unsigned long num_pages = obj->base.size >> PAGE_SHIFT;
struct page **pvec;
unsigned int gup_flags = 0;
@ -277,38 +268,21 @@ int i915_gem_object_userptr_submit_init(struct drm_i915_gem_object *obj)
if (obj->userptr.notifier.mm != current->mm)
return -EFAULT;
notifier_seq = mmu_interval_read_begin(&obj->userptr.notifier);
ret = i915_gem_object_lock_interruptible(obj, NULL);
if (ret)
return ret;
/* optimistically try to preserve current pages while unlocked */
if (i915_gem_object_has_pages(obj) &&
!mmu_interval_check_retry(&obj->userptr.notifier,
obj->userptr.notifier_seq)) {
spin_lock(&i915->mm.notifier_lock);
if (obj->userptr.pvec &&
!mmu_interval_read_retry(&obj->userptr.notifier,
obj->userptr.notifier_seq)) {
obj->userptr.page_ref++;
/* We can keep using the current binding, this is the fastpath */
ret = 1;
}
spin_unlock(&i915->mm.notifier_lock);
}
if (!ret) {
/* Make sure userptr is unbound for next attempt, so we don't use stale pages. */
ret = i915_gem_object_userptr_unbind(obj, false);
}
i915_gem_object_unlock(obj);
if (ret < 0)
return ret;
if (ret > 0)
if (notifier_seq == obj->userptr.notifier_seq && obj->userptr.pvec) {
i915_gem_object_unlock(obj);
return 0;
}
notifier_seq = mmu_interval_read_begin(&obj->userptr.notifier);
ret = i915_gem_object_userptr_unbind(obj);
i915_gem_object_unlock(obj);
if (ret)
return ret;
pvec = kvmalloc_array(num_pages, sizeof(struct page *), GFP_KERNEL);
if (!pvec)
@ -329,7 +303,9 @@ int i915_gem_object_userptr_submit_init(struct drm_i915_gem_object *obj)
}
ret = 0;
spin_lock(&i915->mm.notifier_lock);
ret = i915_gem_object_lock_interruptible(obj, NULL);
if (ret)
goto out;
if (mmu_interval_read_retry(&obj->userptr.notifier,
!obj->userptr.page_ref ? notifier_seq :
@ -341,12 +317,14 @@ int i915_gem_object_userptr_submit_init(struct drm_i915_gem_object *obj)
if (!obj->userptr.page_ref++) {
obj->userptr.pvec = pvec;
obj->userptr.notifier_seq = notifier_seq;
pvec = NULL;
ret = ____i915_gem_object_get_pages(obj);
}
obj->userptr.page_ref--;
out_unlock:
spin_unlock(&i915->mm.notifier_lock);
i915_gem_object_unlock(obj);
out:
if (pvec) {
@ -369,11 +347,6 @@ int i915_gem_object_userptr_submit_done(struct drm_i915_gem_object *obj)
return 0;
}
void i915_gem_object_userptr_submit_fini(struct drm_i915_gem_object *obj)
{
i915_gem_object_userptr_drop_ref(obj);
}
int i915_gem_object_userptr_validate(struct drm_i915_gem_object *obj)
{
int err;
@ -396,7 +369,6 @@ int i915_gem_object_userptr_validate(struct drm_i915_gem_object *obj)
i915_gem_object_unlock(obj);
}
i915_gem_object_userptr_submit_fini(obj);
return err;
}
@ -450,6 +422,34 @@ static const struct drm_i915_gem_object_ops i915_gem_userptr_ops = {
#endif
static int
probe_range(struct mm_struct *mm, unsigned long addr, unsigned long len)
{
const unsigned long end = addr + len;
struct vm_area_struct *vma;
int ret = -EFAULT;
mmap_read_lock(mm);
for (vma = find_vma(mm, addr); vma; vma = vma->vm_next) {
/* Check for holes, note that we also update the addr below */
if (vma->vm_start > addr)
break;
if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
break;
if (vma->vm_end >= end) {
ret = 0;
break;
}
addr = vma->vm_end;
}
mmap_read_unlock(mm);
return ret;
}
/*
* Creates a new mm object that wraps some normal memory from the process
* context - user memory.
@ -505,7 +505,8 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
}
if (args->flags & ~(I915_USERPTR_READ_ONLY |
I915_USERPTR_UNSYNCHRONIZED))
I915_USERPTR_UNSYNCHRONIZED |
I915_USERPTR_PROBE))
return -EINVAL;
if (i915_gem_object_size_2big(args->user_size))
@ -532,14 +533,24 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
return -ENODEV;
}
if (args->flags & I915_USERPTR_PROBE) {
/*
* Check that the range pointed to represents real struct
* pages and not iomappings (at this moment in time!)
*/
ret = probe_range(current->mm, args->user_ptr, args->user_size);
if (ret)
return ret;
}
#ifdef CONFIG_MMU_NOTIFIER
obj = i915_gem_object_alloc();
if (obj == NULL)
return -ENOMEM;
drm_gem_private_object_init(dev, &obj->base, args->user_size);
i915_gem_object_init(obj, &i915_gem_userptr_ops, &lock_class,
I915_BO_ALLOC_STRUCT_PAGE);
i915_gem_object_init(obj, &i915_gem_userptr_ops, &lock_class, 0);
obj->mem_flags = I915_BO_FLAG_STRUCT_PAGE;
obj->read_domains = I915_GEM_DOMAIN_CPU;
obj->write_domain = I915_GEM_DOMAIN_CPU;
i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
@ -572,7 +583,7 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
int i915_gem_init_userptr(struct drm_i915_private *dev_priv)
{
#ifdef CONFIG_MMU_NOTIFIER
spin_lock_init(&dev_priv->mm.notifier_lock);
rwlock_init(&dev_priv->mm.notifier_lock);
#endif
return 0;

View File

@ -104,8 +104,8 @@ static void fence_set_priority(struct dma_fence *fence,
engine = rq->engine;
rcu_read_lock(); /* RCU serialisation for set-wedged protection */
if (engine->schedule)
engine->schedule(rq, attr);
if (engine->sched_engine->schedule)
engine->sched_engine->schedule(rq, attr);
rcu_read_unlock();
}
@ -290,3 +290,22 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
i915_gem_object_put(obj);
return ret;
}
/**
* i915_gem_object_wait_migration - Sync an accelerated migration operation
* @obj: The migrating object.
* @flags: waiting flags. Currently supports only I915_WAIT_INTERRUPTIBLE.
*
* Wait for any pending async migration operation on the object,
* whether it's explicitly (i915_gem_object_migrate()) or implicitly
* (swapin, initial clearing) initiated.
*
* Return: 0 if successful, -ERESTARTSYS if a signal was hit during waiting.
*/
int i915_gem_object_wait_migration(struct drm_i915_gem_object *obj,
unsigned int flags)
{
might_sleep();
/* NOP for now. */
return 0;
}

View File

@ -114,8 +114,8 @@ huge_gem_object(struct drm_i915_private *i915,
return ERR_PTR(-ENOMEM);
drm_gem_private_object_init(&i915->drm, &obj->base, dma_size);
i915_gem_object_init(obj, &huge_ops, &lock_class,
I915_BO_ALLOC_STRUCT_PAGE);
i915_gem_object_init(obj, &huge_ops, &lock_class, 0);
obj->mem_flags |= I915_BO_FLAG_STRUCT_PAGE;
obj->read_domains = I915_GEM_DOMAIN_CPU;
obj->write_domain = I915_GEM_DOMAIN_CPU;

View File

@ -167,9 +167,8 @@ huge_pages_object(struct drm_i915_private *i915,
return ERR_PTR(-ENOMEM);
drm_gem_private_object_init(&i915->drm, &obj->base, size);
i915_gem_object_init(obj, &huge_page_ops, &lock_class,
I915_BO_ALLOC_STRUCT_PAGE);
i915_gem_object_init(obj, &huge_page_ops, &lock_class, 0);
obj->mem_flags |= I915_BO_FLAG_STRUCT_PAGE;
i915_gem_object_set_volatile(obj);
obj->write_domain = I915_GEM_DOMAIN_CPU;
@ -497,7 +496,8 @@ static int igt_mock_memory_region_huge_pages(void *arg)
int i;
for (i = 0; i < ARRAY_SIZE(flags); ++i) {
obj = i915_gem_object_create_region(mem, page_size,
obj = i915_gem_object_create_region(mem,
page_size, page_size,
flags[i]);
if (IS_ERR(obj)) {
err = PTR_ERR(obj);

View File

@ -5,6 +5,7 @@
#include "i915_selftest.h"
#include "gt/intel_context.h"
#include "gt/intel_engine_user.h"
#include "gt/intel_gt.h"
#include "gt/intel_gpu_commands.h"
@ -16,118 +17,6 @@
#include "huge_gem_object.h"
#include "mock_context.h"
static int __igt_client_fill(struct intel_engine_cs *engine)
{
struct intel_context *ce = engine->kernel_context;
struct drm_i915_gem_object *obj;
I915_RND_STATE(prng);
IGT_TIMEOUT(end);
u32 *vaddr;
int err = 0;
intel_engine_pm_get(engine);
do {
const u32 max_block_size = S16_MAX * PAGE_SIZE;
u32 sz = min_t(u64, ce->vm->total >> 4, prandom_u32_state(&prng));
u32 phys_sz = sz % (max_block_size + 1);
u32 val = prandom_u32_state(&prng);
u32 i;
sz = round_up(sz, PAGE_SIZE);
phys_sz = round_up(phys_sz, PAGE_SIZE);
pr_debug("%s with phys_sz= %x, sz=%x, val=%x\n", __func__,
phys_sz, sz, val);
obj = huge_gem_object(engine->i915, phys_sz, sz);
if (IS_ERR(obj)) {
err = PTR_ERR(obj);
goto err_flush;
}
vaddr = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WB);
if (IS_ERR(vaddr)) {
err = PTR_ERR(vaddr);
goto err_put;
}
/*
* XXX: The goal is move this to get_pages, so try to dirty the
* CPU cache first to check that we do the required clflush
* before scheduling the blt for !llc platforms. This matches
* some version of reality where at get_pages the pages
* themselves may not yet be coherent with the GPU(swap-in). If
* we are missing the flush then we should see the stale cache
* values after we do the set_to_cpu_domain and pick it up as a
* test failure.
*/
memset32(vaddr, val ^ 0xdeadbeaf,
huge_gem_object_phys_size(obj) / sizeof(u32));
if (!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE))
obj->cache_dirty = true;
err = i915_gem_schedule_fill_pages_blt(obj, ce, obj->mm.pages,
&obj->mm.page_sizes,
val);
if (err)
goto err_unpin;
i915_gem_object_lock(obj, NULL);
err = i915_gem_object_set_to_cpu_domain(obj, false);
i915_gem_object_unlock(obj);
if (err)
goto err_unpin;
for (i = 0; i < huge_gem_object_phys_size(obj) / sizeof(u32); ++i) {
if (vaddr[i] != val) {
pr_err("vaddr[%u]=%x, expected=%x\n", i,
vaddr[i], val);
err = -EINVAL;
goto err_unpin;
}
}
i915_gem_object_unpin_map(obj);
i915_gem_object_put(obj);
} while (!time_after(jiffies, end));
goto err_flush;
err_unpin:
i915_gem_object_unpin_map(obj);
err_put:
i915_gem_object_put(obj);
err_flush:
if (err == -ENOMEM)
err = 0;
intel_engine_pm_put(engine);
return err;
}
static int igt_client_fill(void *arg)
{
int inst = 0;
do {
struct intel_engine_cs *engine;
int err;
engine = intel_engine_lookup_user(arg,
I915_ENGINE_CLASS_COPY,
inst++);
if (!engine)
return 0;
err = __igt_client_fill(engine);
if (err == -ENOMEM)
err = 0;
if (err)
return err;
} while (1);
}
#define WIDTH 512
#define HEIGHT 32
@ -693,7 +582,6 @@ static int igt_client_tiled_blits(void *arg)
int i915_gem_client_blt_live_selftests(struct drm_i915_private *i915)
{
static const struct i915_subtest tests[] = {
SUBTEST(igt_client_fill),
SUBTEST(igt_client_tiled_blits),
};

View File

@ -680,7 +680,7 @@ static int igt_ctx_exec(void *arg)
struct i915_gem_context *ctx;
struct intel_context *ce;
ctx = kernel_context(i915);
ctx = kernel_context(i915, NULL);
if (IS_ERR(ctx)) {
err = PTR_ERR(ctx);
goto out_file;
@ -813,16 +813,12 @@ static int igt_shared_ctx_exec(void *arg)
struct i915_gem_context *ctx;
struct intel_context *ce;
ctx = kernel_context(i915);
ctx = kernel_context(i915, ctx_vm(parent));
if (IS_ERR(ctx)) {
err = PTR_ERR(ctx);
goto out_test;
}
mutex_lock(&ctx->mutex);
__assign_ppgtt(ctx, ctx_vm(parent));
mutex_unlock(&ctx->mutex);
ce = i915_gem_context_get_engine(ctx, engine->legacy_idx);
GEM_BUG_ON(IS_ERR(ce));
@ -1875,125 +1871,6 @@ out_file:
return err;
}
static bool skip_unused_engines(struct intel_context *ce, void *data)
{
return !ce->state;
}
static void mock_barrier_task(void *data)
{
unsigned int *counter = data;
++*counter;
}
static int mock_context_barrier(void *arg)
{
#undef pr_fmt
#define pr_fmt(x) "context_barrier_task():" # x
struct drm_i915_private *i915 = arg;
struct i915_gem_context *ctx;
struct i915_request *rq;
unsigned int counter;
int err;
/*
* The context barrier provides us with a callback after it emits
* a request; useful for retiring old state after loading new.
*/
ctx = mock_context(i915, "mock");
if (!ctx)
return -ENOMEM;
counter = 0;
err = context_barrier_task(ctx, 0, NULL, NULL, NULL,
mock_barrier_task, &counter);
if (err) {
pr_err("Failed at line %d, err=%d\n", __LINE__, err);
goto out;
}
if (counter == 0) {
pr_err("Did not retire immediately with 0 engines\n");
err = -EINVAL;
goto out;
}
counter = 0;
err = context_barrier_task(ctx, ALL_ENGINES, skip_unused_engines,
NULL, NULL, mock_barrier_task, &counter);
if (err) {
pr_err("Failed at line %d, err=%d\n", __LINE__, err);
goto out;
}
if (counter == 0) {
pr_err("Did not retire immediately for all unused engines\n");
err = -EINVAL;
goto out;
}
rq = igt_request_alloc(ctx, i915->gt.engine[RCS0]);
if (IS_ERR(rq)) {
pr_err("Request allocation failed!\n");
goto out;
}
i915_request_add(rq);
counter = 0;
context_barrier_inject_fault = BIT(RCS0);
err = context_barrier_task(ctx, ALL_ENGINES, NULL, NULL, NULL,
mock_barrier_task, &counter);
context_barrier_inject_fault = 0;
if (err == -ENXIO)
err = 0;
else
pr_err("Did not hit fault injection!\n");
if (counter != 0) {
pr_err("Invoked callback on error!\n");
err = -EIO;
}
if (err)
goto out;
counter = 0;
err = context_barrier_task(ctx, ALL_ENGINES, skip_unused_engines,
NULL, NULL, mock_barrier_task, &counter);
if (err) {
pr_err("Failed at line %d, err=%d\n", __LINE__, err);
goto out;
}
mock_device_flush(i915);
if (counter == 0) {
pr_err("Did not retire on each active engines\n");
err = -EINVAL;
goto out;
}
out:
mock_context_close(ctx);
return err;
#undef pr_fmt
#define pr_fmt(x) x
}
int i915_gem_context_mock_selftests(void)
{
static const struct i915_subtest tests[] = {
SUBTEST(mock_context_barrier),
};
struct drm_i915_private *i915;
int err;
i915 = mock_gem_device();
if (!i915)
return -ENOMEM;
err = i915_subtests(tests, i915);
mock_destroy_device(i915);
return err;
}
int i915_gem_context_live_selftests(struct drm_i915_private *i915)
{
static const struct i915_subtest tests[] = {

View File

@ -35,7 +35,7 @@ static int igt_dmabuf_export(void *arg)
static int igt_dmabuf_import_self(void *arg)
{
struct drm_i915_private *i915 = arg;
struct drm_i915_gem_object *obj;
struct drm_i915_gem_object *obj, *import_obj;
struct drm_gem_object *import;
struct dma_buf *dmabuf;
int err;
@ -65,10 +65,19 @@ static int igt_dmabuf_import_self(void *arg)
err = -EINVAL;
goto out_import;
}
import_obj = to_intel_bo(import);
i915_gem_object_lock(import_obj, NULL);
err = __i915_gem_object_get_pages(import_obj);
i915_gem_object_unlock(import_obj);
if (err) {
pr_err("Same object dma-buf get_pages failed!\n");
goto out_import;
}
err = 0;
out_import:
i915_gem_object_put(to_intel_bo(import));
i915_gem_object_put(import_obj);
out_dmabuf:
dma_buf_put(dmabuf);
out:
@ -76,6 +85,180 @@ out:
return err;
}
static int igt_dmabuf_import_same_driver_lmem(void *arg)
{
struct drm_i915_private *i915 = arg;
struct intel_memory_region *lmem = i915->mm.regions[INTEL_REGION_LMEM];
struct drm_i915_gem_object *obj;
struct drm_gem_object *import;
struct dma_buf *dmabuf;
int err;
if (!lmem)
return 0;
force_different_devices = true;
obj = __i915_gem_object_create_user(i915, PAGE_SIZE, &lmem, 1);
if (IS_ERR(obj)) {
pr_err("__i915_gem_object_create_user failed with err=%ld\n",
PTR_ERR(dmabuf));
err = PTR_ERR(obj);
goto out_ret;
}
dmabuf = i915_gem_prime_export(&obj->base, 0);
if (IS_ERR(dmabuf)) {
pr_err("i915_gem_prime_export failed with err=%ld\n",
PTR_ERR(dmabuf));
err = PTR_ERR(dmabuf);
goto out;
}
/*
* We expect an import of an LMEM-only object to fail with
* -EOPNOTSUPP because it can't be migrated to SMEM.
*/
import = i915_gem_prime_import(&i915->drm, dmabuf);
if (!IS_ERR(import)) {
drm_gem_object_put(import);
pr_err("i915_gem_prime_import succeeded when it shouldn't have\n");
err = -EINVAL;
} else if (PTR_ERR(import) != -EOPNOTSUPP) {
pr_err("i915_gem_prime_import failed with the wrong err=%ld\n",
PTR_ERR(import));
err = PTR_ERR(import);
}
dma_buf_put(dmabuf);
out:
i915_gem_object_put(obj);
out_ret:
force_different_devices = false;
return err;
}
static int igt_dmabuf_import_same_driver(struct drm_i915_private *i915,
struct intel_memory_region **regions,
unsigned int num_regions)
{
struct drm_i915_gem_object *obj, *import_obj;
struct drm_gem_object *import;
struct dma_buf *dmabuf;
struct dma_buf_attachment *import_attach;
struct sg_table *st;
long timeout;
int err;
force_different_devices = true;
obj = __i915_gem_object_create_user(i915, PAGE_SIZE,
regions, num_regions);
if (IS_ERR(obj)) {
pr_err("__i915_gem_object_create_user failed with err=%ld\n",
PTR_ERR(dmabuf));
err = PTR_ERR(obj);
goto out_ret;
}
dmabuf = i915_gem_prime_export(&obj->base, 0);
if (IS_ERR(dmabuf)) {
pr_err("i915_gem_prime_export failed with err=%ld\n",
PTR_ERR(dmabuf));
err = PTR_ERR(dmabuf);
goto out;
}
import = i915_gem_prime_import(&i915->drm, dmabuf);
if (IS_ERR(import)) {
pr_err("i915_gem_prime_import failed with err=%ld\n",
PTR_ERR(import));
err = PTR_ERR(import);
goto out_dmabuf;
}
if (import == &obj->base) {
pr_err("i915_gem_prime_import reused gem object!\n");
err = -EINVAL;
goto out_import;
}
import_obj = to_intel_bo(import);
i915_gem_object_lock(import_obj, NULL);
err = __i915_gem_object_get_pages(import_obj);
if (err) {
pr_err("Different objects dma-buf get_pages failed!\n");
i915_gem_object_unlock(import_obj);
goto out_import;
}
/*
* If the exported object is not in system memory, something
* weird is going on. TODO: When p2p is supported, this is no
* longer considered weird.
*/
if (obj->mm.region != i915->mm.regions[INTEL_REGION_SMEM]) {
pr_err("Exported dma-buf is not in system memory\n");
err = -EINVAL;
}
i915_gem_object_unlock(import_obj);
/* Now try a fake an importer */
import_attach = dma_buf_attach(dmabuf, obj->base.dev->dev);
if (IS_ERR(import_attach)) {
err = PTR_ERR(import_attach);
goto out_import;
}
st = dma_buf_map_attachment(import_attach, DMA_BIDIRECTIONAL);
if (IS_ERR(st)) {
err = PTR_ERR(st);
goto out_detach;
}
timeout = dma_resv_wait_timeout(dmabuf->resv, false, true, 5 * HZ);
if (!timeout) {
pr_err("dmabuf wait for exclusive fence timed out.\n");
timeout = -ETIME;
}
err = timeout > 0 ? 0 : timeout;
dma_buf_unmap_attachment(import_attach, st, DMA_BIDIRECTIONAL);
out_detach:
dma_buf_detach(dmabuf, import_attach);
out_import:
i915_gem_object_put(import_obj);
out_dmabuf:
dma_buf_put(dmabuf);
out:
i915_gem_object_put(obj);
out_ret:
force_different_devices = false;
return err;
}
static int igt_dmabuf_import_same_driver_smem(void *arg)
{
struct drm_i915_private *i915 = arg;
struct intel_memory_region *smem = i915->mm.regions[INTEL_REGION_SMEM];
return igt_dmabuf_import_same_driver(i915, &smem, 1);
}
static int igt_dmabuf_import_same_driver_lmem_smem(void *arg)
{
struct drm_i915_private *i915 = arg;
struct intel_memory_region *regions[2];
if (!i915->mm.regions[INTEL_REGION_LMEM])
return 0;
regions[0] = i915->mm.regions[INTEL_REGION_LMEM];
regions[1] = i915->mm.regions[INTEL_REGION_SMEM];
return igt_dmabuf_import_same_driver(i915, regions, 2);
}
static int igt_dmabuf_import(void *arg)
{
struct drm_i915_private *i915 = arg;
@ -286,6 +469,9 @@ int i915_gem_dmabuf_live_selftests(struct drm_i915_private *i915)
{
static const struct i915_subtest tests[] = {
SUBTEST(igt_dmabuf_export),
SUBTEST(igt_dmabuf_import_same_driver_lmem),
SUBTEST(igt_dmabuf_import_same_driver_smem),
SUBTEST(igt_dmabuf_import_same_driver_lmem_smem),
};
return i915_subtests(tests, i915);

View File

@ -0,0 +1,243 @@
// SPDX-License-Identifier: MIT
/*
* Copyright © 2020-2021 Intel Corporation
*/
#include "gt/intel_migrate.h"
static int igt_fill_check_buffer(struct drm_i915_gem_object *obj,
bool fill)
{
struct drm_i915_private *i915 = to_i915(obj->base.dev);
unsigned int i, count = obj->base.size / sizeof(u32);
enum i915_map_type map_type =
i915_coherent_map_type(i915, obj, false);
u32 *cur;
int err = 0;
assert_object_held(obj);
cur = i915_gem_object_pin_map(obj, map_type);
if (IS_ERR(cur))
return PTR_ERR(cur);
if (fill)
for (i = 0; i < count; ++i)
*cur++ = i;
else
for (i = 0; i < count; ++i)
if (*cur++ != i) {
pr_err("Object content mismatch at location %d of %d\n", i, count);
err = -EINVAL;
break;
}
i915_gem_object_unpin_map(obj);
return err;
}
static int igt_create_migrate(struct intel_gt *gt, enum intel_region_id src,
enum intel_region_id dst)
{
struct drm_i915_private *i915 = gt->i915;
struct intel_memory_region *src_mr = i915->mm.regions[src];
struct drm_i915_gem_object *obj;
struct i915_gem_ww_ctx ww;
int err = 0;
GEM_BUG_ON(!src_mr);
/* Switch object backing-store on create */
obj = i915_gem_object_create_region(src_mr, PAGE_SIZE, 0, 0);
if (IS_ERR(obj))
return PTR_ERR(obj);
for_i915_gem_ww(&ww, err, true) {
err = i915_gem_object_lock(obj, &ww);
if (err)
continue;
err = igt_fill_check_buffer(obj, true);
if (err)
continue;
err = i915_gem_object_migrate(obj, &ww, dst);
if (err)
continue;
err = i915_gem_object_pin_pages(obj);
if (err)
continue;
if (i915_gem_object_can_migrate(obj, src))
err = -EINVAL;
i915_gem_object_unpin_pages(obj);
err = i915_gem_object_wait_migration(obj, true);
if (err)
continue;
err = igt_fill_check_buffer(obj, false);
}
i915_gem_object_put(obj);
return err;
}
static int igt_smem_create_migrate(void *arg)
{
return igt_create_migrate(arg, INTEL_REGION_LMEM, INTEL_REGION_SMEM);
}
static int igt_lmem_create_migrate(void *arg)
{
return igt_create_migrate(arg, INTEL_REGION_SMEM, INTEL_REGION_LMEM);
}
static int igt_same_create_migrate(void *arg)
{
return igt_create_migrate(arg, INTEL_REGION_LMEM, INTEL_REGION_LMEM);
}
static int lmem_pages_migrate_one(struct i915_gem_ww_ctx *ww,
struct drm_i915_gem_object *obj)
{
int err;
err = i915_gem_object_lock(obj, ww);
if (err)
return err;
if (i915_gem_object_is_lmem(obj)) {
err = i915_gem_object_migrate(obj, ww, INTEL_REGION_SMEM);
if (err) {
pr_err("Object failed migration to smem\n");
if (err)
return err;
}
if (i915_gem_object_is_lmem(obj)) {
pr_err("object still backed by lmem\n");
err = -EINVAL;
}
if (!i915_gem_object_has_struct_page(obj)) {
pr_err("object not backed by struct page\n");
err = -EINVAL;
}
} else {
err = i915_gem_object_migrate(obj, ww, INTEL_REGION_LMEM);
if (err) {
pr_err("Object failed migration to lmem\n");
if (err)
return err;
}
if (i915_gem_object_has_struct_page(obj)) {
pr_err("object still backed by struct page\n");
err = -EINVAL;
}
if (!i915_gem_object_is_lmem(obj)) {
pr_err("object not backed by lmem\n");
err = -EINVAL;
}
}
return err;
}
static int igt_lmem_pages_migrate(void *arg)
{
struct intel_gt *gt = arg;
struct drm_i915_private *i915 = gt->i915;
struct drm_i915_gem_object *obj;
struct i915_gem_ww_ctx ww;
struct i915_request *rq;
int err;
int i;
/* From LMEM to shmem and back again */
obj = i915_gem_object_create_lmem(i915, SZ_2M, 0);
if (IS_ERR(obj))
return PTR_ERR(obj);
/* Initial GPU fill, sync, CPU initialization. */
for_i915_gem_ww(&ww, err, true) {
err = i915_gem_object_lock(obj, &ww);
if (err)
continue;
err = ____i915_gem_object_get_pages(obj);
if (err)
continue;
err = intel_migrate_clear(&gt->migrate, &ww, NULL,
obj->mm.pages->sgl, obj->cache_level,
i915_gem_object_is_lmem(obj),
0xdeadbeaf, &rq);
if (rq) {
dma_resv_add_excl_fence(obj->base.resv, &rq->fence);
i915_request_put(rq);
}
if (err)
continue;
err = i915_gem_object_wait(obj, I915_WAIT_INTERRUPTIBLE,
5 * HZ);
if (err)
continue;
err = igt_fill_check_buffer(obj, true);
if (err)
continue;
}
if (err)
goto out_put;
/*
* Migrate to and from smem without explicitly syncing.
* Finalize with data in smem for fast readout.
*/
for (i = 1; i <= 5; ++i) {
for_i915_gem_ww(&ww, err, true)
err = lmem_pages_migrate_one(&ww, obj);
if (err)
goto out_put;
}
err = i915_gem_object_lock_interruptible(obj, NULL);
if (err)
goto out_put;
/* Finally sync migration and check content. */
err = i915_gem_object_wait_migration(obj, true);
if (err)
goto out_unlock;
err = igt_fill_check_buffer(obj, false);
out_unlock:
i915_gem_object_unlock(obj);
out_put:
i915_gem_object_put(obj);
return err;
}
int i915_gem_migrate_live_selftests(struct drm_i915_private *i915)
{
static const struct i915_subtest tests[] = {
SUBTEST(igt_smem_create_migrate),
SUBTEST(igt_lmem_create_migrate),
SUBTEST(igt_same_create_migrate),
SUBTEST(igt_lmem_pages_migrate),
};
if (!HAS_LMEM(i915))
return 0;
return intel_gt_live_subtests(tests, &i915->gt);
}

View File

@ -573,6 +573,14 @@ err:
return 0;
}
static enum i915_mmap_type default_mapping(struct drm_i915_private *i915)
{
if (HAS_LMEM(i915))
return I915_MMAP_TYPE_FIXED;
return I915_MMAP_TYPE_GTT;
}
static bool assert_mmap_offset(struct drm_i915_private *i915,
unsigned long size,
int expected)
@ -585,7 +593,7 @@ static bool assert_mmap_offset(struct drm_i915_private *i915,
if (IS_ERR(obj))
return expected && expected == PTR_ERR(obj);
ret = __assign_mmap_offset(obj, I915_MMAP_TYPE_GTT, &offset, NULL);
ret = __assign_mmap_offset(obj, default_mapping(i915), &offset, NULL);
i915_gem_object_put(obj);
return ret == expected;
@ -689,7 +697,7 @@ static int igt_mmap_offset_exhaustion(void *arg)
goto out;
}
err = __assign_mmap_offset(obj, I915_MMAP_TYPE_GTT, &offset, NULL);
err = __assign_mmap_offset(obj, default_mapping(i915), &offset, NULL);
if (err) {
pr_err("Unable to insert object into reclaimed hole\n");
goto err_obj;
@ -831,34 +839,25 @@ static int wc_check(struct drm_i915_gem_object *obj)
static bool can_mmap(struct drm_i915_gem_object *obj, enum i915_mmap_type type)
{
struct drm_i915_private *i915 = to_i915(obj->base.dev);
bool no_map;
if (HAS_LMEM(i915))
return type == I915_MMAP_TYPE_FIXED;
else if (type == I915_MMAP_TYPE_FIXED)
return false;
if (type == I915_MMAP_TYPE_GTT &&
!i915_ggtt_has_aperture(&to_i915(obj->base.dev)->ggtt))
return false;
if (type != I915_MMAP_TYPE_GTT &&
!i915_gem_object_has_struct_page(obj) &&
!i915_gem_object_type_has(obj, I915_GEM_OBJECT_HAS_IOMEM))
return false;
i915_gem_object_lock(obj, NULL);
no_map = (type != I915_MMAP_TYPE_GTT &&
!i915_gem_object_has_struct_page(obj) &&
!i915_gem_object_has_iomem(obj));
i915_gem_object_unlock(obj);
return true;
}
static void object_set_placements(struct drm_i915_gem_object *obj,
struct intel_memory_region **placements,
unsigned int n_placements)
{
GEM_BUG_ON(!n_placements);
if (n_placements == 1) {
struct drm_i915_private *i915 = to_i915(obj->base.dev);
struct intel_memory_region *mr = placements[0];
obj->mm.placements = &i915->mm.regions[mr->id];
obj->mm.n_placements = 1;
} else {
obj->mm.placements = placements;
obj->mm.n_placements = n_placements;
}
return !no_map;
}
#define expand32(x) (((x) << 0) | ((x) << 8) | ((x) << 16) | ((x) << 24))
@ -955,18 +954,18 @@ static int igt_mmap(void *arg)
struct drm_i915_gem_object *obj;
int err;
obj = i915_gem_object_create_region(mr, sizes[i], I915_BO_ALLOC_USER);
obj = __i915_gem_object_create_user(i915, sizes[i], &mr, 1);
if (obj == ERR_PTR(-ENODEV))
continue;
if (IS_ERR(obj))
return PTR_ERR(obj);
object_set_placements(obj, &mr, 1);
err = __igt_mmap(i915, obj, I915_MMAP_TYPE_GTT);
if (err == 0)
err = __igt_mmap(i915, obj, I915_MMAP_TYPE_WC);
if (err == 0)
err = __igt_mmap(i915, obj, I915_MMAP_TYPE_FIXED);
i915_gem_object_put(obj);
if (err)
@ -984,14 +983,21 @@ static const char *repr_mmap_type(enum i915_mmap_type type)
case I915_MMAP_TYPE_WB: return "wb";
case I915_MMAP_TYPE_WC: return "wc";
case I915_MMAP_TYPE_UC: return "uc";
case I915_MMAP_TYPE_FIXED: return "fixed";
default: return "unknown";
}
}
static bool can_access(const struct drm_i915_gem_object *obj)
static bool can_access(struct drm_i915_gem_object *obj)
{
return i915_gem_object_has_struct_page(obj) ||
i915_gem_object_type_has(obj, I915_GEM_OBJECT_HAS_IOMEM);
bool access;
i915_gem_object_lock(obj, NULL);
access = i915_gem_object_has_struct_page(obj) ||
i915_gem_object_has_iomem(obj);
i915_gem_object_unlock(obj);
return access;
}
static int __igt_mmap_access(struct drm_i915_private *i915,
@ -1075,15 +1081,13 @@ static int igt_mmap_access(void *arg)
struct drm_i915_gem_object *obj;
int err;
obj = i915_gem_object_create_region(mr, PAGE_SIZE, I915_BO_ALLOC_USER);
obj = __i915_gem_object_create_user(i915, PAGE_SIZE, &mr, 1);
if (obj == ERR_PTR(-ENODEV))
continue;
if (IS_ERR(obj))
return PTR_ERR(obj);
object_set_placements(obj, &mr, 1);
err = __igt_mmap_access(i915, obj, I915_MMAP_TYPE_GTT);
if (err == 0)
err = __igt_mmap_access(i915, obj, I915_MMAP_TYPE_WB);
@ -1091,6 +1095,8 @@ static int igt_mmap_access(void *arg)
err = __igt_mmap_access(i915, obj, I915_MMAP_TYPE_WC);
if (err == 0)
err = __igt_mmap_access(i915, obj, I915_MMAP_TYPE_UC);
if (err == 0)
err = __igt_mmap_access(i915, obj, I915_MMAP_TYPE_FIXED);
i915_gem_object_put(obj);
if (err)
@ -1220,18 +1226,18 @@ static int igt_mmap_gpu(void *arg)
struct drm_i915_gem_object *obj;
int err;
obj = i915_gem_object_create_region(mr, PAGE_SIZE, I915_BO_ALLOC_USER);
obj = __i915_gem_object_create_user(i915, PAGE_SIZE, &mr, 1);
if (obj == ERR_PTR(-ENODEV))
continue;
if (IS_ERR(obj))
return PTR_ERR(obj);
object_set_placements(obj, &mr, 1);
err = __igt_mmap_gpu(i915, obj, I915_MMAP_TYPE_GTT);
if (err == 0)
err = __igt_mmap_gpu(i915, obj, I915_MMAP_TYPE_WC);
if (err == 0)
err = __igt_mmap_gpu(i915, obj, I915_MMAP_TYPE_FIXED);
i915_gem_object_put(obj);
if (err)
@ -1375,18 +1381,18 @@ static int igt_mmap_revoke(void *arg)
struct drm_i915_gem_object *obj;
int err;
obj = i915_gem_object_create_region(mr, PAGE_SIZE, I915_BO_ALLOC_USER);
obj = __i915_gem_object_create_user(i915, PAGE_SIZE, &mr, 1);
if (obj == ERR_PTR(-ENODEV))
continue;
if (IS_ERR(obj))
return PTR_ERR(obj);
object_set_placements(obj, &mr, 1);
err = __igt_mmap_revoke(i915, obj, I915_MMAP_TYPE_GTT);
if (err == 0)
err = __igt_mmap_revoke(i915, obj, I915_MMAP_TYPE_WC);
if (err == 0)
err = __igt_mmap_revoke(i915, obj, I915_MMAP_TYPE_FIXED);
i915_gem_object_put(obj);
if (err)

View File

@ -1,597 +0,0 @@
// SPDX-License-Identifier: MIT
/*
* Copyright © 2019 Intel Corporation
*/
#include <linux/sort.h>
#include "gt/intel_gt.h"
#include "gt/intel_engine_user.h"
#include "i915_selftest.h"
#include "gem/i915_gem_context.h"
#include "selftests/igt_flush_test.h"
#include "selftests/i915_random.h"
#include "selftests/mock_drm.h"
#include "huge_gem_object.h"
#include "mock_context.h"
static int wrap_ktime_compare(const void *A, const void *B)
{
const ktime_t *a = A, *b = B;
return ktime_compare(*a, *b);
}
static int __perf_fill_blt(struct drm_i915_gem_object *obj)
{
struct drm_i915_private *i915 = to_i915(obj->base.dev);
int inst = 0;
do {
struct intel_engine_cs *engine;
ktime_t t[5];
int pass;
int err;
engine = intel_engine_lookup_user(i915,
I915_ENGINE_CLASS_COPY,
inst++);
if (!engine)
return 0;
intel_engine_pm_get(engine);
for (pass = 0; pass < ARRAY_SIZE(t); pass++) {
struct intel_context *ce = engine->kernel_context;
ktime_t t0, t1;
t0 = ktime_get();
err = i915_gem_object_fill_blt(obj, ce, 0);
if (err)
break;
err = i915_gem_object_wait(obj,
I915_WAIT_ALL,
MAX_SCHEDULE_TIMEOUT);
if (err)
break;
t1 = ktime_get();
t[pass] = ktime_sub(t1, t0);
}
intel_engine_pm_put(engine);
if (err)
return err;
sort(t, ARRAY_SIZE(t), sizeof(*t), wrap_ktime_compare, NULL);
pr_info("%s: blt %zd KiB fill: %lld MiB/s\n",
engine->name,
obj->base.size >> 10,
div64_u64(mul_u32_u32(4 * obj->base.size,
1000 * 1000 * 1000),
t[1] + 2 * t[2] + t[3]) >> 20);
} while (1);
}
static int perf_fill_blt(void *arg)
{
struct drm_i915_private *i915 = arg;
static const unsigned long sizes[] = {
SZ_4K,
SZ_64K,
SZ_2M,
SZ_64M
};
int i;
for (i = 0; i < ARRAY_SIZE(sizes); i++) {
struct drm_i915_gem_object *obj;
int err;
obj = i915_gem_object_create_internal(i915, sizes[i]);
if (IS_ERR(obj))
return PTR_ERR(obj);
err = __perf_fill_blt(obj);
i915_gem_object_put(obj);
if (err)
return err;
}
return 0;
}
static int __perf_copy_blt(struct drm_i915_gem_object *src,
struct drm_i915_gem_object *dst)
{
struct drm_i915_private *i915 = to_i915(src->base.dev);
int inst = 0;
do {
struct intel_engine_cs *engine;
ktime_t t[5];
int pass;
int err = 0;
engine = intel_engine_lookup_user(i915,
I915_ENGINE_CLASS_COPY,
inst++);
if (!engine)
return 0;
intel_engine_pm_get(engine);
for (pass = 0; pass < ARRAY_SIZE(t); pass++) {
struct intel_context *ce = engine->kernel_context;
ktime_t t0, t1;
t0 = ktime_get();
err = i915_gem_object_copy_blt(src, dst, ce);
if (err)
break;
err = i915_gem_object_wait(dst,
I915_WAIT_ALL,
MAX_SCHEDULE_TIMEOUT);
if (err)
break;
t1 = ktime_get();
t[pass] = ktime_sub(t1, t0);
}
intel_engine_pm_put(engine);
if (err)
return err;
sort(t, ARRAY_SIZE(t), sizeof(*t), wrap_ktime_compare, NULL);
pr_info("%s: blt %zd KiB copy: %lld MiB/s\n",
engine->name,
src->base.size >> 10,
div64_u64(mul_u32_u32(4 * src->base.size,
1000 * 1000 * 1000),
t[1] + 2 * t[2] + t[3]) >> 20);
} while (1);
}
static int perf_copy_blt(void *arg)
{
struct drm_i915_private *i915 = arg;
static const unsigned long sizes[] = {
SZ_4K,
SZ_64K,
SZ_2M,
SZ_64M
};
int i;
for (i = 0; i < ARRAY_SIZE(sizes); i++) {
struct drm_i915_gem_object *src, *dst;
int err;
src = i915_gem_object_create_internal(i915, sizes[i]);
if (IS_ERR(src))
return PTR_ERR(src);
dst = i915_gem_object_create_internal(i915, sizes[i]);
if (IS_ERR(dst)) {
err = PTR_ERR(dst);
goto err_src;
}
err = __perf_copy_blt(src, dst);
i915_gem_object_put(dst);
err_src:
i915_gem_object_put(src);
if (err)
return err;
}
return 0;
}
struct igt_thread_arg {
struct intel_engine_cs *engine;
struct i915_gem_context *ctx;
struct file *file;
struct rnd_state prng;
unsigned int n_cpus;
};
static int igt_fill_blt_thread(void *arg)
{
struct igt_thread_arg *thread = arg;
struct intel_engine_cs *engine = thread->engine;
struct rnd_state *prng = &thread->prng;
struct drm_i915_gem_object *obj;
struct i915_gem_context *ctx;
struct intel_context *ce;
unsigned int prio;
IGT_TIMEOUT(end);
u64 total, max;
int err;
ctx = thread->ctx;
if (!ctx) {
ctx = live_context_for_engine(engine, thread->file);
if (IS_ERR(ctx))
return PTR_ERR(ctx);
prio = i915_prandom_u32_max_state(I915_PRIORITY_MAX, prng);
ctx->sched.priority = prio;
}
ce = i915_gem_context_get_engine(ctx, 0);
GEM_BUG_ON(IS_ERR(ce));
/*
* If we have a tiny shared address space, like for the GGTT
* then we can't be too greedy.
*/
max = ce->vm->total;
if (i915_is_ggtt(ce->vm) || thread->ctx)
max = div_u64(max, thread->n_cpus);
max >>= 4;
total = PAGE_SIZE;
do {
/* Aim to keep the runtime under reasonable bounds! */
const u32 max_phys_size = SZ_64K;
u32 val = prandom_u32_state(prng);
u32 phys_sz;
u32 sz;
u32 *vaddr;
u32 i;
total = min(total, max);
sz = i915_prandom_u32_max_state(total, prng) + 1;
phys_sz = sz % max_phys_size + 1;
sz = round_up(sz, PAGE_SIZE);
phys_sz = round_up(phys_sz, PAGE_SIZE);
phys_sz = min(phys_sz, sz);
pr_debug("%s with phys_sz= %x, sz=%x, val=%x\n", __func__,
phys_sz, sz, val);
obj = huge_gem_object(engine->i915, phys_sz, sz);
if (IS_ERR(obj)) {
err = PTR_ERR(obj);
goto err_flush;
}
vaddr = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WB);
if (IS_ERR(vaddr)) {
err = PTR_ERR(vaddr);
goto err_put;
}
/*
* Make sure the potentially async clflush does its job, if
* required.
*/
memset32(vaddr, val ^ 0xdeadbeaf,
huge_gem_object_phys_size(obj) / sizeof(u32));
if (!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE))
obj->cache_dirty = true;
err = i915_gem_object_fill_blt(obj, ce, val);
if (err)
goto err_unpin;
err = i915_gem_object_wait(obj, 0, MAX_SCHEDULE_TIMEOUT);
if (err)
goto err_unpin;
for (i = 0; i < huge_gem_object_phys_size(obj) / sizeof(u32); i += 17) {
if (!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_READ))
drm_clflush_virt_range(&vaddr[i], sizeof(vaddr[i]));
if (vaddr[i] != val) {
pr_err("vaddr[%u]=%x, expected=%x\n", i,
vaddr[i], val);
err = -EINVAL;
goto err_unpin;
}
}
i915_gem_object_unpin_map(obj);
i915_gem_object_put(obj);
total <<= 1;
} while (!time_after(jiffies, end));
goto err_flush;
err_unpin:
i915_gem_object_unpin_map(obj);
err_put:
i915_gem_object_put(obj);
err_flush:
if (err == -ENOMEM)
err = 0;
intel_context_put(ce);
return err;
}
static int igt_copy_blt_thread(void *arg)
{
struct igt_thread_arg *thread = arg;
struct intel_engine_cs *engine = thread->engine;
struct rnd_state *prng = &thread->prng;
struct drm_i915_gem_object *src, *dst;
struct i915_gem_context *ctx;
struct intel_context *ce;
unsigned int prio;
IGT_TIMEOUT(end);
u64 total, max;
int err;
ctx = thread->ctx;
if (!ctx) {
ctx = live_context_for_engine(engine, thread->file);
if (IS_ERR(ctx))
return PTR_ERR(ctx);
prio = i915_prandom_u32_max_state(I915_PRIORITY_MAX, prng);
ctx->sched.priority = prio;
}
ce = i915_gem_context_get_engine(ctx, 0);
GEM_BUG_ON(IS_ERR(ce));
/*
* If we have a tiny shared address space, like for the GGTT
* then we can't be too greedy.
*/
max = ce->vm->total;
if (i915_is_ggtt(ce->vm) || thread->ctx)
max = div_u64(max, thread->n_cpus);
max >>= 4;
total = PAGE_SIZE;
do {
/* Aim to keep the runtime under reasonable bounds! */
const u32 max_phys_size = SZ_64K;
u32 val = prandom_u32_state(prng);
u32 phys_sz;
u32 sz;
u32 *vaddr;
u32 i;
total = min(total, max);
sz = i915_prandom_u32_max_state(total, prng) + 1;
phys_sz = sz % max_phys_size + 1;
sz = round_up(sz, PAGE_SIZE);
phys_sz = round_up(phys_sz, PAGE_SIZE);
phys_sz = min(phys_sz, sz);
pr_debug("%s with phys_sz= %x, sz=%x, val=%x\n", __func__,
phys_sz, sz, val);
src = huge_gem_object(engine->i915, phys_sz, sz);
if (IS_ERR(src)) {
err = PTR_ERR(src);
goto err_flush;
}
vaddr = i915_gem_object_pin_map_unlocked(src, I915_MAP_WB);
if (IS_ERR(vaddr)) {
err = PTR_ERR(vaddr);
goto err_put_src;
}
memset32(vaddr, val,
huge_gem_object_phys_size(src) / sizeof(u32));
i915_gem_object_unpin_map(src);
if (!(src->cache_coherent & I915_BO_CACHE_COHERENT_FOR_READ))
src->cache_dirty = true;
dst = huge_gem_object(engine->i915, phys_sz, sz);
if (IS_ERR(dst)) {
err = PTR_ERR(dst);
goto err_put_src;
}
vaddr = i915_gem_object_pin_map_unlocked(dst, I915_MAP_WB);
if (IS_ERR(vaddr)) {
err = PTR_ERR(vaddr);
goto err_put_dst;
}
memset32(vaddr, val ^ 0xdeadbeaf,
huge_gem_object_phys_size(dst) / sizeof(u32));
if (!(dst->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE))
dst->cache_dirty = true;
err = i915_gem_object_copy_blt(src, dst, ce);
if (err)
goto err_unpin;
err = i915_gem_object_wait(dst, 0, MAX_SCHEDULE_TIMEOUT);
if (err)
goto err_unpin;
for (i = 0; i < huge_gem_object_phys_size(dst) / sizeof(u32); i += 17) {
if (!(dst->cache_coherent & I915_BO_CACHE_COHERENT_FOR_READ))
drm_clflush_virt_range(&vaddr[i], sizeof(vaddr[i]));
if (vaddr[i] != val) {
pr_err("vaddr[%u]=%x, expected=%x\n", i,
vaddr[i], val);
err = -EINVAL;
goto err_unpin;
}
}
i915_gem_object_unpin_map(dst);
i915_gem_object_put(src);
i915_gem_object_put(dst);
total <<= 1;
} while (!time_after(jiffies, end));
goto err_flush;
err_unpin:
i915_gem_object_unpin_map(dst);
err_put_dst:
i915_gem_object_put(dst);
err_put_src:
i915_gem_object_put(src);
err_flush:
if (err == -ENOMEM)
err = 0;
intel_context_put(ce);
return err;
}
static int igt_threaded_blt(struct intel_engine_cs *engine,
int (*blt_fn)(void *arg),
unsigned int flags)
#define SINGLE_CTX BIT(0)
{
struct igt_thread_arg *thread;
struct task_struct **tsk;
unsigned int n_cpus, i;
I915_RND_STATE(prng);
int err = 0;
n_cpus = num_online_cpus() + 1;
tsk = kcalloc(n_cpus, sizeof(struct task_struct *), GFP_KERNEL);
if (!tsk)
return 0;
thread = kcalloc(n_cpus, sizeof(struct igt_thread_arg), GFP_KERNEL);
if (!thread)
goto out_tsk;
thread[0].file = mock_file(engine->i915);
if (IS_ERR(thread[0].file)) {
err = PTR_ERR(thread[0].file);
goto out_thread;
}
if (flags & SINGLE_CTX) {
thread[0].ctx = live_context_for_engine(engine, thread[0].file);
if (IS_ERR(thread[0].ctx)) {
err = PTR_ERR(thread[0].ctx);
goto out_file;
}
}
for (i = 0; i < n_cpus; ++i) {
thread[i].engine = engine;
thread[i].file = thread[0].file;
thread[i].ctx = thread[0].ctx;
thread[i].n_cpus = n_cpus;
thread[i].prng =
I915_RND_STATE_INITIALIZER(prandom_u32_state(&prng));
tsk[i] = kthread_run(blt_fn, &thread[i], "igt/blt-%d", i);
if (IS_ERR(tsk[i])) {
err = PTR_ERR(tsk[i]);
break;
}
get_task_struct(tsk[i]);
}
yield(); /* start all threads before we kthread_stop() */
for (i = 0; i < n_cpus; ++i) {
int status;
if (IS_ERR_OR_NULL(tsk[i]))
continue;
status = kthread_stop(tsk[i]);
if (status && !err)
err = status;
put_task_struct(tsk[i]);
}
out_file:
fput(thread[0].file);
out_thread:
kfree(thread);
out_tsk:
kfree(tsk);
return err;
}
static int test_copy_engines(struct drm_i915_private *i915,
int (*fn)(void *arg),
unsigned int flags)
{
struct intel_engine_cs *engine;
int ret;
for_each_uabi_class_engine(engine, I915_ENGINE_CLASS_COPY, i915) {
ret = igt_threaded_blt(engine, fn, flags);
if (ret)
return ret;
}
return 0;
}
static int igt_fill_blt(void *arg)
{
return test_copy_engines(arg, igt_fill_blt_thread, 0);
}
static int igt_fill_blt_ctx0(void *arg)
{
return test_copy_engines(arg, igt_fill_blt_thread, SINGLE_CTX);
}
static int igt_copy_blt(void *arg)
{
return test_copy_engines(arg, igt_copy_blt_thread, 0);
}
static int igt_copy_blt_ctx0(void *arg)
{
return test_copy_engines(arg, igt_copy_blt_thread, SINGLE_CTX);
}
int i915_gem_object_blt_live_selftests(struct drm_i915_private *i915)
{
static const struct i915_subtest tests[] = {
SUBTEST(igt_fill_blt),
SUBTEST(igt_fill_blt_ctx0),
SUBTEST(igt_copy_blt),
SUBTEST(igt_copy_blt_ctx0),
};
if (intel_gt_is_wedged(&i915->gt))
return 0;
return i915_live_subtests(tests, i915);
}
int i915_gem_object_blt_perf_selftests(struct drm_i915_private *i915)
{
static const struct i915_subtest tests[] = {
SUBTEST(perf_fill_blt),
SUBTEST(perf_copy_blt),
};
if (intel_gt_is_wedged(&i915->gt))
return 0;
return i915_live_subtests(tests, i915);
}

View File

@ -25,13 +25,14 @@ static int mock_phys_object(void *arg)
goto out;
}
i915_gem_object_lock(obj, NULL);
if (!i915_gem_object_has_struct_page(obj)) {
i915_gem_object_unlock(obj);
err = -EINVAL;
pr_err("shmem has no struct page\n");
goto out_obj;
}
i915_gem_object_lock(obj, NULL);
err = i915_gem_object_attach_phys(obj, PAGE_SIZE);
i915_gem_object_unlock(obj);
if (err) {

View File

@ -14,6 +14,7 @@ mock_context(struct drm_i915_private *i915,
{
struct i915_gem_context *ctx;
struct i915_gem_engines *e;
struct intel_sseu null_sseu = {};
ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
if (!ctx)
@ -30,15 +31,6 @@ mock_context(struct drm_i915_private *i915,
i915_gem_context_set_persistence(ctx);
mutex_init(&ctx->engines_mutex);
e = default_engines(ctx);
if (IS_ERR(e))
goto err_free;
RCU_INIT_POINTER(ctx->engines, e);
INIT_RADIX_TREE(&ctx->handles_vma, GFP_KERNEL);
mutex_init(&ctx->lut_mutex);
if (name) {
struct i915_ppgtt *ppgtt;
@ -46,25 +38,29 @@ mock_context(struct drm_i915_private *i915,
ppgtt = mock_ppgtt(i915, name);
if (!ppgtt)
goto err_put;
mutex_lock(&ctx->mutex);
__set_ppgtt(ctx, &ppgtt->vm);
mutex_unlock(&ctx->mutex);
goto err_free;
ctx->vm = i915_vm_open(&ppgtt->vm);
i915_vm_put(&ppgtt->vm);
}
mutex_init(&ctx->engines_mutex);
e = default_engines(ctx, null_sseu);
if (IS_ERR(e))
goto err_vm;
RCU_INIT_POINTER(ctx->engines, e);
INIT_RADIX_TREE(&ctx->handles_vma, GFP_KERNEL);
mutex_init(&ctx->lut_mutex);
return ctx;
err_vm:
if (ctx->vm)
i915_vm_close(ctx->vm);
err_free:
kfree(ctx);
return NULL;
err_put:
i915_gem_context_set_closed(ctx);
i915_gem_context_put(ctx);
return NULL;
}
void mock_context_close(struct i915_gem_context *ctx)
@ -80,20 +76,29 @@ void mock_init_contexts(struct drm_i915_private *i915)
struct i915_gem_context *
live_context(struct drm_i915_private *i915, struct file *file)
{
struct drm_i915_file_private *fpriv = to_drm_file(file)->driver_priv;
struct i915_gem_proto_context *pc;
struct i915_gem_context *ctx;
int err;
u32 id;
ctx = i915_gem_create_context(i915, 0);
pc = proto_context_create(i915, 0);
if (IS_ERR(pc))
return ERR_CAST(pc);
ctx = i915_gem_create_context(i915, pc);
proto_context_close(pc);
if (IS_ERR(ctx))
return ctx;
i915_gem_context_set_no_error_capture(ctx);
err = gem_context_register(ctx, to_drm_file(file)->driver_priv, &id);
err = xa_alloc(&fpriv->context_xa, &id, NULL, xa_limit_32b, GFP_KERNEL);
if (err < 0)
goto err_ctx;
gem_context_register(ctx, fpriv, id);
return ctx;
err_ctx:
@ -106,6 +111,7 @@ live_context_for_engine(struct intel_engine_cs *engine, struct file *file)
{
struct i915_gem_engines *engines;
struct i915_gem_context *ctx;
struct intel_sseu null_sseu = {};
struct intel_context *ce;
engines = alloc_engines(1);
@ -124,7 +130,7 @@ live_context_for_engine(struct intel_engine_cs *engine, struct file *file)
return ERR_CAST(ce);
}
intel_context_set_gem(ce, ctx);
intel_context_set_gem(ce, ctx, null_sseu);
engines->engines[0] = ce;
engines->num_engines = 1;
@ -139,11 +145,24 @@ live_context_for_engine(struct intel_engine_cs *engine, struct file *file)
}
struct i915_gem_context *
kernel_context(struct drm_i915_private *i915)
kernel_context(struct drm_i915_private *i915,
struct i915_address_space *vm)
{
struct i915_gem_context *ctx;
struct i915_gem_proto_context *pc;
ctx = i915_gem_create_context(i915, 0);
pc = proto_context_create(i915, 0);
if (IS_ERR(pc))
return ERR_CAST(pc);
if (vm) {
if (pc->vm)
i915_vm_put(pc->vm);
pc->vm = i915_vm_get(vm);
}
ctx = i915_gem_create_context(i915, pc);
proto_context_close(pc);
if (IS_ERR(ctx))
return ctx;

View File

@ -10,6 +10,7 @@
struct file;
struct drm_i915_private;
struct intel_engine_cs;
struct i915_address_space;
void mock_init_contexts(struct drm_i915_private *i915);
@ -25,7 +26,8 @@ live_context(struct drm_i915_private *i915, struct file *file);
struct i915_gem_context *
live_context_for_engine(struct intel_engine_cs *engine, struct file *file);
struct i915_gem_context *kernel_context(struct drm_i915_private *i915);
struct i915_gem_context *kernel_context(struct drm_i915_private *i915,
struct i915_address_space *vm);
void kernel_context_close(struct i915_gem_context *ctx);
#endif /* !__MOCK_CONTEXT_H */

View File

@ -437,20 +437,20 @@ static int frequency_show(struct seq_file *m, void *unused)
max_freq = (IS_GEN9_LP(i915) ? rp_state_cap >> 0 :
rp_state_cap >> 16) & 0xff;
max_freq *= (IS_GEN9_BC(i915) ||
GRAPHICS_VER(i915) >= 10 ? GEN9_FREQ_SCALER : 1);
GRAPHICS_VER(i915) >= 11 ? GEN9_FREQ_SCALER : 1);
seq_printf(m, "Lowest (RPN) frequency: %dMHz\n",
intel_gpu_freq(rps, max_freq));
max_freq = (rp_state_cap & 0xff00) >> 8;
max_freq *= (IS_GEN9_BC(i915) ||
GRAPHICS_VER(i915) >= 10 ? GEN9_FREQ_SCALER : 1);
GRAPHICS_VER(i915) >= 11 ? GEN9_FREQ_SCALER : 1);
seq_printf(m, "Nominal (RP1) frequency: %dMHz\n",
intel_gpu_freq(rps, max_freq));
max_freq = (IS_GEN9_LP(i915) ? rp_state_cap >> 16 :
rp_state_cap >> 0) & 0xff;
max_freq *= (IS_GEN9_BC(i915) ||
GRAPHICS_VER(i915) >= 10 ? GEN9_FREQ_SCALER : 1);
GRAPHICS_VER(i915) >= 11 ? GEN9_FREQ_SCALER : 1);
seq_printf(m, "Max non-overclocked (RP0) frequency: %dMHz\n",
intel_gpu_freq(rps, max_freq));
seq_printf(m, "Max overclocked frequency: %dMHz\n",
@ -500,7 +500,7 @@ static int llc_show(struct seq_file *m, void *data)
min_gpu_freq = rps->min_freq;
max_gpu_freq = rps->max_freq;
if (IS_GEN9_BC(i915) || GRAPHICS_VER(i915) >= 10) {
if (IS_GEN9_BC(i915) || GRAPHICS_VER(i915) >= 11) {
/* Convert GT frequency to 50 HZ units */
min_gpu_freq /= GEN9_FREQ_SCALER;
max_gpu_freq /= GEN9_FREQ_SCALER;
@ -518,7 +518,7 @@ static int llc_show(struct seq_file *m, void *data)
intel_gpu_freq(rps,
(gpu_freq *
(IS_GEN9_BC(i915) ||
GRAPHICS_VER(i915) >= 10 ?
GRAPHICS_VER(i915) >= 11 ?
GEN9_FREQ_SCALER : 1))),
((ia_freq >> 0) & 0xff) * 100,
((ia_freq >> 8) & 0xff) * 100);

View File

@ -42,7 +42,7 @@ int gen8_emit_flush_rcs(struct i915_request *rq, u32 mode)
vf_flush_wa = true;
/* WaForGAMHang:kbl */
if (IS_KBL_GT_STEP(rq->engine->i915, 0, STEP_B0))
if (IS_KBL_GT_STEP(rq->engine->i915, 0, STEP_C0))
dc_flush_wa = true;
}
@ -208,7 +208,7 @@ int gen12_emit_flush_rcs(struct i915_request *rq, u32 mode)
flags |= PIPE_CONTROL_FLUSH_L3;
flags |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH;
flags |= PIPE_CONTROL_DEPTH_CACHE_FLUSH;
/* Wa_1409600907:tgl */
/* Wa_1409600907:tgl,adl-p */
flags |= PIPE_CONTROL_DEPTH_STALL;
flags |= PIPE_CONTROL_DC_FLUSH_ENABLE;
flags |= PIPE_CONTROL_FLUSH_ENABLE;
@ -279,7 +279,7 @@ int gen12_emit_flush_xcs(struct i915_request *rq, u32 mode)
if (mode & EMIT_INVALIDATE)
aux_inv = rq->engine->mask & ~BIT(BCS0);
if (aux_inv)
cmd += 2 * hweight8(aux_inv) + 2;
cmd += 2 * hweight32(aux_inv) + 2;
cs = intel_ring_begin(rq, cmd);
if (IS_ERR(cs))
@ -313,9 +313,8 @@ int gen12_emit_flush_xcs(struct i915_request *rq, u32 mode)
struct intel_engine_cs *engine;
unsigned int tmp;
*cs++ = MI_LOAD_REGISTER_IMM(hweight8(aux_inv));
for_each_engine_masked(engine, rq->engine->gt,
aux_inv, tmp) {
*cs++ = MI_LOAD_REGISTER_IMM(hweight32(aux_inv));
for_each_engine_masked(engine, rq->engine->gt, aux_inv, tmp) {
*cs++ = i915_mmio_reg_offset(aux_inv_reg(engine));
*cs++ = AUX_INV;
}
@ -506,7 +505,8 @@ gen8_emit_fini_breadcrumb_tail(struct i915_request *rq, u32 *cs)
*cs++ = MI_USER_INTERRUPT;
*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
if (intel_engine_has_semaphores(rq->engine))
if (intel_engine_has_semaphores(rq->engine) &&
!intel_uc_uses_guc_submission(&rq->engine->gt->uc))
cs = emit_preempt_busywait(rq, cs);
rq->tail = intel_ring_offset(rq, cs);
@ -598,7 +598,8 @@ gen12_emit_fini_breadcrumb_tail(struct i915_request *rq, u32 *cs)
*cs++ = MI_USER_INTERRUPT;
*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
if (intel_engine_has_semaphores(rq->engine))
if (intel_engine_has_semaphores(rq->engine) &&
!intel_uc_uses_guc_submission(&rq->engine->gt->uc))
cs = gen12_emit_preempt_busywait(rq, cs);
rq->tail = intel_ring_offset(rq, cs);

View File

@ -358,6 +358,54 @@ static void gen8_ppgtt_alloc(struct i915_address_space *vm,
&start, start + length, vm->top);
}
static void __gen8_ppgtt_foreach(struct i915_address_space *vm,
struct i915_page_directory *pd,
u64 *start, u64 end, int lvl,
void (*fn)(struct i915_address_space *vm,
struct i915_page_table *pt,
void *data),
void *data)
{
unsigned int idx, len;
len = gen8_pd_range(*start, end, lvl--, &idx);
spin_lock(&pd->lock);
do {
struct i915_page_table *pt = pd->entry[idx];
atomic_inc(&pt->used);
spin_unlock(&pd->lock);
if (lvl) {
__gen8_ppgtt_foreach(vm, as_pd(pt), start, end, lvl,
fn, data);
} else {
fn(vm, pt, data);
*start += gen8_pt_count(*start, end);
}
spin_lock(&pd->lock);
atomic_dec(&pt->used);
} while (idx++, --len);
spin_unlock(&pd->lock);
}
static void gen8_ppgtt_foreach(struct i915_address_space *vm,
u64 start, u64 length,
void (*fn)(struct i915_address_space *vm,
struct i915_page_table *pt,
void *data),
void *data)
{
start >>= GEN8_PTE_SHIFT;
length >>= GEN8_PTE_SHIFT;
__gen8_ppgtt_foreach(vm, i915_vm_to_ppgtt(vm)->pd,
&start, start + length, vm->top,
fn, data);
}
static __always_inline u64
gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt,
struct i915_page_directory *pdp,
@ -552,6 +600,24 @@ static void gen8_ppgtt_insert(struct i915_address_space *vm,
}
}
static void gen8_ppgtt_insert_entry(struct i915_address_space *vm,
dma_addr_t addr,
u64 offset,
enum i915_cache_level level,
u32 flags)
{
u64 idx = offset >> GEN8_PTE_SHIFT;
struct i915_page_directory * const pdp =
gen8_pdp_for_page_index(vm, idx);
struct i915_page_directory *pd =
i915_pd_entry(pdp, gen8_pd_index(idx, 2));
gen8_pte_t *vaddr;
vaddr = px_vaddr(i915_pt_entry(pd, gen8_pd_index(idx, 1)));
vaddr[gen8_pd_index(idx, 0)] = gen8_pte_encode(addr, level, flags);
clflush_cache_range(&vaddr[gen8_pd_index(idx, 0)], sizeof(*vaddr));
}
static int gen8_init_scratch(struct i915_address_space *vm)
{
u32 pte_flags;
@ -731,8 +797,10 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt)
ppgtt->vm.bind_async_flags = I915_VMA_LOCAL_BIND;
ppgtt->vm.insert_entries = gen8_ppgtt_insert;
ppgtt->vm.insert_page = gen8_ppgtt_insert_entry;
ppgtt->vm.allocate_va_range = gen8_ppgtt_alloc;
ppgtt->vm.clear_range = gen8_ppgtt_clear;
ppgtt->vm.foreach = gen8_ppgtt_foreach;
ppgtt->vm.pte_encode = gen8_pte_encode;

View File

@ -15,28 +15,14 @@
#include "intel_gt_pm.h"
#include "intel_gt_requests.h"
static bool irq_enable(struct intel_engine_cs *engine)
static bool irq_enable(struct intel_breadcrumbs *b)
{
if (!engine->irq_enable)
return false;
/* Caller disables interrupts */
spin_lock(&engine->gt->irq_lock);
engine->irq_enable(engine);
spin_unlock(&engine->gt->irq_lock);
return true;
return intel_engine_irq_enable(b->irq_engine);
}
static void irq_disable(struct intel_engine_cs *engine)
static void irq_disable(struct intel_breadcrumbs *b)
{
if (!engine->irq_disable)
return;
/* Caller disables interrupts */
spin_lock(&engine->gt->irq_lock);
engine->irq_disable(engine);
spin_unlock(&engine->gt->irq_lock);
intel_engine_irq_disable(b->irq_engine);
}
static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
@ -57,7 +43,7 @@ static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
WRITE_ONCE(b->irq_armed, true);
/* Requests may have completed before we could enable the interrupt. */
if (!b->irq_enabled++ && irq_enable(b->irq_engine))
if (!b->irq_enabled++ && b->irq_enable(b))
irq_work_queue(&b->irq_work);
}
@ -76,7 +62,7 @@ static void __intel_breadcrumbs_disarm_irq(struct intel_breadcrumbs *b)
{
GEM_BUG_ON(!b->irq_enabled);
if (!--b->irq_enabled)
irq_disable(b->irq_engine);
b->irq_disable(b);
WRITE_ONCE(b->irq_armed, false);
intel_gt_pm_put_async(b->irq_engine->gt);
@ -259,6 +245,9 @@ static void signal_irq_work(struct irq_work *work)
llist_entry(signal, typeof(*rq), signal_node);
struct list_head cb_list;
if (rq->engine->sched_engine->retire_inflight_request_prio)
rq->engine->sched_engine->retire_inflight_request_prio(rq);
spin_lock(&rq->lock);
list_replace(&rq->fence.cb_list, &cb_list);
__dma_fence_signal__timestamp(&rq->fence, timestamp);
@ -281,7 +270,7 @@ intel_breadcrumbs_create(struct intel_engine_cs *irq_engine)
if (!b)
return NULL;
b->irq_engine = irq_engine;
kref_init(&b->ref);
spin_lock_init(&b->signalers_lock);
INIT_LIST_HEAD(&b->signalers);
@ -290,6 +279,10 @@ intel_breadcrumbs_create(struct intel_engine_cs *irq_engine)
spin_lock_init(&b->irq_lock);
init_irq_work(&b->irq_work, signal_irq_work);
b->irq_engine = irq_engine;
b->irq_enable = irq_enable;
b->irq_disable = irq_disable;
return b;
}
@ -303,9 +296,9 @@ void intel_breadcrumbs_reset(struct intel_breadcrumbs *b)
spin_lock_irqsave(&b->irq_lock, flags);
if (b->irq_enabled)
irq_enable(b->irq_engine);
b->irq_enable(b);
else
irq_disable(b->irq_engine);
b->irq_disable(b);
spin_unlock_irqrestore(&b->irq_lock, flags);
}
@ -325,11 +318,14 @@ void __intel_breadcrumbs_park(struct intel_breadcrumbs *b)
}
}
void intel_breadcrumbs_free(struct intel_breadcrumbs *b)
void intel_breadcrumbs_free(struct kref *kref)
{
struct intel_breadcrumbs *b = container_of(kref, typeof(*b), ref);
irq_work_sync(&b->irq_work);
GEM_BUG_ON(!list_empty(&b->signalers));
GEM_BUG_ON(b->irq_armed);
kfree(b);
}

View File

@ -9,7 +9,7 @@
#include <linux/atomic.h>
#include <linux/irq_work.h>
#include "intel_engine_types.h"
#include "intel_breadcrumbs_types.h"
struct drm_printer;
struct i915_request;
@ -17,7 +17,7 @@ struct intel_breadcrumbs;
struct intel_breadcrumbs *
intel_breadcrumbs_create(struct intel_engine_cs *irq_engine);
void intel_breadcrumbs_free(struct intel_breadcrumbs *b);
void intel_breadcrumbs_free(struct kref *kref);
void intel_breadcrumbs_reset(struct intel_breadcrumbs *b);
void __intel_breadcrumbs_park(struct intel_breadcrumbs *b);
@ -48,4 +48,16 @@ void i915_request_cancel_breadcrumb(struct i915_request *request);
void intel_context_remove_breadcrumbs(struct intel_context *ce,
struct intel_breadcrumbs *b);
static inline struct intel_breadcrumbs *
intel_breadcrumbs_get(struct intel_breadcrumbs *b)
{
kref_get(&b->ref);
return b;
}
static inline void intel_breadcrumbs_put(struct intel_breadcrumbs *b)
{
kref_put(&b->ref, intel_breadcrumbs_free);
}
#endif /* __INTEL_BREADCRUMBS__ */

View File

@ -7,10 +7,13 @@
#define __INTEL_BREADCRUMBS_TYPES__
#include <linux/irq_work.h>
#include <linux/kref.h>
#include <linux/list.h>
#include <linux/spinlock.h>
#include <linux/types.h>
#include "intel_engine_types.h"
/*
* Rather than have every client wait upon all user interrupts,
* with the herd waking after every interrupt and each doing the
@ -29,6 +32,7 @@
* the overhead of waking that client is much preferred.
*/
struct intel_breadcrumbs {
struct kref ref;
atomic_t active;
spinlock_t signalers_lock; /* protects the list of signalers */
@ -42,7 +46,10 @@ struct intel_breadcrumbs {
bool irq_armed;
/* Not all breadcrumbs are attached to physical HW */
intel_engine_mask_t engine_mask;
struct intel_engine_cs *irq_engine;
bool (*irq_enable)(struct intel_breadcrumbs *b);
void (*irq_disable)(struct intel_breadcrumbs *b);
};
#endif /* __INTEL_BREADCRUMBS_TYPES__ */

View File

@ -7,28 +7,26 @@
#include "gem/i915_gem_pm.h"
#include "i915_drv.h"
#include "i915_globals.h"
#include "i915_trace.h"
#include "intel_context.h"
#include "intel_engine.h"
#include "intel_engine_pm.h"
#include "intel_ring.h"
static struct i915_global_context {
struct i915_global base;
struct kmem_cache *slab_ce;
} global;
static struct kmem_cache *slab_ce;
static struct intel_context *intel_context_alloc(void)
{
return kmem_cache_zalloc(global.slab_ce, GFP_KERNEL);
return kmem_cache_zalloc(slab_ce, GFP_KERNEL);
}
static void rcu_context_free(struct rcu_head *rcu)
{
struct intel_context *ce = container_of(rcu, typeof(*ce), rcu);
kmem_cache_free(global.slab_ce, ce);
trace_intel_context_free(ce);
kmem_cache_free(slab_ce, ce);
}
void intel_context_free(struct intel_context *ce)
@ -46,6 +44,7 @@ intel_context_create(struct intel_engine_cs *engine)
return ERR_PTR(-ENOMEM);
intel_context_init(ce, engine);
trace_intel_context_create(ce);
return ce;
}
@ -80,7 +79,7 @@ static int intel_context_active_acquire(struct intel_context *ce)
__i915_active_acquire(&ce->active);
if (intel_context_is_barrier(ce))
if (intel_context_is_barrier(ce) || intel_engine_uses_guc(ce->engine))
return 0;
/* Preallocate tracking nodes */
@ -268,6 +267,8 @@ int __intel_context_do_pin_ww(struct intel_context *ce,
GEM_BUG_ON(!intel_context_is_pinned(ce)); /* no overflow! */
trace_intel_context_do_pin(ce);
err_unlock:
mutex_unlock(&ce->pin_mutex);
err_post_unpin:
@ -306,9 +307,9 @@ retry:
return err;
}
void intel_context_unpin(struct intel_context *ce)
void __intel_context_do_unpin(struct intel_context *ce, int sub)
{
if (!atomic_dec_and_test(&ce->pin_count))
if (!atomic_sub_and_test(sub, &ce->pin_count))
return;
CE_TRACE(ce, "unpin\n");
@ -323,6 +324,7 @@ void intel_context_unpin(struct intel_context *ce)
*/
intel_context_get(ce);
intel_context_active_release(ce);
trace_intel_context_do_unpin(ce);
intel_context_put(ce);
}
@ -360,6 +362,12 @@ static int __intel_context_active(struct i915_active *active)
return 0;
}
static int sw_fence_dummy_notify(struct i915_sw_fence *sf,
enum i915_sw_fence_notify state)
{
return NOTIFY_DONE;
}
void
intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
{
@ -371,7 +379,8 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
ce->engine = engine;
ce->ops = engine->cops;
ce->sseu = engine->sseu;
ce->ring = __intel_context_ring_size(SZ_4K);
ce->ring = NULL;
ce->ring_size = SZ_4K;
ewma_runtime_init(&ce->runtime.avg);
@ -383,6 +392,22 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
mutex_init(&ce->pin_mutex);
spin_lock_init(&ce->guc_state.lock);
INIT_LIST_HEAD(&ce->guc_state.fences);
spin_lock_init(&ce->guc_active.lock);
INIT_LIST_HEAD(&ce->guc_active.requests);
ce->guc_id = GUC_INVALID_LRC_ID;
INIT_LIST_HEAD(&ce->guc_id_link);
/*
* Initialize fence to be complete as this is expected to be complete
* unless there is a pending schedule disable outstanding.
*/
i915_sw_fence_init(&ce->guc_blocked, sw_fence_dummy_notify);
i915_sw_fence_commit(&ce->guc_blocked);
i915_active_init(&ce->active,
__intel_context_active, __intel_context_retire, 0);
}
@ -397,28 +422,17 @@ void intel_context_fini(struct intel_context *ce)
i915_active_fini(&ce->active);
}
static void i915_global_context_shrink(void)
void i915_context_module_exit(void)
{
kmem_cache_shrink(global.slab_ce);
kmem_cache_destroy(slab_ce);
}
static void i915_global_context_exit(void)
int __init i915_context_module_init(void)
{
kmem_cache_destroy(global.slab_ce);
}
static struct i915_global_context global = { {
.shrink = i915_global_context_shrink,
.exit = i915_global_context_exit,
} };
int __init i915_global_context_init(void)
{
global.slab_ce = KMEM_CACHE(intel_context, SLAB_HWCACHE_ALIGN);
if (!global.slab_ce)
slab_ce = KMEM_CACHE(intel_context, SLAB_HWCACHE_ALIGN);
if (!slab_ce)
return -ENOMEM;
i915_global_register(&global.base);
return 0;
}
@ -499,6 +513,26 @@ retry:
return rq;
}
struct i915_request *intel_context_find_active_request(struct intel_context *ce)
{
struct i915_request *rq, *active = NULL;
unsigned long flags;
GEM_BUG_ON(!intel_engine_uses_guc(ce->engine));
spin_lock_irqsave(&ce->guc_active.lock, flags);
list_for_each_entry_reverse(rq, &ce->guc_active.requests,
sched.link) {
if (i915_request_completed(rq))
break;
active = rq;
}
spin_unlock_irqrestore(&ce->guc_active.lock, flags);
return active;
}
#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
#include "selftest_context.c"
#endif

View File

@ -16,6 +16,7 @@
#include "intel_engine_types.h"
#include "intel_ring_types.h"
#include "intel_timeline_types.h"
#include "i915_trace.h"
#define CE_TRACE(ce, fmt, ...) do { \
const struct intel_context *ce__ = (ce); \
@ -30,6 +31,9 @@ void intel_context_init(struct intel_context *ce,
struct intel_engine_cs *engine);
void intel_context_fini(struct intel_context *ce);
void i915_context_module_exit(void);
int i915_context_module_init(void);
struct intel_context *
intel_context_create(struct intel_engine_cs *engine);
@ -69,6 +73,13 @@ intel_context_is_pinned(struct intel_context *ce)
return atomic_read(&ce->pin_count);
}
static inline void intel_context_cancel_request(struct intel_context *ce,
struct i915_request *rq)
{
GEM_BUG_ON(!ce->ops->cancel_request);
return ce->ops->cancel_request(ce, rq);
}
/**
* intel_context_unlock_pinned - Releases the earlier locking of 'pinned' status
* @ce - the context
@ -113,7 +124,32 @@ static inline void __intel_context_pin(struct intel_context *ce)
atomic_inc(&ce->pin_count);
}
void intel_context_unpin(struct intel_context *ce);
void __intel_context_do_unpin(struct intel_context *ce, int sub);
static inline void intel_context_sched_disable_unpin(struct intel_context *ce)
{
__intel_context_do_unpin(ce, 2);
}
static inline void intel_context_unpin(struct intel_context *ce)
{
if (!ce->ops->sched_disable) {
__intel_context_do_unpin(ce, 1);
} else {
/*
* Move ownership of this pin to the scheduling disable which is
* an async operation. When that operation completes the above
* intel_context_sched_disable_unpin is called potentially
* unpinning the context.
*/
while (!atomic_add_unless(&ce->pin_count, -1, 1)) {
if (atomic_cmpxchg(&ce->pin_count, 1, 2) == 1) {
ce->ops->sched_disable(ce);
break;
}
}
}
}
void intel_context_enter_engine(struct intel_context *ce);
void intel_context_exit_engine(struct intel_context *ce);
@ -175,10 +211,8 @@ int intel_context_prepare_remote_request(struct intel_context *ce,
struct i915_request *intel_context_create_request(struct intel_context *ce);
static inline struct intel_ring *__intel_context_ring_size(u64 sz)
{
return u64_to_ptr(struct intel_ring, sz);
}
struct i915_request *
intel_context_find_active_request(struct intel_context *ce);
static inline bool intel_context_is_barrier(const struct intel_context *ce)
{
@ -220,6 +254,18 @@ static inline bool intel_context_set_banned(struct intel_context *ce)
return test_and_set_bit(CONTEXT_BANNED, &ce->flags);
}
static inline bool intel_context_ban(struct intel_context *ce,
struct i915_request *rq)
{
bool ret = intel_context_set_banned(ce);
trace_intel_context_ban(ce);
if (ce->ops->ban)
ce->ops->ban(ce, rq);
return ret;
}
static inline bool
intel_context_force_single_submission(const struct intel_context *ce)
{

View File

@ -1,63 +0,0 @@
// SPDX-License-Identifier: MIT
/*
* Copyright © 2019 Intel Corporation
*/
#include "i915_active.h"
#include "intel_context.h"
#include "intel_context_param.h"
#include "intel_ring.h"
int intel_context_set_ring_size(struct intel_context *ce, long sz)
{
int err;
if (intel_context_lock_pinned(ce))
return -EINTR;
err = i915_active_wait(&ce->active);
if (err < 0)
goto unlock;
if (intel_context_is_pinned(ce)) {
err = -EBUSY; /* In active use, come back later! */
goto unlock;
}
if (test_bit(CONTEXT_ALLOC_BIT, &ce->flags)) {
struct intel_ring *ring;
/* Replace the existing ringbuffer */
ring = intel_engine_create_ring(ce->engine, sz);
if (IS_ERR(ring)) {
err = PTR_ERR(ring);
goto unlock;
}
intel_ring_put(ce->ring);
ce->ring = ring;
/* Context image will be updated on next pin */
} else {
ce->ring = __intel_context_ring_size(sz);
}
unlock:
intel_context_unlock_pinned(ce);
return err;
}
long intel_context_get_ring_size(struct intel_context *ce)
{
long sz = (unsigned long)READ_ONCE(ce->ring);
if (test_bit(CONTEXT_ALLOC_BIT, &ce->flags)) {
if (intel_context_lock_pinned(ce))
return -EINTR;
sz = ce->ring->size;
intel_context_unlock_pinned(ce);
}
return sz;
}

View File

@ -10,14 +10,10 @@
#include "intel_context.h"
int intel_context_set_ring_size(struct intel_context *ce, long sz);
long intel_context_get_ring_size(struct intel_context *ce);
static inline int
static inline void
intel_context_set_watchdog_us(struct intel_context *ce, u64 timeout_us)
{
ce->watchdog.timeout_us = timeout_us;
return 0;
}
#endif /* INTEL_CONTEXT_PARAM_H */

View File

@ -13,12 +13,14 @@
#include <linux/types.h>
#include "i915_active_types.h"
#include "i915_sw_fence.h"
#include "i915_utils.h"
#include "intel_engine_types.h"
#include "intel_sseu.h"
#define CONTEXT_REDZONE POISON_INUSE
#include "uc/intel_guc_fwif.h"
#define CONTEXT_REDZONE POISON_INUSE
DECLARE_EWMA(runtime, 3, 8);
struct i915_gem_context;
@ -35,16 +37,29 @@ struct intel_context_ops {
int (*alloc)(struct intel_context *ce);
void (*ban)(struct intel_context *ce, struct i915_request *rq);
int (*pre_pin)(struct intel_context *ce, struct i915_gem_ww_ctx *ww, void **vaddr);
int (*pin)(struct intel_context *ce, void *vaddr);
void (*unpin)(struct intel_context *ce);
void (*post_unpin)(struct intel_context *ce);
void (*cancel_request)(struct intel_context *ce,
struct i915_request *rq);
void (*enter)(struct intel_context *ce);
void (*exit)(struct intel_context *ce);
void (*sched_disable)(struct intel_context *ce);
void (*reset)(struct intel_context *ce);
void (*destroy)(struct kref *kref);
/* virtual engine/context interface */
struct intel_context *(*create_virtual)(struct intel_engine_cs **engine,
unsigned int count);
struct intel_engine_cs *(*get_sibling)(struct intel_engine_cs *engine,
unsigned int sibling);
};
struct intel_context {
@ -82,6 +97,7 @@ struct intel_context {
spinlock_t signal_lock; /* protects signals, the list of requests */
struct i915_vma *state;
u32 ring_size;
struct intel_ring *ring;
struct intel_timeline *timeline;
@ -95,6 +111,7 @@ struct intel_context {
#define CONTEXT_BANNED 6
#define CONTEXT_FORCE_SINGLE_SUBMISSION 7
#define CONTEXT_NOPREEMPT 8
#define CONTEXT_LRCA_DIRTY 9
struct {
u64 timeout_us;
@ -136,6 +153,51 @@ struct intel_context {
struct intel_sseu sseu;
u8 wa_bb_page; /* if set, page num reserved for context workarounds */
struct {
/** lock: protects everything in guc_state */
spinlock_t lock;
/**
* sched_state: scheduling state of this context using GuC
* submission
*/
u16 sched_state;
/*
* fences: maintains of list of requests that have a submit
* fence related to GuC submission
*/
struct list_head fences;
} guc_state;
struct {
/** lock: protects everything in guc_active */
spinlock_t lock;
/** requests: active requests on this context */
struct list_head requests;
} guc_active;
/* GuC scheduling state flags that do not require a lock. */
atomic_t guc_sched_state_no_lock;
/* GuC LRC descriptor ID */
u16 guc_id;
/* GuC LRC descriptor reference count */
atomic_t guc_id_ref;
/*
* GuC ID link - in list when unpinned but guc_id still valid in GuC
*/
struct list_head guc_id_link;
/* GuC context blocked fence */
struct i915_sw_fence guc_blocked;
/*
* GuC priority management
*/
u8 guc_prio;
u32 guc_prio_count[GUC_CLIENT_PRIORITY_NUM];
};
#endif /* __INTEL_CONTEXT_TYPES__ */

View File

@ -19,7 +19,9 @@
#include "intel_workarounds.h"
struct drm_printer;
struct intel_context;
struct intel_gt;
struct lock_class_key;
/* Early gen2 devices have a cacheline of just 32 bytes, using 64 is overkill,
* but keeps the logic simple. Indeed, the whole purpose of this macro is just
@ -123,20 +125,6 @@ execlists_active(const struct intel_engine_execlists *execlists)
return active;
}
static inline void
execlists_active_lock_bh(struct intel_engine_execlists *execlists)
{
local_bh_disable(); /* prevent local softirq and lock recursion */
tasklet_lock(&execlists->tasklet);
}
static inline void
execlists_active_unlock_bh(struct intel_engine_execlists *execlists)
{
tasklet_unlock(&execlists->tasklet);
local_bh_enable(); /* restore softirq, and kick ksoftirqd! */
}
struct i915_request *
execlists_unwind_incomplete_requests(struct intel_engine_execlists *execlists);
@ -186,11 +174,12 @@ intel_write_status_page(struct intel_engine_cs *engine, int reg, u32 value)
#define I915_GEM_HWS_PREEMPT_ADDR (I915_GEM_HWS_PREEMPT * sizeof(u32))
#define I915_GEM_HWS_SEQNO 0x40
#define I915_GEM_HWS_SEQNO_ADDR (I915_GEM_HWS_SEQNO * sizeof(u32))
#define I915_GEM_HWS_MIGRATE (0x42 * sizeof(u32))
#define I915_GEM_HWS_SCRATCH 0x80
#define I915_HWS_CSB_BUF0_INDEX 0x10
#define I915_HWS_CSB_WRITE_INDEX 0x1f
#define CNL_HWS_CSB_WRITE_INDEX 0x2f
#define ICL_HWS_CSB_WRITE_INDEX 0x2f
void intel_engine_stop(struct intel_engine_cs *engine);
void intel_engine_cleanup(struct intel_engine_cs *engine);
@ -223,6 +212,9 @@ void intel_engine_get_instdone(const struct intel_engine_cs *engine,
void intel_engine_init_execlists(struct intel_engine_cs *engine);
bool intel_engine_irq_enable(struct intel_engine_cs *engine);
void intel_engine_irq_disable(struct intel_engine_cs *engine);
static inline void __intel_engine_reset(struct intel_engine_cs *engine,
bool stalled)
{
@ -248,17 +240,27 @@ __printf(3, 4)
void intel_engine_dump(struct intel_engine_cs *engine,
struct drm_printer *m,
const char *header, ...);
void intel_engine_dump_active_requests(struct list_head *requests,
struct i915_request *hung_rq,
struct drm_printer *m);
ktime_t intel_engine_get_busy_time(struct intel_engine_cs *engine,
ktime_t *now);
struct i915_request *
intel_engine_find_active_request(struct intel_engine_cs *engine);
intel_engine_execlist_find_hung_request(struct intel_engine_cs *engine);
u32 intel_engine_context_size(struct intel_gt *gt, u8 class);
struct intel_context *
intel_engine_create_pinned_context(struct intel_engine_cs *engine,
struct i915_address_space *vm,
unsigned int ring_size,
unsigned int hwsp,
struct lock_class_key *key,
const char *name);
void intel_engine_destroy_pinned_context(struct intel_context *ce);
void intel_engine_init_active(struct intel_engine_cs *engine,
unsigned int subclass);
#define ENGINE_PHYSICAL 0
#define ENGINE_MOCK 1
#define ENGINE_VIRTUAL 2
@ -277,13 +279,60 @@ intel_engine_has_preempt_reset(const struct intel_engine_cs *engine)
return intel_engine_has_preemption(engine);
}
struct intel_context *
intel_engine_create_virtual(struct intel_engine_cs **siblings,
unsigned int count);
static inline bool
intel_virtual_engine_has_heartbeat(const struct intel_engine_cs *engine)
{
/*
* For non-GuC submission we expect the back-end to look at the
* heartbeat status of the actual physical engine that the work
* has been (or is being) scheduled on, so we should only reach
* here with GuC submission enabled.
*/
GEM_BUG_ON(!intel_engine_uses_guc(engine));
return intel_guc_virtual_engine_has_heartbeat(engine);
}
static inline bool
intel_engine_has_heartbeat(const struct intel_engine_cs *engine)
{
if (!IS_ACTIVE(CONFIG_DRM_I915_HEARTBEAT_INTERVAL))
return false;
return READ_ONCE(engine->props.heartbeat_interval_ms);
if (intel_engine_is_virtual(engine))
return intel_virtual_engine_has_heartbeat(engine);
else
return READ_ONCE(engine->props.heartbeat_interval_ms);
}
static inline struct intel_engine_cs *
intel_engine_get_sibling(struct intel_engine_cs *engine, unsigned int sibling)
{
GEM_BUG_ON(!intel_engine_is_virtual(engine));
return engine->cops->get_sibling(engine, sibling);
}
static inline void
intel_engine_set_hung_context(struct intel_engine_cs *engine,
struct intel_context *ce)
{
engine->hung_ce = ce;
}
static inline void
intel_engine_clear_hung_context(struct intel_engine_cs *engine)
{
intel_engine_set_hung_context(engine, NULL);
}
static inline struct intel_context *
intel_engine_get_hung_context(struct intel_engine_cs *engine)
{
return engine->hung_ce;
}
#endif /* _INTEL_RINGBUFFER_H_ */

View File

@ -35,14 +35,12 @@
#define DEFAULT_LR_CONTEXT_RENDER_SIZE (22 * PAGE_SIZE)
#define GEN8_LR_CONTEXT_RENDER_SIZE (20 * PAGE_SIZE)
#define GEN9_LR_CONTEXT_RENDER_SIZE (22 * PAGE_SIZE)
#define GEN10_LR_CONTEXT_RENDER_SIZE (18 * PAGE_SIZE)
#define GEN11_LR_CONTEXT_RENDER_SIZE (14 * PAGE_SIZE)
#define GEN8_LR_CONTEXT_OTHER_SIZE ( 2 * PAGE_SIZE)
#define MAX_MMIO_BASES 3
struct engine_info {
unsigned int hw_id;
u8 class;
u8 instance;
/* mmio bases table *must* be sorted in reverse graphics_ver order */
@ -54,7 +52,6 @@ struct engine_info {
static const struct engine_info intel_engines[] = {
[RCS0] = {
.hw_id = RCS0_HW,
.class = RENDER_CLASS,
.instance = 0,
.mmio_bases = {
@ -62,7 +59,6 @@ static const struct engine_info intel_engines[] = {
},
},
[BCS0] = {
.hw_id = BCS0_HW,
.class = COPY_ENGINE_CLASS,
.instance = 0,
.mmio_bases = {
@ -70,7 +66,6 @@ static const struct engine_info intel_engines[] = {
},
},
[VCS0] = {
.hw_id = VCS0_HW,
.class = VIDEO_DECODE_CLASS,
.instance = 0,
.mmio_bases = {
@ -80,7 +75,6 @@ static const struct engine_info intel_engines[] = {
},
},
[VCS1] = {
.hw_id = VCS1_HW,
.class = VIDEO_DECODE_CLASS,
.instance = 1,
.mmio_bases = {
@ -89,7 +83,6 @@ static const struct engine_info intel_engines[] = {
},
},
[VCS2] = {
.hw_id = VCS2_HW,
.class = VIDEO_DECODE_CLASS,
.instance = 2,
.mmio_bases = {
@ -97,15 +90,41 @@ static const struct engine_info intel_engines[] = {
},
},
[VCS3] = {
.hw_id = VCS3_HW,
.class = VIDEO_DECODE_CLASS,
.instance = 3,
.mmio_bases = {
{ .graphics_ver = 11, .base = GEN11_BSD4_RING_BASE }
},
},
[VCS4] = {
.class = VIDEO_DECODE_CLASS,
.instance = 4,
.mmio_bases = {
{ .graphics_ver = 12, .base = XEHP_BSD5_RING_BASE }
},
},
[VCS5] = {
.class = VIDEO_DECODE_CLASS,
.instance = 5,
.mmio_bases = {
{ .graphics_ver = 12, .base = XEHP_BSD6_RING_BASE }
},
},
[VCS6] = {
.class = VIDEO_DECODE_CLASS,
.instance = 6,
.mmio_bases = {
{ .graphics_ver = 12, .base = XEHP_BSD7_RING_BASE }
},
},
[VCS7] = {
.class = VIDEO_DECODE_CLASS,
.instance = 7,
.mmio_bases = {
{ .graphics_ver = 12, .base = XEHP_BSD8_RING_BASE }
},
},
[VECS0] = {
.hw_id = VECS0_HW,
.class = VIDEO_ENHANCEMENT_CLASS,
.instance = 0,
.mmio_bases = {
@ -114,13 +133,26 @@ static const struct engine_info intel_engines[] = {
},
},
[VECS1] = {
.hw_id = VECS1_HW,
.class = VIDEO_ENHANCEMENT_CLASS,
.instance = 1,
.mmio_bases = {
{ .graphics_ver = 11, .base = GEN11_VEBOX2_RING_BASE }
},
},
[VECS2] = {
.class = VIDEO_ENHANCEMENT_CLASS,
.instance = 2,
.mmio_bases = {
{ .graphics_ver = 12, .base = XEHP_VEBOX3_RING_BASE }
},
},
[VECS3] = {
.class = VIDEO_ENHANCEMENT_CLASS,
.instance = 3,
.mmio_bases = {
{ .graphics_ver = 12, .base = XEHP_VEBOX4_RING_BASE }
},
},
};
/**
@ -153,8 +185,6 @@ u32 intel_engine_context_size(struct intel_gt *gt, u8 class)
case 12:
case 11:
return GEN11_LR_CONTEXT_RENDER_SIZE;
case 10:
return GEN10_LR_CONTEXT_RENDER_SIZE;
case 9:
return GEN9_LR_CONTEXT_RENDER_SIZE;
case 8:
@ -269,6 +299,8 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
BUILD_BUG_ON(MAX_ENGINE_CLASS >= BIT(GEN11_ENGINE_CLASS_WIDTH));
BUILD_BUG_ON(MAX_ENGINE_INSTANCE >= BIT(GEN11_ENGINE_INSTANCE_WIDTH));
BUILD_BUG_ON(I915_MAX_VCS > (MAX_ENGINE_INSTANCE + 1));
BUILD_BUG_ON(I915_MAX_VECS > (MAX_ENGINE_INSTANCE + 1));
if (GEM_DEBUG_WARN_ON(id >= ARRAY_SIZE(gt->engine)))
return -EINVAL;
@ -294,7 +326,6 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
engine->i915 = i915;
engine->gt = gt;
engine->uncore = gt->uncore;
engine->hw_id = info->hw_id;
guc_class = engine_class_to_guc_class(info->class);
engine->guc_id = MAKE_GUC_ID(guc_class, info->instance);
engine->mmio_base = __engine_mmio_base(i915, info->mmio_bases);
@ -328,9 +359,6 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
if (engine->context_size)
DRIVER_CAPS(i915)->has_logical_contexts = true;
/* Nothing to do here, execute in order of dependencies */
engine->schedule = NULL;
ewma__engine_latency_init(&engine->latency);
seqcount_init(&engine->stats.lock);
@ -445,6 +473,28 @@ void intel_engines_free(struct intel_gt *gt)
}
}
static
bool gen11_vdbox_has_sfc(struct drm_i915_private *i915,
unsigned int physical_vdbox,
unsigned int logical_vdbox, u16 vdbox_mask)
{
/*
* In Gen11, only even numbered logical VDBOXes are hooked
* up to an SFC (Scaler & Format Converter) unit.
* In Gen12, Even numbered physical instance always are connected
* to an SFC. Odd numbered physical instances have SFC only if
* previous even instance is fused off.
*/
if (GRAPHICS_VER(i915) == 12)
return (physical_vdbox % 2 == 0) ||
!(BIT(physical_vdbox - 1) & vdbox_mask);
else if (GRAPHICS_VER(i915) == 11)
return logical_vdbox % 2 == 0;
MISSING_CASE(GRAPHICS_VER(i915));
return false;
}
/*
* Determine which engines are fused off in our particular hardware.
* Note that we have a catch-22 situation where we need to be able to access
@ -471,7 +521,14 @@ static intel_engine_mask_t init_engine_mask(struct intel_gt *gt)
if (GRAPHICS_VER(i915) < 11)
return info->engine_mask;
media_fuse = ~intel_uncore_read(uncore, GEN11_GT_VEBOX_VDBOX_DISABLE);
/*
* On newer platforms the fusing register is called 'enable' and has
* enable semantics, while on older platforms it is called 'disable'
* and bits have disable semantices.
*/
media_fuse = intel_uncore_read(uncore, GEN11_GT_VEBOX_VDBOX_DISABLE);
if (GRAPHICS_VER_FULL(i915) < IP_VER(12, 50))
media_fuse = ~media_fuse;
vdbox_mask = media_fuse & GEN11_GT_VDBOX_DISABLE_MASK;
vebox_mask = (media_fuse & GEN11_GT_VEBOX_DISABLE_MASK) >>
@ -489,13 +546,9 @@ static intel_engine_mask_t init_engine_mask(struct intel_gt *gt)
continue;
}
/*
* In Gen11, only even numbered logical VDBOXes are
* hooked up to an SFC (Scaler & Format Converter) unit.
* In TGL each VDBOX has access to an SFC.
*/
if (GRAPHICS_VER(i915) >= 12 || logical_vdbox++ % 2 == 0)
if (gen11_vdbox_has_sfc(i915, i, logical_vdbox, vdbox_mask))
gt->info.vdbox_sfc_access |= BIT(i);
logical_vdbox++;
}
drm_dbg(&i915->drm, "vdbox enable: %04x, instances: %04lx\n",
vdbox_mask, VDBOX_MASK(gt));
@ -585,9 +638,6 @@ void intel_engine_init_execlists(struct intel_engine_cs *engine)
memset(execlists->pending, 0, sizeof(execlists->pending));
execlists->active =
memset(execlists->inflight, 0, sizeof(execlists->inflight));
execlists->queue_priority_hint = INT_MIN;
execlists->queue = RB_ROOT_CACHED;
}
static void cleanup_status_page(struct intel_engine_cs *engine)
@ -714,11 +764,17 @@ static int engine_setup_common(struct intel_engine_cs *engine)
goto err_status;
}
engine->sched_engine = i915_sched_engine_create(ENGINE_PHYSICAL);
if (!engine->sched_engine) {
err = -ENOMEM;
goto err_sched_engine;
}
engine->sched_engine->private_data = engine;
err = intel_engine_init_cmd_parser(engine);
if (err)
goto err_cmd_parser;
intel_engine_init_active(engine, ENGINE_PHYSICAL);
intel_engine_init_execlists(engine);
intel_engine_init__pm(engine);
intel_engine_init_retire(engine);
@ -737,7 +793,9 @@ static int engine_setup_common(struct intel_engine_cs *engine)
return 0;
err_cmd_parser:
intel_breadcrumbs_free(engine->breadcrumbs);
i915_sched_engine_put(engine->sched_engine);
err_sched_engine:
intel_breadcrumbs_put(engine->breadcrumbs);
err_status:
cleanup_status_page(engine);
return err;
@ -775,11 +833,11 @@ static int measure_breadcrumb_dw(struct intel_context *ce)
frame->rq.ring = &frame->ring;
mutex_lock(&ce->timeline->mutex);
spin_lock_irq(&engine->active.lock);
spin_lock_irq(&engine->sched_engine->lock);
dw = engine->emit_fini_breadcrumb(&frame->rq, frame->cs) - frame->cs;
spin_unlock_irq(&engine->active.lock);
spin_unlock_irq(&engine->sched_engine->lock);
mutex_unlock(&ce->timeline->mutex);
GEM_BUG_ON(dw & 1); /* RING_TAIL must be qword aligned */
@ -788,33 +846,13 @@ static int measure_breadcrumb_dw(struct intel_context *ce)
return dw;
}
void
intel_engine_init_active(struct intel_engine_cs *engine, unsigned int subclass)
{
INIT_LIST_HEAD(&engine->active.requests);
INIT_LIST_HEAD(&engine->active.hold);
spin_lock_init(&engine->active.lock);
lockdep_set_subclass(&engine->active.lock, subclass);
/*
* Due to an interesting quirk in lockdep's internal debug tracking,
* after setting a subclass we must ensure the lock is used. Otherwise,
* nr_unused_locks is incremented once too often.
*/
#ifdef CONFIG_DEBUG_LOCK_ALLOC
local_irq_disable();
lock_map_acquire(&engine->active.lock.dep_map);
lock_map_release(&engine->active.lock.dep_map);
local_irq_enable();
#endif
}
static struct intel_context *
create_pinned_context(struct intel_engine_cs *engine,
unsigned int hwsp,
struct lock_class_key *key,
const char *name)
struct intel_context *
intel_engine_create_pinned_context(struct intel_engine_cs *engine,
struct i915_address_space *vm,
unsigned int ring_size,
unsigned int hwsp,
struct lock_class_key *key,
const char *name)
{
struct intel_context *ce;
int err;
@ -825,6 +863,11 @@ create_pinned_context(struct intel_engine_cs *engine,
__set_bit(CONTEXT_BARRIER_BIT, &ce->flags);
ce->timeline = page_pack_bits(NULL, hwsp);
ce->ring = NULL;
ce->ring_size = ring_size;
i915_vm_put(ce->vm);
ce->vm = i915_vm_get(vm);
err = intel_context_pin(ce); /* perma-pin so it is always available */
if (err) {
@ -843,7 +886,7 @@ create_pinned_context(struct intel_engine_cs *engine,
return ce;
}
static void destroy_pinned_context(struct intel_context *ce)
void intel_engine_destroy_pinned_context(struct intel_context *ce)
{
struct intel_engine_cs *engine = ce->engine;
struct i915_vma *hwsp = engine->status_page.vma;
@ -863,8 +906,9 @@ create_kernel_context(struct intel_engine_cs *engine)
{
static struct lock_class_key kernel;
return create_pinned_context(engine, I915_GEM_HWS_SEQNO_ADDR,
&kernel, "kernel_context");
return intel_engine_create_pinned_context(engine, engine->gt->vm, SZ_4K,
I915_GEM_HWS_SEQNO_ADDR,
&kernel, "kernel_context");
}
/**
@ -907,7 +951,7 @@ static int engine_init_common(struct intel_engine_cs *engine)
return 0;
err_context:
destroy_pinned_context(ce);
intel_engine_destroy_pinned_context(ce);
return ret;
}
@ -957,10 +1001,10 @@ int intel_engines_init(struct intel_gt *gt)
*/
void intel_engine_cleanup_common(struct intel_engine_cs *engine)
{
GEM_BUG_ON(!list_empty(&engine->active.requests));
tasklet_kill(&engine->execlists.tasklet); /* flush the callback */
GEM_BUG_ON(!list_empty(&engine->sched_engine->requests));
intel_breadcrumbs_free(engine->breadcrumbs);
i915_sched_engine_put(engine->sched_engine);
intel_breadcrumbs_put(engine->breadcrumbs);
intel_engine_fini_retire(engine);
intel_engine_cleanup_cmd_parser(engine);
@ -969,7 +1013,7 @@ void intel_engine_cleanup_common(struct intel_engine_cs *engine)
fput(engine->default_state);
if (engine->kernel_context)
destroy_pinned_context(engine->kernel_context);
intel_engine_destroy_pinned_context(engine->kernel_context);
GEM_BUG_ON(!llist_empty(&engine->barrier_tasks));
cleanup_status_page(engine);
@ -1105,45 +1149,8 @@ static u32
read_subslice_reg(const struct intel_engine_cs *engine,
int slice, int subslice, i915_reg_t reg)
{
struct drm_i915_private *i915 = engine->i915;
struct intel_uncore *uncore = engine->uncore;
u32 mcr_mask, mcr_ss, mcr, old_mcr, val;
enum forcewake_domains fw_domains;
if (GRAPHICS_VER(i915) >= 11) {
mcr_mask = GEN11_MCR_SLICE_MASK | GEN11_MCR_SUBSLICE_MASK;
mcr_ss = GEN11_MCR_SLICE(slice) | GEN11_MCR_SUBSLICE(subslice);
} else {
mcr_mask = GEN8_MCR_SLICE_MASK | GEN8_MCR_SUBSLICE_MASK;
mcr_ss = GEN8_MCR_SLICE(slice) | GEN8_MCR_SUBSLICE(subslice);
}
fw_domains = intel_uncore_forcewake_for_reg(uncore, reg,
FW_REG_READ);
fw_domains |= intel_uncore_forcewake_for_reg(uncore,
GEN8_MCR_SELECTOR,
FW_REG_READ | FW_REG_WRITE);
spin_lock_irq(&uncore->lock);
intel_uncore_forcewake_get__locked(uncore, fw_domains);
old_mcr = mcr = intel_uncore_read_fw(uncore, GEN8_MCR_SELECTOR);
mcr &= ~mcr_mask;
mcr |= mcr_ss;
intel_uncore_write_fw(uncore, GEN8_MCR_SELECTOR, mcr);
val = intel_uncore_read_fw(uncore, reg);
mcr &= ~mcr_mask;
mcr |= old_mcr & mcr_mask;
intel_uncore_write_fw(uncore, GEN8_MCR_SELECTOR, mcr);
intel_uncore_forcewake_put__locked(uncore, fw_domains);
spin_unlock_irq(&uncore->lock);
return val;
return intel_uncore_read_with_mcr_steering(engine->uncore, reg,
slice, subslice);
}
/* NB: please notice the memset */
@ -1243,7 +1250,7 @@ static bool ring_is_idle(struct intel_engine_cs *engine)
void __intel_engine_flush_submission(struct intel_engine_cs *engine, bool sync)
{
struct tasklet_struct *t = &engine->execlists.tasklet;
struct tasklet_struct *t = &engine->sched_engine->tasklet;
if (!t->callback)
return;
@ -1283,7 +1290,7 @@ bool intel_engine_is_idle(struct intel_engine_cs *engine)
intel_engine_flush_submission(engine);
/* ELSP is empty, but there are ready requests? E.g. after reset */
if (!RB_EMPTY_ROOT(&engine->execlists.queue.rb_root))
if (!i915_sched_engine_is_empty(engine->sched_engine))
return false;
/* Ring stopped? */
@ -1314,6 +1321,30 @@ bool intel_engines_are_idle(struct intel_gt *gt)
return true;
}
bool intel_engine_irq_enable(struct intel_engine_cs *engine)
{
if (!engine->irq_enable)
return false;
/* Caller disables interrupts */
spin_lock(&engine->gt->irq_lock);
engine->irq_enable(engine);
spin_unlock(&engine->gt->irq_lock);
return true;
}
void intel_engine_irq_disable(struct intel_engine_cs *engine)
{
if (!engine->irq_disable)
return;
/* Caller disables interrupts */
spin_lock(&engine->gt->irq_lock);
engine->irq_disable(engine);
spin_unlock(&engine->gt->irq_lock);
}
void intel_engines_reset_default_submission(struct intel_gt *gt)
{
struct intel_engine_cs *engine;
@ -1349,7 +1380,7 @@ static struct intel_timeline *get_timeline(struct i915_request *rq)
struct intel_timeline *tl;
/*
* Even though we are holding the engine->active.lock here, there
* Even though we are holding the engine->sched_engine->lock here, there
* is no control over the submission queue per-se and we are
* inspecting the active state at a random point in time, with an
* unknown queue. Play safe and make sure the timeline remains valid.
@ -1504,8 +1535,8 @@ static void intel_engine_print_registers(struct intel_engine_cs *engine,
drm_printf(m, "\tExeclist tasklet queued? %s (%s), preempt? %s, timeslice? %s\n",
yesno(test_bit(TASKLET_STATE_SCHED,
&engine->execlists.tasklet.state)),
enableddisabled(!atomic_read(&engine->execlists.tasklet.count)),
&engine->sched_engine->tasklet.state)),
enableddisabled(!atomic_read(&engine->sched_engine->tasklet.count)),
repr_timer(&engine->execlists.preempt),
repr_timer(&engine->execlists.timer));
@ -1529,7 +1560,7 @@ static void intel_engine_print_registers(struct intel_engine_cs *engine,
idx, hws[idx * 2], hws[idx * 2 + 1]);
}
execlists_active_lock_bh(execlists);
i915_sched_engine_active_lock_bh(engine->sched_engine);
rcu_read_lock();
for (port = execlists->active; (rq = *port); port++) {
char hdr[160];
@ -1560,7 +1591,7 @@ static void intel_engine_print_registers(struct intel_engine_cs *engine,
i915_request_show(m, rq, hdr, 0);
}
rcu_read_unlock();
execlists_active_unlock_bh(execlists);
i915_sched_engine_active_unlock_bh(engine->sched_engine);
} else if (GRAPHICS_VER(dev_priv) > 6) {
drm_printf(m, "\tPP_DIR_BASE: 0x%08x\n",
ENGINE_READ(engine, RING_PP_DIR_BASE));
@ -1650,6 +1681,98 @@ static void print_properties(struct intel_engine_cs *engine,
read_ul(&engine->defaults, p->offset));
}
static void engine_dump_request(struct i915_request *rq, struct drm_printer *m, const char *msg)
{
struct intel_timeline *tl = get_timeline(rq);
i915_request_show(m, rq, msg, 0);
drm_printf(m, "\t\tring->start: 0x%08x\n",
i915_ggtt_offset(rq->ring->vma));
drm_printf(m, "\t\tring->head: 0x%08x\n",
rq->ring->head);
drm_printf(m, "\t\tring->tail: 0x%08x\n",
rq->ring->tail);
drm_printf(m, "\t\tring->emit: 0x%08x\n",
rq->ring->emit);
drm_printf(m, "\t\tring->space: 0x%08x\n",
rq->ring->space);
if (tl) {
drm_printf(m, "\t\tring->hwsp: 0x%08x\n",
tl->hwsp_offset);
intel_timeline_put(tl);
}
print_request_ring(m, rq);
if (rq->context->lrc_reg_state) {
drm_printf(m, "Logical Ring Context:\n");
hexdump(m, rq->context->lrc_reg_state, PAGE_SIZE);
}
}
void intel_engine_dump_active_requests(struct list_head *requests,
struct i915_request *hung_rq,
struct drm_printer *m)
{
struct i915_request *rq;
const char *msg;
enum i915_request_state state;
list_for_each_entry(rq, requests, sched.link) {
if (rq == hung_rq)
continue;
state = i915_test_request_state(rq);
if (state < I915_REQUEST_QUEUED)
continue;
if (state == I915_REQUEST_ACTIVE)
msg = "\t\tactive on engine";
else
msg = "\t\tactive in queue";
engine_dump_request(rq, m, msg);
}
}
static void engine_dump_active_requests(struct intel_engine_cs *engine, struct drm_printer *m)
{
struct i915_request *hung_rq = NULL;
struct intel_context *ce;
bool guc;
/*
* No need for an engine->irq_seqno_barrier() before the seqno reads.
* The GPU is still running so requests are still executing and any
* hardware reads will be out of date by the time they are reported.
* But the intention here is just to report an instantaneous snapshot
* so that's fine.
*/
lockdep_assert_held(&engine->sched_engine->lock);
drm_printf(m, "\tRequests:\n");
guc = intel_uc_uses_guc_submission(&engine->gt->uc);
if (guc) {
ce = intel_engine_get_hung_context(engine);
if (ce)
hung_rq = intel_context_find_active_request(ce);
} else {
hung_rq = intel_engine_execlist_find_hung_request(engine);
}
if (hung_rq)
engine_dump_request(hung_rq, m, "\t\thung");
if (guc)
intel_guc_dump_active_requests(engine, hung_rq, m);
else
intel_engine_dump_active_requests(&engine->sched_engine->requests,
hung_rq, m);
}
void intel_engine_dump(struct intel_engine_cs *engine,
struct drm_printer *m,
const char *header, ...)
@ -1694,41 +1817,12 @@ void intel_engine_dump(struct intel_engine_cs *engine,
i915_reset_count(error));
print_properties(engine, m);
drm_printf(m, "\tRequests:\n");
spin_lock_irqsave(&engine->sched_engine->lock, flags);
engine_dump_active_requests(engine, m);
spin_lock_irqsave(&engine->active.lock, flags);
rq = intel_engine_find_active_request(engine);
if (rq) {
struct intel_timeline *tl = get_timeline(rq);
i915_request_show(m, rq, "\t\tactive ", 0);
drm_printf(m, "\t\tring->start: 0x%08x\n",
i915_ggtt_offset(rq->ring->vma));
drm_printf(m, "\t\tring->head: 0x%08x\n",
rq->ring->head);
drm_printf(m, "\t\tring->tail: 0x%08x\n",
rq->ring->tail);
drm_printf(m, "\t\tring->emit: 0x%08x\n",
rq->ring->emit);
drm_printf(m, "\t\tring->space: 0x%08x\n",
rq->ring->space);
if (tl) {
drm_printf(m, "\t\tring->hwsp: 0x%08x\n",
tl->hwsp_offset);
intel_timeline_put(tl);
}
print_request_ring(m, rq);
if (rq->context->lrc_reg_state) {
drm_printf(m, "Logical Ring Context:\n");
hexdump(m, rq->context->lrc_reg_state, PAGE_SIZE);
}
}
drm_printf(m, "\tOn hold?: %lu\n", list_count(&engine->active.hold));
spin_unlock_irqrestore(&engine->active.lock, flags);
drm_printf(m, "\tOn hold?: %lu\n",
list_count(&engine->sched_engine->hold));
spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
drm_printf(m, "\tMMIO base: 0x%08x\n", engine->mmio_base);
wakeref = intel_runtime_pm_get_if_in_use(engine->uncore->rpm);
@ -1785,18 +1879,32 @@ ktime_t intel_engine_get_busy_time(struct intel_engine_cs *engine, ktime_t *now)
return total;
}
static bool match_ring(struct i915_request *rq)
struct intel_context *
intel_engine_create_virtual(struct intel_engine_cs **siblings,
unsigned int count)
{
u32 ring = ENGINE_READ(rq->engine, RING_START);
if (count == 0)
return ERR_PTR(-EINVAL);
return ring == i915_ggtt_offset(rq->ring->vma);
if (count == 1)
return intel_context_create(siblings[0]);
GEM_BUG_ON(!siblings[0]->cops->create_virtual);
return siblings[0]->cops->create_virtual(siblings, count);
}
struct i915_request *
intel_engine_find_active_request(struct intel_engine_cs *engine)
intel_engine_execlist_find_hung_request(struct intel_engine_cs *engine)
{
struct i915_request *request, *active = NULL;
/*
* This search does not work in GuC submission mode. However, the GuC
* will report the hanging context directly to the driver itself. So
* the driver should never get here when in GuC mode.
*/
GEM_BUG_ON(intel_uc_uses_guc_submission(&engine->gt->uc));
/*
* We are called by the error capture, reset and to dump engine
* state at random points in time. In particular, note that neither is
@ -1808,7 +1916,7 @@ intel_engine_find_active_request(struct intel_engine_cs *engine)
* At all other times, we must assume the GPU is still running, but
* we only care about the snapshot of this moment.
*/
lockdep_assert_held(&engine->active.lock);
lockdep_assert_held(&engine->sched_engine->lock);
rcu_read_lock();
request = execlists_active(&engine->execlists);
@ -1826,15 +1934,9 @@ intel_engine_find_active_request(struct intel_engine_cs *engine)
if (active)
return active;
list_for_each_entry(request, &engine->active.requests, sched.link) {
if (__i915_request_is_complete(request))
continue;
if (!__i915_request_has_started(request))
continue;
/* More than one preemptible request may match! */
if (!match_ring(request))
list_for_each_entry(request, &engine->sched_engine->requests,
sched.link) {
if (i915_test_request_state(request) != I915_REQUEST_ACTIVE)
continue;
active = request;

View File

@ -70,12 +70,38 @@ static void show_heartbeat(const struct i915_request *rq,
{
struct drm_printer p = drm_debug_printer("heartbeat");
intel_engine_dump(engine, &p,
"%s heartbeat {seqno:%llx:%lld, prio:%d} not ticking\n",
engine->name,
rq->fence.context,
rq->fence.seqno,
rq->sched.attr.priority);
if (!rq) {
intel_engine_dump(engine, &p,
"%s heartbeat not ticking\n",
engine->name);
} else {
intel_engine_dump(engine, &p,
"%s heartbeat {seqno:%llx:%lld, prio:%d} not ticking\n",
engine->name,
rq->fence.context,
rq->fence.seqno,
rq->sched.attr.priority);
}
}
static void
reset_engine(struct intel_engine_cs *engine, struct i915_request *rq)
{
if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
show_heartbeat(rq, engine);
if (intel_engine_uses_guc(engine))
/*
* GuC itself is toast or GuC's hang detection
* is disabled. Either way, need to find the
* hang culprit manually.
*/
intel_guc_find_hung_context(engine);
intel_gt_handle_error(engine->gt, engine->mask,
I915_ERROR_CAPTURE,
"stopped heartbeat on %s",
engine->name);
}
static void heartbeat(struct work_struct *wrk)
@ -102,6 +128,11 @@ static void heartbeat(struct work_struct *wrk)
if (intel_gt_is_wedged(engine->gt))
goto out;
if (i915_sched_engine_disabled(engine->sched_engine)) {
reset_engine(engine, engine->heartbeat.systole);
goto out;
}
if (engine->heartbeat.systole) {
long delay = READ_ONCE(engine->props.heartbeat_interval_ms);
@ -121,7 +152,7 @@ static void heartbeat(struct work_struct *wrk)
* but all other contexts, including the kernel
* context are stuck waiting for the signal.
*/
} else if (engine->schedule &&
} else if (engine->sched_engine->schedule &&
rq->sched.attr.priority < I915_PRIORITY_BARRIER) {
/*
* Gradually raise the priority of the heartbeat to
@ -136,16 +167,10 @@ static void heartbeat(struct work_struct *wrk)
attr.priority = I915_PRIORITY_BARRIER;
local_bh_disable();
engine->schedule(rq, &attr);
engine->sched_engine->schedule(rq, &attr);
local_bh_enable();
} else {
if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
show_heartbeat(rq, engine);
intel_gt_handle_error(engine->gt, engine->mask,
I915_ERROR_CAPTURE,
"stopped heartbeat on %s",
engine->name);
reset_engine(engine, rq);
}
rq->emitted_jiffies = jiffies;
@ -194,6 +219,25 @@ void intel_engine_park_heartbeat(struct intel_engine_cs *engine)
i915_request_put(fetch_and_zero(&engine->heartbeat.systole));
}
void intel_gt_unpark_heartbeats(struct intel_gt *gt)
{
struct intel_engine_cs *engine;
enum intel_engine_id id;
for_each_engine(engine, gt, id)
if (intel_engine_pm_is_awake(engine))
intel_engine_unpark_heartbeat(engine);
}
void intel_gt_park_heartbeats(struct intel_gt *gt)
{
struct intel_engine_cs *engine;
enum intel_engine_id id;
for_each_engine(engine, gt, id)
intel_engine_park_heartbeat(engine);
}
void intel_engine_init_heartbeat(struct intel_engine_cs *engine)
{
INIT_DELAYED_WORK(&engine->heartbeat.work, heartbeat);

View File

@ -7,6 +7,7 @@
#define INTEL_ENGINE_HEARTBEAT_H
struct intel_engine_cs;
struct intel_gt;
void intel_engine_init_heartbeat(struct intel_engine_cs *engine);
@ -16,6 +17,9 @@ int intel_engine_set_heartbeat(struct intel_engine_cs *engine,
void intel_engine_park_heartbeat(struct intel_engine_cs *engine);
void intel_engine_unpark_heartbeat(struct intel_engine_cs *engine);
void intel_gt_park_heartbeats(struct intel_gt *gt);
void intel_gt_unpark_heartbeats(struct intel_gt *gt);
int intel_engine_pulse(struct intel_engine_cs *engine);
int intel_engine_flush_barriers(struct intel_engine_cs *engine);

View File

@ -275,13 +275,11 @@ static int __engine_park(struct intel_wakeref *wf)
intel_breadcrumbs_park(engine->breadcrumbs);
/* Must be reset upon idling, or we may miss the busy wakeup. */
GEM_BUG_ON(engine->execlists.queue_priority_hint != INT_MIN);
GEM_BUG_ON(engine->sched_engine->queue_priority_hint != INT_MIN);
if (engine->park)
engine->park(engine);
engine->execlists.no_priolist = false;
/* While gt calls i915_vma_parked(), we have to break the lock cycle */
intel_gt_pm_put_async(engine->gt);
return 0;

View File

@ -21,32 +21,20 @@
#include "i915_pmu.h"
#include "i915_priolist_types.h"
#include "i915_selftest.h"
#include "intel_breadcrumbs_types.h"
#include "intel_sseu.h"
#include "intel_timeline_types.h"
#include "intel_uncore.h"
#include "intel_wakeref.h"
#include "intel_workarounds_types.h"
/* Legacy HW Engine ID */
#define RCS0_HW 0
#define VCS0_HW 1
#define BCS0_HW 2
#define VECS0_HW 3
#define VCS1_HW 4
#define VCS2_HW 6
#define VCS3_HW 7
#define VECS1_HW 12
/* Gen11+ HW Engine class + instance */
/* HW Engine class + instance */
#define RENDER_CLASS 0
#define VIDEO_DECODE_CLASS 1
#define VIDEO_ENHANCEMENT_CLASS 2
#define COPY_ENGINE_CLASS 3
#define OTHER_CLASS 4
#define MAX_ENGINE_CLASS 4
#define MAX_ENGINE_INSTANCE 3
#define MAX_ENGINE_INSTANCE 7
#define I915_MAX_SLICES 3
#define I915_MAX_SUBSLICES 8
@ -59,11 +47,13 @@ struct drm_i915_reg_table;
struct i915_gem_context;
struct i915_request;
struct i915_sched_attr;
struct i915_sched_engine;
struct intel_gt;
struct intel_ring;
struct intel_uncore;
struct intel_breadcrumbs;
typedef u8 intel_engine_mask_t;
typedef u32 intel_engine_mask_t;
#define ALL_ENGINES ((intel_engine_mask_t)~0ul)
struct intel_hw_status_page {
@ -100,8 +90,8 @@ struct i915_ctx_workarounds {
struct i915_vma *vma;
};
#define I915_MAX_VCS 4
#define I915_MAX_VECS 2
#define I915_MAX_VCS 8
#define I915_MAX_VECS 4
/*
* Engine IDs definitions.
@ -114,9 +104,15 @@ enum intel_engine_id {
VCS1,
VCS2,
VCS3,
VCS4,
VCS5,
VCS6,
VCS7,
#define _VCS(n) (VCS0 + (n))
VECS0,
VECS1,
VECS2,
VECS3,
#define _VECS(n) (VECS0 + (n))
I915_NUM_ENGINES
#define INVALID_ENGINE ((enum intel_engine_id)-1)
@ -137,11 +133,6 @@ struct st_preempt_hang {
* driver and the hardware state for execlist mode of submission.
*/
struct intel_engine_execlists {
/**
* @tasklet: softirq tasklet for bottom handler
*/
struct tasklet_struct tasklet;
/**
* @timer: kick the current context if its timeslice expires
*/
@ -152,11 +143,6 @@ struct intel_engine_execlists {
*/
struct timer_list preempt;
/**
* @default_priolist: priority list for I915_PRIORITY_NORMAL
*/
struct i915_priolist default_priolist;
/**
* @ccid: identifier for contexts submitted to this engine
*/
@ -191,11 +177,6 @@ struct intel_engine_execlists {
*/
u32 reset_ccid;
/**
* @no_priolist: priority lists disabled
*/
bool no_priolist;
/**
* @submit_reg: gen-specific execlist submission register
* set to the ExecList Submission Port (elsp) register pre-Gen11 and to
@ -238,23 +219,10 @@ struct intel_engine_execlists {
unsigned int port_mask;
/**
* @queue_priority_hint: Highest pending priority.
*
* When we add requests into the queue, or adjust the priority of
* executing requests, we compute the maximum priority of those
* pending requests. We can then use this value to determine if
* we need to preempt the executing requests to service the queue.
* However, since the we may have recorded the priority of an inflight
* request we wanted to preempt but since completed, at the time of
* dequeuing the priority hint may no longer may match the highest
* available request priority.
* @virtual: Queue of requets on a virtual engine, sorted by priority.
* Each RB entry is a struct i915_priolist containing a list of requests
* of the same priority.
*/
int queue_priority_hint;
/**
* @queue: queue of requests, in priority lists
*/
struct rb_root_cached queue;
struct rb_root_cached virtual;
/**
@ -295,7 +263,6 @@ struct intel_engine_cs {
enum intel_engine_id id;
enum intel_engine_id legacy_idx;
unsigned int hw_id;
unsigned int guc_id;
intel_engine_mask_t mask;
@ -326,15 +293,13 @@ struct intel_engine_cs {
struct intel_sseu sseu;
struct {
spinlock_t lock;
struct list_head requests;
struct list_head hold; /* ready requests, but on hold */
} active;
struct i915_sched_engine *sched_engine;
/* keep a request in reserve for a [pm] barrier under oom */
struct i915_request *request_pool;
struct intel_context *hung_ce;
struct llist_head barrier_tasks;
struct intel_context *kernel_context; /* pinned */
@ -419,6 +384,8 @@ struct intel_engine_cs {
void (*park)(struct intel_engine_cs *engine);
void (*unpark)(struct intel_engine_cs *engine);
void (*bump_serial)(struct intel_engine_cs *engine);
void (*set_default_submission)(struct intel_engine_cs *engine);
const struct intel_context_ops *cops;
@ -447,23 +414,14 @@ struct intel_engine_cs {
*/
void (*submit_request)(struct i915_request *rq);
/*
* Called on signaling of a SUBMIT_FENCE, passing along the signaling
* request down to the bonded pairs.
*/
void (*bond_execute)(struct i915_request *rq,
struct dma_fence *signal);
/*
* Call when the priority on a request has changed and it and its
* dependencies may need rescheduling. Note the request itself may
* not be ready to run!
*/
void (*schedule)(struct i915_request *request,
const struct i915_sched_attr *attr);
void (*release)(struct intel_engine_cs *engine);
/*
* Add / remove request from engine active tracking
*/
void (*add_active_request)(struct i915_request *rq);
void (*remove_active_request)(struct i915_request *rq);
struct intel_engine_execlists execlists;
/*
@ -485,6 +443,7 @@ struct intel_engine_cs {
#define I915_ENGINE_IS_VIRTUAL BIT(5)
#define I915_ENGINE_HAS_RELATIVE_MMIO BIT(6)
#define I915_ENGINE_REQUIRES_CMD_PARSER BIT(7)
#define I915_ENGINE_WANT_FORCED_PREEMPTION BIT(8)
unsigned int flags;
/*

View File

@ -11,6 +11,7 @@
#include "intel_engine.h"
#include "intel_engine_user.h"
#include "intel_gt.h"
#include "uc/intel_guc_submission.h"
struct intel_engine_cs *
intel_engine_lookup_user(struct drm_i915_private *i915, u8 class, u8 instance)
@ -108,13 +109,16 @@ static void set_scheduler_caps(struct drm_i915_private *i915)
for_each_uabi_engine(engine, i915) { /* all engines must agree! */
int i;
if (engine->schedule)
if (engine->sched_engine->schedule)
enabled |= (I915_SCHEDULER_CAP_ENABLED |
I915_SCHEDULER_CAP_PRIORITY);
else
disabled |= (I915_SCHEDULER_CAP_ENABLED |
I915_SCHEDULER_CAP_PRIORITY);
if (intel_uc_uses_guc_submission(&i915->gt.uc))
enabled |= I915_SCHEDULER_CAP_STATIC_PRIORITY_MAP;
for (i = 0; i < ARRAY_SIZE(map); i++) {
if (engine->flags & BIT(map[i].engine))
enabled |= BIT(map[i].sched);

File diff suppressed because it is too large Load Diff

View File

@ -32,15 +32,7 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine,
int indent),
unsigned int max);
struct intel_context *
intel_execlists_create_virtual(struct intel_engine_cs **siblings,
unsigned int count);
struct intel_context *
intel_execlists_clone_virtual(struct intel_engine_cs *src);
int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine,
const struct intel_engine_cs *master,
const struct intel_engine_cs *sibling);
bool
intel_engine_in_execlists_submission_mode(const struct intel_engine_cs *engine);
#endif /* __INTEL_EXECLISTS_SUBMISSION_H__ */

View File

@ -826,13 +826,13 @@ static int ggtt_probe_common(struct i915_ggtt *ggtt, u64 size)
phys_addr = pci_resource_start(pdev, 0) + pci_resource_len(pdev, 0) / 2;
/*
* On BXT+/CNL+ writes larger than 64 bit to the GTT pagetable range
* On BXT+/ICL+ writes larger than 64 bit to the GTT pagetable range
* will be dropped. For WC mappings in general we have 64 byte burst
* writes when the WC buffer is flushed, so we can't use it, but have to
* resort to an uncached mapping. The WC issue is easily caught by the
* readback check when writing GTT PTE entries.
*/
if (IS_GEN9_LP(i915) || GRAPHICS_VER(i915) >= 10)
if (IS_GEN9_LP(i915) || GRAPHICS_VER(i915) >= 11)
ggtt->gsm = ioremap(phys_addr, size);
else
ggtt->gsm = ioremap_wc(phys_addr, size);
@ -1494,7 +1494,7 @@ intel_partial_pages(const struct i915_ggtt_view *view,
if (ret)
goto err_sg_alloc;
iter = i915_gem_object_get_sg_dma(obj, view->partial.offset, &offset, true);
iter = i915_gem_object_get_sg_dma(obj, view->partial.offset, &offset);
GEM_BUG_ON(!iter);
sg = st->sgl;

View File

@ -123,8 +123,10 @@
#define MI_SEMAPHORE_SAD_NEQ_SDD (5 << 12)
#define MI_SEMAPHORE_TOKEN_MASK REG_GENMASK(9, 5)
#define MI_SEMAPHORE_TOKEN_SHIFT 5
#define MI_STORE_DATA_IMM MI_INSTR(0x20, 0)
#define MI_STORE_DWORD_IMM MI_INSTR(0x20, 1)
#define MI_STORE_DWORD_IMM_GEN4 MI_INSTR(0x20, 2)
#define MI_STORE_QWORD_IMM_GEN8 (MI_INSTR(0x20, 3) | REG_BIT(21))
#define MI_MEM_VIRTUAL (1 << 22) /* 945,g33,965 */
#define MI_USE_GGTT (1 << 22) /* g4x+ */
#define MI_STORE_DWORD_INDEX MI_INSTR(0x21, 1)

View File

@ -13,6 +13,7 @@
#include "intel_gt_clock_utils.h"
#include "intel_gt_pm.h"
#include "intel_gt_requests.h"
#include "intel_migrate.h"
#include "intel_mocs.h"
#include "intel_rc6.h"
#include "intel_renderstate.h"
@ -40,8 +41,8 @@ void intel_gt_init_early(struct intel_gt *gt, struct drm_i915_private *i915)
intel_gt_init_timelines(gt);
intel_gt_pm_init_early(gt);
intel_rps_init_early(&gt->rps);
intel_uc_init_early(&gt->uc);
intel_rps_init_early(&gt->rps);
}
int intel_gt_probe_lmem(struct intel_gt *gt)
@ -83,13 +84,73 @@ void intel_gt_init_hw_early(struct intel_gt *gt, struct i915_ggtt *ggtt)
gt->ggtt = ggtt;
}
static const struct intel_mmio_range icl_l3bank_steering_table[] = {
{ 0x00B100, 0x00B3FF },
{},
};
static const struct intel_mmio_range xehpsdv_mslice_steering_table[] = {
{ 0x004000, 0x004AFF },
{ 0x00C800, 0x00CFFF },
{ 0x00DD00, 0x00DDFF },
{ 0x00E900, 0x00FFFF }, /* 0xEA00 - OxEFFF is unused */
{},
};
static const struct intel_mmio_range xehpsdv_lncf_steering_table[] = {
{ 0x00B000, 0x00B0FF },
{ 0x00D800, 0x00D8FF },
{},
};
static const struct intel_mmio_range dg2_lncf_steering_table[] = {
{ 0x00B000, 0x00B0FF },
{ 0x00D880, 0x00D8FF },
{},
};
static u16 slicemask(struct intel_gt *gt, int count)
{
u64 dss_mask = intel_sseu_get_subslices(&gt->info.sseu, 0);
return intel_slicemask_from_dssmask(dss_mask, count);
}
int intel_gt_init_mmio(struct intel_gt *gt)
{
struct drm_i915_private *i915 = gt->i915;
intel_gt_init_clock_frequency(gt);
intel_uc_init_mmio(&gt->uc);
intel_sseu_info_init(gt);
/*
* An mslice is unavailable only if both the meml3 for the slice is
* disabled *and* all of the DSS in the slice (quadrant) are disabled.
*/
if (HAS_MSLICES(i915))
gt->info.mslice_mask =
slicemask(gt, GEN_DSS_PER_MSLICE) |
(intel_uncore_read(gt->uncore, GEN10_MIRROR_FUSE3) &
GEN12_MEML3_EN_MASK);
if (IS_DG2(i915)) {
gt->steering_table[MSLICE] = xehpsdv_mslice_steering_table;
gt->steering_table[LNCF] = dg2_lncf_steering_table;
} else if (IS_XEHPSDV(i915)) {
gt->steering_table[MSLICE] = xehpsdv_mslice_steering_table;
gt->steering_table[LNCF] = xehpsdv_lncf_steering_table;
} else if (GRAPHICS_VER(i915) >= 11 &&
GRAPHICS_VER_FULL(i915) < IP_VER(12, 50)) {
gt->steering_table[L3BANK] = icl_l3bank_steering_table;
gt->info.l3bank_mask =
~intel_uncore_read(gt->uncore, GEN10_MIRROR_FUSE3) &
GEN10_L3BANK_MASK;
} else if (HAS_MSLICES(i915)) {
MISSING_CASE(INTEL_INFO(i915)->platform);
}
return intel_engines_init_mmio(gt);
}
@ -192,7 +253,7 @@ static void clear_register(struct intel_uncore *uncore, i915_reg_t reg)
intel_uncore_rmw(uncore, reg, 0, 0);
}
static void gen8_clear_engine_error_register(struct intel_engine_cs *engine)
static void gen6_clear_engine_error_register(struct intel_engine_cs *engine)
{
GEN6_RING_FAULT_REG_RMW(engine, RING_FAULT_VALID, 0);
GEN6_RING_FAULT_REG_POSTING_READ(engine);
@ -238,7 +299,7 @@ intel_gt_clear_error_registers(struct intel_gt *gt,
enum intel_engine_id id;
for_each_engine_masked(engine, gt, engine_mask, id)
gen8_clear_engine_error_register(engine);
gen6_clear_engine_error_register(engine);
}
}
@ -572,6 +633,25 @@ static void __intel_gt_disable(struct intel_gt *gt)
GEM_BUG_ON(intel_gt_pm_is_awake(gt));
}
int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout)
{
long remaining_timeout;
/* If the device is asleep, we have no requests outstanding */
if (!intel_gt_pm_is_awake(gt))
return 0;
while ((timeout = intel_gt_retire_requests_timeout(gt, timeout,
&remaining_timeout)) > 0) {
cond_resched();
if (signal_pending(current))
return -EINTR;
}
return timeout ? timeout : intel_uc_wait_for_idle(&gt->uc,
remaining_timeout);
}
int intel_gt_init(struct intel_gt *gt)
{
int err;
@ -622,10 +702,14 @@ int intel_gt_init(struct intel_gt *gt)
if (err)
goto err_gt;
intel_uc_init_late(&gt->uc);
err = i915_inject_probe_error(gt->i915, -EIO);
if (err)
goto err_gt;
intel_migrate_init(&gt->migrate, gt);
goto out_fw;
err_gt:
__intel_gt_disable(gt);
@ -649,6 +733,7 @@ void intel_gt_driver_remove(struct intel_gt *gt)
{
__intel_gt_disable(gt);
intel_migrate_fini(&gt->migrate);
intel_uc_driver_remove(&gt->uc);
intel_engines_release(gt);
@ -697,6 +782,112 @@ void intel_gt_driver_late_release(struct intel_gt *gt)
intel_engines_free(gt);
}
/**
* intel_gt_reg_needs_read_steering - determine whether a register read
* requires explicit steering
* @gt: GT structure
* @reg: the register to check steering requirements for
* @type: type of multicast steering to check
*
* Determines whether @reg needs explicit steering of a specific type for
* reads.
*
* Returns false if @reg does not belong to a register range of the given
* steering type, or if the default (subslice-based) steering IDs are suitable
* for @type steering too.
*/
static bool intel_gt_reg_needs_read_steering(struct intel_gt *gt,
i915_reg_t reg,
enum intel_steering_type type)
{
const u32 offset = i915_mmio_reg_offset(reg);
const struct intel_mmio_range *entry;
if (likely(!intel_gt_needs_read_steering(gt, type)))
return false;
for (entry = gt->steering_table[type]; entry->end; entry++) {
if (offset >= entry->start && offset <= entry->end)
return true;
}
return false;
}
/**
* intel_gt_get_valid_steering - determines valid IDs for a class of MCR steering
* @gt: GT structure
* @type: multicast register type
* @sliceid: Slice ID returned
* @subsliceid: Subslice ID returned
*
* Determines sliceid and subsliceid values that will steer reads
* of a specific multicast register class to a valid value.
*/
static void intel_gt_get_valid_steering(struct intel_gt *gt,
enum intel_steering_type type,
u8 *sliceid, u8 *subsliceid)
{
switch (type) {
case L3BANK:
GEM_DEBUG_WARN_ON(!gt->info.l3bank_mask); /* should be impossible! */
*sliceid = 0; /* unused */
*subsliceid = __ffs(gt->info.l3bank_mask);
break;
case MSLICE:
GEM_DEBUG_WARN_ON(!gt->info.mslice_mask); /* should be impossible! */
*sliceid = __ffs(gt->info.mslice_mask);
*subsliceid = 0; /* unused */
break;
case LNCF:
GEM_DEBUG_WARN_ON(!gt->info.mslice_mask); /* should be impossible! */
/*
* An LNCF is always present if its mslice is present, so we
* can safely just steer to LNCF 0 in all cases.
*/
*sliceid = __ffs(gt->info.mslice_mask) << 1;
*subsliceid = 0; /* unused */
break;
default:
MISSING_CASE(type);
*sliceid = 0;
*subsliceid = 0;
}
}
/**
* intel_gt_read_register_fw - reads a GT register with support for multicast
* @gt: GT structure
* @reg: register to read
*
* This function will read a GT register. If the register is a multicast
* register, the read will be steered to a valid instance (i.e., one that
* isn't fused off or powered down by power gating).
*
* Returns the value from a valid instance of @reg.
*/
u32 intel_gt_read_register_fw(struct intel_gt *gt, i915_reg_t reg)
{
int type;
u8 sliceid, subsliceid;
for (type = 0; type < NUM_STEERING_TYPES; type++) {
if (intel_gt_reg_needs_read_steering(gt, reg, type)) {
intel_gt_get_valid_steering(gt, type, &sliceid,
&subsliceid);
return intel_uncore_read_with_mcr_steering_fw(gt->uncore,
reg,
sliceid,
subsliceid);
}
}
return intel_uncore_read_fw(gt->uncore, reg);
}
void intel_gt_info_print(const struct intel_gt_info *info,
struct drm_printer *p)
{

View File

@ -48,6 +48,8 @@ void intel_gt_driver_release(struct intel_gt *gt);
void intel_gt_driver_late_release(struct intel_gt *gt);
int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);
void intel_gt_check_and_clear_faults(struct intel_gt *gt);
void intel_gt_clear_error_registers(struct intel_gt *gt,
intel_engine_mask_t engine_mask);
@ -75,6 +77,14 @@ static inline bool intel_gt_is_wedged(const struct intel_gt *gt)
return unlikely(test_bit(I915_WEDGED, &gt->reset.flags));
}
static inline bool intel_gt_needs_read_steering(struct intel_gt *gt,
enum intel_steering_type type)
{
return gt->steering_table[type];
}
u32 intel_gt_read_register_fw(struct intel_gt *gt, i915_reg_t reg);
void intel_gt_info_print(const struct intel_gt_info *info,
struct drm_printer *p);

View File

@ -24,8 +24,8 @@ static u32 read_reference_ts_freq(struct intel_uncore *uncore)
return base_freq + frac_freq;
}
static u32 gen10_get_crystal_clock_freq(struct intel_uncore *uncore,
u32 rpm_config_reg)
static u32 gen9_get_crystal_clock_freq(struct intel_uncore *uncore,
u32 rpm_config_reg)
{
u32 f19_2_mhz = 19200000;
u32 f24_mhz = 24000000;
@ -128,10 +128,10 @@ static u32 read_clock_frequency(struct intel_uncore *uncore)
} else {
u32 c0 = intel_uncore_read(uncore, RPM_CONFIG0);
if (GRAPHICS_VER(uncore->i915) <= 10)
freq = gen10_get_crystal_clock_freq(uncore, c0);
else
if (GRAPHICS_VER(uncore->i915) >= 11)
freq = gen11_get_crystal_clock_freq(uncore, c0);
else
freq = gen9_get_crystal_clock_freq(uncore, c0);
/*
* Now figure out how the command stream's timestamp

View File

@ -184,7 +184,13 @@ void gen11_gt_irq_reset(struct intel_gt *gt)
intel_uncore_write(uncore, GEN11_BCS_RSVD_INTR_MASK, ~0);
intel_uncore_write(uncore, GEN11_VCS0_VCS1_INTR_MASK, ~0);
intel_uncore_write(uncore, GEN11_VCS2_VCS3_INTR_MASK, ~0);
if (HAS_ENGINE(gt, VCS4) || HAS_ENGINE(gt, VCS5))
intel_uncore_write(uncore, GEN12_VCS4_VCS5_INTR_MASK, ~0);
if (HAS_ENGINE(gt, VCS6) || HAS_ENGINE(gt, VCS7))
intel_uncore_write(uncore, GEN12_VCS6_VCS7_INTR_MASK, ~0);
intel_uncore_write(uncore, GEN11_VECS0_VECS1_INTR_MASK, ~0);
if (HAS_ENGINE(gt, VECS2) || HAS_ENGINE(gt, VECS3))
intel_uncore_write(uncore, GEN12_VECS2_VECS3_INTR_MASK, ~0);
intel_uncore_write(uncore, GEN11_GPM_WGBOXPERF_INTR_ENABLE, 0);
intel_uncore_write(uncore, GEN11_GPM_WGBOXPERF_INTR_MASK, ~0);
@ -218,8 +224,13 @@ void gen11_gt_irq_postinstall(struct intel_gt *gt)
intel_uncore_write(uncore, GEN11_BCS_RSVD_INTR_MASK, ~smask);
intel_uncore_write(uncore, GEN11_VCS0_VCS1_INTR_MASK, ~dmask);
intel_uncore_write(uncore, GEN11_VCS2_VCS3_INTR_MASK, ~dmask);
if (HAS_ENGINE(gt, VCS4) || HAS_ENGINE(gt, VCS5))
intel_uncore_write(uncore, GEN12_VCS4_VCS5_INTR_MASK, ~dmask);
if (HAS_ENGINE(gt, VCS6) || HAS_ENGINE(gt, VCS7))
intel_uncore_write(uncore, GEN12_VCS6_VCS7_INTR_MASK, ~dmask);
intel_uncore_write(uncore, GEN11_VECS0_VECS1_INTR_MASK, ~dmask);
if (HAS_ENGINE(gt, VECS2) || HAS_ENGINE(gt, VECS3))
intel_uncore_write(uncore, GEN12_VECS2_VECS3_INTR_MASK, ~dmask);
/*
* RPS interrupts will get enabled/disabled on demand when RPS itself
* is enabled/disabled.

View File

@ -6,7 +6,6 @@
#include <linux/suspend.h>
#include "i915_drv.h"
#include "i915_globals.h"
#include "i915_params.h"
#include "intel_context.h"
#include "intel_engine_pm.h"
@ -67,8 +66,6 @@ static int __gt_unpark(struct intel_wakeref *wf)
GT_TRACE(gt, "\n");
i915_globals_unpark();
/*
* It seems that the DMC likes to transition between the DC states a lot
* when there are no connected displays (no active power domains) during
@ -116,8 +113,6 @@ static int __gt_park(struct intel_wakeref *wf)
GEM_BUG_ON(!wakeref);
intel_display_power_put_async(i915, POWER_DOMAIN_GT_IRQ, wakeref);
i915_globals_park();
return 0;
}
@ -174,8 +169,6 @@ static void gt_sanitize(struct intel_gt *gt, bool force)
if (intel_gt_is_wedged(gt))
intel_gt_unset_wedged(gt);
intel_uc_sanitize(&gt->uc);
for_each_engine(engine, gt, id)
if (engine->reset.prepare)
engine->reset.prepare(engine);
@ -191,6 +184,8 @@ static void gt_sanitize(struct intel_gt *gt, bool force)
__intel_engine_reset(engine, false);
}
intel_uc_reset(&gt->uc, false);
for_each_engine(engine, gt, id)
if (engine->reset.finish)
engine->reset.finish(engine);
@ -243,6 +238,8 @@ int intel_gt_resume(struct intel_gt *gt)
goto err_wedged;
}
intel_uc_reset_finish(&gt->uc);
intel_rps_enable(&gt->rps);
intel_llc_enable(&gt->llc);

View File

@ -130,7 +130,8 @@ void intel_engine_fini_retire(struct intel_engine_cs *engine)
GEM_BUG_ON(engine->retire);
}
long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout)
long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
long *remaining_timeout)
{
struct intel_gt_timelines *timelines = &gt->timelines;
struct intel_timeline *tl, *tn;
@ -195,24 +196,12 @@ out_active: spin_lock(&timelines->lock);
if (flush_submission(gt, timeout)) /* Wait, there's more! */
active_count++;
if (remaining_timeout)
*remaining_timeout = timeout;
return active_count ? timeout : 0;
}
int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout)
{
/* If the device is asleep, we have no requests outstanding */
if (!intel_gt_pm_is_awake(gt))
return 0;
while ((timeout = intel_gt_retire_requests_timeout(gt, timeout)) > 0) {
cond_resched();
if (signal_pending(current))
return -EINTR;
}
return timeout;
}
static void retire_work_handler(struct work_struct *work)
{
struct intel_gt *gt =

View File

@ -6,14 +6,17 @@
#ifndef INTEL_GT_REQUESTS_H
#define INTEL_GT_REQUESTS_H
#include <stddef.h>
struct intel_engine_cs;
struct intel_gt;
struct intel_timeline;
long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout);
long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
long *remaining_timeout);
static inline void intel_gt_retire_requests(struct intel_gt *gt)
{
intel_gt_retire_requests_timeout(gt, 0);
intel_gt_retire_requests_timeout(gt, 0, NULL);
}
void intel_engine_init_retire(struct intel_engine_cs *engine);
@ -21,8 +24,6 @@ void intel_engine_add_retire(struct intel_engine_cs *engine,
struct intel_timeline *tl);
void intel_engine_fini_retire(struct intel_engine_cs *engine);
int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);
void intel_gt_init_requests(struct intel_gt *gt);
void intel_gt_park_requests(struct intel_gt *gt);
void intel_gt_unpark_requests(struct intel_gt *gt);

View File

@ -24,6 +24,7 @@
#include "intel_reset_types.h"
#include "intel_rc6_types.h"
#include "intel_rps_types.h"
#include "intel_migrate_types.h"
#include "intel_wakeref.h"
struct drm_i915_private;
@ -31,6 +32,33 @@ struct i915_ggtt;
struct intel_engine_cs;
struct intel_uncore;
struct intel_mmio_range {
u32 start;
u32 end;
};
/*
* The hardware has multiple kinds of multicast register ranges that need
* special register steering (and future platforms are expected to add
* additional types).
*
* During driver startup, we initialize the steering control register to
* direct reads to a slice/subslice that are valid for the 'subslice' class
* of multicast registers. If another type of steering does not have any
* overlap in valid steering targets with 'subslice' style registers, we will
* need to explicitly re-steer reads of registers of the other type.
*
* Only the replication types that may need additional non-default steering
* are listed here.
*/
enum intel_steering_type {
L3BANK,
MSLICE,
LNCF,
NUM_STEERING_TYPES
};
enum intel_submission_method {
INTEL_SUBMISSION_RING,
INTEL_SUBMISSION_ELSP,
@ -145,8 +173,15 @@ struct intel_gt {
struct i915_vma *scratch;
struct intel_migrate migrate;
const struct intel_mmio_range *steering_table[NUM_STEERING_TYPES];
struct intel_gt_info {
intel_engine_mask_t engine_mask;
u32 l3bank_mask;
u8 num_engines;
/* Media engine access to SFC per instance */
@ -154,6 +189,8 @@ struct intel_gt {
/* Slice/subslice/EU info */
struct sseu_dev_info sseu;
unsigned long mslice_mask;
} info;
};

View File

@ -16,7 +16,19 @@ struct drm_i915_gem_object *alloc_pt_lmem(struct i915_address_space *vm, int sz)
{
struct drm_i915_gem_object *obj;
obj = i915_gem_object_create_lmem(vm->i915, sz, 0);
/*
* To avoid severe over-allocation when dealing with min_page_size
* restrictions, we override that behaviour here by allowing an object
* size and page layout which can be smaller. In practice this should be
* totally fine, since GTT paging structures are not typically inserted
* into the GTT.
*
* Note that we also hit this path for the scratch page, and for this
* case it might need to be 64K, but that should work fine here since we
* used the passed in size for the page size, which should ensure it
* also has the same alignment.
*/
obj = __i915_gem_object_create_lmem_with_ps(vm->i915, sz, sz, 0);
/*
* Ensure all paging structures for this vm share the same dma-resv
* object underneath, with the idea that one object_lock() will lock
@ -414,7 +426,7 @@ static void tgl_setup_private_ppat(struct intel_uncore *uncore)
intel_uncore_write(uncore, GEN12_PAT_INDEX(7), GEN8_PPAT_WB);
}
static void cnl_setup_private_ppat(struct intel_uncore *uncore)
static void icl_setup_private_ppat(struct intel_uncore *uncore)
{
intel_uncore_write(uncore,
GEN10_PAT_INDEX(0),
@ -514,8 +526,8 @@ void setup_private_pat(struct intel_uncore *uncore)
if (GRAPHICS_VER(i915) >= 12)
tgl_setup_private_ppat(uncore);
else if (GRAPHICS_VER(i915) >= 10)
cnl_setup_private_ppat(uncore);
else if (GRAPHICS_VER(i915) >= 11)
icl_setup_private_ppat(uncore);
else if (IS_CHERRYVIEW(i915) || IS_GEN9_LP(i915))
chv_setup_private_ppat(uncore);
else

View File

@ -140,7 +140,6 @@ typedef u64 gen8_pte_t;
enum i915_cache_level;
struct drm_i915_file_private;
struct drm_i915_gem_object;
struct i915_fence_reg;
struct i915_vma;
@ -220,16 +219,6 @@ struct i915_address_space {
struct intel_gt *gt;
struct drm_i915_private *i915;
struct device *dma;
/*
* Every address space belongs to a struct file - except for the global
* GTT that is owned by the driver (and so @file is set to NULL). In
* principle, no information should leak from one context to another
* (or between files/processes etc) unless explicitly shared by the
* owner. Tracking the owner is important in order to free up per-file
* objects along with the file, to aide resource tracking, and to
* assign blame.
*/
struct drm_i915_file_private *file;
u64 total; /* size addr space maps (ex. 2GB for ggtt) */
u64 reserved; /* size addr space reserved */
@ -296,6 +285,13 @@ struct i915_address_space {
u32 flags);
void (*cleanup)(struct i915_address_space *vm);
void (*foreach)(struct i915_address_space *vm,
u64 start, u64 length,
void (*fn)(struct i915_address_space *vm,
struct i915_page_table *pt,
void *data),
void *data);
struct i915_vma_ops vma_ops;
I915_SELFTEST_DECLARE(struct fault_attr fault_attr);

View File

@ -70,7 +70,7 @@ static void set_offsets(u32 *regs,
if (close) {
/* Close the batch; used mainly by live_lrc_layout() */
*regs = MI_BATCH_BUFFER_END;
if (GRAPHICS_VER(engine->i915) >= 10)
if (GRAPHICS_VER(engine->i915) >= 11)
*regs |= BIT(0);
}
}
@ -484,6 +484,47 @@ static const u8 gen12_rcs_offsets[] = {
END
};
static const u8 xehp_rcs_offsets[] = {
NOP(1),
LRI(13, POSTED),
REG16(0x244),
REG(0x034),
REG(0x030),
REG(0x038),
REG(0x03c),
REG(0x168),
REG(0x140),
REG(0x110),
REG(0x1c0),
REG(0x1c4),
REG(0x1c8),
REG(0x180),
REG16(0x2b4),
NOP(5),
LRI(9, POSTED),
REG16(0x3a8),
REG16(0x28c),
REG16(0x288),
REG16(0x284),
REG16(0x280),
REG16(0x27c),
REG16(0x278),
REG16(0x274),
REG16(0x270),
LRI(3, POSTED),
REG(0x1b0),
REG16(0x5a8),
REG16(0x5ac),
NOP(6),
LRI(1, 0),
REG(0x0c8),
END
};
#undef END
#undef REG16
#undef REG
@ -502,7 +543,9 @@ static const u8 *reg_offsets(const struct intel_engine_cs *engine)
!intel_engine_has_relative_mmio(engine));
if (engine->class == RENDER_CLASS) {
if (GRAPHICS_VER(engine->i915) >= 12)
if (GRAPHICS_VER_FULL(engine->i915) >= IP_VER(12, 50))
return xehp_rcs_offsets;
else if (GRAPHICS_VER(engine->i915) >= 12)
return gen12_rcs_offsets;
else if (GRAPHICS_VER(engine->i915) >= 11)
return gen11_rcs_offsets;
@ -522,7 +565,9 @@ static const u8 *reg_offsets(const struct intel_engine_cs *engine)
static int lrc_ring_mi_mode(const struct intel_engine_cs *engine)
{
if (GRAPHICS_VER(engine->i915) >= 12)
if (GRAPHICS_VER_FULL(engine->i915) >= IP_VER(12, 50))
return 0x70;
else if (GRAPHICS_VER(engine->i915) >= 12)
return 0x60;
else if (GRAPHICS_VER(engine->i915) >= 9)
return 0x54;
@ -534,7 +579,9 @@ static int lrc_ring_mi_mode(const struct intel_engine_cs *engine)
static int lrc_ring_gpr0(const struct intel_engine_cs *engine)
{
if (GRAPHICS_VER(engine->i915) >= 12)
if (GRAPHICS_VER_FULL(engine->i915) >= IP_VER(12, 50))
return 0x84;
else if (GRAPHICS_VER(engine->i915) >= 12)
return 0x74;
else if (GRAPHICS_VER(engine->i915) >= 9)
return 0x68;
@ -578,10 +625,16 @@ static int lrc_ring_indirect_offset(const struct intel_engine_cs *engine)
static int lrc_ring_cmd_buf_cctl(const struct intel_engine_cs *engine)
{
if (engine->class != RENDER_CLASS)
return -1;
if (GRAPHICS_VER(engine->i915) >= 12)
if (GRAPHICS_VER_FULL(engine->i915) >= IP_VER(12, 50))
/*
* Note that the CSFE context has a dummy slot for CMD_BUF_CCTL
* simply to match the RCS context image layout.
*/
return 0xc6;
else if (engine->class != RENDER_CLASS)
return -1;
else if (GRAPHICS_VER(engine->i915) >= 12)
return 0xb6;
else if (GRAPHICS_VER(engine->i915) >= 11)
return 0xaa;
@ -600,8 +653,6 @@ lrc_ring_indirect_offset_default(const struct intel_engine_cs *engine)
return GEN12_CTX_RCS_INDIRECT_CTX_OFFSET_DEFAULT;
case 11:
return GEN11_CTX_RCS_INDIRECT_CTX_OFFSET_DEFAULT;
case 10:
return GEN10_CTX_RCS_INDIRECT_CTX_OFFSET_DEFAULT;
case 9:
return GEN9_CTX_RCS_INDIRECT_CTX_OFFSET_DEFAULT;
case 8:
@ -845,7 +896,7 @@ int lrc_alloc(struct intel_context *ce, struct intel_engine_cs *engine)
if (IS_ERR(vma))
return PTR_ERR(vma);
ring = intel_engine_create_ring(engine, (unsigned long)ce->ring);
ring = intel_engine_create_ring(engine, ce->ring_size);
if (IS_ERR(ring)) {
err = PTR_ERR(ring);
goto err_vma;
@ -1101,6 +1152,14 @@ setup_indirect_ctx_bb(const struct intel_context *ce,
* bits 55-60: SW counter
* bits 61-63: engine class
*
* On Xe_HP, the upper dword of the descriptor has a new format:
*
* bits 32-37: virtual function number
* bit 38: mbz, reserved for use by hardware
* bits 39-54: SW context ID
* bits 55-57: reserved
* bits 58-63: SW counter
*
* engine info, SW context ID and SW counter need to form a unique number
* (Context ID) per lrc.
*/
@ -1387,40 +1446,6 @@ static u32 *gen9_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch)
return batch;
}
static u32 *
gen10_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch)
{
int i;
/*
* WaPipeControlBefore3DStateSamplePattern: cnl
*
* Ensure the engine is idle prior to programming a
* 3DSTATE_SAMPLE_PATTERN during a context restore.
*/
batch = gen8_emit_pipe_control(batch,
PIPE_CONTROL_CS_STALL,
0);
/*
* WaPipeControlBefore3DStateSamplePattern says we need 4 dwords for
* the PIPE_CONTROL followed by 12 dwords of 0x0, so 16 dwords in
* total. However, a PIPE_CONTROL is 6 dwords long, not 4, which is
* confusing. Since gen8_emit_pipe_control() already advances the
* batch by 6 dwords, we advance the other 10 here, completing a
* cacheline. It's not clear if the workaround requires this padding
* before other commands, or if it's just the regular padding we would
* already have for the workaround bb, so leave it here for now.
*/
for (i = 0; i < 10; i++)
*batch++ = MI_NOOP;
/* Pad to end of cacheline */
while ((unsigned long)batch % CACHELINE_BYTES)
*batch++ = MI_NOOP;
return batch;
}
#define CTX_WA_BB_SIZE (PAGE_SIZE)
static int lrc_create_wa_ctx(struct intel_engine_cs *engine)
@ -1473,10 +1498,6 @@ void lrc_init_wa_ctx(struct intel_engine_cs *engine)
case 12:
case 11:
return;
case 10:
wa_bb_fn[0] = gen10_init_indirectctx_bb;
wa_bb_fn[1] = NULL;
break;
case 9:
wa_bb_fn[0] = gen9_init_indirectctx_bb;
wa_bb_fn[1] = NULL;

View File

@ -87,9 +87,10 @@
#define GEN11_CSB_WRITE_PTR_MASK (GEN11_CSB_PTR_MASK << 0)
#define MAX_CONTEXT_HW_ID (1 << 21) /* exclusive */
#define MAX_GUC_CONTEXT_HW_ID (1 << 20) /* exclusive */
#define GEN11_MAX_CONTEXT_HW_ID (1 << 11) /* exclusive */
/* in Gen12 ID 0x7FF is reserved to indicate idle */
#define GEN12_MAX_CONTEXT_HW_ID (GEN11_MAX_CONTEXT_HW_ID - 1)
/* in Xe_HP ID 0xFFFF is reserved to indicate "invalid context" */
#define XEHP_MAX_CONTEXT_HW_ID 0xFFFF
#endif /* _INTEL_LRC_REG_H_ */

View File

@ -0,0 +1,688 @@
// SPDX-License-Identifier: MIT
/*
* Copyright © 2020 Intel Corporation
*/
#include "i915_drv.h"
#include "intel_context.h"
#include "intel_gpu_commands.h"
#include "intel_gt.h"
#include "intel_gtt.h"
#include "intel_migrate.h"
#include "intel_ring.h"
struct insert_pte_data {
u64 offset;
bool is_lmem;
};
#define CHUNK_SZ SZ_8M /* ~1ms at 8GiB/s preemption delay */
static bool engine_supports_migration(struct intel_engine_cs *engine)
{
if (!engine)
return false;
/*
* We need the ability to prevent aribtration (MI_ARB_ON_OFF),
* the ability to write PTE using inline data (MI_STORE_DATA)
* and of course the ability to do the block transfer (blits).
*/
GEM_BUG_ON(engine->class != COPY_ENGINE_CLASS);
return true;
}
static void insert_pte(struct i915_address_space *vm,
struct i915_page_table *pt,
void *data)
{
struct insert_pte_data *d = data;
vm->insert_page(vm, px_dma(pt), d->offset, I915_CACHE_NONE,
d->is_lmem ? PTE_LM : 0);
d->offset += PAGE_SIZE;
}
static struct i915_address_space *migrate_vm(struct intel_gt *gt)
{
struct i915_vm_pt_stash stash = {};
struct i915_ppgtt *vm;
int err;
int i;
/*
* We construct a very special VM for use by all migration contexts,
* it is kept pinned so that it can be used at any time. As we need
* to pre-allocate the page directories for the migration VM, this
* limits us to only using a small number of prepared vma.
*
* To be able to pipeline and reschedule migration operations while
* avoiding unnecessary contention on the vm itself, the PTE updates
* are inline with the blits. All the blits use the same fixed
* addresses, with the backing store redirection being updated on the
* fly. Only 2 implicit vma are used for all migration operations.
*
* We lay the ppGTT out as:
*
* [0, CHUNK_SZ) -> first object
* [CHUNK_SZ, 2 * CHUNK_SZ) -> second object
* [2 * CHUNK_SZ, 2 * CHUNK_SZ + 2 * CHUNK_SZ >> 9] -> PTE
*
* By exposing the dma addresses of the page directories themselves
* within the ppGTT, we are then able to rewrite the PTE prior to use.
* But the PTE update and subsequent migration operation must be atomic,
* i.e. within the same non-preemptible window so that we do not switch
* to another migration context that overwrites the PTE.
*
* TODO: Add support for huge LMEM PTEs
*/
vm = i915_ppgtt_create(gt);
if (IS_ERR(vm))
return ERR_CAST(vm);
if (!vm->vm.allocate_va_range || !vm->vm.foreach) {
err = -ENODEV;
goto err_vm;
}
/*
* Each engine instance is assigned its own chunk in the VM, so
* that we can run multiple instances concurrently
*/
for (i = 0; i < ARRAY_SIZE(gt->engine_class[COPY_ENGINE_CLASS]); i++) {
struct intel_engine_cs *engine;
u64 base = (u64)i << 32;
struct insert_pte_data d = {};
struct i915_gem_ww_ctx ww;
u64 sz;
engine = gt->engine_class[COPY_ENGINE_CLASS][i];
if (!engine_supports_migration(engine))
continue;
/*
* We copy in 8MiB chunks. Each PDE covers 2MiB, so we need
* 4x2 page directories for source/destination.
*/
sz = 2 * CHUNK_SZ;
d.offset = base + sz;
/*
* We need another page directory setup so that we can write
* the 8x512 PTE in each chunk.
*/
sz += (sz >> 12) * sizeof(u64);
err = i915_vm_alloc_pt_stash(&vm->vm, &stash, sz);
if (err)
goto err_vm;
for_i915_gem_ww(&ww, err, true) {
err = i915_vm_lock_objects(&vm->vm, &ww);
if (err)
continue;
err = i915_vm_map_pt_stash(&vm->vm, &stash);
if (err)
continue;
vm->vm.allocate_va_range(&vm->vm, &stash, base, sz);
}
i915_vm_free_pt_stash(&vm->vm, &stash);
if (err)
goto err_vm;
/* Now allow the GPU to rewrite the PTE via its own ppGTT */
d.is_lmem = i915_gem_object_is_lmem(vm->vm.scratch[0]);
vm->vm.foreach(&vm->vm, base, base + sz, insert_pte, &d);
}
return &vm->vm;
err_vm:
i915_vm_put(&vm->vm);
return ERR_PTR(err);
}
static struct intel_engine_cs *first_copy_engine(struct intel_gt *gt)
{
struct intel_engine_cs *engine;
int i;
for (i = 0; i < ARRAY_SIZE(gt->engine_class[COPY_ENGINE_CLASS]); i++) {
engine = gt->engine_class[COPY_ENGINE_CLASS][i];
if (engine_supports_migration(engine))
return engine;
}
return NULL;
}
static struct intel_context *pinned_context(struct intel_gt *gt)
{
static struct lock_class_key key;
struct intel_engine_cs *engine;
struct i915_address_space *vm;
struct intel_context *ce;
engine = first_copy_engine(gt);
if (!engine)
return ERR_PTR(-ENODEV);
vm = migrate_vm(gt);
if (IS_ERR(vm))
return ERR_CAST(vm);
ce = intel_engine_create_pinned_context(engine, vm, SZ_512K,
I915_GEM_HWS_MIGRATE,
&key, "migrate");
i915_vm_put(ce->vm);
return ce;
}
int intel_migrate_init(struct intel_migrate *m, struct intel_gt *gt)
{
struct intel_context *ce;
memset(m, 0, sizeof(*m));
ce = pinned_context(gt);
if (IS_ERR(ce))
return PTR_ERR(ce);
m->context = ce;
return 0;
}
static int random_index(unsigned int max)
{
return upper_32_bits(mul_u32_u32(get_random_u32(), max));
}
static struct intel_context *__migrate_engines(struct intel_gt *gt)
{
struct intel_engine_cs *engines[MAX_ENGINE_INSTANCE];
struct intel_engine_cs *engine;
unsigned int count, i;
count = 0;
for (i = 0; i < ARRAY_SIZE(gt->engine_class[COPY_ENGINE_CLASS]); i++) {
engine = gt->engine_class[COPY_ENGINE_CLASS][i];
if (engine_supports_migration(engine))
engines[count++] = engine;
}
return intel_context_create(engines[random_index(count)]);
}
struct intel_context *intel_migrate_create_context(struct intel_migrate *m)
{
struct intel_context *ce;
/*
* We randomly distribute contexts across the engines upon constrction,
* as they all share the same pinned vm, and so in order to allow
* multiple blits to run in parallel, we must construct each blit
* to use a different range of the vm for its GTT. This has to be
* known at construction, so we can not use the late greedy load
* balancing of the virtual-engine.
*/
ce = __migrate_engines(m->context->engine->gt);
if (IS_ERR(ce))
return ce;
ce->ring = NULL;
ce->ring_size = SZ_256K;
i915_vm_put(ce->vm);
ce->vm = i915_vm_get(m->context->vm);
return ce;
}
static inline struct sgt_dma sg_sgt(struct scatterlist *sg)
{
dma_addr_t addr = sg_dma_address(sg);
return (struct sgt_dma){ sg, addr, addr + sg_dma_len(sg) };
}
static int emit_no_arbitration(struct i915_request *rq)
{
u32 *cs;
cs = intel_ring_begin(rq, 2);
if (IS_ERR(cs))
return PTR_ERR(cs);
/* Explicitly disable preemption for this request. */
*cs++ = MI_ARB_ON_OFF;
*cs++ = MI_NOOP;
intel_ring_advance(rq, cs);
return 0;
}
static int emit_pte(struct i915_request *rq,
struct sgt_dma *it,
enum i915_cache_level cache_level,
bool is_lmem,
u64 offset,
int length)
{
const u64 encode = rq->context->vm->pte_encode(0, cache_level,
is_lmem ? PTE_LM : 0);
struct intel_ring *ring = rq->ring;
int total = 0;
u32 *hdr, *cs;
int pkt;
GEM_BUG_ON(GRAPHICS_VER(rq->engine->i915) < 8);
/* Compute the page directory offset for the target address range */
offset += (u64)rq->engine->instance << 32;
offset >>= 12;
offset *= sizeof(u64);
offset += 2 * CHUNK_SZ;
cs = intel_ring_begin(rq, 6);
if (IS_ERR(cs))
return PTR_ERR(cs);
/* Pack as many PTE updates as possible into a single MI command */
pkt = min_t(int, 0x400, ring->space / sizeof(u32) + 5);
pkt = min_t(int, pkt, (ring->size - ring->emit) / sizeof(u32) + 5);
hdr = cs;
*cs++ = MI_STORE_DATA_IMM | REG_BIT(21); /* as qword elements */
*cs++ = lower_32_bits(offset);
*cs++ = upper_32_bits(offset);
do {
if (cs - hdr >= pkt) {
*hdr += cs - hdr - 2;
*cs++ = MI_NOOP;
ring->emit = (void *)cs - ring->vaddr;
intel_ring_advance(rq, cs);
intel_ring_update_space(ring);
cs = intel_ring_begin(rq, 6);
if (IS_ERR(cs))
return PTR_ERR(cs);
pkt = min_t(int, 0x400, ring->space / sizeof(u32) + 5);
pkt = min_t(int, pkt, (ring->size - ring->emit) / sizeof(u32) + 5);
hdr = cs;
*cs++ = MI_STORE_DATA_IMM | REG_BIT(21);
*cs++ = lower_32_bits(offset);
*cs++ = upper_32_bits(offset);
}
*cs++ = lower_32_bits(encode | it->dma);
*cs++ = upper_32_bits(encode | it->dma);
offset += 8;
total += I915_GTT_PAGE_SIZE;
it->dma += I915_GTT_PAGE_SIZE;
if (it->dma >= it->max) {
it->sg = __sg_next(it->sg);
if (!it->sg || sg_dma_len(it->sg) == 0)
break;
it->dma = sg_dma_address(it->sg);
it->max = it->dma + sg_dma_len(it->sg);
}
} while (total < length);
*hdr += cs - hdr - 2;
*cs++ = MI_NOOP;
ring->emit = (void *)cs - ring->vaddr;
intel_ring_advance(rq, cs);
intel_ring_update_space(ring);
return total;
}
static bool wa_1209644611_applies(int ver, u32 size)
{
u32 height = size >> PAGE_SHIFT;
if (ver != 11)
return false;
return height % 4 == 3 && height <= 8;
}
static int emit_copy(struct i915_request *rq, int size)
{
const int ver = GRAPHICS_VER(rq->engine->i915);
u32 instance = rq->engine->instance;
u32 *cs;
cs = intel_ring_begin(rq, ver >= 8 ? 10 : 6);
if (IS_ERR(cs))
return PTR_ERR(cs);
if (ver >= 9 && !wa_1209644611_applies(ver, size)) {
*cs++ = GEN9_XY_FAST_COPY_BLT_CMD | (10 - 2);
*cs++ = BLT_DEPTH_32 | PAGE_SIZE;
*cs++ = 0;
*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
*cs++ = CHUNK_SZ; /* dst offset */
*cs++ = instance;
*cs++ = 0;
*cs++ = PAGE_SIZE;
*cs++ = 0; /* src offset */
*cs++ = instance;
} else if (ver >= 8) {
*cs++ = XY_SRC_COPY_BLT_CMD | BLT_WRITE_RGBA | (10 - 2);
*cs++ = BLT_DEPTH_32 | BLT_ROP_SRC_COPY | PAGE_SIZE;
*cs++ = 0;
*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
*cs++ = CHUNK_SZ; /* dst offset */
*cs++ = instance;
*cs++ = 0;
*cs++ = PAGE_SIZE;
*cs++ = 0; /* src offset */
*cs++ = instance;
} else {
GEM_BUG_ON(instance);
*cs++ = SRC_COPY_BLT_CMD | BLT_WRITE_RGBA | (6 - 2);
*cs++ = BLT_DEPTH_32 | BLT_ROP_SRC_COPY | PAGE_SIZE;
*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE;
*cs++ = CHUNK_SZ; /* dst offset */
*cs++ = PAGE_SIZE;
*cs++ = 0; /* src offset */
}
intel_ring_advance(rq, cs);
return 0;
}
int
intel_context_migrate_copy(struct intel_context *ce,
struct dma_fence *await,
struct scatterlist *src,
enum i915_cache_level src_cache_level,
bool src_is_lmem,
struct scatterlist *dst,
enum i915_cache_level dst_cache_level,
bool dst_is_lmem,
struct i915_request **out)
{
struct sgt_dma it_src = sg_sgt(src), it_dst = sg_sgt(dst);
struct i915_request *rq;
int err;
GEM_BUG_ON(ce->vm != ce->engine->gt->migrate.context->vm);
*out = NULL;
GEM_BUG_ON(ce->ring->size < SZ_64K);
do {
int len;
rq = i915_request_create(ce);
if (IS_ERR(rq)) {
err = PTR_ERR(rq);
goto out_ce;
}
if (await) {
err = i915_request_await_dma_fence(rq, await);
if (err)
goto out_rq;
if (rq->engine->emit_init_breadcrumb) {
err = rq->engine->emit_init_breadcrumb(rq);
if (err)
goto out_rq;
}
await = NULL;
}
/* The PTE updates + copy must not be interrupted. */
err = emit_no_arbitration(rq);
if (err)
goto out_rq;
len = emit_pte(rq, &it_src, src_cache_level, src_is_lmem, 0,
CHUNK_SZ);
if (len <= 0) {
err = len;
goto out_rq;
}
err = emit_pte(rq, &it_dst, dst_cache_level, dst_is_lmem,
CHUNK_SZ, len);
if (err < 0)
goto out_rq;
if (err < len) {
err = -EINVAL;
goto out_rq;
}
err = rq->engine->emit_flush(rq, EMIT_INVALIDATE);
if (err)
goto out_rq;
err = emit_copy(rq, len);
/* Arbitration is re-enabled between requests. */
out_rq:
if (*out)
i915_request_put(*out);
*out = i915_request_get(rq);
i915_request_add(rq);
if (err || !it_src.sg || !sg_dma_len(it_src.sg))
break;
cond_resched();
} while (1);
out_ce:
return err;
}
static int emit_clear(struct i915_request *rq, int size, u32 value)
{
const int ver = GRAPHICS_VER(rq->engine->i915);
u32 instance = rq->engine->instance;
u32 *cs;
GEM_BUG_ON(size >> PAGE_SHIFT > S16_MAX);
cs = intel_ring_begin(rq, ver >= 8 ? 8 : 6);
if (IS_ERR(cs))
return PTR_ERR(cs);
if (ver >= 8) {
*cs++ = XY_COLOR_BLT_CMD | BLT_WRITE_RGBA | (7 - 2);
*cs++ = BLT_DEPTH_32 | BLT_ROP_COLOR_COPY | PAGE_SIZE;
*cs++ = 0;
*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
*cs++ = 0; /* offset */
*cs++ = instance;
*cs++ = value;
*cs++ = MI_NOOP;
} else {
GEM_BUG_ON(instance);
*cs++ = XY_COLOR_BLT_CMD | BLT_WRITE_RGBA | (6 - 2);
*cs++ = BLT_DEPTH_32 | BLT_ROP_COLOR_COPY | PAGE_SIZE;
*cs++ = 0;
*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
*cs++ = 0;
*cs++ = value;
}
intel_ring_advance(rq, cs);
return 0;
}
int
intel_context_migrate_clear(struct intel_context *ce,
struct dma_fence *await,
struct scatterlist *sg,
enum i915_cache_level cache_level,
bool is_lmem,
u32 value,
struct i915_request **out)
{
struct sgt_dma it = sg_sgt(sg);
struct i915_request *rq;
int err;
GEM_BUG_ON(ce->vm != ce->engine->gt->migrate.context->vm);
*out = NULL;
GEM_BUG_ON(ce->ring->size < SZ_64K);
do {
int len;
rq = i915_request_create(ce);
if (IS_ERR(rq)) {
err = PTR_ERR(rq);
goto out_ce;
}
if (await) {
err = i915_request_await_dma_fence(rq, await);
if (err)
goto out_rq;
if (rq->engine->emit_init_breadcrumb) {
err = rq->engine->emit_init_breadcrumb(rq);
if (err)
goto out_rq;
}
await = NULL;
}
/* The PTE updates + clear must not be interrupted. */
err = emit_no_arbitration(rq);
if (err)
goto out_rq;
len = emit_pte(rq, &it, cache_level, is_lmem, 0, CHUNK_SZ);
if (len <= 0) {
err = len;
goto out_rq;
}
err = rq->engine->emit_flush(rq, EMIT_INVALIDATE);
if (err)
goto out_rq;
err = emit_clear(rq, len, value);
/* Arbitration is re-enabled between requests. */
out_rq:
if (*out)
i915_request_put(*out);
*out = i915_request_get(rq);
i915_request_add(rq);
if (err || !it.sg || !sg_dma_len(it.sg))
break;
cond_resched();
} while (1);
out_ce:
return err;
}
int intel_migrate_copy(struct intel_migrate *m,
struct i915_gem_ww_ctx *ww,
struct dma_fence *await,
struct scatterlist *src,
enum i915_cache_level src_cache_level,
bool src_is_lmem,
struct scatterlist *dst,
enum i915_cache_level dst_cache_level,
bool dst_is_lmem,
struct i915_request **out)
{
struct intel_context *ce;
int err;
*out = NULL;
if (!m->context)
return -ENODEV;
ce = intel_migrate_create_context(m);
if (IS_ERR(ce))
ce = intel_context_get(m->context);
GEM_BUG_ON(IS_ERR(ce));
err = intel_context_pin_ww(ce, ww);
if (err)
goto out;
err = intel_context_migrate_copy(ce, await,
src, src_cache_level, src_is_lmem,
dst, dst_cache_level, dst_is_lmem,
out);
intel_context_unpin(ce);
out:
intel_context_put(ce);
return err;
}
int
intel_migrate_clear(struct intel_migrate *m,
struct i915_gem_ww_ctx *ww,
struct dma_fence *await,
struct scatterlist *sg,
enum i915_cache_level cache_level,
bool is_lmem,
u32 value,
struct i915_request **out)
{
struct intel_context *ce;
int err;
*out = NULL;
if (!m->context)
return -ENODEV;
ce = intel_migrate_create_context(m);
if (IS_ERR(ce))
ce = intel_context_get(m->context);
GEM_BUG_ON(IS_ERR(ce));
err = intel_context_pin_ww(ce, ww);
if (err)
goto out;
err = intel_context_migrate_clear(ce, await, sg, cache_level,
is_lmem, value, out);
intel_context_unpin(ce);
out:
intel_context_put(ce);
return err;
}
void intel_migrate_fini(struct intel_migrate *m)
{
struct intel_context *ce;
ce = fetch_and_zero(&m->context);
if (!ce)
return;
intel_engine_destroy_pinned_context(ce);
}
#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
#include "selftest_migrate.c"
#endif

View File

@ -0,0 +1,65 @@
/* SPDX-License-Identifier: MIT */
/*
* Copyright © 2020 Intel Corporation
*/
#ifndef __INTEL_MIGRATE__
#define __INTEL_MIGRATE__
#include <linux/types.h>
#include "intel_migrate_types.h"
struct dma_fence;
struct i915_request;
struct i915_gem_ww_ctx;
struct intel_gt;
struct scatterlist;
enum i915_cache_level;
int intel_migrate_init(struct intel_migrate *m, struct intel_gt *gt);
struct intel_context *intel_migrate_create_context(struct intel_migrate *m);
int intel_migrate_copy(struct intel_migrate *m,
struct i915_gem_ww_ctx *ww,
struct dma_fence *await,
struct scatterlist *src,
enum i915_cache_level src_cache_level,
bool src_is_lmem,
struct scatterlist *dst,
enum i915_cache_level dst_cache_level,
bool dst_is_lmem,
struct i915_request **out);
int intel_context_migrate_copy(struct intel_context *ce,
struct dma_fence *await,
struct scatterlist *src,
enum i915_cache_level src_cache_level,
bool src_is_lmem,
struct scatterlist *dst,
enum i915_cache_level dst_cache_level,
bool dst_is_lmem,
struct i915_request **out);
int
intel_migrate_clear(struct intel_migrate *m,
struct i915_gem_ww_ctx *ww,
struct dma_fence *await,
struct scatterlist *sg,
enum i915_cache_level cache_level,
bool is_lmem,
u32 value,
struct i915_request **out);
int
intel_context_migrate_clear(struct intel_context *ce,
struct dma_fence *await,
struct scatterlist *sg,
enum i915_cache_level cache_level,
bool is_lmem,
u32 value,
struct i915_request **out);
void intel_migrate_fini(struct intel_migrate *m);
#endif /* __INTEL_MIGRATE__ */

View File

@ -0,0 +1,15 @@
/* SPDX-License-Identifier: MIT */
/*
* Copyright © 2020 Intel Corporation
*/
#ifndef __INTEL_MIGRATE_TYPES__
#define __INTEL_MIGRATE_TYPES__
struct intel_context;
struct intel_migrate {
struct intel_context *context;
};
#endif /* __INTEL_MIGRATE_TYPES__ */

View File

@ -352,7 +352,7 @@ static unsigned int get_mocs_settings(const struct drm_i915_private *i915,
table->size = ARRAY_SIZE(icl_mocs_table);
table->table = icl_mocs_table;
table->n_entries = GEN9_NUM_MOCS_ENTRIES;
} else if (IS_GEN9_BC(i915) || IS_CANNONLAKE(i915)) {
} else if (IS_GEN9_BC(i915)) {
table->size = ARRAY_SIZE(skl_mocs_table);
table->n_entries = GEN9_NUM_MOCS_ENTRIES;
table->table = skl_mocs_table;

View File

@ -62,20 +62,25 @@ static void gen11_rc6_enable(struct intel_rc6 *rc6)
u32 pg_enable;
int i;
/* 2b: Program RC6 thresholds.*/
set(uncore, GEN6_RC6_WAKE_RATE_LIMIT, 54 << 16 | 85);
set(uncore, GEN10_MEDIA_WAKE_RATE_LIMIT, 150);
/*
* With GuCRC, these parameters are set by GuC
*/
if (!intel_uc_uses_guc_rc(&gt->uc)) {
/* 2b: Program RC6 thresholds.*/
set(uncore, GEN6_RC6_WAKE_RATE_LIMIT, 54 << 16 | 85);
set(uncore, GEN10_MEDIA_WAKE_RATE_LIMIT, 150);
set(uncore, GEN6_RC_EVALUATION_INTERVAL, 125000); /* 12500 * 1280ns */
set(uncore, GEN6_RC_IDLE_HYSTERSIS, 25); /* 25 * 1280ns */
for_each_engine(engine, rc6_to_gt(rc6), id)
set(uncore, RING_MAX_IDLE(engine->mmio_base), 10);
set(uncore, GEN6_RC_EVALUATION_INTERVAL, 125000); /* 12500 * 1280ns */
set(uncore, GEN6_RC_IDLE_HYSTERSIS, 25); /* 25 * 1280ns */
for_each_engine(engine, rc6_to_gt(rc6), id)
set(uncore, RING_MAX_IDLE(engine->mmio_base), 10);
set(uncore, GUC_MAX_IDLE_COUNT, 0xA);
set(uncore, GUC_MAX_IDLE_COUNT, 0xA);
set(uncore, GEN6_RC_SLEEP, 0);
set(uncore, GEN6_RC_SLEEP, 0);
set(uncore, GEN6_RC6_THRESHOLD, 50000); /* 50/125ms per EI */
set(uncore, GEN6_RC6_THRESHOLD, 50000); /* 50/125ms per EI */
}
/*
* 2c: Program Coarse Power Gating Policies.
@ -98,11 +103,19 @@ static void gen11_rc6_enable(struct intel_rc6 *rc6)
set(uncore, GEN9_MEDIA_PG_IDLE_HYSTERESIS, 60);
set(uncore, GEN9_RENDER_PG_IDLE_HYSTERESIS, 60);
/* 3a: Enable RC6 */
rc6->ctl_enable =
GEN6_RC_CTL_HW_ENABLE |
GEN6_RC_CTL_RC6_ENABLE |
GEN6_RC_CTL_EI_MODE(1);
/* 3a: Enable RC6
*
* With GuCRC, we do not enable bit 31 of RC_CTL,
* thus allowing GuC to control RC6 entry/exit fully instead.
* We will not set the HW ENABLE and EI bits
*/
if (!intel_guc_rc_enable(&gt->uc.guc))
rc6->ctl_enable = GEN6_RC_CTL_RC6_ENABLE;
else
rc6->ctl_enable =
GEN6_RC_CTL_HW_ENABLE |
GEN6_RC_CTL_RC6_ENABLE |
GEN6_RC_CTL_EI_MODE(1);
pg_enable =
GEN9_RENDER_PG_ENABLE |
@ -126,7 +139,7 @@ static void gen9_rc6_enable(struct intel_rc6 *rc6)
enum intel_engine_id id;
/* 2b: Program RC6 thresholds.*/
if (GRAPHICS_VER(rc6_to_i915(rc6)) >= 10) {
if (GRAPHICS_VER(rc6_to_i915(rc6)) >= 11) {
set(uncore, GEN6_RC6_WAKE_RATE_LIMIT, 54 << 16 | 85);
set(uncore, GEN10_MEDIA_WAKE_RATE_LIMIT, 150);
} else if (IS_SKYLAKE(rc6_to_i915(rc6))) {
@ -513,6 +526,10 @@ static void __intel_rc6_disable(struct intel_rc6 *rc6)
{
struct drm_i915_private *i915 = rc6_to_i915(rc6);
struct intel_uncore *uncore = rc6_to_uncore(rc6);
struct intel_gt *gt = rc6_to_gt(rc6);
/* Take control of RC6 back from GuC */
intel_guc_rc_disable(&gt->uc.guc);
intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
if (GRAPHICS_VER(i915) >= 9)

View File

@ -10,7 +10,7 @@
#include "gem/i915_gem_lmem.h"
#include "gem/i915_gem_region.h"
#include "gem/i915_gem_ttm.h"
#include "intel_region_lmem.h"
#include "gt/intel_gt.h"
static int init_fake_lmem_bar(struct intel_memory_region *mem)
{
@ -158,7 +158,7 @@ intel_gt_setup_fake_lmem(struct intel_gt *gt)
static bool get_legacy_lowmem_region(struct intel_uncore *uncore,
u64 *start, u32 *size)
{
if (!IS_DG1_REVID(uncore->i915, DG1_REVID_A0, DG1_REVID_B0))
if (!IS_DG1_GT_STEP(uncore->i915, STEP_A0, STEP_C0))
return false;
*start = 0;

View File

@ -8,6 +8,7 @@
#include <linux/types.h>
#include "i915_gem.h"
#include "i915_gem_ww.h"
struct i915_request;
struct intel_context;

View File

@ -22,7 +22,6 @@
#include "intel_reset.h"
#include "uc/intel_guc.h"
#include "uc/intel_guc_submission.h"
#define RESET_MAX_RETRIES 3
@ -39,21 +38,6 @@ static void rmw_clear_fw(struct intel_uncore *uncore, i915_reg_t reg, u32 clr)
intel_uncore_rmw_fw(uncore, reg, clr, 0);
}
static void skip_context(struct i915_request *rq)
{
struct intel_context *hung_ctx = rq->context;
list_for_each_entry_from_rcu(rq, &hung_ctx->timeline->requests, link) {
if (!i915_request_is_active(rq))
return;
if (rq->context == hung_ctx) {
i915_request_set_error_once(rq, -EIO);
__i915_request_skip(rq);
}
}
}
static void client_mark_guilty(struct i915_gem_context *ctx, bool banned)
{
struct drm_i915_file_private *file_priv = ctx->file_priv;
@ -88,10 +72,8 @@ static bool mark_guilty(struct i915_request *rq)
bool banned;
int i;
if (intel_context_is_closed(rq->context)) {
intel_context_set_banned(rq->context);
if (intel_context_is_closed(rq->context))
return true;
}
rcu_read_lock();
ctx = rcu_dereference(rq->context->gem_context);
@ -123,11 +105,9 @@ static bool mark_guilty(struct i915_request *rq)
banned = !i915_gem_context_is_recoverable(ctx);
if (time_before(jiffies, prev_hang + CONTEXT_FAST_HANG_JIFFIES))
banned = true;
if (banned) {
if (banned)
drm_dbg(&ctx->i915->drm, "context %s: guilty %d, banned\n",
ctx->name, atomic_read(&ctx->guilty_count));
intel_context_set_banned(rq->context);
}
client_mark_guilty(ctx, banned);
@ -149,6 +129,8 @@ static void mark_innocent(struct i915_request *rq)
void __i915_request_reset(struct i915_request *rq, bool guilty)
{
bool banned = false;
RQ_TRACE(rq, "guilty? %s\n", yesno(guilty));
GEM_BUG_ON(__i915_request_is_complete(rq));
@ -156,13 +138,15 @@ void __i915_request_reset(struct i915_request *rq, bool guilty)
if (guilty) {
i915_request_set_error_once(rq, -EIO);
__i915_request_skip(rq);
if (mark_guilty(rq))
skip_context(rq);
banned = mark_guilty(rq);
} else {
i915_request_set_error_once(rq, -EAGAIN);
mark_innocent(rq);
}
rcu_read_unlock();
if (banned)
intel_context_ban(rq->context, rq);
}
static bool i915_in_reset(struct pci_dev *pdev)
@ -515,8 +499,14 @@ static int gen11_reset_engines(struct intel_gt *gt,
[VCS1] = GEN11_GRDOM_MEDIA2,
[VCS2] = GEN11_GRDOM_MEDIA3,
[VCS3] = GEN11_GRDOM_MEDIA4,
[VCS4] = GEN11_GRDOM_MEDIA5,
[VCS5] = GEN11_GRDOM_MEDIA6,
[VCS6] = GEN11_GRDOM_MEDIA7,
[VCS7] = GEN11_GRDOM_MEDIA8,
[VECS0] = GEN11_GRDOM_VECS,
[VECS1] = GEN11_GRDOM_VECS2,
[VECS2] = GEN11_GRDOM_VECS3,
[VECS3] = GEN11_GRDOM_VECS4,
};
struct intel_engine_cs *engine;
intel_engine_mask_t tmp;
@ -826,6 +816,8 @@ static int gt_reset(struct intel_gt *gt, intel_engine_mask_t stalled_mask)
__intel_engine_reset(engine, stalled_mask & engine->mask);
local_bh_enable();
intel_uc_reset(&gt->uc, true);
intel_ggtt_restore_fences(gt->ggtt);
return err;
@ -850,6 +842,8 @@ static void reset_finish(struct intel_gt *gt, intel_engine_mask_t awake)
if (awake & engine->mask)
intel_engine_pm_put(engine);
}
intel_uc_reset_finish(&gt->uc);
}
static void nop_submit_request(struct i915_request *request)
@ -903,6 +897,7 @@ static void __intel_gt_set_wedged(struct intel_gt *gt)
for_each_engine(engine, gt, id)
if (engine->reset.cancel)
engine->reset.cancel(engine);
intel_uc_cancel_requests(&gt->uc);
local_bh_enable();
reset_finish(gt, awake);
@ -1191,6 +1186,9 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg)
ENGINE_TRACE(engine, "flags=%lx\n", gt->reset.flags);
GEM_BUG_ON(!test_bit(I915_RESET_ENGINE + engine->id, &gt->reset.flags));
if (intel_engine_uses_guc(engine))
return -ENODEV;
if (!intel_engine_pm_get_if_awake(engine))
return 0;
@ -1201,13 +1199,10 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg)
"Resetting %s for %s\n", engine->name, msg);
atomic_inc(&engine->i915->gpu_error.reset_engine_count[engine->uabi_class]);
if (intel_engine_uses_guc(engine))
ret = intel_guc_reset_engine(&engine->gt->uc.guc, engine);
else
ret = intel_gt_reset_engine(engine);
ret = intel_gt_reset_engine(engine);
if (ret) {
/* If we fail here, we expect to fallback to a global reset */
ENGINE_TRACE(engine, "Failed to reset, err: %d\n", ret);
ENGINE_TRACE(engine, "Failed to reset %s, err: %d\n", engine->name, ret);
goto out;
}
@ -1341,7 +1336,8 @@ void intel_gt_handle_error(struct intel_gt *gt,
* Try engine reset when available. We fall back to full reset if
* single reset fails.
*/
if (intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) {
if (!intel_uc_uses_guc_submission(&gt->uc) &&
intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) {
local_bh_disable();
for_each_engine_masked(engine, gt, engine_mask, tmp) {
BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE);

View File

@ -49,6 +49,7 @@ static inline void intel_ring_advance(struct i915_request *rq, u32 *cs)
* intel_ring_begin()).
*/
GEM_BUG_ON((rq->ring->vaddr + rq->ring->emit) != cs);
GEM_BUG_ON(!IS_ALIGNED(rq->ring->emit, 8)); /* RING_TAIL qword align */
}
static inline u32 intel_ring_wrap(const struct intel_ring *ring, u32 pos)

View File

@ -16,6 +16,7 @@
#include "intel_reset.h"
#include "intel_ring.h"
#include "shmem_utils.h"
#include "intel_engine_heartbeat.h"
/* Rough estimate of the typical request size, performing a flush,
* set-context and then emitting the batch.
@ -342,9 +343,9 @@ static void reset_rewind(struct intel_engine_cs *engine, bool stalled)
u32 head;
rq = NULL;
spin_lock_irqsave(&engine->active.lock, flags);
spin_lock_irqsave(&engine->sched_engine->lock, flags);
rcu_read_lock();
list_for_each_entry(pos, &engine->active.requests, sched.link) {
list_for_each_entry(pos, &engine->sched_engine->requests, sched.link) {
if (!__i915_request_is_complete(pos)) {
rq = pos;
break;
@ -399,7 +400,7 @@ static void reset_rewind(struct intel_engine_cs *engine, bool stalled)
}
engine->legacy.ring->head = intel_ring_wrap(engine->legacy.ring, head);
spin_unlock_irqrestore(&engine->active.lock, flags);
spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
}
static void reset_finish(struct intel_engine_cs *engine)
@ -411,16 +412,16 @@ static void reset_cancel(struct intel_engine_cs *engine)
struct i915_request *request;
unsigned long flags;
spin_lock_irqsave(&engine->active.lock, flags);
spin_lock_irqsave(&engine->sched_engine->lock, flags);
/* Mark all submitted requests as skipped. */
list_for_each_entry(request, &engine->active.requests, sched.link)
list_for_each_entry(request, &engine->sched_engine->requests, sched.link)
i915_request_put(i915_request_mark_eio(request));
intel_engine_signal_breadcrumbs(engine);
/* Remaining _unready_ requests will be nop'ed when submitted */
spin_unlock_irqrestore(&engine->active.lock, flags);
spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
}
static void i9xx_submit_request(struct i915_request *request)
@ -586,9 +587,44 @@ static void ring_context_reset(struct intel_context *ce)
clear_bit(CONTEXT_VALID_BIT, &ce->flags);
}
static void ring_context_ban(struct intel_context *ce,
struct i915_request *rq)
{
struct intel_engine_cs *engine;
if (!rq || !i915_request_is_active(rq))
return;
engine = rq->engine;
lockdep_assert_held(&engine->sched_engine->lock);
list_for_each_entry_continue(rq, &engine->sched_engine->requests,
sched.link)
if (rq->context == ce) {
i915_request_set_error_once(rq, -EIO);
__i915_request_skip(rq);
}
}
static void ring_context_cancel_request(struct intel_context *ce,
struct i915_request *rq)
{
struct intel_engine_cs *engine = NULL;
i915_request_active_engine(rq, &engine);
if (engine && intel_engine_pulse(engine))
intel_gt_handle_error(engine->gt, engine->mask, 0,
"request cancellation by %s",
current->comm);
}
static const struct intel_context_ops ring_context_ops = {
.alloc = ring_context_alloc,
.cancel_request = ring_context_cancel_request,
.ban = ring_context_ban,
.pre_pin = ring_context_pre_pin,
.pin = ring_context_pin,
.unpin = ring_context_unpin,
@ -1047,6 +1083,25 @@ static void setup_irq(struct intel_engine_cs *engine)
}
}
static void add_to_engine(struct i915_request *rq)
{
lockdep_assert_held(&rq->engine->sched_engine->lock);
list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
}
static void remove_from_engine(struct i915_request *rq)
{
spin_lock_irq(&rq->engine->sched_engine->lock);
list_del_init(&rq->sched.link);
/* Prevent further __await_execution() registering a cb, then flush */
set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
spin_unlock_irq(&rq->engine->sched_engine->lock);
i915_request_notify_execute_cb_imm(rq);
}
static void setup_common(struct intel_engine_cs *engine)
{
struct drm_i915_private *i915 = engine->i915;
@ -1064,6 +1119,9 @@ static void setup_common(struct intel_engine_cs *engine)
engine->reset.cancel = reset_cancel;
engine->reset.finish = reset_finish;
engine->add_active_request = add_to_engine;
engine->remove_active_request = remove_from_engine;
engine->cops = &ring_context_ops;
engine->request_alloc = ring_request_alloc;

View File

@ -37,6 +37,20 @@ static struct intel_uncore *rps_to_uncore(struct intel_rps *rps)
return rps_to_gt(rps)->uncore;
}
static struct intel_guc_slpc *rps_to_slpc(struct intel_rps *rps)
{
struct intel_gt *gt = rps_to_gt(rps);
return &gt->uc.guc.slpc;
}
static bool rps_uses_slpc(struct intel_rps *rps)
{
struct intel_gt *gt = rps_to_gt(rps);
return intel_uc_uses_guc_slpc(&gt->uc);
}
static u32 rps_pm_sanitize_mask(struct intel_rps *rps, u32 mask)
{
return mask & ~rps->pm_intrmsk_mbz;
@ -167,6 +181,8 @@ static void rps_enable_interrupts(struct intel_rps *rps)
{
struct intel_gt *gt = rps_to_gt(rps);
GEM_BUG_ON(rps_uses_slpc(rps));
GT_TRACE(gt, "interrupts:on rps->pm_events: %x, rps_pm_mask:%x\n",
rps->pm_events, rps_pm_mask(rps, rps->last_freq));
@ -771,6 +787,8 @@ static int gen6_rps_set(struct intel_rps *rps, u8 val)
struct drm_i915_private *i915 = rps_to_i915(rps);
u32 swreq;
GEM_BUG_ON(rps_uses_slpc(rps));
if (GRAPHICS_VER(i915) >= 9)
swreq = GEN9_FREQUENCY(val);
else if (IS_HASWELL(i915) || IS_BROADWELL(i915))
@ -861,6 +879,9 @@ void intel_rps_park(struct intel_rps *rps)
{
int adj;
if (!intel_rps_is_enabled(rps))
return;
GEM_BUG_ON(atomic_read(&rps->num_waiters));
if (!intel_rps_clear_active(rps))
@ -999,7 +1020,7 @@ static void gen6_rps_init(struct intel_rps *rps)
rps->efficient_freq = rps->rp1_freq;
if (IS_HASWELL(i915) || IS_BROADWELL(i915) ||
IS_GEN9_BC(i915) || GRAPHICS_VER(i915) >= 10) {
IS_GEN9_BC(i915) || GRAPHICS_VER(i915) >= 11) {
u32 ddcc_status = 0;
if (sandybridge_pcode_read(i915,
@ -1012,7 +1033,7 @@ static void gen6_rps_init(struct intel_rps *rps)
rps->max_freq);
}
if (IS_GEN9_BC(i915) || GRAPHICS_VER(i915) >= 10) {
if (IS_GEN9_BC(i915) || GRAPHICS_VER(i915) >= 11) {
/* Store the frequency values in 16.66 MHZ units, which is
* the natural hardware unit for SKL
*/
@ -1356,6 +1377,9 @@ void intel_rps_enable(struct intel_rps *rps)
if (!HAS_RPS(i915))
return;
if (rps_uses_slpc(rps))
return;
intel_gt_check_clock_frequency(rps_to_gt(rps));
intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
@ -1829,6 +1853,9 @@ void intel_rps_init(struct intel_rps *rps)
{
struct drm_i915_private *i915 = rps_to_i915(rps);
if (rps_uses_slpc(rps))
return;
if (IS_CHERRYVIEW(i915))
chv_rps_init(rps);
else if (IS_VALLEYVIEW(i915))
@ -1877,10 +1904,17 @@ void intel_rps_init(struct intel_rps *rps)
if (GRAPHICS_VER(i915) >= 8 && GRAPHICS_VER(i915) < 11)
rps->pm_intrmsk_mbz |= GEN8_PMINTR_DISABLE_REDIRECT_TO_GUC;
/* GuC needs ARAT expired interrupt unmasked */
if (intel_uc_uses_guc_submission(&rps_to_gt(rps)->uc))
rps->pm_intrmsk_mbz |= ARAT_EXPIRED_INTRMSK;
}
void intel_rps_sanitize(struct intel_rps *rps)
{
if (rps_uses_slpc(rps))
return;
if (GRAPHICS_VER(rps_to_i915(rps)) >= 6)
rps_disable_interrupts(rps);
}
@ -1936,6 +1970,176 @@ u32 intel_rps_read_actual_frequency(struct intel_rps *rps)
return freq;
}
u32 intel_rps_read_punit_req(struct intel_rps *rps)
{
struct intel_uncore *uncore = rps_to_uncore(rps);
return intel_uncore_read(uncore, GEN6_RPNSWREQ);
}
static u32 intel_rps_get_req(u32 pureq)
{
u32 req = pureq >> GEN9_SW_REQ_UNSLICE_RATIO_SHIFT;
return req;
}
u32 intel_rps_read_punit_req_frequency(struct intel_rps *rps)
{
u32 freq = intel_rps_get_req(intel_rps_read_punit_req(rps));
return intel_gpu_freq(rps, freq);
}
u32 intel_rps_get_requested_frequency(struct intel_rps *rps)
{
if (rps_uses_slpc(rps))
return intel_rps_read_punit_req_frequency(rps);
else
return intel_gpu_freq(rps, rps->cur_freq);
}
u32 intel_rps_get_max_frequency(struct intel_rps *rps)
{
struct intel_guc_slpc *slpc = rps_to_slpc(rps);
if (rps_uses_slpc(rps))
return slpc->max_freq_softlimit;
else
return intel_gpu_freq(rps, rps->max_freq_softlimit);
}
u32 intel_rps_get_rp0_frequency(struct intel_rps *rps)
{
struct intel_guc_slpc *slpc = rps_to_slpc(rps);
if (rps_uses_slpc(rps))
return slpc->rp0_freq;
else
return intel_gpu_freq(rps, rps->rp0_freq);
}
u32 intel_rps_get_rp1_frequency(struct intel_rps *rps)
{
struct intel_guc_slpc *slpc = rps_to_slpc(rps);
if (rps_uses_slpc(rps))
return slpc->rp1_freq;
else
return intel_gpu_freq(rps, rps->rp1_freq);
}
u32 intel_rps_get_rpn_frequency(struct intel_rps *rps)
{
struct intel_guc_slpc *slpc = rps_to_slpc(rps);
if (rps_uses_slpc(rps))
return slpc->min_freq;
else
return intel_gpu_freq(rps, rps->min_freq);
}
static int set_max_freq(struct intel_rps *rps, u32 val)
{
struct drm_i915_private *i915 = rps_to_i915(rps);
int ret = 0;
mutex_lock(&rps->lock);
val = intel_freq_opcode(rps, val);
if (val < rps->min_freq ||
val > rps->max_freq ||
val < rps->min_freq_softlimit) {
ret = -EINVAL;
goto unlock;
}
if (val > rps->rp0_freq)
drm_dbg(&i915->drm, "User requested overclocking to %d\n",
intel_gpu_freq(rps, val));
rps->max_freq_softlimit = val;
val = clamp_t(int, rps->cur_freq,
rps->min_freq_softlimit,
rps->max_freq_softlimit);
/*
* We still need *_set_rps to process the new max_delay and
* update the interrupt limits and PMINTRMSK even though
* frequency request may be unchanged.
*/
intel_rps_set(rps, val);
unlock:
mutex_unlock(&rps->lock);
return ret;
}
int intel_rps_set_max_frequency(struct intel_rps *rps, u32 val)
{
struct intel_guc_slpc *slpc = rps_to_slpc(rps);
if (rps_uses_slpc(rps))
return intel_guc_slpc_set_max_freq(slpc, val);
else
return set_max_freq(rps, val);
}
u32 intel_rps_get_min_frequency(struct intel_rps *rps)
{
struct intel_guc_slpc *slpc = rps_to_slpc(rps);
if (rps_uses_slpc(rps))
return slpc->min_freq_softlimit;
else
return intel_gpu_freq(rps, rps->min_freq_softlimit);
}
static int set_min_freq(struct intel_rps *rps, u32 val)
{
int ret = 0;
mutex_lock(&rps->lock);
val = intel_freq_opcode(rps, val);
if (val < rps->min_freq ||
val > rps->max_freq ||
val > rps->max_freq_softlimit) {
ret = -EINVAL;
goto unlock;
}
rps->min_freq_softlimit = val;
val = clamp_t(int, rps->cur_freq,
rps->min_freq_softlimit,
rps->max_freq_softlimit);
/*
* We still need *_set_rps to process the new min_delay and
* update the interrupt limits and PMINTRMSK even though
* frequency request may be unchanged.
*/
intel_rps_set(rps, val);
unlock:
mutex_unlock(&rps->lock);
return ret;
}
int intel_rps_set_min_frequency(struct intel_rps *rps, u32 val)
{
struct intel_guc_slpc *slpc = rps_to_slpc(rps);
if (rps_uses_slpc(rps))
return intel_guc_slpc_set_min_freq(slpc, val);
else
return set_min_freq(rps, val);
}
/* External interface for intel_ips.ko */
static struct drm_i915_private __rcu *ips_mchdev;
@ -2129,4 +2333,5 @@ EXPORT_SYMBOL_GPL(i915_gpu_turbo_disable);
#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
#include "selftest_rps.c"
#include "selftest_slpc.c"
#endif

View File

@ -31,6 +31,16 @@ int intel_gpu_freq(struct intel_rps *rps, int val);
int intel_freq_opcode(struct intel_rps *rps, int val);
u32 intel_rps_get_cagf(struct intel_rps *rps, u32 rpstat1);
u32 intel_rps_read_actual_frequency(struct intel_rps *rps);
u32 intel_rps_get_requested_frequency(struct intel_rps *rps);
u32 intel_rps_get_min_frequency(struct intel_rps *rps);
int intel_rps_set_min_frequency(struct intel_rps *rps, u32 val);
u32 intel_rps_get_max_frequency(struct intel_rps *rps);
int intel_rps_set_max_frequency(struct intel_rps *rps, u32 val);
u32 intel_rps_get_rp0_frequency(struct intel_rps *rps);
u32 intel_rps_get_rp1_frequency(struct intel_rps *rps);
u32 intel_rps_get_rpn_frequency(struct intel_rps *rps);
u32 intel_rps_read_punit_req(struct intel_rps *rps);
u32 intel_rps_read_punit_req_frequency(struct intel_rps *rps);
void gen5_rps_irq_handler(struct intel_rps *rps);
void gen6_rps_irq_handler(struct intel_rps *rps, u32 pm_iir);

View File

@ -139,17 +139,36 @@ static void gen12_sseu_info_init(struct intel_gt *gt)
* Gen12 has Dual-Subslices, which behave similarly to 2 gen11 SS.
* Instead of splitting these, provide userspace with an array
* of DSS to more closely represent the hardware resource.
*
* In addition, the concept of slice has been removed in Xe_HP.
* To be compatible with prior generations, assume a single slice
* across the entire device. Then calculate out the DSS for each
* workload type within that software slice.
*/
intel_sseu_set_info(sseu, 1, 6, 16);
if (IS_DG2(gt->i915) || IS_XEHPSDV(gt->i915))
intel_sseu_set_info(sseu, 1, 32, 16);
else
intel_sseu_set_info(sseu, 1, 6, 16);
s_en = intel_uncore_read(uncore, GEN11_GT_SLICE_ENABLE) &
GEN11_GT_S_ENA_MASK;
/*
* As mentioned above, Xe_HP does not have the concept of a slice.
* Enable one for software backwards compatibility.
*/
if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 50))
s_en = 0x1;
else
s_en = intel_uncore_read(uncore, GEN11_GT_SLICE_ENABLE) &
GEN11_GT_S_ENA_MASK;
dss_en = intel_uncore_read(uncore, GEN12_GT_DSS_ENABLE);
/* one bit per pair of EUs */
eu_en_fuse = ~(intel_uncore_read(uncore, GEN11_EU_DISABLE) &
GEN11_EU_DIS_MASK);
if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 50))
eu_en_fuse = intel_uncore_read(uncore, XEHP_EU_ENABLE) & XEHP_EU_ENA_MASK;
else
eu_en_fuse = ~(intel_uncore_read(uncore, GEN11_EU_DISABLE) &
GEN11_EU_DIS_MASK);
for (eu = 0; eu < sseu->max_eus_per_subslice / 2; eu++)
if (eu_en_fuse & BIT(eu))
eu_en |= BIT(eu * 2) | BIT(eu * 2 + 1);
@ -188,83 +207,6 @@ static void gen11_sseu_info_init(struct intel_gt *gt)
sseu->has_eu_pg = 1;
}
static void gen10_sseu_info_init(struct intel_gt *gt)
{
struct intel_uncore *uncore = gt->uncore;
struct sseu_dev_info *sseu = &gt->info.sseu;
const u32 fuse2 = intel_uncore_read(uncore, GEN8_FUSE2);
const int eu_mask = 0xff;
u32 subslice_mask, eu_en;
int s, ss;
intel_sseu_set_info(sseu, 6, 4, 8);
sseu->slice_mask = (fuse2 & GEN10_F2_S_ENA_MASK) >>
GEN10_F2_S_ENA_SHIFT;
/* Slice0 */
eu_en = ~intel_uncore_read(uncore, GEN8_EU_DISABLE0);
for (ss = 0; ss < sseu->max_subslices; ss++)
sseu_set_eus(sseu, 0, ss, (eu_en >> (8 * ss)) & eu_mask);
/* Slice1 */
sseu_set_eus(sseu, 1, 0, (eu_en >> 24) & eu_mask);
eu_en = ~intel_uncore_read(uncore, GEN8_EU_DISABLE1);
sseu_set_eus(sseu, 1, 1, eu_en & eu_mask);
/* Slice2 */
sseu_set_eus(sseu, 2, 0, (eu_en >> 8) & eu_mask);
sseu_set_eus(sseu, 2, 1, (eu_en >> 16) & eu_mask);
/* Slice3 */
sseu_set_eus(sseu, 3, 0, (eu_en >> 24) & eu_mask);
eu_en = ~intel_uncore_read(uncore, GEN8_EU_DISABLE2);
sseu_set_eus(sseu, 3, 1, eu_en & eu_mask);
/* Slice4 */
sseu_set_eus(sseu, 4, 0, (eu_en >> 8) & eu_mask);
sseu_set_eus(sseu, 4, 1, (eu_en >> 16) & eu_mask);
/* Slice5 */
sseu_set_eus(sseu, 5, 0, (eu_en >> 24) & eu_mask);
eu_en = ~intel_uncore_read(uncore, GEN10_EU_DISABLE3);
sseu_set_eus(sseu, 5, 1, eu_en & eu_mask);
subslice_mask = (1 << 4) - 1;
subslice_mask &= ~((fuse2 & GEN10_F2_SS_DIS_MASK) >>
GEN10_F2_SS_DIS_SHIFT);
for (s = 0; s < sseu->max_slices; s++) {
u32 subslice_mask_with_eus = subslice_mask;
for (ss = 0; ss < sseu->max_subslices; ss++) {
if (sseu_get_eus(sseu, s, ss) == 0)
subslice_mask_with_eus &= ~BIT(ss);
}
/*
* Slice0 can have up to 3 subslices, but there are only 2 in
* slice1/2.
*/
intel_sseu_set_subslices(sseu, s, s == 0 ?
subslice_mask_with_eus :
subslice_mask_with_eus & 0x3);
}
sseu->eu_total = compute_eu_total(sseu);
/*
* CNL is expected to always have a uniform distribution
* of EU across subslices with the exception that any one
* EU in any one subslice may be fused off for die
* recovery.
*/
sseu->eu_per_subslice =
intel_sseu_subslice_total(sseu) ?
DIV_ROUND_UP(sseu->eu_total, intel_sseu_subslice_total(sseu)) :
0;
/* No restrictions on Power Gating */
sseu->has_slice_pg = 1;
sseu->has_subslice_pg = 1;
sseu->has_eu_pg = 1;
}
static void cherryview_sseu_info_init(struct intel_gt *gt)
{
struct sseu_dev_info *sseu = &gt->info.sseu;
@ -592,8 +534,6 @@ void intel_sseu_info_init(struct intel_gt *gt)
bdw_sseu_info_init(gt);
else if (GRAPHICS_VER(i915) == 9)
gen9_sseu_info_init(gt);
else if (GRAPHICS_VER(i915) == 10)
gen10_sseu_info_init(gt);
else if (GRAPHICS_VER(i915) == 11)
gen11_sseu_info_init(gt);
else if (GRAPHICS_VER(i915) >= 12)
@ -759,3 +699,21 @@ void intel_sseu_print_topology(const struct sseu_dev_info *sseu,
}
}
}
u16 intel_slicemask_from_dssmask(u64 dss_mask, int dss_per_slice)
{
u16 slice_mask = 0;
int i;
WARN_ON(sizeof(dss_mask) * 8 / dss_per_slice > 8 * sizeof(slice_mask));
for (i = 0; dss_mask; i++) {
if (dss_mask & GENMASK(dss_per_slice - 1, 0))
slice_mask |= BIT(i);
dss_mask >>= dss_per_slice;
}
return slice_mask;
}

View File

@ -15,13 +15,17 @@ struct drm_i915_private;
struct intel_gt;
struct drm_printer;
#define GEN_MAX_SLICES (6) /* CNL upper bound */
#define GEN_MAX_SUBSLICES (8) /* ICL upper bound */
#define GEN_MAX_SLICES (3) /* SKL upper bound */
#define GEN_MAX_SUBSLICES (32) /* XEHPSDV upper bound */
#define GEN_SSEU_STRIDE(max_entries) DIV_ROUND_UP(max_entries, BITS_PER_BYTE)
#define GEN_MAX_SUBSLICE_STRIDE GEN_SSEU_STRIDE(GEN_MAX_SUBSLICES)
#define GEN_MAX_EUS (16) /* TGL upper bound */
#define GEN_MAX_EU_STRIDE GEN_SSEU_STRIDE(GEN_MAX_EUS)
#define GEN_DSS_PER_GSLICE 4
#define GEN_DSS_PER_CSLICE 8
#define GEN_DSS_PER_MSLICE 8
struct sseu_dev_info {
u8 slice_mask;
u8 subslice_mask[GEN_MAX_SLICES * GEN_MAX_SUBSLICE_STRIDE];
@ -104,4 +108,6 @@ void intel_sseu_dump(const struct sseu_dev_info *sseu, struct drm_printer *p);
void intel_sseu_print_topology(const struct sseu_dev_info *sseu,
struct drm_printer *p);
u16 intel_slicemask_from_dssmask(u64 dss_mask, int dss_per_slice);
#endif /* __INTEL_SSEU_H__ */

View File

@ -50,10 +50,10 @@ static void cherryview_sseu_device_status(struct intel_gt *gt,
#undef SS_MAX
}
static void gen10_sseu_device_status(struct intel_gt *gt,
static void gen11_sseu_device_status(struct intel_gt *gt,
struct sseu_dev_info *sseu)
{
#define SS_MAX 6
#define SS_MAX 8
struct intel_uncore *uncore = gt->uncore;
const struct intel_gt_info *info = &gt->info;
u32 s_reg[SS_MAX], eu_reg[2 * SS_MAX], eu_mask[2];
@ -267,8 +267,8 @@ int intel_sseu_status(struct seq_file *m, struct intel_gt *gt)
bdw_sseu_device_status(gt, &sseu);
else if (GRAPHICS_VER(i915) == 9)
gen9_sseu_device_status(gt, &sseu);
else if (GRAPHICS_VER(i915) >= 10)
gen10_sseu_device_status(gt, &sseu);
else if (GRAPHICS_VER(i915) >= 11)
gen11_sseu_device_status(gt, &sseu);
}
i915_print_sseu_info(m, false, HAS_POOLED_EU(i915), &sseu);

View File

@ -150,13 +150,14 @@ static void _wa_add(struct i915_wa_list *wal, const struct i915_wa *wa)
}
static void wa_add(struct i915_wa_list *wal, i915_reg_t reg,
u32 clear, u32 set, u32 read_mask)
u32 clear, u32 set, u32 read_mask, bool masked_reg)
{
struct i915_wa wa = {
.reg = reg,
.clr = clear,
.set = set,
.read = read_mask,
.masked_reg = masked_reg,
};
_wa_add(wal, &wa);
@ -165,7 +166,7 @@ static void wa_add(struct i915_wa_list *wal, i915_reg_t reg,
static void
wa_write_clr_set(struct i915_wa_list *wal, i915_reg_t reg, u32 clear, u32 set)
{
wa_add(wal, reg, clear, set, clear);
wa_add(wal, reg, clear, set, clear, false);
}
static void
@ -200,20 +201,20 @@ wa_write_clr(struct i915_wa_list *wal, i915_reg_t reg, u32 clr)
static void
wa_masked_en(struct i915_wa_list *wal, i915_reg_t reg, u32 val)
{
wa_add(wal, reg, 0, _MASKED_BIT_ENABLE(val), val);
wa_add(wal, reg, 0, _MASKED_BIT_ENABLE(val), val, true);
}
static void
wa_masked_dis(struct i915_wa_list *wal, i915_reg_t reg, u32 val)
{
wa_add(wal, reg, 0, _MASKED_BIT_DISABLE(val), val);
wa_add(wal, reg, 0, _MASKED_BIT_DISABLE(val), val, true);
}
static void
wa_masked_field_set(struct i915_wa_list *wal, i915_reg_t reg,
u32 mask, u32 val)
{
wa_add(wal, reg, 0, _MASKED_FIELD(mask, val), mask);
wa_add(wal, reg, 0, _MASKED_FIELD(mask, val), mask, true);
}
static void gen6_ctx_workarounds_init(struct intel_engine_cs *engine,
@ -514,53 +515,15 @@ static void cfl_ctx_workarounds_init(struct intel_engine_cs *engine,
GEN7_SBE_SS_CACHE_DISPATCH_PORT_SHARING_DISABLE);
}
static void cnl_ctx_workarounds_init(struct intel_engine_cs *engine,
struct i915_wa_list *wal)
{
/* WaForceContextSaveRestoreNonCoherent:cnl */
wa_masked_en(wal, CNL_HDC_CHICKEN0,
HDC_FORCE_CONTEXT_SAVE_RESTORE_NON_COHERENT);
/* WaDisableReplayBufferBankArbitrationOptimization:cnl */
wa_masked_en(wal, COMMON_SLICE_CHICKEN2,
GEN8_SBE_DISABLE_REPLAY_BUF_OPTIMIZATION);
/* WaPushConstantDereferenceHoldDisable:cnl */
wa_masked_en(wal, GEN7_ROW_CHICKEN2, PUSH_CONSTANT_DEREF_DISABLE);
/* FtrEnableFastAnisoL1BankingFix:cnl */
wa_masked_en(wal, HALF_SLICE_CHICKEN3, CNL_FAST_ANISO_L1_BANKING_FIX);
/* WaDisable3DMidCmdPreemption:cnl */
wa_masked_dis(wal, GEN8_CS_CHICKEN1, GEN9_PREEMPT_3D_OBJECT_LEVEL);
/* WaDisableGPGPUMidCmdPreemption:cnl */
wa_masked_field_set(wal, GEN8_CS_CHICKEN1,
GEN9_PREEMPT_GPGPU_LEVEL_MASK,
GEN9_PREEMPT_GPGPU_COMMAND_LEVEL);
/* WaDisableEarlyEOT:cnl */
wa_masked_en(wal, GEN8_ROW_CHICKEN, DISABLE_EARLY_EOT);
}
static void icl_ctx_workarounds_init(struct intel_engine_cs *engine,
struct i915_wa_list *wal)
{
struct drm_i915_private *i915 = engine->i915;
/* WaDisableBankHangMode:icl */
/* Wa_1406697149 (WaDisableBankHangMode:icl) */
wa_write(wal,
GEN8_L3CNTLREG,
intel_uncore_read(engine->uncore, GEN8_L3CNTLREG) |
GEN8_ERRDETBCTRL);
/* Wa_1604370585:icl (pre-prod)
* Formerly known as WaPushConstantDereferenceHoldDisable
*/
if (IS_ICL_REVID(i915, ICL_REVID_A0, ICL_REVID_B0))
wa_masked_en(wal, GEN7_ROW_CHICKEN2,
PUSH_CONSTANT_DEREF_DISABLE);
/* WaForceEnableNonCoherent:icl
* This is not the same workaround as in early Gen9 platforms, where
* lacking this could cause system hangs, but coherency performance
@ -570,23 +533,11 @@ static void icl_ctx_workarounds_init(struct intel_engine_cs *engine,
*/
wa_masked_en(wal, ICL_HDC_MODE, HDC_FORCE_NON_COHERENT);
/* Wa_2006611047:icl (pre-prod)
* Formerly known as WaDisableImprovedTdlClkGating
*/
if (IS_ICL_REVID(i915, ICL_REVID_A0, ICL_REVID_A0))
wa_masked_en(wal, GEN7_ROW_CHICKEN2,
GEN11_TDL_CLOCK_GATING_FIX_DISABLE);
/* Wa_2006665173:icl (pre-prod) */
if (IS_ICL_REVID(i915, ICL_REVID_A0, ICL_REVID_A0))
wa_masked_en(wal, GEN11_COMMON_SLICE_CHICKEN3,
GEN11_BLEND_EMB_FIX_DISABLE_IN_RCC);
/* WaEnableFloatBlendOptimization:icl */
wa_write_clr_set(wal,
GEN10_CACHE_MODE_SS,
0, /* write-only, so skip validation */
_MASKED_BIT_ENABLE(FLOAT_BLEND_OPTIMIZATION_ENABLE));
wa_add(wal, GEN10_CACHE_MODE_SS, 0,
_MASKED_BIT_ENABLE(FLOAT_BLEND_OPTIMIZATION_ENABLE),
0 /* write-only, so skip validation */,
true);
/* WaDisableGPGPUMidThreadPreemption:icl */
wa_masked_field_set(wal, GEN8_CS_CHICKEN1,
@ -631,7 +582,7 @@ static void gen12_ctx_gt_tuning_init(struct intel_engine_cs *engine,
FF_MODE2,
FF_MODE2_TDS_TIMER_MASK,
FF_MODE2_TDS_TIMER_128,
0);
0, false);
}
static void gen12_ctx_workarounds_init(struct intel_engine_cs *engine,
@ -640,15 +591,16 @@ static void gen12_ctx_workarounds_init(struct intel_engine_cs *engine,
gen12_ctx_gt_tuning_init(engine, wal);
/*
* Wa_1409142259:tgl
* Wa_1409347922:tgl
* Wa_1409252684:tgl
* Wa_1409217633:tgl
* Wa_1409207793:tgl
* Wa_1409178076:tgl
* Wa_1408979724:tgl
* Wa_14010443199:rkl
* Wa_14010698770:rkl
* Wa_1409142259:tgl,dg1,adl-p
* Wa_1409347922:tgl,dg1,adl-p
* Wa_1409252684:tgl,dg1,adl-p
* Wa_1409217633:tgl,dg1,adl-p
* Wa_1409207793:tgl,dg1,adl-p
* Wa_1409178076:tgl,dg1,adl-p
* Wa_1408979724:tgl,dg1,adl-p
* Wa_14010443199:tgl,rkl,dg1,adl-p
* Wa_14010698770:tgl,rkl,dg1,adl-s,adl-p
* Wa_1409342910:tgl,rkl,dg1,adl-s,adl-p
*/
wa_masked_en(wal, GEN11_COMMON_SLICE_CHICKEN3,
GEN12_DISABLE_CPS_AWARE_COLOR_PIPE);
@ -668,7 +620,14 @@ static void gen12_ctx_workarounds_init(struct intel_engine_cs *engine,
FF_MODE2,
FF_MODE2_GS_TIMER_MASK,
FF_MODE2_GS_TIMER_224,
0);
0, false);
/*
* Wa_14012131227:dg1
* Wa_1508744258:tgl,rkl,dg1,adl-s,adl-p
*/
wa_masked_en(wal, GEN7_COMMON_SLICE_CHICKEN1,
GEN9_RHWO_OPTIMIZATION_DISABLE);
}
static void dg1_ctx_workarounds_init(struct intel_engine_cs *engine,
@ -703,8 +662,6 @@ __intel_engine_init_ctx_wa(struct intel_engine_cs *engine,
gen12_ctx_workarounds_init(engine, wal);
else if (GRAPHICS_VER(i915) == 11)
icl_ctx_workarounds_init(engine, wal);
else if (IS_CANNONLAKE(i915))
cnl_ctx_workarounds_init(engine, wal);
else if (IS_COFFEELAKE(i915) || IS_COMETLAKE(i915))
cfl_ctx_workarounds_init(engine, wal);
else if (IS_GEMINILAKE(i915))
@ -839,7 +796,7 @@ hsw_gt_workarounds_init(struct drm_i915_private *i915, struct i915_wa_list *wal)
wa_add(wal,
HSW_ROW_CHICKEN3, 0,
_MASKED_BIT_ENABLE(HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE),
0 /* XXX does this reg exist? */);
0 /* XXX does this reg exist? */, true);
/* WaVSRefCountFullforceMissDisable:hsw */
wa_write_clr(wal, GEN7_FF_THREAD_MODE, GEN7_FF_VS_REF_CNT_FFME);
@ -882,30 +839,19 @@ skl_gt_workarounds_init(struct drm_i915_private *i915, struct i915_wa_list *wal)
GEN8_EU_GAUNIT_CLOCK_GATE_DISABLE);
/* WaInPlaceDecompressionHang:skl */
if (IS_SKL_REVID(i915, SKL_REVID_H0, REVID_FOREVER))
if (IS_SKL_GT_STEP(i915, STEP_A0, STEP_H0))
wa_write_or(wal,
GEN9_GAMT_ECO_REG_RW_IA,
GAMT_ECO_ENABLE_IN_PLACE_DECOMPRESS);
}
static void
bxt_gt_workarounds_init(struct drm_i915_private *i915, struct i915_wa_list *wal)
{
gen9_gt_workarounds_init(i915, wal);
/* WaInPlaceDecompressionHang:bxt */
wa_write_or(wal,
GEN9_GAMT_ECO_REG_RW_IA,
GAMT_ECO_ENABLE_IN_PLACE_DECOMPRESS);
}
static void
kbl_gt_workarounds_init(struct drm_i915_private *i915, struct i915_wa_list *wal)
{
gen9_gt_workarounds_init(i915, wal);
/* WaDisableDynamicCreditSharing:kbl */
if (IS_KBL_GT_STEP(i915, 0, STEP_B0))
if (IS_KBL_GT_STEP(i915, 0, STEP_C0))
wa_write_or(wal,
GAMT_CHKN_BIT_REG,
GAMT_CHKN_DISABLE_DYNAMIC_CREDIT_SHARING);
@ -943,98 +889,144 @@ cfl_gt_workarounds_init(struct drm_i915_private *i915, struct i915_wa_list *wal)
GAMT_ECO_ENABLE_IN_PLACE_DECOMPRESS);
}
static void
wa_init_mcr(struct drm_i915_private *i915, struct i915_wa_list *wal)
static void __set_mcr_steering(struct i915_wa_list *wal,
i915_reg_t steering_reg,
unsigned int slice, unsigned int subslice)
{
const struct sseu_dev_info *sseu = &i915->gt.info.sseu;
unsigned int slice, subslice;
u32 l3_en, mcr, mcr_mask;
u32 mcr, mcr_mask;
GEM_BUG_ON(GRAPHICS_VER(i915) < 10);
mcr = GEN11_MCR_SLICE(slice) | GEN11_MCR_SUBSLICE(subslice);
mcr_mask = GEN11_MCR_SLICE_MASK | GEN11_MCR_SUBSLICE_MASK;
/*
* WaProgramMgsrForL3BankSpecificMmioReads: cnl,icl
* L3Banks could be fused off in single slice scenario. If that is
* the case, we might need to program MCR select to a valid L3Bank
* by default, to make sure we correctly read certain registers
* later on (in the range 0xB100 - 0xB3FF).
*
* WaProgramMgsrForCorrectSliceSpecificMmioReads:cnl,icl
* Before any MMIO read into slice/subslice specific registers, MCR
* packet control register needs to be programmed to point to any
* enabled s/ss pair. Otherwise, incorrect values will be returned.
* This means each subsequent MMIO read will be forwarded to an
* specific s/ss combination, but this is OK since these registers
* are consistent across s/ss in almost all cases. In the rare
* occasions, such as INSTDONE, where this value is dependent
* on s/ss combo, the read should be done with read_subslice_reg.
*
* Since GEN8_MCR_SELECTOR contains dual-purpose bits which select both
* to which subslice, or to which L3 bank, the respective mmio reads
* will go, we have to find a common index which works for both
* accesses.
*
* Case where we cannot find a common index fortunately should not
* happen in production hardware, so we only emit a warning instead of
* implementing something more complex that requires checking the range
* of every MMIO read.
*/
wa_write_clr_set(wal, steering_reg, mcr_mask, mcr);
}
if (GRAPHICS_VER(i915) >= 10 && is_power_of_2(sseu->slice_mask)) {
u32 l3_fuse =
intel_uncore_read(&i915->uncore, GEN10_MIRROR_FUSE3) &
GEN10_L3BANK_MASK;
static void __add_mcr_wa(struct drm_i915_private *i915, struct i915_wa_list *wal,
unsigned int slice, unsigned int subslice)
{
drm_dbg(&i915->drm, "MCR slice=0x%x, subslice=0x%x\n", slice, subslice);
drm_dbg(&i915->drm, "L3 fuse = %x\n", l3_fuse);
l3_en = ~(l3_fuse << GEN10_L3BANK_PAIR_COUNT | l3_fuse);
} else {
l3_en = ~0;
}
slice = fls(sseu->slice_mask) - 1;
subslice = fls(l3_en & intel_sseu_get_subslices(sseu, slice));
if (!subslice) {
drm_warn(&i915->drm,
"No common index found between subslice mask %x and L3 bank mask %x!\n",
intel_sseu_get_subslices(sseu, slice), l3_en);
subslice = fls(l3_en);
drm_WARN_ON(&i915->drm, !subslice);
}
subslice--;
if (GRAPHICS_VER(i915) >= 11) {
mcr = GEN11_MCR_SLICE(slice) | GEN11_MCR_SUBSLICE(subslice);
mcr_mask = GEN11_MCR_SLICE_MASK | GEN11_MCR_SUBSLICE_MASK;
} else {
mcr = GEN8_MCR_SLICE(slice) | GEN8_MCR_SUBSLICE(subslice);
mcr_mask = GEN8_MCR_SLICE_MASK | GEN8_MCR_SUBSLICE_MASK;
}
drm_dbg(&i915->drm, "MCR slice/subslice = %x\n", mcr);
wa_write_clr_set(wal, GEN8_MCR_SELECTOR, mcr_mask, mcr);
__set_mcr_steering(wal, GEN8_MCR_SELECTOR, slice, subslice);
}
static void
cnl_gt_workarounds_init(struct drm_i915_private *i915, struct i915_wa_list *wal)
icl_wa_init_mcr(struct drm_i915_private *i915, struct i915_wa_list *wal)
{
wa_init_mcr(i915, wal);
const struct sseu_dev_info *sseu = &i915->gt.info.sseu;
unsigned int slice, subslice;
/* WaInPlaceDecompressionHang:cnl */
wa_write_or(wal,
GEN9_GAMT_ECO_REG_RW_IA,
GAMT_ECO_ENABLE_IN_PLACE_DECOMPRESS);
GEM_BUG_ON(GRAPHICS_VER(i915) < 11);
GEM_BUG_ON(hweight8(sseu->slice_mask) > 1);
slice = 0;
/*
* Although a platform may have subslices, we need to always steer
* reads to the lowest instance that isn't fused off. When Render
* Power Gating is enabled, grabbing forcewake will only power up a
* single subslice (the "minconfig") if there isn't a real workload
* that needs to be run; this means that if we steer register reads to
* one of the higher subslices, we run the risk of reading back 0's or
* random garbage.
*/
subslice = __ffs(intel_sseu_get_subslices(sseu, slice));
/*
* If the subslice we picked above also steers us to a valid L3 bank,
* then we can just rely on the default steering and won't need to
* worry about explicitly re-steering L3BANK reads later.
*/
if (i915->gt.info.l3bank_mask & BIT(subslice))
i915->gt.steering_table[L3BANK] = NULL;
__add_mcr_wa(i915, wal, slice, subslice);
}
static void
xehp_init_mcr(struct intel_gt *gt, struct i915_wa_list *wal)
{
struct drm_i915_private *i915 = gt->i915;
const struct sseu_dev_info *sseu = &gt->info.sseu;
unsigned long slice, subslice = 0, slice_mask = 0;
u64 dss_mask = 0;
u32 lncf_mask = 0;
int i;
/*
* On Xe_HP the steering increases in complexity. There are now several
* more units that require steering and we're not guaranteed to be able
* to find a common setting for all of them. These are:
* - GSLICE (fusable)
* - DSS (sub-unit within gslice; fusable)
* - L3 Bank (fusable)
* - MSLICE (fusable)
* - LNCF (sub-unit within mslice; always present if mslice is present)
*
* We'll do our default/implicit steering based on GSLICE (in the
* sliceid field) and DSS (in the subsliceid field). If we can
* find overlap between the valid MSLICE and/or LNCF values with
* a suitable GSLICE, then we can just re-use the default value and
* skip and explicit steering at runtime.
*
* We only need to look for overlap between GSLICE/MSLICE/LNCF to find
* a valid sliceid value. DSS steering is the only type of steering
* that utilizes the 'subsliceid' bits.
*
* Also note that, even though the steering domain is called "GSlice"
* and it is encoded in the register using the gslice format, the spec
* says that the combined (geometry | compute) fuse should be used to
* select the steering.
*/
/* Find the potential gslice candidates */
dss_mask = intel_sseu_get_subslices(sseu, 0);
slice_mask = intel_slicemask_from_dssmask(dss_mask, GEN_DSS_PER_GSLICE);
/*
* Find the potential LNCF candidates. Either LNCF within a valid
* mslice is fine.
*/
for_each_set_bit(i, &gt->info.mslice_mask, GEN12_MAX_MSLICES)
lncf_mask |= (0x3 << (i * 2));
/*
* Are there any sliceid values that work for both GSLICE and LNCF
* steering?
*/
if (slice_mask & lncf_mask) {
slice_mask &= lncf_mask;
gt->steering_table[LNCF] = NULL;
}
/* How about sliceid values that also work for MSLICE steering? */
if (slice_mask & gt->info.mslice_mask) {
slice_mask &= gt->info.mslice_mask;
gt->steering_table[MSLICE] = NULL;
}
slice = __ffs(slice_mask);
subslice = __ffs(dss_mask >> (slice * GEN_DSS_PER_GSLICE));
WARN_ON(subslice > GEN_DSS_PER_GSLICE);
WARN_ON(dss_mask >> (slice * GEN_DSS_PER_GSLICE) == 0);
__add_mcr_wa(i915, wal, slice, subslice);
/*
* SQIDI ranges are special because they use different steering
* registers than everything else we work with. On XeHP SDV and
* DG2-G10, any value in the steering registers will work fine since
* all instances are present, but DG2-G11 only has SQIDI instances at
* ID's 2 and 3, so we need to steer to one of those. For simplicity
* we'll just steer to a hardcoded "2" since that value will work
* everywhere.
*/
__set_mcr_steering(wal, MCFG_MCR_SELECTOR, 0, 2);
__set_mcr_steering(wal, SF_MCR_SELECTOR, 0, 2);
}
static void
icl_gt_workarounds_init(struct drm_i915_private *i915, struct i915_wa_list *wal)
{
wa_init_mcr(i915, wal);
/* WaInPlaceDecompressionHang:icl */
wa_write_or(wal,
GEN9_GAMT_ECO_REG_RW_IA,
GAMT_ECO_ENABLE_IN_PLACE_DECOMPRESS);
icl_wa_init_mcr(i915, wal);
/* WaModifyGamTlbPartitioning:icl */
wa_write_clr_set(wal,
@ -1057,18 +1049,6 @@ icl_gt_workarounds_init(struct drm_i915_private *i915, struct i915_wa_list *wal)
GEN8_GAMW_ECO_DEV_RW_IA,
GAMW_ECO_DEV_CTX_RELOAD_DISABLE);
/* Wa_1405779004:icl (pre-prod) */
if (IS_ICL_REVID(i915, ICL_REVID_A0, ICL_REVID_A0))
wa_write_or(wal,
SLICE_UNIT_LEVEL_CLKGATE,
MSCUNIT_CLKGATE_DIS);
/* Wa_1406838659:icl (pre-prod) */
if (IS_ICL_REVID(i915, ICL_REVID_A0, ICL_REVID_B0))
wa_write_or(wal,
INF_UNIT_LEVEL_CLKGATE,
CGPSF_CLKGATE_DIS);
/* Wa_1406463099:icl
* Formerly known as WaGamTlbPendError
*/
@ -1078,10 +1058,16 @@ icl_gt_workarounds_init(struct drm_i915_private *i915, struct i915_wa_list *wal)
/* Wa_1607087056:icl,ehl,jsl */
if (IS_ICELAKE(i915) ||
IS_JSL_EHL_REVID(i915, EHL_REVID_A0, EHL_REVID_A0))
IS_JSL_EHL_GT_STEP(i915, STEP_A0, STEP_B0))
wa_write_or(wal,
SLICE_UNIT_LEVEL_CLKGATE,
L3_CLKGATE_DIS | L3_CR2X_CLKGATE_DIS);
/*
* This is not a documented workaround, but rather an optimization
* to reduce sampler power.
*/
wa_write_clr(wal, GEN10_DFR_RATIO_EN_AND_CHICKEN, DFR_DISABLE);
}
/*
@ -1111,10 +1097,13 @@ static void
gen12_gt_workarounds_init(struct drm_i915_private *i915,
struct i915_wa_list *wal)
{
wa_init_mcr(i915, wal);
icl_wa_init_mcr(i915, wal);
/* Wa_14011060649:tgl,rkl,dg1,adls */
/* Wa_14011060649:tgl,rkl,dg1,adl-s,adl-p */
wa_14011060649(i915, wal);
/* Wa_14011059788:tgl,rkl,adl-s,dg1,adl-p */
wa_write_or(wal, GEN10_DFR_RATIO_EN_AND_CHICKEN, DFR_DISABLE);
}
static void
@ -1123,19 +1112,19 @@ tgl_gt_workarounds_init(struct drm_i915_private *i915, struct i915_wa_list *wal)
gen12_gt_workarounds_init(i915, wal);
/* Wa_1409420604:tgl */
if (IS_TGL_UY_GT_STEP(i915, STEP_A0, STEP_A0))
if (IS_TGL_UY_GT_STEP(i915, STEP_A0, STEP_B0))
wa_write_or(wal,
SUBSLICE_UNIT_LEVEL_CLKGATE2,
CPSSUNIT_CLKGATE_DIS);
/* Wa_1607087056:tgl also know as BUG:1409180338 */
if (IS_TGL_UY_GT_STEP(i915, STEP_A0, STEP_A0))
if (IS_TGL_UY_GT_STEP(i915, STEP_A0, STEP_B0))
wa_write_or(wal,
SLICE_UNIT_LEVEL_CLKGATE,
L3_CLKGATE_DIS | L3_CR2X_CLKGATE_DIS);
/* Wa_1408615072:tgl[a0] */
if (IS_TGL_UY_GT_STEP(i915, STEP_A0, STEP_A0))
if (IS_TGL_UY_GT_STEP(i915, STEP_A0, STEP_B0))
wa_write_or(wal, UNSLICE_UNIT_LEVEL_CLKGATE2,
VSUNIT_CLKGATE_DIS_TGL);
}
@ -1146,7 +1135,7 @@ dg1_gt_workarounds_init(struct drm_i915_private *i915, struct i915_wa_list *wal)
gen12_gt_workarounds_init(i915, wal);
/* Wa_1607087056:dg1 */
if (IS_DG1_REVID(i915, DG1_REVID_A0, DG1_REVID_A0))
if (IS_DG1_GT_STEP(i915, STEP_A0, STEP_B0))
wa_write_or(wal,
SLICE_UNIT_LEVEL_CLKGATE,
L3_CLKGATE_DIS | L3_CR2X_CLKGATE_DIS);
@ -1164,10 +1153,18 @@ dg1_gt_workarounds_init(struct drm_i915_private *i915, struct i915_wa_list *wal)
VSUNIT_CLKGATE_DIS_TGL);
}
static void
xehpsdv_gt_workarounds_init(struct drm_i915_private *i915, struct i915_wa_list *wal)
{
xehp_init_mcr(&i915->gt, wal);
}
static void
gt_init_workarounds(struct drm_i915_private *i915, struct i915_wa_list *wal)
{
if (IS_DG1(i915))
if (IS_XEHPSDV(i915))
xehpsdv_gt_workarounds_init(i915, wal);
else if (IS_DG1(i915))
dg1_gt_workarounds_init(i915, wal);
else if (IS_TIGERLAKE(i915))
tgl_gt_workarounds_init(i915, wal);
@ -1175,8 +1172,6 @@ gt_init_workarounds(struct drm_i915_private *i915, struct i915_wa_list *wal)
gen12_gt_workarounds_init(i915, wal);
else if (GRAPHICS_VER(i915) == 11)
icl_gt_workarounds_init(i915, wal);
else if (IS_CANNONLAKE(i915))
cnl_gt_workarounds_init(i915, wal);
else if (IS_COFFEELAKE(i915) || IS_COMETLAKE(i915))
cfl_gt_workarounds_init(i915, wal);
else if (IS_GEMINILAKE(i915))
@ -1184,7 +1179,7 @@ gt_init_workarounds(struct drm_i915_private *i915, struct i915_wa_list *wal)
else if (IS_KABYLAKE(i915))
kbl_gt_workarounds_init(i915, wal);
else if (IS_BROXTON(i915))
bxt_gt_workarounds_init(i915, wal);
gen9_gt_workarounds_init(i915, wal);
else if (IS_SKYLAKE(i915))
skl_gt_workarounds_init(i915, wal);
else if (IS_HASWELL(i915))
@ -1247,8 +1242,9 @@ wa_verify(const struct i915_wa *wa, u32 cur, const char *name, const char *from)
}
static void
wa_list_apply(struct intel_uncore *uncore, const struct i915_wa_list *wal)
wa_list_apply(struct intel_gt *gt, const struct i915_wa_list *wal)
{
struct intel_uncore *uncore = gt->uncore;
enum forcewake_domains fw;
unsigned long flags;
struct i915_wa *wa;
@ -1263,13 +1259,16 @@ wa_list_apply(struct intel_uncore *uncore, const struct i915_wa_list *wal)
intel_uncore_forcewake_get__locked(uncore, fw);
for (i = 0, wa = wal->list; i < wal->count; i++, wa++) {
if (wa->clr)
intel_uncore_rmw_fw(uncore, wa->reg, wa->clr, wa->set);
else
intel_uncore_write_fw(uncore, wa->reg, wa->set);
u32 val, old = 0;
/* open-coded rmw due to steering */
old = wa->clr ? intel_gt_read_register_fw(gt, wa->reg) : 0;
val = (old & ~wa->clr) | wa->set;
if (val != old || !wa->clr)
intel_uncore_write_fw(uncore, wa->reg, val);
if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
wa_verify(wa,
intel_uncore_read_fw(uncore, wa->reg),
wa_verify(wa, intel_gt_read_register_fw(gt, wa->reg),
wal->name, "application");
}
@ -1279,28 +1278,39 @@ wa_list_apply(struct intel_uncore *uncore, const struct i915_wa_list *wal)
void intel_gt_apply_workarounds(struct intel_gt *gt)
{
wa_list_apply(gt->uncore, &gt->i915->gt_wa_list);
wa_list_apply(gt, &gt->i915->gt_wa_list);
}
static bool wa_list_verify(struct intel_uncore *uncore,
static bool wa_list_verify(struct intel_gt *gt,
const struct i915_wa_list *wal,
const char *from)
{
struct intel_uncore *uncore = gt->uncore;
struct i915_wa *wa;
enum forcewake_domains fw;
unsigned long flags;
unsigned int i;
bool ok = true;
fw = wal_get_fw_for_rmw(uncore, wal);
spin_lock_irqsave(&uncore->lock, flags);
intel_uncore_forcewake_get__locked(uncore, fw);
for (i = 0, wa = wal->list; i < wal->count; i++, wa++)
ok &= wa_verify(wa,
intel_uncore_read(uncore, wa->reg),
intel_gt_read_register_fw(gt, wa->reg),
wal->name, from);
intel_uncore_forcewake_put__locked(uncore, fw);
spin_unlock_irqrestore(&uncore->lock, flags);
return ok;
}
bool intel_gt_verify_workarounds(struct intel_gt *gt, const char *from)
{
return wa_list_verify(gt->uncore, &gt->i915->gt_wa_list, from);
return wa_list_verify(gt, &gt->i915->gt_wa_list, from);
}
__maybe_unused
@ -1438,17 +1448,6 @@ static void cml_whitelist_build(struct intel_engine_cs *engine)
cfl_whitelist_build(engine);
}
static void cnl_whitelist_build(struct intel_engine_cs *engine)
{
struct i915_wa_list *w = &engine->whitelist;
if (engine->class != RENDER_CLASS)
return;
/* WaEnablePreemptionGranularityControlByUMD:cnl */
whitelist_reg(w, GEN8_CS_CHICKEN1);
}
static void icl_whitelist_build(struct intel_engine_cs *engine)
{
struct i915_wa_list *w = &engine->whitelist;
@ -1542,7 +1541,7 @@ static void dg1_whitelist_build(struct intel_engine_cs *engine)
tgl_whitelist_build(engine);
/* GEN:BUG:1409280441:dg1 */
if (IS_DG1_REVID(engine->i915, DG1_REVID_A0, DG1_REVID_A0) &&
if (IS_DG1_GT_STEP(engine->i915, STEP_A0, STEP_B0) &&
(engine->class == RENDER_CLASS ||
engine->class == COPY_ENGINE_CLASS))
whitelist_reg_ext(w, RING_ID(engine->mmio_base),
@ -1562,8 +1561,6 @@ void intel_engine_init_whitelist(struct intel_engine_cs *engine)
tgl_whitelist_build(engine);
else if (GRAPHICS_VER(i915) == 11)
icl_whitelist_build(engine);
else if (IS_CANNONLAKE(i915))
cnl_whitelist_build(engine);
else if (IS_COMETLAKE(i915))
cml_whitelist_build(engine);
else if (IS_COFFEELAKE(i915))
@ -1612,8 +1609,8 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
{
struct drm_i915_private *i915 = engine->i915;
if (IS_DG1_REVID(i915, DG1_REVID_A0, DG1_REVID_A0) ||
IS_TGL_UY_GT_STEP(i915, STEP_A0, STEP_A0)) {
if (IS_DG1_GT_STEP(i915, STEP_A0, STEP_B0) ||
IS_TGL_UY_GT_STEP(i915, STEP_A0, STEP_B0)) {
/*
* Wa_1607138336:tgl[a0],dg1[a0]
* Wa_1607063988:tgl[a0],dg1[a0]
@ -1623,7 +1620,7 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
GEN12_DISABLE_POSH_BUSY_FF_DOP_CG);
}
if (IS_TGL_UY_GT_STEP(i915, STEP_A0, STEP_A0)) {
if (IS_TGL_UY_GT_STEP(i915, STEP_A0, STEP_B0)) {
/*
* Wa_1606679103:tgl
* (see also Wa_1606682166:icl)
@ -1633,44 +1630,46 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
GEN7_DISABLE_SAMPLER_PREFETCH);
}
if (IS_ALDERLAKE_S(i915) || IS_DG1(i915) ||
if (IS_ALDERLAKE_P(i915) || IS_ALDERLAKE_S(i915) || IS_DG1(i915) ||
IS_ROCKETLAKE(i915) || IS_TIGERLAKE(i915)) {
/* Wa_1606931601:tgl,rkl,dg1,adl-s */
/* Wa_1606931601:tgl,rkl,dg1,adl-s,adl-p */
wa_masked_en(wal, GEN7_ROW_CHICKEN2, GEN12_DISABLE_EARLY_READ);
/*
* Wa_1407928979:tgl A*
* Wa_18011464164:tgl[B0+],dg1[B0+]
* Wa_22010931296:tgl[B0+],dg1[B0+]
* Wa_14010919138:rkl,dg1,adl-s
* Wa_14010919138:rkl,dg1,adl-s,adl-p
*/
wa_write_or(wal, GEN7_FF_THREAD_MODE,
GEN12_FF_TESSELATION_DOP_GATE_DISABLE);
/*
* Wa_1606700617:tgl,dg1
* Wa_22010271021:tgl,rkl,dg1, adl-s
* Wa_1606700617:tgl,dg1,adl-p
* Wa_22010271021:tgl,rkl,dg1,adl-s,adl-p
* Wa_14010826681:tgl,dg1,rkl,adl-p
*/
wa_masked_en(wal,
GEN9_CS_DEBUG_MODE1,
FF_DOP_CLOCK_GATE_DISABLE);
}
if (IS_ALDERLAKE_S(i915) || IS_DG1_REVID(i915, DG1_REVID_A0, DG1_REVID_A0) ||
if (IS_ALDERLAKE_P(i915) || IS_ALDERLAKE_S(i915) ||
IS_DG1_GT_STEP(i915, STEP_A0, STEP_B0) ||
IS_ROCKETLAKE(i915) || IS_TIGERLAKE(i915)) {
/* Wa_1409804808:tgl,rkl,dg1[a0],adl-s */
/* Wa_1409804808:tgl,rkl,dg1[a0],adl-s,adl-p */
wa_masked_en(wal, GEN7_ROW_CHICKEN2,
GEN12_PUSH_CONST_DEREF_HOLD_DIS);
/*
* Wa_1409085225:tgl
* Wa_14010229206:tgl,rkl,dg1[a0],adl-s
* Wa_14010229206:tgl,rkl,dg1[a0],adl-s,adl-p
*/
wa_masked_en(wal, GEN9_ROW_CHICKEN4, GEN12_DISABLE_TDL_PUSH);
}
if (IS_DG1_REVID(i915, DG1_REVID_A0, DG1_REVID_A0) ||
if (IS_DG1_GT_STEP(i915, STEP_A0, STEP_B0) ||
IS_ROCKETLAKE(i915) || IS_TIGERLAKE(i915)) {
/*
* Wa_1607030317:tgl
@ -1688,8 +1687,9 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
GEN8_RC_SEMA_IDLE_MSG_DISABLE);
}
if (IS_DG1(i915) || IS_ROCKETLAKE(i915) || IS_TIGERLAKE(i915)) {
/* Wa_1406941453:tgl,rkl,dg1 */
if (IS_DG1(i915) || IS_ROCKETLAKE(i915) || IS_TIGERLAKE(i915) ||
IS_ALDERLAKE_S(i915) || IS_ALDERLAKE_P(i915)) {
/* Wa_1406941453:tgl,rkl,dg1,adl-s,adl-p */
wa_masked_en(wal,
GEN10_SAMPLER_MODE,
ENABLE_SMALLPL);
@ -1701,11 +1701,6 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
_3D_CHICKEN3,
_3D_CHICKEN3_AA_LINE_QUALITY_FIX_ENABLE);
/* WaPipelineFlushCoherentLines:icl */
wa_write_or(wal,
GEN8_L3SQCREG4,
GEN8_LQSC_FLUSH_COHERENT_LINES);
/*
* Wa_1405543622:icl
* Formerly known as WaGAPZPriorityScheme
@ -1735,19 +1730,6 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
GEN8_L3SQCREG4,
GEN11_LQSC_CLEAN_EVICT_DISABLE);
/* WaForwardProgressSoftReset:icl */
wa_write_or(wal,
GEN10_SCRATCH_LNCF2,
PMFLUSHDONE_LNICRSDROP |
PMFLUSH_GAPL3UNBLOCK |
PMFLUSHDONE_LNEBLK);
/* Wa_1406609255:icl (pre-prod) */
if (IS_ICL_REVID(i915, ICL_REVID_A0, ICL_REVID_B0))
wa_write_or(wal,
GEN7_SARCHKMD,
GEN7_DISABLE_DEMAND_PREFETCH);
/* Wa_1606682166:icl */
wa_write_or(wal,
GEN7_SARCHKMD,
@ -1947,10 +1929,10 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
* disable bit, which we don't touch here, but it's good
* to keep in mind (see 3DSTATE_PS and 3DSTATE_WM).
*/
wa_add(wal, GEN7_GT_MODE, 0,
_MASKED_FIELD(GEN6_WIZ_HASHING_MASK,
GEN6_WIZ_HASHING_16x4),
GEN6_WIZ_HASHING_16x4);
wa_masked_field_set(wal,
GEN7_GT_MODE,
GEN6_WIZ_HASHING_MASK,
GEN6_WIZ_HASHING_16x4);
}
if (IS_GRAPHICS_VER(i915, 6, 7))
@ -2000,10 +1982,10 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
* disable bit, which we don't touch here, but it's good
* to keep in mind (see 3DSTATE_PS and 3DSTATE_WM).
*/
wa_add(wal,
GEN6_GT_MODE, 0,
_MASKED_FIELD(GEN6_WIZ_HASHING_MASK, GEN6_WIZ_HASHING_16x4),
GEN6_WIZ_HASHING_16x4);
wa_masked_field_set(wal,
GEN6_GT_MODE,
GEN6_WIZ_HASHING_MASK,
GEN6_WIZ_HASHING_16x4);
/* WaDisable_RenderCache_OperationalFlush:snb */
wa_masked_dis(wal, CACHE_MODE_0, RC_OP_FLUSH_ENABLE);
@ -2024,7 +2006,7 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
wa_add(wal, MI_MODE,
0, _MASKED_BIT_ENABLE(VS_TIMER_DISPATCH),
/* XXX bit doesn't stick on Broadwater */
IS_I965G(i915) ? 0 : VS_TIMER_DISPATCH);
IS_I965G(i915) ? 0 : VS_TIMER_DISPATCH, true);
if (GRAPHICS_VER(i915) == 4)
/*
@ -2039,7 +2021,8 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
*/
wa_add(wal, ECOSKPD,
0, _MASKED_BIT_ENABLE(ECO_CONSTANT_BUFFER_SR_DISABLE),
0 /* XXX bit doesn't stick on Broadwater */);
0 /* XXX bit doesn't stick on Broadwater */,
true);
}
static void
@ -2048,7 +2031,7 @@ xcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
struct drm_i915_private *i915 = engine->i915;
/* WaKBLVECSSemaphoreWaitPoll:kbl */
if (IS_KBL_GT_STEP(i915, STEP_A0, STEP_E0)) {
if (IS_KBL_GT_STEP(i915, STEP_A0, STEP_F0)) {
wa_write(wal,
RING_SEMA_WAIT_POLL(engine->mmio_base),
1);
@ -2081,7 +2064,7 @@ void intel_engine_init_workarounds(struct intel_engine_cs *engine)
void intel_engine_apply_workarounds(struct intel_engine_cs *engine)
{
wa_list_apply(engine->uncore, &engine->wa_list);
wa_list_apply(engine->gt, &engine->wa_list);
}
struct mcr_range {
@ -2107,12 +2090,31 @@ static const struct mcr_range mcr_ranges_gen12[] = {
{},
};
static const struct mcr_range mcr_ranges_xehp[] = {
{ .start = 0x4000, .end = 0x4aff },
{ .start = 0x5200, .end = 0x52ff },
{ .start = 0x5400, .end = 0x7fff },
{ .start = 0x8140, .end = 0x815f },
{ .start = 0x8c80, .end = 0x8dff },
{ .start = 0x94d0, .end = 0x955f },
{ .start = 0x9680, .end = 0x96ff },
{ .start = 0xb000, .end = 0xb3ff },
{ .start = 0xc800, .end = 0xcfff },
{ .start = 0xd800, .end = 0xd8ff },
{ .start = 0xdc00, .end = 0xffff },
{ .start = 0x17000, .end = 0x17fff },
{ .start = 0x24a00, .end = 0x24a7f },
{},
};
static bool mcr_range(struct drm_i915_private *i915, u32 offset)
{
const struct mcr_range *mcr_ranges;
int i;
if (GRAPHICS_VER(i915) >= 12)
if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50))
mcr_ranges = mcr_ranges_xehp;
else if (GRAPHICS_VER(i915) >= 12)
mcr_ranges = mcr_ranges_gen12;
else if (GRAPHICS_VER(i915) >= 8)
mcr_ranges = mcr_ranges_gen8;

Some files were not shown because too many files have changed in this diff Show More