When kernel.h is used in the headers it adds a lot into dependency hell,
especially when there are circular dependencies are involved.
Replace kernel.h inclusion with the list of what is really being used.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20211110102423.54282-1-andriy.shevchenko@linux.intel.com
Userspace can severely fragment rb_hole_addr rbtree by manipulating
alignment while allocating buffers. Fragmented rb_hole_addr rbtree
would result in large delays while allocating buffer object for a
userspace application. It takes long time to find suitable hole
because if we fail to find a suitable hole in the first attempt
then we look for neighbouring nodes using rb_prev()/rb_next().
Traversing rbtree using rb_prev()/rb_next() can take really long
time if the tree is fragmented.
This patch improves searches in fragmented rb_hole_addr rbtree by
modifying it to an augmented rbtree which will store an extra field
in drm_mm_node, subtree_max_hole. Each drm_mm_node now stores maximum
hole size for its subtree in drm_mm_node->subtree_max_hole. Using
drm_mm_node->subtree_max_hole, it is possible to eliminate a complete
subtree if that subtree is unable to serve a request hence reducing
number of rb_prev()/rb_next() used.
With this patch applied, 1 million bo allocs on amdgpu took ~8 sec,
compared to 50k bo allocs which took 28 sec without it.
partial test code:
int test_fragmentation(void)
{
int i = 0;
uint32_t minor_version;
uint32_t major_version;
struct amdgpu_bo_alloc_request request = {};
amdgpu_bo_handle vram_handle[MAX_ALLOC] = {};
amdgpu_device_handle device_handle;
request.alloc_size = 4096;
request.phys_alignment = 8192;
request.preferred_heap = AMDGPU_GEM_DOMAIN_VRAM;
int fd = open("/dev/dri/card0", O_RDWR | O_CLOEXEC);
amdgpu_device_initialize(fd, &major_version, &minor_version,
&device_handle);
for (i = 0; i < MAX_ALLOC; i++) {
amdgpu_bo_alloc(device_handle, &request, &vram_handle[i]);
}
for (i = 0; i < MAX_ALLOC; i++)
amdgpu_bo_free(vram_handle[i]);
return 0;
}
v2:
Use RB_DECLARE_CALLBACKS_MAX to maintain subtree_max_hole
v3:
insert_hole_addr() should be static a function
fix return value of next_hole_high_addr()/next_hole_low_addr()
Reported-by: kbuild test robot <lkp@intel.com>
v4:
Fix commit message.
Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/364341/
Signed-off-by: Christian König <christian.koenig@amd.com>
A straightforward conversion of assignment and checking of the boolean
state flags (allocated, scanned) into non-atomic bitops. The caller
remains responsible for all locking around the drm_mm and its nodes.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191003210100.22250-4-chris@chris-wilson.co.uk
Searching for an available hole by address is slow, as there no
guarantee that a hole will be available and so we must walk over all
nodes in the rbtree before we determine the search was futile. In many
cases, the caller doesn't strictly care for the highest available hole
and was just opportunistically laying out the address space in a
preferred order. In such cases, the caller can accept any address and
would rather do so then do a slow walk.
To be able to mix search strategies, the caller wants to tell the drm_mm
how long to spend on the search. Without a good guide for what should be
the best split, start with a request to try once at most. That is return
the top-most (or lowest) hole if it fulfils the alignment and size
requirements.
v2: Documentation, by why of example (selftests) and kerneldoc.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180521082131.13744-2-chris@chris-wilson.co.uk
As we keep an rbtree of available holes sorted by their size, we can
very easily determine if there is any hole large enough that might
satisfy the allocation request. This helps when dealing with a highly
fragmented address space and a request for a search by address.
To cache the largest size, we convert into the cached rbtree variant
which tracks the leftmost node for us. However, currently we sorted into
ascending size order so the leftmost node is the smallest, and so to
make it the largest hole we need to invert our sorting.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180521082131.13744-1-chris@chris-wilson.co.uk
drm_mm_insert_node_generic() is a simplified version of
drm_mm_insert_node_in_range(), update comment to reflect correct
function name.
Signed-off-by: Liviu Dudau <liviu.dudau@arm.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171101140445.2798-1-Liviu.Dudau@arm.com
Allow interval trees to quickly check for overlaps to avoid unnecesary
tree lookups in interval_tree_iter_first().
As of this patch, all interval tree flavors will require using a
'rb_root_cached' such that we can have the leftmost node easily
available. While most users will make use of this feature, those with
special functions (in addition to the generic insert, delete, search
calls) will avoid using the cached option as they can do funky things
with insertions -- for example, vma_interval_tree_insert_after().
[jglisse@redhat.com: fix deadlock from typo vm_lock_anon_vma()]
Link: http://lkml.kernel.org/r/20170808225719.20723-1-jglisse@redhat.com
Link: http://lkml.kernel.org/r/20170719014603.19029-12-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Doug Ledford <dledford@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Christian Benvenuti <benve@cisco.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
First slice of drm-misc-next for 4.12:
Core/subsystem-wide:
- link status core patch from Manasi, for signalling link train fail
to userspace. I also had the i915 patch in here, but that had a
small buglet in our CI, so reverted.
- more debugfs_remove removal from Noralf, almost there now (Noralf
said he'll try to follow up with the stragglers).
- drm todo moved into kerneldoc, for better visibility (see
Documentation/gpu/todo.rst), lots of starter tasks in there.
- devm_ of helpers + use it in sti (from Ben Gaignard, acked by Rob
Herring)
- extended framebuffer fbdev support (for fbdev flipping), and vblank
wait ioctl fbdev support (Maxime Ripard)
- misc small things all over, as usual
- add vblank callbacks to drm_crtc_funcs, plus make lots of good use
of this to simplify drivers (Shawn Guo)
- new atomic iterator macros to unconfuse old vs. new state
Small drivers:
- vc4 improvements from Eric
- vc4 kerneldocs (Eric)!
- tons of improvements for dw-mipi-dsi in rockchip from John Keeping
and Chris Zhong.
- MAINTAINERS entries for drivers managed in drm-misc. It's not yet
official, still an experiment, but definitely not complete fail and
better to avoid confusion. We kinda screwed that up with drm-misc a
bit when we started committers last year.
- qxl atomic conversion (Gabriel Krisman)
- bunch of virtual driver polish (qxl, virgl, ...)
- misc tiny patches all over
This is the first time we've done the same merge-window blackout for
drm-misc as we've done for drm-intel for ages, hence why we have a
_lot_ of stuff queued already. But it's still only half of drm-intel
(room to grow!), and the drivers in drm-misc experiment seems to work
at least insofar as that you also get lots of driver updates here
alredy.
* tag 'drm-misc-next-2017-03-06' of git://anongit.freedesktop.org/git/drm-misc: (141 commits)
drm/vc4: Fix OOPSes from trying to cache a partially constructed BO.
drm/vc4: Fulfill user BO creation requests from the kernel BO cache.
Revert "drm/i915: Implement Link Rate fallback on Link training failure"
drm/fb-helper: implement ioctl FBIO_WAITFORVSYNC
drm: Update drm_fbdev_cma_init documentation
drm/rockchip/dsi: add dw-mipi power domain support
drm/rockchip/dsi: fix insufficient bandwidth of some panel
dt-bindings: add power domain node for dw-mipi-rockchip
drm/rockchip/dsi: remove mode_valid function
drm/rockchip/dsi: dw-mipi: correct the coding style
drm/rockchip/dsi: dw-mipi: support RK3399 mipi dsi
dt-bindings: add rk3399 support for dw-mipi-rockchip
drm/rockchip: dw-mipi-dsi: add reset control
drm/rockchip: dw-mipi-dsi: support non-burst modes
drm/rockchip: dw-mipi-dsi: defer probe if panel is not loaded
drm/rockchip: vop: test for P{H,V}SYNC
drm/rockchip: dw-mipi-dsi: use positive check for N{H, V}SYNC
drm/rockchip: dw-mipi-dsi: use specific poll helper
drm/rockchip: dw-mipi-dsi: improve PLL configuration
drm/rockchip: dw-mipi-dsi: properly configure PHY timing
...
Update code that relied on sched.h including various MM types for them.
This will allow us to remove the <linux/mm_types.h> include from <linux/sched.h>.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As we require valid start/end parameters, we can replace the initial
potential NULL with a pointer to the drm_mm.head_node and so reduce the
test on every iteration from a NULL + address comparison to just an
address comparison.
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-26 (-26)
function old new delta
i915_gem_evict_for_node 719 693 -26
(No other users outside of the test harness.)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20170204111913.12416-1-chris@chris-wilson.co.uk
The drm_mm range manager claimed to support top-down insertion, but it
was neither searching for the top-most hole that could fit the
allocation request nor fitting the request to the hole correctly.
In order to search the range efficiently, we create a secondary index
for the holes using either their size or their address. This index
allows us to find the smallest hole or the hole at the bottom or top of
the range efficiently, whilst keeping the hole stack to rapidly service
evictions.
v2: Search for holes both high and low. Rename flags to mode.
v3: Discover rb_entry_safe() and use it!
v4: Kerneldoc for enum drm_mm_insert_mode.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Sean Paul <seanpaul@chromium.org>
Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Stephen Warren <swarren@wwwdotorg.org>
Cc: Alexandre Courbot <gnurou@gmail.com>
Cc: Eric Anholt <eric@anholt.net>
Cc: Sinclair Yeh <syeh@vmware.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com> # vmwgfx
Reviewed-by: Lucas Stach <l.stach@pengutronix.de> #etnaviv
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20170202210438.28702-1-chris@chris-wilson.co.uk
Added some boilerplate for the structs, documented members where they
are relevant and plenty of markup for hyperlinks all over. And a few
small wording polish.
Note that the intro needs some more love after the DRM_MM_INSERT_*
patch from Chris has landed.
v2: Spelling fixes (Chris).
v3: Use &struct foo instead of &foo structure (Chris).
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1483044517-5770-3-git-send-email-daniel.vetter@ffwll.ch
Including all drivers. I thought about keeping small compat functions
to avoid having to change all drivers. But I really like the
drm_printer idea, so figured spreading it more widely is a good thing.
v2: Review from Chris:
- Natural argument order and better name for drm_mm_print.
- show_mm() macro in the selftest.
Cc: Rob Clark <robdclark@gmail.com>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Jyri Sarha <jsarha@ti.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1483009764-8281-1-git-send-email-daniel.vetter@ffwll.ch
Remove a superfluous helper as drm_mm_insert_node is equivalent to
insert_node_in_range with a range of [0, U64_MAX].
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161222083641.2691-37-chris@chris-wilson.co.uk
Insulate users from changes to the internal hole tracking within
struct drm_mm_node by using an accessor for hole_follows.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
[danvet: resolve conflicts in i915_vma.c]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Using mm->color_adjust makes the eviction scanner much tricker since we
don't know the actual neighbours of the target hole until after it is
created (after scanning is complete). To work out whether we need to
evict the neighbours because they impact upon the hole, we have to then
check the hole afterwards - requiring an extra step in the user of the
eviction scanner when they apply color_adjust.
v2: Massage kerneldoc.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161222083641.2691-34-chris@chris-wilson.co.uk
Since we mandate a strict reverse-order of drm_mm_scan_remove_block()
after drm_mm_scan_add_block() we can further simplify the list
manipulations when generating the temporary scan-hole.
v2: Highlight the games being played with the lists to track the scan
holes without allocation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161222083641.2691-33-chris@chris-wilson.co.uk
For power-of-two alignments, we can avoid the 64bit divide and do a
simple bitwise add instead.
v2: s/alignment_mask/remainder_mask/
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161222083641.2691-32-chris@chris-wilson.co.uk
Compute the minimal required hole during scan and only evict those nodes
that overlap. This enables us to reduce the number of nodes we need to
evict to the bare minimum.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161222083641.2691-31-chris@chris-wilson.co.uk
Doing the check is trivial (low cost in comparison to overall eviction)
and helps simplify the code.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161222083641.2691-29-chris@chris-wilson.co.uk
The scan state occupies a large proportion of the struct drm_mm and is
rarely used and only contains temporary state. That makes it suitable to
moving to its struct and onto the stack of the callers.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
[danvet: Fix up etnaviv to compile, was missing a BUG_ON.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Since commit ea7b1dd448 ("drm: mm: track free areas implicitly"),
to test whether there are any nodes allocated within the range manager,
we merely have to ask whether the node_list is empty.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161222083641.2691-25-chris@chris-wilson.co.uk
The nodes must be removed in the *reverse* order. This is correct in the
overview, but backwards in the function description. Whilst here add
Intel's copyright statement and tweak some formatting.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161222083641.2691-23-chris@chris-wilson.co.uk
In places (e.g. i915.ko), the alignment is exported to userspace as u64
and there now exists hardware for which we can indeed utilize a u64
alignment. As such, we need to keep 64bit integers throughout when
handling alignment.
Testcase: igt/drm_mm/align64
Testcase: igt/gem_exec_alignment
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161222083641.2691-22-chris@chris-wilson.co.uk
Fairly commonly we want to inspect the node list on the struct drm_mm,
which is buried within an embedded node. Bring it to the surface with a
bit of syntatic sugar.
Note this was intended to be split from commit ad579002c8 ("drm: Add
drm_mm_for_each_node_safe()") before being applied, but my timing sucks.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161222083641.2691-3-chris@chris-wilson.co.uk
A complement to drm_mm_for_each_node(), wraps list_for_each_entry_safe()
for walking the list of nodes safe against removal.
Note from Joonas:
"Most of the diff is about __drm_mm_nodes(mm), which could be split into
own patch and keep the R-b's."
But I don't feel like insisting on the resend.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
[danvet: Add note.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161216074718.32500-4-chris@chris-wilson.co.uk
start is being used as both a macro parameter and as a member of struct
drm_mm_node (node->start). This causes a conflict as cpp then tries to
replace node->start with the passed in string for "start". Work just
fine so long as you also happened to using local variables called start!
Fixes: 522e85dd86 ("drm: Define drm_mm_for_each_node_in_range()")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Christian König <christian.koenig@amd.com>.
[danvet: Fixup kerneldoc.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161127111623.11124-1-chris@chris-wilson.co.uk
Some clients would like to iterate over every node within a certain
range. Make a nice little macro for them to hide the mixing of the
rbtree search and linear walk.
v2: Blurb
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: dri-devel@lists.freedesktop.org
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161123141118.23876-1-chris@chris-wilson.co.uk
We can use the kernel's stack tracer and depot to record the allocation
site of every drm_mm user. Then on shutdown, as well as warning that
allocated nodes still reside with the drm_mm range manager, we can
display who allocated them to aide tracking down the leak.
v2: Move Kconfig around so it lies underneath the DRM options submenu.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161031090806.20073-1-chris@chris-wilson.co.uk
In addition to the last-in/first-out stack for accessing drm_mm nodes,
we occasionally and in the future often want to find a drm_mm_node by an
address. To do so efficiently we need to track the nodes in an interval
tree - lookups for a particular address will then be O(lg(N)), where N
is the number of nodes in the range manager as opposed to O(N).
Insertion however gains an extra O(lg(N)) step for all nodes
irrespective of whether the interval tree is in use. For future i915
patches, eliminating the linear walk is a significant improvement.
v2: Use generic interval-tree template for u64 and faster insertion.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: David Herrmann <dh.herrmann@gmail.com>
Cc: dri-devel@lists.freedesktop.org
Reviewed-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470236651-678-1-git-send-email-chris@chris-wilson.co.uk
To make the intention clearer, use list_next_entry instead of list_entry.
Signed-off-by: Geliang Tang <geliangtang@163.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
When backwards is 0, __drm_mm_for_each_hole is same as
drm_mm_for_each_hole. So I rewrite drm_mm_for_each_hole
by using __drm_mm_for_each_hole.
Signed-off-by: Geliang Tang <geliangtang@163.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The current implementation is limited by the number of addresses that
fit into an unsigned long. This causes problems on 32-bit Tegra where
unsigned long is 32-bit but drm_mm is used to manage an IOVA space of
4 GiB. Given the 32-bit limitation, the range is limited to 4 GiB - 1
(or 4 GiB - 4 KiB for page granularity).
This commit changes the start and size of the range to be an unsigned
64-bit integer, thus allowing much larger ranges to be supported.
[airlied: fix i915 warnings and coloring callback]
Signed-off-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Dave Airlie <airlied@redhat.com>
fixupo
Clients like i915 need to segregate cache domains within the GTT which
can lead to small amounts of fragmentation. By allocating the uncached
buffers from the bottom and the cacheable buffers from the top, we can
reduce the amount of wasted space and also optimize allocation of the
mappable portion of the GTT to only those buffers that require CPU
access through the GTT.
For other drivers, allocating small bos from one end and large ones
from the other helps improve the quality of fragmentation.
Based on drm_mm work by Chris Wilson.
v3: Changed to use a TTM placement flag
v2: Updated kerneldoc
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Christian König <deathsimple@vodafone.de>
Signed-off-by: Lauri Kasanen <cand@gmx.com>
Signed-off-by: David Airlie <airlied@redhat.com>
While at it do a tiny bit of interface cleanup and convert boolean
return values to bool. With this patch all exported functions and inline
helpers which are part of the drm_mm public interface are documented.
Also drop superflous extern function modifiers since most of drm_mm.h
doesn't use them - more consistent that way.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We used to pre-allocate drm_mm nodes and save them in a linked list for
later usage so we always have spare ones in atomic contexts. However, this
is really racy if multiple threads are in an atomic context at the same
time and we don't have enough spare nodes. Moreover, all remaining users
run in user-context and just lock drm_mm with a spinlock. So we can easily
preallocate the node, take the spinlock and insert the node.
This may have worked well with BKL in place, however, with today's
infrastructure it really doesn't make any sense. Besides, most users can
easily embed drm_mm_node into their objects so no allocation is needed at
all.
Thus, remove the old pre-alloc API and all the helpers that it provides.
Drivers have already been converted and we should not use the old API for
new code, anymore.
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Add a "best_match" flag similar to the drm_mm_search_*() helpers so we
can convert TTM to use them in follow up patches. We can also inline the
non-generic helpers and move them into the header to allow compile-time
optimizations.
To make calls to drm_mm_{search,insert}_node() more readable, this
converts the boolean argument to a flagset. There are pending patches that
add additional flags for top-down allocators and more.
v2:
- use flag parameter instead of boolean "best_match"
- convert *_search_free() helpers to also use flags argument
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We need BUG_ON(), spinlock_t and standard kernel data-types so include the
right headers.
Subject: [drm-intel:drm-intel-nightly 154/166] include/drm/drm_mm.h:67:2:
error: unknown type name 'spinlock_t'
Message-ID: <51f14693.g5HGdcuw2v3m8FOd%fengguang.wu@intel.com>
In case it didn't link to it correctly. Somehow this bug doesn't occur
here on my machine, hmm. But I think fixing drm_mm.h is better than
changing the include-order in drm_vma_manager.h, so this is what I
did.
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Highlights:
- follow-up refactoring after the shared dpll rework that landed in 3.11
- oddball prep cleanups from Ben for ppgtt
- encoder->get_config state tracking infrastructure from Jesse
- used by the experimental fastboot support from Jesse (disabled by
default)
- make the error state file official and add it to our sysfs interface
(Mika)
- drm_mm prep changes from Ben, prepares to embedd the drm_mm_node (which
will be used by the vma rework later on)
- interrupt handling rework, follow up cleanups to the VECS enabling, hpd
storm handling and fifo underrun reporting.
- Big pile of smaller cleanups, code improvements and related stuff.
* tag 'drm-intel-next-2013-07-12' of git://people.freedesktop.org/~danvet/drm-intel: (72 commits)
drm/i915: clear DPLL reg when disabling i9xx dplls
drm/i915: Fix up cpt pixel multiplier enable sequence
drm/i915: clean up vlv ->pre_pll_enable and pll enable sequence
drm/i915: move error state to own compilation unit
drm/i915: Don't attempt to read an unitialized stack value
drm/i915: Use for_each_pipe() when possible
drm/i915: don't enable PM_VEBOX_CS_ERROR_INTERRUPT
drm/i915: unify ring irq refcounts (again)
drm/i915: kill dev_priv->rps.lock
drm/i915: queue work outside spinlock in hsw_pm_irq_handler
drm/i915: streamline hsw_pm_irq_handler
drm/i915: irq handlers don't need interrupt-safe spinlocks
drm/i915: kill lpt pch transcoder->crtc mapping code for fifo underruns
drm/i915: improve GEN7_ERR_INT clearing for fifo underrun reporting
drm/i915: improve SERR_INT clearing for fifo underrun reporting
drm/i915: extract ibx_display_interrupt_update
drm/i915: remove unused members from drm_i915_private
drm/i915: don't frob mm.suspended when not using ums
drm/i915: Fix VLV DP RBR/HDMI/DAC PLL LPF coefficients
drm/i915: WARN if the bios reserved range is bigger than stolen size
...
Conflicts:
drivers/gpu/drm/i915/i915_gem.c
With the previous patch we no longer actually create a node, we simply
find the correct hole and occupy it. This very well could have been
squashed with the last patch, but since I already had David's review, I
figured it's easiest to keep it distinct.
Also update the users in i915. Conveniently this is the only user of the
interface.
CC: David Airlie <airlied@linux.ie>
CC: <dri-devel@lists.freedesktop.org>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Acked-by: David Airlie <airlied@linux.ie>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
For an upcoming patch where we introduce the i915 VMA, it's ideal to
have the drm_mm_node as part of the VMA struct (ie. it's pre-allocated).
Part of the conversion to VMAs is to kill off obj->gtt_space. Doing this
will break a bunch of code, but amongst them are 2 callers of
drm_mm_create_block(), both related to stolen memory.
It also allows us to embed the drm_mm_node into the object currently
which provides a nice transition over to the new code.
v2: Reordered to do before ripping out obj->gtt_offset.
Some minor cleanups made available because of reordering.
v3: s/continue/break on failed stolen node allocation (David)
Set obj->gtt_space on failed node allocation (David)
Only unref stolen (fix double free) on failed create_stolen (David)
Free node, and NULL it in failed create_stolen (David)
Add back accidentally removed newline (David)
CC: <dri-devel@lists.freedesktop.org>
Reviewed-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Acked-by: David Airlie <airlied@linux.ie>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
drm/i915 is the only user of the color allocation handling and
switched to insert_node a while ago. So we can ditch this.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Dave Airlie <airlied@redhat.com>
There is no reason to return "int" as this function never fails.
Furthermore, several drivers (ast, sis) already depend on this.
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The aim of this locking rework is that ioctls which a compositor should be
might call for every frame (set_cursor, page_flip, addfb, rmfb and
getfb/create_handle) should not be able to block on kms background
activities like output detection. And since each EDID read takes about
25ms (in the best case), that always means we'll drop at least one frame.
The solution is to add per-crtc locking for these ioctls, and restrict
background activities to only use the global lock. Change-the-world type
of events (modeset, dpms, ...) need to grab all locks.
Two tricky parts arose in the conversion:
- A lot of current code assumes that a kms fb object can't disappear while
holding the global lock, since the current code serializes fb
destruction with it. Hence proper lifetime management using the already
created refcounting for fbs need to be instantiated for all ioctls and
interfaces/users.
- The rmfb ioctl removes the to-be-deleted fb from all active users. But
unconditionally taking the global kms lock to do so introduces an
unacceptable potential stall point. And obviously changing the userspace
abi isn't on the table, either. Hence this conversion opportunistically
checks whether the rmfb ioctl holds the very last reference, which
guarantees that the fb isn't in active use on any crtc or plane (thanks
to the conversion to the new lifetime rules using proper refcounting).
Only if this is not the case will the code go through the slowpath and
grab all modeset locks. Sane compositors will never hit this path and so
avoid the stall, but userspace relying on these semantics will also not
break.
All these cases are exercised by the newly added subtests for the i-g-t
kms_flip, tested on a machine where a full detect cycle takes around 100
ms. It works, and no frames are dropped any more with these patches
applied. kms_flip also contains a special case to exercise the
above-describe rmfb slowpath.
* 'drm-kms-locking' of git://people.freedesktop.org/~danvet/drm-intel: (335 commits)
drm/fb_helper: check whether fbcon is bound
drm/doc: updates for new framebuffer lifetime rules
drm: don't hold crtc mutexes for connector ->detect callbacks
drm: only grab the crtc lock for pageflips
drm: optimize drm_framebuffer_remove
drm/vmwgfx: add proper framebuffer refcounting
drm/i915: dump refcount into framebuffer debugfs file
drm: refcounting for crtc framebuffers
drm: refcounting for sprite framebuffers
drm: fb refcounting for dirtyfb_ioctl
drm: don't take modeset locks in getfb ioctl
drm: push modeset_lock_all into ->fb_create driver callbacks
drm: nest modeset locks within fpriv->fbs_lock
drm: reference framebuffers which are on the idr
drm: revamp framebuffer cleanup interfaces
drm: create drm_framebuffer_lookup
drm: revamp locking around fb creation/destruction
drm: only take the crtc lock for ->cursor_move
drm: only take the crtc lock for ->cursor_set
drm: add per-crtc locks
...
Daniel writes:
- seqno wrap fixes and debug infrastructure from Mika Kuoppala and Chris
Wilson
- some leftover kill-agp on gen6+ patches from Ben
- hotplug improvements from Damien
- clear fb when allocated from stolen, avoids dirt on the fbcon (Chris)
- Stolen mem support from Chris Wilson, one of the many steps to get to
real fastboot support.
- Some DDI code cleanups from Paulo.
- Some refactorings around lvds and dp code.
- some random little bits&pieces
* tag 'drm-intel-next-2012-12-21' of git://people.freedesktop.org/~danvet/drm-intel: (93 commits)
drm/i915: Return the real error code from intel_set_mode()
drm/i915: Make GSM void
drm/i915: Move GSM mapping into dev_priv
drm/i915: Move even more gtt code to i915_gem_gtt
drm/i915: Make next_seqno debugs entry to use i915_gem_set_seqno
drm/i915: Introduce i915_gem_set_seqno()
drm/i915: Always clear semaphore mboxes on seqno wrap
drm/i915: Initialize hardware semaphore state on ring init
drm/i915: Introduce ring set_seqno
drm/i915: Missed conversion to gtt_pte_t
drm/i915: Bug on unsupported swizzled platforms
drm/i915: BUG() if fences are used on unsupported platform
drm/i915: fixup overlay stolen memory leak
drm/i915: clean up PIPECONF bpc #defines
drm/i915: add intel_dp_set_signal_levels
drm/i915: remove leftover display.update_wm assignment
drm/i915: check for the PCH when setting pch_transcoder
drm/i915: Clear the stolen fb before enabling
drm/i915: Access to snooped system memory through the GTT is incoherent
drm/i915: Remove stale comment about intel_dp_detect()
...
Conflicts:
drivers/gpu/drm/i915/intel_display.c
Avoid clobbering adjacent blocks if they happen to expire earlier and
amalgamate together to form the requested hole.
In passing this fixes a regression from
commit ea7b1dd448
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Fri Feb 18 17:59:12 2011 +0100
drm: mm: track free areas implicitly
which swaps the end address for size (with a potential overflow) and
effectively causes the eviction code to clobber almost all earlier
buffers above the evictee.
v2: Check the original hole not the adjusted as the coloring may confuse
us when later searching for the overlapping nodes. Also make sure that
we do apply the range restriction and color adjustment in the same
order for both scanning, searching and insertion.
v3: Send the version that was actually tested.
Note that this seems to be ducttape of decent quality ot paper over
some of our unbind related gpu hangs reported since 3.7. It is not
fully effective though, and certainly doesn't fix the underlying bug.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
[danvet: Added note plus bugzilla link and tested-by.]
Cc: stable@vger.kernel.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55984
Tested-by: Norbert Preining <preining@logic.at>
Acked-by: Dave Airlie <airlied@gmail.com
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>