Makes the lookup query underneath the transaction smaller if there are a lot of images referenced directly by tag. We still must do the direct referenced check within the transaction, but this should reduce the scope of the search space a bit.
This prevents us from creating a massive join when there are a large number of tags in the repository, which can result in locking the entire DB for long periods of time. Instead of the join, we just iteratively lookup any images found to be directly referenced by a tag or found as a parent of another image, both of which should be indexed lookups. Once done, we only remove those images and then iterate until the working set stops changing.
We remove the directly referenced images from the join across ancestors, as they will be covered by the first part of the union clause. For some large repositories, this will result in a significantly reduced set of images that have to be joined NxM.
* Add flag to enable trust per repo
* Add api for enabling/disabling trust
* Add new LogEntryKind for changing repo trust settings
Also add tests for repo trust api
* Add `set_trust` method to repository
* Expose new logkind to UI
* Fix registry tests
* Rebase migrations and regen test.db
* Raise downstreamissue if trust metadata can't be removed
* Refactor change_repo_trust
* Add show_if to change_repo_trust endpoint
This will be used in a followup PR to order search results instead of the RAC join. Currently, the join with the RAC table in search results in a lookup of ~600K rows, which causes searching to take ~6s. This PR denormalizes the data we need, as well as allowing us to score based on a wider band (6 months vs the current 1 week).
Previous to this change, repositories were looked up unfiltered in six different queries, and then filtered using the permissions model, which issued a query per repository found, making search incredibly slow. Instead, we now lookup a chunk of repositories unfiltered and then filter them via a single query to the database. By layering the filtering on top of the lookup, each as queries, we can minimize the number of queries necessary, without (at the same time) using a super expensive join.
Other changes:
- Remove the 5 page pre-lookup on V1 search and simply return that there is one more page available, until there isn't. While technically not correct, it is much more efficient, and no one should be using pagination with V1 search anyway.
- Remove the lookup for repos without entries in the RAC table. Instead, we now add a new RAC entry when the repository is created for *the day before*, with count 0, so that it is immediately searchable
- Remove lookup of results with a matching namespace; these aren't very relevant anyway, and it overly complicates sorting
By calling `visibility` instead of `visibility_id`, peewee was issuing a SQL Select statement for the repository, which removes the benefit of the optimization
Add support to GC to invoke a callback with the image+storages removed. Only images whose storage was also removed will be sent to the callback. This will be used by security scanning for its own GC in the followup change.