Old V1 code would occasionally do so, which was never correct, but we never hit it because we didn't (practically) share storage rows before. Now that we explicitly do, we were occasionally de-checksum-ing storage rows on V1 pushes (which are extremely rare as-is). This change makes sure we don't do that, and makes sure we always set a proper digest and checksum on storage rows.
Addresses the slow query:
```
SELECT `candidates` . `id` FROM ( SELECT `t1` . `id` FROM `repositorybuild` AS `t1` WHERE ( ( ( `t1` . `phase` IN (...) ) OR ( `t1` . `started` < ? ) ) AND ( `t1` . `logs_archived` = ? ) ) LIMIT ? ) AS `candidates` ORDER BY `Rand` ( ) LIMIT ? OFFSET ?
```
While the cardinality on `logs_archived` will be low, it should also only be in the `false` state for a very small number of records, so it should make this query very, very fast.
Currently, we only have one (the shared empty layer), but this should make the blob lookups for repositories significantly faster, as we won't need to do the massive join.
- Implement logs model using Elasticsearch with tests
- Implement transition model using both elasticsearch and database model
- Add LOGS_MODEL configuration to choose which to use.
Co-authored-by: Sida Chen <sidchen@redhat.com>
Co-authored-by: Kenny Lee Sin Cheong <kenny.lee@redhat.com>
Schema version 1 manifests contain the tag name, and we have a check to ensure we don't point a tag at a manifest with the wrong name embedded. However, this also means that we cannot retarget to that manifest, which will break the UI once we get rid of legacy images.
This change means we can retarget to those manifests, and the OCI model does the work of rewriting the manifest when necessary.
Some customers are hitting this endpoint rapidly for repositories with many, many tags. This change drops the unnecessary joins, which should reduce database load somewhat.
- Adds the charset: utf-8 to all the manifest responses
- Makes sure we connect to MySQL in utf8mb4 mode, to ensure we can properly read and write 4-byte utf8 strings
- Adds tests for all of the above
We were occasionally trying to compute schema 2 version 1 signatures on the *unicode* representation, which was failing the signature check. This PR adds a new wrapper type called `Bytes`, which all manifests must take in, and which handles the unicodes vs encoded utf-8 stuff in a central location. This PR also adds a test for the manifest that was breaking in production.
This will allow customers to request their usage logs for a repository or an entire namespace, and we can export the logs in a manner that doesn't absolutely destroy the database, with every step along the way timed.