We were occasionally trying to compute schema 2 version 1 signatures on the *unicode* representation, which was failing the signature check. This PR adds a new wrapper type called `Bytes`, which all manifests must take in, and which handles the unicodes vs encoded utf-8 stuff in a central location. This PR also adds a test for the manifest that was breaking in production.
This will allow customers to request their usage logs for a repository or an entire namespace, and we can export the logs in a manner that doesn't absolutely destroy the database, with every step along the way timed.
TagManifests can (apparently, in very rare scenarios) share manifests with the exact same digests, so we need to support that case in the backfill worker. We also need to remove a unique constraint on the manifest column in the mapping table to support this case.
After discussion, we decided the best solution for the missing content checksum problem was to lookup the proper blobs in the repository and, if not present, mark the manifest as broken, as this would reflect the actual issue the user faces if they pull the repository tag today via V2
This will be used in a followup PR to order search results instead of the RAC join. Currently, the join with the RAC table in search results in a lookup of ~600K rows, which causes searching to take ~6s. This PR denormalizes the data we need, as well as allowing us to score based on a wider band (6 months vs the current 1 week).