Before this change, the queue code would check that none of the fields on the item to be claimed had changed between the time when the item was selected and the item is claimed. While this is a safe approach, it also causes quite a bit of lock contention in MySQL, because InnoDB will take a lock on *any* rows examined by the `where` clause of the `update`, even if they will ultimately thrown out due to other clauses (See: http://dev.mysql.com/doc/refman/5.7/en/innodb-locks-set.html: "A ..., an UPDATE, ... generally set record locks on every index record that is scanned in the processing of the SQL statement. It does not matter whether there are WHERE conditions in the statement that would exclude the row. InnoDB does not remember the exact WHERE condition, but only knows which index ranges were scanned").
As a result, we want to minimize the number of fields accessed in the `where` clause on an update to the QueueItem row. To do so, we introduce a new `state_id` column, which is updated on *every change* to the QueueItem rows with a unique, random value. We can then have the queue item claiming code simply check that the `state_id` column has not changed between the retrieval and claiming steps. This minimizes the number of columns being checked to two (`id` and `state_id`), and thus, should significantly reduce lock contention. Note that we can not (yet) reduce to just a single `state_id` column (which should work in theory), because we need to maintain backwards compatibility with existing items in the QueueItem table, which will be given empty `state_id` values when the migration in this change runs.
Also adds a number of tests for other queue operations that we want to make sure operate correctly following this change.
[Delivers #133632501]
If the feature is enabled and recaptcha keys are given in config, then a recaptcha box is displayed in the UI when creating a user and a recaptcha response code *must* be sent with the create API call for it to succeed.
Add support to GC to invoke a callback with the image+storages removed. Only images whose storage was also removed will be sent to the callback. This will be used by security scanning for its own GC in the followup change.
This ensures that even if security scanner pagination sends Old and New layer IDs on different pages, they will properly be handled across the entire notification.
Fixes https://www.pivotaltracker.com/story/show/136133657
Changes the security scanner code to raise exceptions now for non-successful operations. One of the new exceptions raised is MissingParentLayerException, which, when raised, will cause the security worker to perform a full rescan of all parent images for the current layer, before trying once more to scan the current layer. This should allow the system to be "self-healing" in the case where the security scanner engine somehow loses or corrupts a parent layer.
Currently, if a user tries to confirm an invite sent to them on an account with a mismatching email address, we simply redirect to the org (where they get a 403). This change ensures they get the proper error response message, and restyles the error page to be nicer.
Fixes#2227
Fixes https://www.pivotaltracker.com/story/show/136088507
The FakeSecurityScanner mocks out all calls that Quay is expected to make to the security scanner API, and returns faked data that can be adjusted by the calling test case
Following this change, anytime a layer is indexed by the security scanner, we only send notifications out if the layer previously had a security_indexed_engine value of `-1`, thus ensuring it has *never* been indexed previously. This will allow us to change to version of the security scanner upwards, and have all the images be re-indexed, without firing off notifications in a spammy manner.
Adds the missing field on the query_user calls, updates the external auth tests to ensure it is returned properly, and adds new end-to-end tests which call the external auth engines via the *API*, to ensure this doesn't break again
- Switches database schema creation to alembic, which solves the MySQL issue (and makes sure we test migrations as well)
- Adds a few time.sleep(1) to work around MySQL's second-precision issue when adding items to queues and then immediately retrieving them
- Disables the storage proxy tests when running against non-SQLite databases, as it causes failures with the multiple process and multiple transactions
- Changes initdb to support only populating the database, as well as fixing a few small items around the test data when working with non-SQLite data
Note that the test suite doesn't fully verify that each validation succeeds; rather, it ensures that the proper system (storage, security scanning, etc) is called with the configuration and returns at all (usually with an expected error). This should prevent us from forgetting to update these code paths when we change config-based systems. Longer term, we might want to have these tests stand up fake/mock versions of the endpoint services as well, for end-to-end testing.
Instead of having the Swift storage engine try to delete the empty chunk(s) synchronously, we simply queue them and have a worker come along after 30s to delete the empty chunks. This has a few key benefits: it is async (doesn't slow down the push code), helps deal with Swift's eventual consistency (less retries necessary) and is generic for other storage engines if/when they need this as well
When a user now logs in for the first time for any external auth (LDAP, JWT, Keystone, Github, Google, Dex), they will be presented with a confirmation screen that affords them the opportunity to change their Quay-assigned username.
Addresses most of the user issues around #74
Before this change, external auth such as Keystone would fail if a user without an email address tried to login, even if the email feature was disabled.
Amazon S3 does not allow for chunk sizes larger than 5 GB; we currently don't handle that case at all, which is why large uploads are failing. This change ensures that if a storage engine specifies a *maximum* chunk size, we write multiple chunks no larger than that size.
- Fixes a bug which allows for underscores at the beginning of namespaces: Fixes#1849
- Allows dots and dashes for newer Docker clients: Fixes#1188
- Has the UI display better messaging associated with namespace entry
Before this change, if you ended up writing a middle layer whose parent is not in the database, the manifest would fail to rewrite. We now just lookup the parent image in the manifest given to us, ignoring whether it is in the database or not (as it doesn't actually matter if not present; it'll be created if necessary).
This change fixes the build manager ephemeral executor to tell the overall build server to call set_phase when a build never starts. Before this change, we'd properly adjust the queue item, but not the repo build row or the logs, which is why users just saw "Preparing Build Node", with no indicating the node failed to start.
Fixes#1904
Gitlab sends multiple commits in the order reversed from Github. As this only broke recently, I suspect that they may have changed the ordering. This change makes the code order-agnostic to hopefully remove the problem going forward.
Fixes#1900
Gitlab will occasionally send trigger payloads with an empty commit list (and a null checkout_ha) for branches that have been deleted. Properly handle that case.
Also adds some tests to registry tests for V1 stuff.
Note: All *registry* tests currently pass, but as verbs are not yet converted, the verb tests in registry_tests.py currently fail.
Until now, once the heartbeat has expired, we would issue a TTL that is negative, which causes etcd to either raise an exception or simply ignore the expiration (depending on the version of etcd). This change ensures that once the key is expired, it is removed immediately via a set of a TTL of 0. Also adds tests for this case and the normal expiration case.
- Fixes various bugs introduced in the most recent build system commit
- Refactors state management in the build manager to be cleaner and more contained
- Adds back in the mock-based tests, fixed to not use threads and adjusted for the refactoring
- Adds some more simplified unit tests around non-etch related flows
MySQL's date time's appear to have a 1 second threshold, so we need to make sure the queue items added for the tests are available as soon as they are added. Before this change, the available_after was set to `datetime.utcnow()`, and, if the `get` was called within 1 second, then its check would fail.