Commit graph

388 commits

Author SHA1 Message Date
Joseph Schorr
8bc55a5676 Make namespace deletion asynchronous
Instead of deleting a namespace synchronously as before, we now mark the namespace for deletion, disable it, and rename it. A worker then comes along and deletes the namespace in the background. This results in a *significantly* better user experience, as the namespace deletion operation now "completes" in under a second, where before it could take 10s of minutes at the worse.

Fixes https://jira.coreos.com/browse/QUAY-838
2018-02-27 13:12:51 -05:00
Joseph Schorr
d45161b120 Add a worker to automatically GC expired app specific tokens
Fixes https://jira.coreos.com/browse/QUAY-822
2018-02-12 14:56:01 -05:00
Joseph Schorr
bbdf9e074c Add metrics for tracking when instance key renewal succeeds and fails, as well as when instance key *lookup* fails 2018-02-02 11:14:42 -05:00
Joseph Schorr
05b4a7d457 Add worker to update ipresolver data files every few hours 2017-09-28 14:40:59 -04:00
Joseph Schorr
8a4d583f65 Disable default debug logs for workers
https://coreosdev.atlassian.net/browse/QUAY-771
2017-08-24 14:25:51 -04:00
Joseph Schorr
74e8bc296e Fix bug in service key rotation and fix associated flaky test
We were using `datetime.now` in both the rotation code and the test, but the model uses `utcnow`.
2017-07-28 14:20:11 -04:00
Joseph Schorr
e7d6e60d97 Update for merge and make additional interface improvements 2017-07-25 17:00:08 -04:00
Joseph Schorr
543cba352b Add end-to-end notification worker tests for all notification methods 2017-07-25 17:00:07 -04:00
Joseph Schorr
ce56031846 Move notifications into its own package 2017-07-25 17:00:06 -04:00
josephschorr
78652de3ee Merge pull request #2766 from coreos-inc/joseph.schorr/QUAY-634/buildlogsarchiver-data-interface
Change buildlogsarchiver to use a data model interface
2017-07-19 16:40:05 -04:00
josephschorr
9bd4cee029 Merge pull request #2765 from coreos-inc/joseph.schorr/QUAY-629/globalprom-data-interface
Switch globalpromstats worker to use a data interface
2017-07-19 16:39:36 -04:00
Joseph Schorr
89fad7568d Fix temp fix PR for notifications 2017-07-16 12:09:30 +03:00
Joseph Schorr
026a0d28df Temporary fix for empty event config JSON 2017-07-15 14:01:27 +03:00
Joseph Schorr
bf3e941d7f Fix notification system to use the new tuple correctly 2017-07-13 22:43:26 +03:00
josephschorr
fdb21aa5dc Merge pull request #2777 from coreos-inc/joseph.schorr/QUAY-618/notificationworker-data-interface
Change notificationworker to use data interface
2017-07-13 00:23:15 +03:00
josephschorr
2206c81a95 Merge pull request #2776 from coreos-inc/joseph.schorr/QUAY-652/servicekeyworker-data-interface
Change service key worker to use a data interface
2017-07-13 00:22:49 +03:00
Joseph Schorr
4ed73d247b yapf 2017-07-12 18:11:51 +03:00
Joseph Schorr
b6f1782642 Change notificationworker to use a data interface 2017-07-12 17:40:45 +03:00
Joseph Schorr
8ec198228c Change notificationworker test to pytest 2017-07-12 17:35:09 +03:00
Joseph Schorr
50c2f1fde8 Move notification worker test under its own package 2017-07-12 17:35:09 +03:00
Joseph Schorr
fbfd78532c Move notification worker to its own package 2017-07-12 17:35:09 +03:00
Joseph Schorr
932db23a5c Change servicekeyworker to use a data interface 2017-07-12 16:37:32 +03:00
Joseph Schorr
0afc222214 Add basic unit test for the servicekeyworker 2017-07-12 16:19:30 +03:00
Joseph Schorr
3b496e2759 Move serverkeyworker into its own package 2017-07-12 15:57:02 +03:00
Joseph Schorr
e2cf2d6f2b Move teamsyncworker into its own package 2017-07-12 15:53:01 +03:00
josephschorr
dc6c6b30fc Merge pull request #2768 from coreos-inc/joseph.schorr/QUAY-653/blobuploadcleanupworker-data-interface
Change blobuploadcleanupworker to use a data interface
2017-07-12 00:32:09 +03:00
josephschorr
96d1fd128d Merge pull request #2757 from coreos-inc/joseph.schorr/QUAY-606/logarchive-georep
Add support for QE customers to enable log rotation
2017-07-12 00:30:04 +03:00
Joseph Schorr
8ded8f573d yapf 2017-07-11 16:58:30 +03:00
Joseph Schorr
bdab367285 Change blobuploadcleanupworker to use a data interface 2017-07-11 16:58:09 +03:00
Joseph Schorr
b2053829f9 Add a basic test for blob upload cleanup 2017-07-11 16:35:10 +03:00
Joseph Schorr
b87415129f Move blobuploadcleanupworker into its own package 2017-07-11 15:38:10 +03:00
Joseph Schorr
c7f1944bd5 yapf 2017-07-11 15:33:48 +03:00
Joseph Schorr
8ba71f7a45 Change buildlogsarchiver to use a data interface 2017-07-11 15:33:28 +03:00
Joseph Schorr
b7a2a4390b Add a basic build logs archiver test 2017-07-11 15:12:34 +03:00
Joseph Schorr
22f088d90a Move buildlogsarchiver worker to its own package 2017-07-11 14:42:18 +03:00
Joseph Schorr
8e179cb865 Switch globalpromstats worker to use a data interface 2017-07-11 14:01:07 +03:00
Joseph Schorr
0629a13da2 Add very basic test for stats worker 2017-07-11 13:57:20 +03:00
Joseph Schorr
13922fd194 Remove unused imports 2017-07-11 13:52:35 +03:00
Joseph Schorr
265520d071 Move globalpromstats worker into its own package 2017-07-11 13:52:15 +03:00
EvB
6e2fad2b9c refactor(queueworker): remove unused function and import
Remove `_close_db_handle` method from `QueueWorker` class. Nowhere
calls this method, so it is safe to remove. This function was the
only place using the `db` imported from `data.model`, so we can
remove that import as well.

Testing: need to look into it
2017-07-10 10:49:39 -04:00
Joseph Schorr
fa21e42ffb Add default location for action log archiving
Prevents the logs from being written to the preferred storage, which means they will jump around
2017-07-10 12:37:44 +03:00
josephschorr
a96555511b Merge pull request #2743 from coreos-inc/joseph.schorr/QUAY-663/gcworker-interface
Change GC worker to use new data interface style
2017-06-29 20:54:04 +03:00
Joseph Schorr
27ed3bedcc yapf 2017-06-29 09:43:04 +03:00
Joseph Schorr
138881dab8 yapf format 2017-06-29 09:40:39 +03:00
Joseph Schorr
76c9339453 Rename GC worker package to gc 2017-06-29 09:37:32 +03:00
Joseph Schorr
420a5e5a3a Change GC worker to use data interface 2017-06-28 15:13:11 +03:00
Joseph Schorr
38f1752a2d Move gcworker into its own package 2017-06-28 15:04:10 +03:00
Joseph Schorr
1ddb09ac11 Change security worker to use data interface 2017-06-28 14:50:52 +03:00
Joseph Schorr
ec81148d73 Add super basic security worker test 2017-06-28 14:03:57 +03:00
Joseph Schorr
7b72cf8b27 Small fix for georeplication and add better logs
Previously, if we attempted to georeplicate storage from the existing location and, somehow, that existing location did not exist, we'd still mark the new location as invalid. This is a major problem for storage engines that are not consistent. Now, we first try a back off strategy to find the image in the existing storage and, as well, if the replication fails in any way, we log it.
2017-06-23 17:07:05 -04:00
Antoine Legrand
f0dd2e348b Merge pull request #2551 from coreos-inc/structured-logs
Add log formatter class
2017-06-07 08:22:18 -07:00
Antoine Legrand
3c99928a27 Add log JSON formatter 2017-06-07 00:02:52 +02:00
Kenny Lee Sin Cheong
ad1a0e0840 logger.exception dumps a stack trace by default 2017-06-02 17:21:40 -04:00
Kenny Lee Sin Cheong
3302a96f88 Log the APIRequestFailure at ERROR level 2017-06-02 14:49:50 -04:00
Kenny Lee Sin Cheong
b5f8e7e24d Returning from the method instead of calling sleep
Simply returning from the method will give DEFAULT_INDEXING_INTERVAL seconds
before the next scan operation.
2017-06-02 12:28:17 -04:00
Kenny Lee Sin Cheong
203c0b76e0 Raise an APIRequestFailure exception when security scanner is unavailable
Put worker to sleep for the duration of the default indexing interval
when an APIRequestFailure occurs, when the API request fails due to a
connection error, timeout, or other ambiguous errors, from
analyze_layer or get_layer_data .
2017-05-24 11:04:44 -04:00
Charlton Austin
4dbd1e2eca fix(notification_worker): added in correct exception catching
before we were not catching the correct exception

[TESTING -> locally using docker]

Issue: https://www.pivotaltracker.com/story/show/144646649

- [ ] It works!
- [ ] Comments provide sufficient explanations for the next contributor
- [ ] Tests cover changes and corner cases
- [ ] Follows Quay syntax patterns and format
2017-05-22 11:32:28 -04:00
Charlton Austin
993f2a174c feat(full-stack): disable notifications after 3 failures
This stops notifications from firing over and over again if they are repeatedly failing.

[TESTING -> locally with docker compose, DATABASE MIGRATION -> there is a single migration]

Issue: https://www.pivotaltracker.com/story/show/b144646649n

- [ ] It works!
- [ ] Comments provide sufficient explanations for the next contributor
- [ ] Tests cover changes and corner cases
- [ ] Follows Quay syntax patterns and format
2017-05-19 16:58:46 -04:00
Charlton Austin
b40ad361db style(workers): add in line
there should be two lines between functions and other code

Issue: https://www.pivotaltracker.com/story/show/b144646649n

- [ ] It works!
- [ ] Comments provide sufficient explanations for the next contributor
- [ ] Tests cover changes and corner cases
- [ ] Follows Quay syntax patterns and format
2017-05-19 16:58:22 -04:00
josephschorr
8b148bf1d4 Merge pull request #2576 from coreos-inc/full-db-tests-tox
Reenable full database testing locally and in concourse
2017-04-27 18:09:15 -04:00
Joseph Schorr
cc09e8738e Remove extra whitespace 2017-04-24 17:04:09 -04:00
Joseph Schorr
7debd44b54 Switch fixture imports to wildcard in prep for full db test fixes 2017-04-24 16:45:14 -04:00
Jake Moshenko
a159bd3e77 Resolve race condition between multiple log archivers 2017-04-24 13:41:08 -04:00
Joseph Schorr
80693d6b8c Fix NPE bug in RAC worker
We need to return `None`, not `0` if there are no additional repositories to measure
2017-04-11 15:42:11 -04:00
Joseph Schorr
df3f47c79a Add a RepositorySearchScore table and calculation to the RAC worker
This will be used in a followup PR to order search results instead of the RAC join. Currently, the join with the RAC table in search results in a lookup of ~600K rows, which causes searching to take ~6s. This PR denormalizes the data we need, as well as allowing us to score based on a wider band (6 months vs the current 1 week).
2017-04-10 14:29:02 -04:00
Joseph Schorr
04225f2d25 Add feature flag for team syncing 2017-04-03 11:31:29 -04:00
Joseph Schorr
938730c076 Move sync team into its own module and add tests 2017-04-03 11:31:29 -04:00
Joseph Schorr
eeadeb9383 Initial interfaces and support for team syncing worker 2017-04-03 11:31:29 -04:00
Joseph Schorr
b05ebbf2c0 Have storage replication wait up to 20 minutes before trying again
Copying a file can be a long operation, so make this configurable and far above the default 5 minutes
2017-03-21 16:58:36 -04:00
Antoine Legrand
ec847ce613 Switch from expire to delete redis log_entries 2017-03-17 15:35:47 +01:00
Joseph Schorr
e25c989fef Add a cleanup worker for blob uploads 2017-03-16 13:36:59 -04:00
Jimmy Zelinskie
c6f6204630 workers.securityworker: small fixes
This change adjusts our batch size to coerce to integer after all
floating point math in order to get a more accurate end result. In
addition, we handle the scenario when there are no longer any images in
the database to be scanned when finding the min id.
2017-03-13 18:22:35 -04:00
Jimmy Zelinskie
a780136337 workers.securityworker: revert to image querying 2017-03-10 17:37:40 -05:00
Jimmy Zelinskie
40636d4103 find work based on tag IDs rather than image IDs 2017-03-06 17:09:57 -05:00
Jimmy Zelinskie
904b902295 workers.securityworker: find eligible tag images 2017-03-06 14:37:34 -05:00
Jimmy Zelinskie
b9ac2b7b3b workers.securityworker: simplify min id 2017-03-03 14:51:18 -05:00
Jimmy Zelinskie
4ed0cdda14 securityscanner: add a min image id option
This will enable us to force some instances of the securityworker to
scan only new images.
2017-03-03 13:55:25 -05:00
Jake Moshenko
de7a5c9959 Make the security scanning worker period configurable 2017-02-27 15:02:29 -05:00
Joseph Schorr
407341fe96 Remove images count (which is horribly slow in InnoDB) and add a max gauge 2017-02-23 17:37:28 -05:00
Jake Moshenko
27f5f14f90 Linter fixes 2017-02-22 11:45:38 -05:00
Jake Moshenko
add6b654ae Move the total image count stat back to the prom stat worker 2017-02-22 11:45:38 -05:00
Jake Moshenko
b03e03c389 Read the number of unscanned clair images from the block allocator 2017-02-21 19:13:51 -05:00
Joseph Schorr
5b3212ea0e Change security notification code to use the new stream diff reporters
This ensures that even if security scanner pagination sends Old and New layer IDs on different pages, they will properly be handled across the entire notification.

Fixes https://www.pivotaltracker.com/story/show/136133657
2016-12-20 12:50:19 -05:00
Joseph Schorr
405eca074c Security scanner flow changes and auto-retry
Changes the security scanner code to raise exceptions now for non-successful operations. One of the new exceptions raised is MissingParentLayerException, which, when raised, will cause the security worker to perform a full rescan of all parent images for the current layer, before trying once more to scan the current layer. This should allow the system to be "self-healing" in the case where the security scanner engine somehow loses or corrupts a parent layer.
2016-12-16 15:38:09 -05:00
Joseph Schorr
15041ac5ed Add a fake security scanner class for easier testing
The FakeSecurityScanner mocks out all calls that Quay is expected to make to the security scanner API, and returns faked data that can be adjusted by the calling test case
2016-12-14 17:11:45 -05:00
Charlton Austin
9e25fde3a0 Fixing api usage. 2016-12-07 12:53:07 -05:00
Jimmy Zelinskie
3a7119d499 Merge pull request #2209 from coreos-inc/clair-notification-read
Clair notification read and queue fixes
2016-12-05 19:36:59 -05:00
Joseph Schorr
9f0ce7c634 Have the security worker remove failed notifications from Clair 2016-12-05 19:08:52 -05:00
Jake Moshenko
c263772703 Do not extend processing immediately after taking queue item. 2016-12-05 18:12:14 -05:00
Jake Moshenko
709edd7eb6 Reduce the update period on queue worker metrics. 2016-12-05 18:12:14 -05:00
Quentin Machu
b990a27d50 Increase limit in securitynotificationworker
With https://github.com/coreos/clair/pull/278 and https://github.com/coreos/clair/pull/279, performance of this API call has increased. It has been observed that querying 100 or 1000 layers page doesn't noticeably change the execution time. Therefore, doing significantly less calls will reduce the overall processing time for each notification.
2016-12-04 13:39:34 +01:00
Charlton Austin
7b3d8e3977 Merge pull request #2183 from charltonaustin/metrics_for_unscanned_images
Adding in some metrics around clair sec scan.
2016-12-02 11:50:29 -05:00
Charlton Austin
edd9dcd7f6 Adding in some metrics around clair sec scan. 2016-12-01 16:50:02 -05:00
Joseph Schorr
e6ee538e15 Fix full database test script to not fail randomly
- Switches database schema creation to alembic, which solves the MySQL issue (and makes sure we test migrations as well)
- Adds a few time.sleep(1) to work around MySQL's second-precision issue when adding items to queues and then immediately retrieving them
- Disables the storage proxy tests when running against non-SQLite databases, as it causes failures with the multiple process and multiple transactions
- Changes initdb to support only populating the database, as well as fixing a few small items around the test data when working with non-SQLite data
2016-11-30 18:24:08 -05:00
Joseph Schorr
e29cb34336 Fix Set calls to gauges
Fixes #2150

The proper function is `Set` (not `set`), which was causing these gauges to not report to Prometheus
2016-11-21 15:27:17 -05:00
Joseph Schorr
5f99448adc Add a chunk cleanup queue for async GC of empty chunks
Instead of having the Swift storage engine try to delete the empty chunk(s) synchronously, we simply queue them and have a worker come along after 30s to delete the empty chunks. This has a few key benefits: it is async (doesn't slow down the push code), helps deal with Swift's eventual consistency (less retries necessary) and is generic for other storage engines if/when they need this as well
2016-11-15 15:07:41 -05:00
Jimmy Zelinskie
8b9f9478a4 pylint formatting 2016-10-28 17:12:46 -04:00
Jimmy Zelinskie
a30b358709 add staggered worker startup
Fixes #787
2016-10-28 17:12:39 -04:00
Jimmy Zelinskie
2bd1e76267 workers.queuecleanup: s/week/day cleanup frequency 2016-10-20 13:47:07 -04:00
Jimmy Zelinskie
20ef43d5fb workers.queuecleanup: remove direct peewee usage 2016-10-20 13:46:00 -04:00