Antoine Legrand
f0dd2e348b
Merge pull request #2551 from coreos-inc/structured-logs
...
Add log formatter class
2017-06-07 08:22:18 -07:00
Antoine Legrand
3c99928a27
Add log JSON formatter
2017-06-07 00:02:52 +02:00
Kenny Lee Sin Cheong
ad1a0e0840
logger.exception dumps a stack trace by default
2017-06-02 17:21:40 -04:00
Kenny Lee Sin Cheong
3302a96f88
Log the APIRequestFailure at ERROR level
2017-06-02 14:49:50 -04:00
Kenny Lee Sin Cheong
b5f8e7e24d
Returning from the method instead of calling sleep
...
Simply returning from the method will give DEFAULT_INDEXING_INTERVAL seconds
before the next scan operation.
2017-06-02 12:28:17 -04:00
Kenny Lee Sin Cheong
203c0b76e0
Raise an APIRequestFailure exception when security scanner is unavailable
...
Put worker to sleep for the duration of the default indexing interval
when an APIRequestFailure occurs, when the API request fails due to a
connection error, timeout, or other ambiguous errors, from
analyze_layer or get_layer_data .
2017-05-24 11:04:44 -04:00
Charlton Austin
4dbd1e2eca
fix(notification_worker): added in correct exception catching
...
before we were not catching the correct exception
[TESTING -> locally using docker]
Issue: https://www.pivotaltracker.com/story/show/144646649
- [ ] It works!
- [ ] Comments provide sufficient explanations for the next contributor
- [ ] Tests cover changes and corner cases
- [ ] Follows Quay syntax patterns and format
2017-05-22 11:32:28 -04:00
Charlton Austin
993f2a174c
feat(full-stack): disable notifications after 3 failures
...
This stops notifications from firing over and over again if they are repeatedly failing.
[TESTING -> locally with docker compose, DATABASE MIGRATION -> there is a single migration]
Issue: https://www.pivotaltracker.com/story/show/b144646649n
- [ ] It works!
- [ ] Comments provide sufficient explanations for the next contributor
- [ ] Tests cover changes and corner cases
- [ ] Follows Quay syntax patterns and format
2017-05-19 16:58:46 -04:00
Charlton Austin
b40ad361db
style(workers): add in line
...
there should be two lines between functions and other code
Issue: https://www.pivotaltracker.com/story/show/b144646649n
- [ ] It works!
- [ ] Comments provide sufficient explanations for the next contributor
- [ ] Tests cover changes and corner cases
- [ ] Follows Quay syntax patterns and format
2017-05-19 16:58:22 -04:00
josephschorr
8b148bf1d4
Merge pull request #2576 from coreos-inc/full-db-tests-tox
...
Reenable full database testing locally and in concourse
2017-04-27 18:09:15 -04:00
Joseph Schorr
cc09e8738e
Remove extra whitespace
2017-04-24 17:04:09 -04:00
Joseph Schorr
7debd44b54
Switch fixture imports to wildcard in prep for full db test fixes
2017-04-24 16:45:14 -04:00
Jake Moshenko
a159bd3e77
Resolve race condition between multiple log archivers
2017-04-24 13:41:08 -04:00
Joseph Schorr
80693d6b8c
Fix NPE bug in RAC worker
...
We need to return `None`, not `0` if there are no additional repositories to measure
2017-04-11 15:42:11 -04:00
Joseph Schorr
df3f47c79a
Add a RepositorySearchScore table and calculation to the RAC worker
...
This will be used in a followup PR to order search results instead of the RAC join. Currently, the join with the RAC table in search results in a lookup of ~600K rows, which causes searching to take ~6s. This PR denormalizes the data we need, as well as allowing us to score based on a wider band (6 months vs the current 1 week).
2017-04-10 14:29:02 -04:00
Joseph Schorr
04225f2d25
Add feature flag for team syncing
2017-04-03 11:31:29 -04:00
Joseph Schorr
938730c076
Move sync team into its own module and add tests
2017-04-03 11:31:29 -04:00
Joseph Schorr
eeadeb9383
Initial interfaces and support for team syncing worker
2017-04-03 11:31:29 -04:00
Joseph Schorr
b05ebbf2c0
Have storage replication wait up to 20 minutes before trying again
...
Copying a file can be a long operation, so make this configurable and far above the default 5 minutes
2017-03-21 16:58:36 -04:00
Antoine Legrand
ec847ce613
Switch from expire to delete redis log_entries
2017-03-17 15:35:47 +01:00
Joseph Schorr
e25c989fef
Add a cleanup worker for blob uploads
2017-03-16 13:36:59 -04:00
Jimmy Zelinskie
c6f6204630
workers.securityworker: small fixes
...
This change adjusts our batch size to coerce to integer after all
floating point math in order to get a more accurate end result. In
addition, we handle the scenario when there are no longer any images in
the database to be scanned when finding the min id.
2017-03-13 18:22:35 -04:00
Jimmy Zelinskie
a780136337
workers.securityworker: revert to image querying
2017-03-10 17:37:40 -05:00
Jimmy Zelinskie
40636d4103
find work based on tag IDs rather than image IDs
2017-03-06 17:09:57 -05:00
Jimmy Zelinskie
904b902295
workers.securityworker: find eligible tag images
2017-03-06 14:37:34 -05:00
Jimmy Zelinskie
b9ac2b7b3b
workers.securityworker: simplify min id
2017-03-03 14:51:18 -05:00
Jimmy Zelinskie
4ed0cdda14
securityscanner: add a min image id option
...
This will enable us to force some instances of the securityworker to
scan only new images.
2017-03-03 13:55:25 -05:00
Jake Moshenko
de7a5c9959
Make the security scanning worker period configurable
2017-02-27 15:02:29 -05:00
Joseph Schorr
407341fe96
Remove images count (which is horribly slow in InnoDB) and add a max gauge
2017-02-23 17:37:28 -05:00
Jake Moshenko
27f5f14f90
Linter fixes
2017-02-22 11:45:38 -05:00
Jake Moshenko
add6b654ae
Move the total image count stat back to the prom stat worker
2017-02-22 11:45:38 -05:00
Jake Moshenko
b03e03c389
Read the number of unscanned clair images from the block allocator
2017-02-21 19:13:51 -05:00
Joseph Schorr
5b3212ea0e
Change security notification code to use the new stream diff reporters
...
This ensures that even if security scanner pagination sends Old and New layer IDs on different pages, they will properly be handled across the entire notification.
Fixes https://www.pivotaltracker.com/story/show/136133657
2016-12-20 12:50:19 -05:00
Joseph Schorr
405eca074c
Security scanner flow changes and auto-retry
...
Changes the security scanner code to raise exceptions now for non-successful operations. One of the new exceptions raised is MissingParentLayerException, which, when raised, will cause the security worker to perform a full rescan of all parent images for the current layer, before trying once more to scan the current layer. This should allow the system to be "self-healing" in the case where the security scanner engine somehow loses or corrupts a parent layer.
2016-12-16 15:38:09 -05:00
Joseph Schorr
15041ac5ed
Add a fake security scanner class for easier testing
...
The FakeSecurityScanner mocks out all calls that Quay is expected to make to the security scanner API, and returns faked data that can be adjusted by the calling test case
2016-12-14 17:11:45 -05:00
Charlton Austin
9e25fde3a0
Fixing api usage.
2016-12-07 12:53:07 -05:00
Jimmy Zelinskie
3a7119d499
Merge pull request #2209 from coreos-inc/clair-notification-read
...
Clair notification read and queue fixes
2016-12-05 19:36:59 -05:00
Joseph Schorr
9f0ce7c634
Have the security worker remove failed notifications from Clair
2016-12-05 19:08:52 -05:00
Jake Moshenko
c263772703
Do not extend processing immediately after taking queue item.
2016-12-05 18:12:14 -05:00
Jake Moshenko
709edd7eb6
Reduce the update period on queue worker metrics.
2016-12-05 18:12:14 -05:00
Quentin Machu
b990a27d50
Increase limit in securitynotificationworker
...
With https://github.com/coreos/clair/pull/278 and https://github.com/coreos/clair/pull/279 , performance of this API call has increased. It has been observed that querying 100 or 1000 layers page doesn't noticeably change the execution time. Therefore, doing significantly less calls will reduce the overall processing time for each notification.
2016-12-04 13:39:34 +01:00
Charlton Austin
7b3d8e3977
Merge pull request #2183 from charltonaustin/metrics_for_unscanned_images
...
Adding in some metrics around clair sec scan.
2016-12-02 11:50:29 -05:00
Charlton Austin
edd9dcd7f6
Adding in some metrics around clair sec scan.
2016-12-01 16:50:02 -05:00
Joseph Schorr
e6ee538e15
Fix full database test script to not fail randomly
...
- Switches database schema creation to alembic, which solves the MySQL issue (and makes sure we test migrations as well)
- Adds a few time.sleep(1) to work around MySQL's second-precision issue when adding items to queues and then immediately retrieving them
- Disables the storage proxy tests when running against non-SQLite databases, as it causes failures with the multiple process and multiple transactions
- Changes initdb to support only populating the database, as well as fixing a few small items around the test data when working with non-SQLite data
2016-11-30 18:24:08 -05:00
Joseph Schorr
e29cb34336
Fix Set calls to gauges
...
Fixes #2150
The proper function is `Set` (not `set`), which was causing these gauges to not report to Prometheus
2016-11-21 15:27:17 -05:00
Joseph Schorr
5f99448adc
Add a chunk cleanup queue for async GC of empty chunks
...
Instead of having the Swift storage engine try to delete the empty chunk(s) synchronously, we simply queue them and have a worker come along after 30s to delete the empty chunks. This has a few key benefits: it is async (doesn't slow down the push code), helps deal with Swift's eventual consistency (less retries necessary) and is generic for other storage engines if/when they need this as well
2016-11-15 15:07:41 -05:00
Jimmy Zelinskie
8b9f9478a4
pylint formatting
2016-10-28 17:12:46 -04:00
Jimmy Zelinskie
a30b358709
add staggered worker startup
...
Fixes #787
2016-10-28 17:12:39 -04:00
Jimmy Zelinskie
2bd1e76267
workers.queuecleanup: s/week/day cleanup frequency
2016-10-20 13:47:07 -04:00
Jimmy Zelinskie
20ef43d5fb
workers.queuecleanup: remove direct peewee usage
2016-10-20 13:46:00 -04:00
Joseph Schorr
30af8aef1a
Add a worker for reporting global stats to Prometheus
...
Fixes #1789
2016-09-12 16:19:19 -04:00
josephschorr
5c64646629
Merge pull request #1778 from coreos-inc/redlock
...
Fix locking via RedLock
2016-08-29 16:12:01 -04:00
Joseph Schorr
aa7c87d765
Fix locking via RedLock
...
Fixes #1777
2016-08-29 16:06:26 -04:00
Joseph Schorr
08a3b70b56
Extend processing before processing security notifications
...
Makes sure queue items don't expire during processing
Fixes #1776
2016-08-29 13:08:38 -04:00
Jake Moshenko
a113f548db
Accidentally forgot a line in the gc worker.
2016-08-02 10:44:53 -04:00
Jake Moshenko
05e2773fa7
Get rid of remaining slow query for garbage collection.
2016-08-01 18:22:38 -04:00
Joseph Schorr
b8d2570725
Don't raise an error on duplicate placements
...
This can happen if two pushes are racing on the same storage.
2016-07-19 16:44:05 -04:00
Joseph Schorr
5cd793331e
Fix storage replication for CAS and add tests
2016-07-12 13:46:06 -04:00
Joseph Schorr
3b994431eb
Auto expire the build status and logs in redis
2016-06-20 13:53:13 -04:00
Jake Moshenko
a1cf12e460
Add a sitemap.txt for popular public repos
...
and reference it from the robots.txt
2016-06-17 14:34:20 -04:00
Joseph Schorr
8887f09ba8
Use the instance service key for registry JWT signing
2016-06-07 11:58:10 -04:00
Joseph Schorr
dd0dd39bf0
Fix the queue cleanup worker to delete the items that have expired, not unexpired
2016-06-03 22:14:14 -04:00
Joseph Schorr
5746b42c69
Add a cleanup worker for the queue item table
...
Fixes #784
2016-06-02 15:00:44 -04:00
josephschorr
ec492bb683
Merge pull request #1323 from coreos-inc/secworkerreturn
...
Move security notification work into its own method to allow for retu…
2016-06-02 13:59:25 -04:00
Jake Moshenko
9221a515de
Use the registry API for security scanning
...
when the storage engine doesn't support direct download url
2016-05-04 18:04:06 -04:00
Joseph Schorr
73fa593d02
Various small fixes in prep for QE release
2016-05-04 15:20:27 -04:00
Jimmy Zelinskie
f842545b3e
rename config values to remove "Quay" ( #1431 )
2016-05-03 13:11:21 -04:00
Evan Cordell
489752a0b7
Only refresh current instance service key
2016-04-29 14:10:33 -04:00
Evan Cordell
a6f6a114c2
service key worker to refresh automatic keys
2016-04-29 14:10:33 -04:00
Jimmy Zelinskie
128b0cd38c
logrotateworker: archive every 24 hours
2016-04-18 13:02:30 -04:00
Jimmy Zelinskie
ef65822410
logrotateworker: perf optimizations
...
This removes our needless transaction, only calculates the cutoff date
once, removes the logs generator, and uses a tested optimal
MIN_LOGS_PER_ROTATION.
2016-04-15 16:51:17 -04:00
Jimmy Zelinskie
3d190b786f
userfiles: make handler optional
2016-04-15 13:56:07 -04:00
Jimmy Zelinskie
c7c52e6c74
logrotateworker: save to storage via userfiles
2016-04-14 13:29:29 -04:00
Joseph Schorr
d62ec22fc9
Move security notification work into its own method to allow for return values
...
Fixes #1302
Fixes #1304
2016-03-31 14:08:33 -04:00
Joseph Schorr
dc8f9713f8
Change logs worker to use a global lock in the inner loop and move storage out of the transaction
2016-03-24 14:09:48 -04:00
Joseph Schorr
aa5587c93c
Fixes and added tests for the security notification worker
...
Fixes #1301
- Ensures that the worker uses pagination properly
- Ensures that the worker handles failure as expected
- Moves marking the notification as read to after the worker processes it
- Increases the number of layers requested to 100
2016-03-18 20:28:06 -04:00
Quentin Machu
5b7d6b0638
Merge pull request #1275 from Quentin-M/min_id_once
...
Compute min_id only once during securityworker's lifetime
2016-03-04 14:02:47 -05:00
Quentin Machu
54153c9b80
Compute min_id only once during securityworker's lifetime
2016-03-04 14:02:28 -05:00
Jimmy Zelinskie
b5d904f373
Merge pull request #1218 from jzelinskie/logrotate5ever
...
vastly simplify log rotation
2016-03-04 13:48:21 -05:00
Quentin Machu
888f976e8d
Use a feature flag to toggle security notifications
2016-03-01 15:54:18 -05:00
Joseph Schorr
f498e92d58
Implement against new Clair paginated notification system
2016-02-25 15:58:42 -05:00
Joseph Schorr
c0374d71c9
Refactor the security worker and API calls and add a bunch of tests
2016-02-25 12:29:41 -05:00
Quentin Machu
e5da33578c
Adapt security worker for Clair v1.0 (except notifications)
2016-02-19 17:44:14 -05:00
Quentin Machu
f62a05f6d7
various securityworker fixes
2016-02-09 21:25:07 -05:00
Quentin Machu
1d2b31a581
Mark layers that Clair can't extract as failed
2016-02-09 18:24:35 -05:00
Jimmy Zelinskie
ee705fe7a9
vastly simplify log rotation
2016-02-09 18:20:14 -05:00
Quentin Machu
13c10ba7b1
Double the securityworker indexing interval
2016-02-09 14:49:10 -05:00
Joseph Schorr
ab166c4448
Delete the image diff feature
...
Fixes #1077
2015-12-23 13:08:01 -05:00
Jimmy Zelinskie
f439ad7804
Merge pull request #618 from jzelinskie/logsworker
...
add a log rotation worker
2015-12-16 17:25:50 -05:00
Jimmy Zelinskie
e1f955a3f6
add a log rotation worker
...
Fixes #609 .
2015-12-16 17:22:28 -05:00
Joseph Schorr
c888a8b3be
Make GC timeout configurable
2015-12-16 15:45:02 -05:00
Jake Moshenko
2f626f2691
Unify the database connection lifecycle across all workers
2015-12-04 15:51:53 -05:00
Joseph Schorr
544fa40a5f
Add a base class for a global worker that locks via Redis
2015-11-24 16:18:45 -05:00
Silas Sewell
1162814734
securityworker: mark children we can't analyze
...
This allows us to differentiate between images that are queued and those we
can't analyze in constant time.
2015-11-19 11:22:15 -05:00
Quentin Machu
88e85cded0
Fix security worker (again?)
2015-11-18 19:45:09 -05:00
Quentin Machu
7e9faa6c54
Add missing import
2015-11-18 17:39:27 -05:00
Quentin Machu
605ed1fc77
Refactor security worker
2015-11-18 14:38:32 -05:00
Jake Moshenko
0459c3bc54
Merge remote-tracking branch 'upstream/master' into python-registry-v2
2015-11-16 14:22:54 -05:00
Joseph Schorr
6412e145dd
Fix key error
2015-11-13 13:16:33 -05:00
Jimmy Zelinskie
09ce33e0dc
fix case where query broke on empty list
2015-11-13 12:35:18 -05:00
Joseph Schorr
927a0b639c
Add check for empty locations list
2015-11-13 12:23:02 -05:00
Joseph Schorr
030c69d7d2
Further merge fixes
2015-11-12 22:00:28 -05:00
Joseph Schorr
7816b0c657
Merge master into vulnerability-tool
2015-11-12 21:52:47 -05:00
Joseph Schorr
25b8b7590f
Fix all the things!
2015-11-12 20:55:41 -05:00
Jimmy Zelinskie
37ce84f6af
tiny fixes to securityworker
2015-11-12 17:18:04 -05:00
Jimmy Zelinskie
f6a34c5d06
refactor securityworker
...
Fixes #772 .
2015-11-12 16:03:10 -05:00
Jake Moshenko
ab340e20ea
Merge remote-tracking branch 'upstream/master' into python-registry-v2
2015-11-11 16:41:40 -05:00
Joseph Schorr
ca7d736db2
Only send vulnerability events if the minimum priority is gte to that specified
...
Fixes #770
2015-11-10 16:05:55 -05:00
Jimmy Zelinskie
8e2868737b
rename secscan_endpoint and move db close to API
2015-11-10 15:22:31 -05:00
Jimmy Zelinskie
da31714fb5
specify securityworker skip message
2015-11-10 15:22:30 -05:00
Jimmy Zelinskie
52962b3732
close db connections when calling out to clair
2015-11-10 15:22:30 -05:00
Jimmy Zelinskie
d651ea4b48
initial security notification worker
2015-11-10 15:22:30 -05:00
Quentin Machu
16c364a90c
Rename secscan_endpoint where required, fix index and indentation
2015-11-09 15:18:42 -05:00
Joseph Schorr
2d2662f53f
Fix deleting repos and images under MySQL
...
MySQL doesn't handle constraints at the end of transactions, so deleting images currently fails. This removes the constraint and just leaves parent_id as an int
2015-11-09 14:42:05 -05:00
Quentin Machu
7dbe15e339
Remove checksum from Clair's worker and adjust line length
2015-11-09 14:31:24 -05:00
Joseph Schorr
b408cfd2cc
Ready for demo
2015-11-09 12:51:05 -05:00
Joseph Schorr
7fa4fe08e7
Fix worker
2015-11-09 12:50:39 -05:00
Joseph Schorr
407eaae137
WIP: Towards sec demo
2015-11-09 12:50:39 -05:00
Quentin Machu
37118423a5
Add support for Quay's vulnerability tool
2015-11-09 12:49:19 -05:00
Jake Moshenko
c2fcf8bead
Merge remote-tracking branch 'upstream/phase4-11-07-2015' into python-registry-v2
2015-11-06 18:18:29 -05:00
Quentin Machu
af4511455f
Remove .distinct() from these queries
2015-11-06 15:22:18 -05:00
Quentin Machu
3677947521
Add support for Quay's vulnerability tool
2015-11-06 15:22:18 -05:00
Quentin Machu
1b41200e49
Fix PostgresSQL compatibility and parent omittance securityworker
2015-11-06 15:22:18 -05:00
Quentin Machu
f59e35cc81
Add support for Quay's vulnerability tool
2015-11-06 15:22:18 -05:00
Jake Moshenko
9da64f3aba
Stop writing to deprecated columns for image data.
2015-10-24 14:45:15 -04:00
Jake Moshenko
e7a6176594
Merge remote-tracking branch 'upstream/v2-phase4' into python-registry-v2
2015-10-22 16:59:28 -04:00
Jake Moshenko
ce94931540
Stop writing to deprecated columns for image data.
2015-10-22 12:14:39 -04:00
josephschorr
8e7b20a0d7
Merge pull request #675 from coreos-inc/distinctgc
...
Reduce GC work time and make sure to use distinct query
2015-10-21 12:01:26 -04:00
Silas Sewell
fd96f7c1e3
Merge pull request #667 from coreos-inc/error-georeplication-local-storage
...
workers.storagereplication: error on LocalStorage
2015-10-20 20:29:24 -04:00
Silas Sewell
03f5fe6143
workers.storagereplication: error on LocalStorage
...
Ensure we don't start when LocalStorage is in the config.
Fixes #502
2015-10-20 19:04:31 -04:00
Joseph Schorr
4e5c8a9281
Reduce GC work time and make sure to use distinct query
2015-10-20 18:13:29 -04:00
Joseph Schorr
5941f3937c
Enable async GC for all
...
Fixes #569
2015-10-19 14:22:41 -04:00
Jimmy Zelinskie
7c82e0b5b3
move UseThenDisconnect into queueworker
...
This makes the tests pass while maintaining the same behavior.
2015-09-21 13:34:12 -04:00
Joseph Schorr
96d5bbb155
Fix exceptions raised by the diffs worker
...
Fixes #465
2015-09-10 14:12:16 -04:00
Joseph Schorr
3ee4147117
Switch the build logs archiver to a more performant query
...
Fixes #459
2015-09-09 13:59:45 -04:00
Joseph Schorr
724b1607d7
Add automatic storage replication
...
Adds a worker to automatically replicate data between storages and update the database accordingly
2015-09-01 14:53:32 -04:00
Matt Jibson
7407bca728
Correct fix for notification get repo
...
The fix in #366 was wrong. Not sure how I tested it and it worked.
2015-08-17 17:54:33 -04:00
Matt Jibson
132bc4491b
Fix notification worker's use of get repo notification
2015-08-14 15:42:31 -04:00
Joseph Schorr
c3d7ef2ec4
Only start workers once setup is complete on the registry
...
Fixes #326
2015-08-07 13:44:14 -04:00
Joseph Schorr
14f511bb5a
Make sure to set a default for Raven client
...
Fixes #327
2015-08-07 13:03:38 -04:00
Joseph Schorr
572d6ba53c
Fix broken tests
2015-07-29 14:21:29 -04:00
Joseph Schorr
ac0cca2d90
Switch to a unified worker system
...
- Handles logging
- Handles reporting to Sentry
- Removes old code around serving a web endpoint (unused now)
2015-07-28 17:26:12 -04:00
Joseph Schorr
70de107268
Make GC of repositories fully async for whitelisted namespaces
...
This change adds a worker to conduct GC on repositories with garbage every 10s.
Fixes #144
2015-07-28 15:30:04 -04:00
Jake Moshenko
3efaa255e8
Accidental refactor, split out legacy.py into separate sumodules and update all call sites.
2015-07-17 11:56:15 -04:00
Jake Moshenko
acbcc2e206
Start of a v2 API.
2015-07-17 11:50:41 -04:00
Joseph Schorr
6eaf1dbb3f
Make the repositoryactioncount worker disconnect from the DB between runs
2015-04-22 17:11:08 -04:00
Joseph Schorr
657ba576a8
Make sure to import app so that the DB proxy gets properly initialized
2015-04-13 14:25:09 -04:00
Joseph Schorr
3f1e8f3c27
Add a RepositoryActionCount table so we can use it (instead of LogEntry) when scoring repo search results
2015-04-13 13:31:07 -04:00
Joseph Schorr
3872d29de9
Add a transaction around the extend_processing call
2015-01-29 18:40:41 -05:00
Jake Moshenko
11562a74de
Remove the old builder infrastructure.
2015-01-29 11:03:23 -05:00