Commit graph

59 commits

Author SHA1 Message Date
Jimmy Zelinskie
64d0c5b675 data.queue: fix race condition
It's possible that multiple consumers will acquire a queue item if they
race on an expired item. To mitigate this, we check that the
processing_expires time hasn't been changed since we last read.
2016-07-14 15:34:22 -04:00
Jimmy Zelinskie
609f4fccd8 data.queue: simplify put method 2016-07-14 15:34:22 -04:00
Joseph Schorr
713ba3abaf Further updates to the Prometheus client code 2016-07-01 14:16:51 -04:00
Jake Moshenko
668a8edc50 Refactor prometheus integration
Move prometheus to SaaS and make it a plugin
Move static callers to use metrics_queue plugin
Change local-docker to support different quay clone dirnames
Change prom_aggregator to use logrus
2016-07-01 14:16:50 -04:00
Matt Jibson
3d9acf2fff Use prometheus as a metric backend
This entails writing a metric aggregation program since each worker has its
own memory, and thus own metrics because of python gunicorn. The python
client is a simple wrapper that makes web requests to it.
2016-07-01 14:16:50 -04:00
Jimmy Zelinskie
1f488acf12 data.queue: move name matching clause 2016-05-31 15:44:11 -04:00
Jimmy Zelinskie
26300d3c8e data.queue: lint 2016-05-27 14:51:19 -04:00
Jimmy Zelinskie
8a5aa65d74 data.queue: limiting before order by rand 2016-05-27 14:44:30 -04:00
Jimmy Zelinskie
44b56ae2cf queue: explicitly declare ordering requirement
This change defaults the ordering requirement of queue items to be off
and only enables it for the build manager. This should make the queries
for getting queueitems significantly faster for every other use case.
2016-05-27 14:44:30 -04:00
Joseph Schorr
f498e92d58 Implement against new Clair paginated notification system 2016-02-25 15:58:42 -05:00
Joseph Schorr
01723d5546 Catch other cases where the queue item has been removed
Fixes #1096
2015-12-22 15:58:51 -05:00
Matt Jibson
a994b367da Refactor queue locking to not use select for update
The test suggests this works.

fixes #622
2015-11-03 11:32:28 -05:00
Matt Jibson
87cc3289a0 Remove transaction from metric reporting 2015-10-06 01:28:43 -04:00
Matt Jibson
4da66c1219 Move the metric put outside the transaction 2015-09-21 13:37:49 -04:00
Jimmy Zelinskie
2ff77df946 Merge pull request #518 from jzelinskie/fixmysqlssl
move UseThenDisconnect into queueworker
2015-09-21 13:35:35 -04:00
Jimmy Zelinskie
7c82e0b5b3 move UseThenDisconnect into queueworker
This makes the tests pass while maintaining the same behavior.
2015-09-21 13:34:12 -04:00
Jimmy Zelinskie
0de17627d5 Merge pull request #517 from jzelinskie/fixmysqlssl
close connections after getting queue metrics
2015-09-21 12:28:23 -04:00
Jimmy Zelinskie
98d6262a7f close connections after getting queue metrics 2015-09-21 12:21:39 -04:00
Matt Jibson
bba1557437 Monitor queue adds and EC2 node starts
fixes #157
see #304
2015-09-18 16:21:16 -04:00
Matt Jibson
39dc4c7d8d Monitor various sizes for queues
see #304
2015-09-14 15:57:08 -04:00
Joseph Schorr
96d5bbb155 Fix exceptions raised by the diffs worker
Fixes #465
2015-09-10 14:12:16 -04:00
Matt Jibson
fc671f3dde Fix test_queue.py tests
This restores the reporter class as was before the metrics changes.
2015-08-17 17:22:46 -04:00
Matt Jibson
cfb6e884f2 Refactor metric collection
This change adds a generic queue onto which metrics can be pushed. A
separate module removes metrics from the queue and adds them to Cloudwatch.
Since these are now separate ideas, we can easily change the consumer from
Cloudwatch to anything else.

This change maintains near feature parity (the only change is there is now
just one queue instead of two - not a big deal).
2015-08-12 12:15:52 -04:00
Joseph Schorr
d480a204f5 Revert change to queue 2015-08-05 15:27:33 -04:00
Jake Moshenko
ed62339f89 Improve the performance of queue candidate queries. 2015-08-04 18:20:54 -04:00
Joseph Schorr
5f605b7cc8 Fix queue handling to remove the dependency from repobuild, and have a cancel method 2015-02-23 13:38:01 -05:00
Joseph Schorr
524705b88c Get dashboard working and upgrade bootstrap. Note: the bootstrap fixes will be coming in the followup CL 2015-02-17 19:15:54 -05:00
Joseph Schorr
f84d1bad45 Handle internal errors in a better fashion: If a build would be marked as internal error, only do so if there are retries remaining. Otherwise, we mark it as failed (since it won't be rebuilt anyway) 2015-02-12 16:19:44 -05:00
Jake Moshenko
ce7033489b Hopefully fix the deadlock in the queue. 2015-02-03 14:50:01 -05:00
Jake Moshenko
64750e31fc Add the ability to select for update within transactions to fix some write after read hazards. Fix a bug in extend_processing. 2015-01-30 16:32:13 -05:00
Joseph Schorr
3872d29de9 Add a transaction around the extend_processing call 2015-01-29 18:40:41 -05:00
Jake Moshenko
cc70225043 Generalize the ephemeral build managers so that any manager may manage a builder spawned by any other manager. 2014-12-31 11:33:56 -05:00
Joseph Schorr
dbac8c7e3d Fix build code:
- Fix issue with the queue_item in extend processing
  - Add the new compiled docker binary with the lxc volume fix
2014-12-04 17:49:39 +01:00
Joseph Schorr
72d613614d Merge branch 'bagger' 2014-12-01 12:48:59 -05:00
Jimmy Zelinskie
716d7a737b Strip whitespace from ALL the things. 2014-11-24 16:07:38 -05:00
Joseph Schorr
b8e873b00b Add support to the build system for tracking if/when the build manager crashes and make sure builds are restarted within a few minutes 2014-11-21 14:27:06 -05:00
Jake Moshenko
f4681f2c18 Merge branch 'master' into nomenclature
Conflicts:
	test/data/test.db
2014-11-17 17:59:59 -05:00
Joseph Schorr
c06f57a6e7 Make sure builders close the db handle when no work comes in and make the metrics transaction smaller in scope 2014-10-24 11:40:02 -04:00
Jake Moshenko
e8b3d1cc4a Phase 4 of the namespace to user migration: actually remove the column from the db and remove the dependence on serialized namespaces in the workers and queues 2014-10-01 14:23:46 -04:00
Joseph Schorr
f23038c6ee Update the worker code to better handle exceptions, fix the utcdate issue and make sure we send the proper retry. Also updates notification workers to send JobExceptions rather than returning true or false 2014-09-22 12:52:57 -04:00
Jake Moshenko
0b6552d6cc Fix the metrics so they are usable for scaling the workers down and up. Switch all datetimes which touch the database from now to utcnow. Fix the worker Dockerfile. 2014-05-23 14:16:26 -04:00
Jake Moshenko
f4c488f9b6 Fix the queue query for old jobs which won't run. 2014-05-22 13:50:06 -04:00
Jake Moshenko
f6726bd0a4 Merge branch 'ldapper'
Conflicts:
	Dockerfile
	app.py
	data/database.py
	endpoints/index.py
	test/data/test.db
2014-05-22 12:13:41 -04:00
Jake Moshenko
d14798de1d Add a queue capacity reporter plugin to the queue. Move the queue definitions to app. Add a cloudwatch reporter to the dockerfile build queue. 2014-05-21 19:50:37 -04:00
Jake Moshenko
8a3af93b8c Improve the builder response to being terminated or dying. 2014-05-06 18:46:19 -04:00
jakedt
4b8217d4ad Add config to allow for setting the queue names at runtime. Fix a bug in the data model. 2014-04-11 19:23:57 -04:00
jakedt
61a6db236f Finish the implementation of local userfiles. Strip charsets from mimetypes in the build worker. Add canonical name ordering to the build queue. Port all queues to the canonical naming version. 2014-04-11 18:34:47 -04:00
jakedt
13dea98499 Prepare the build worker to support multiple tags and subdirectories. Change the build database config to accept a job config object instead of breaking out the parameters into independent blocks. 2014-02-24 16:11:23 -05:00
jakedt
e7064f1191 Fix the tests and the one bug that it highlighted. 2014-02-16 18:59:24 -05:00
yackob03
e1a4efe35c Change the build queue identifier to disambiguate between old and new build jobs. 2014-02-12 12:51:30 -05:00