vbatts/quay - Git - Batts Cloud

Archived

Author	SHA1	Message	Date
Joseph Schorr	a8bc4bf697	Send the correct phase when setting the phase from job_complete	2016-09-30 21:26:45 +02:00
Joseph Schorr	f50bb8a1ce	Add missing call to set_phase when a build doesn't start This change fixes the build manager ephemeral executor to tell the overall build server to call set_phase when a build never starts. Before this change, we'd properly adjust the queue item, but not the repo build row or the logs, which is why users just saw "Preparing Build Node", with no indicating the node failed to start. Fixes #1904	2016-09-30 14:54:49 +02:00
josephschorr	ad4efba802	Merge pull request #1830 from coreos-inc/superuser-dashboard Add prometheus stats to enable better dashboarding	2016-09-26 17:19:22 +02:00
Joseph Schorr	1571b2867a	Add executor name to the build metric	2016-09-16 16:26:04 -04:00
Joseph Schorr	f9f60b9faf	Fix some issues around state in the build managers - Make sure to cleanup the job if the executor could not be started - Change the setup leeway to further ensure there isn't any crossover between the queue item timing out and the cleanup of the jobs - Make the lock used for marking jobs as internal error extremely long, but also based on the execution ID. This should ensure we don't get duplicates while allowing different executions to be handled properly. - Make sure to invoke the callback update for the queue before we run off to etcd; should reduce certain timeouts Hopefully Fixes #1836	2016-09-15 14:37:45 -04:00
Joseph Schorr	818ea38dac	Add repo-specific reporting of repository builds	2016-09-09 15:36:54 -04:00
Joseph Schorr	2fe896ba6a	Restore retries of jobs not started and add some leeway to the processing time	2016-08-30 13:57:26 -04:00
Joseph Schorr	713ba3abaf	Further updates to the Prometheus client code	2016-07-01 14:16:51 -04:00
Matt Jibson	3d9acf2fff	Use prometheus as a metric backend This entails writing a metric aggregation program since each worker has its own memory, and thus own metrics because of python gunicorn. The python client is a simple wrapper that makes web requests to it.	2016-07-01 14:16:50 -04:00
Jimmy Zelinskie	7d356c451b	buildman: fix misspell	2016-06-03 15:42:14 -04:00
Jimmy Zelinskie	44b56ae2cf	queue: explicitly declare ordering requirement This change defaults the ordering requirement of queue items to be off and only enables it for the build manager. This should make the queries for getting queueitems significantly faster for every other use case.	2016-05-27 14:44:30 -04:00
Matt Jibson	87cc3289a0	Remove transaction from metric reporting	2015-10-06 01:28:43 -04:00
Matt Jibson	cfb6e884f2	Refactor metric collection This change adds a generic queue onto which metrics can be pushed. A separate module removes metrics from the queue and adds them to Cloudwatch. Since these are now separate ideas, we can easily change the consumer from Cloudwatch to anything else. This change maintains near feature parity (the only change is there is now just one queue instead of two - not a big deal).	2015-08-12 12:15:52 -04:00
Joseph Schorr	52fa9aad5b	Fix etcd watching Etcd can miss events on watches if they are occurring fast enough, so if we can get an exception indicating that we've missed an index, we reset the state of our local tracking structures by re-reading the full list and starting a new watch at HEAD	2015-06-25 21:22:39 -04:00
Jake Moshenko	79f1181a63	Switch build-scheduled to an official build phase.	2015-06-10 16:19:51 -04:00
Jake Moshenko	884fedd229	Improve the log messages in the buildman.	2015-06-10 16:19:51 -04:00
Jake Moshenko	d31e25d5cd	Allow the individual build manager types to specify how long the queue should wait before retring a job that fails to schedule.	2015-06-10 16:19:50 -04:00
Joseph Schorr	24ce0decd9	Requeue build jobs after the work check timeout + some additional padding. This ensures that if a build somehow gets wedged, other builds can continue to be picked up.	2015-06-09 20:43:48 -04:00
Jimmy Zelinskie	8589871f43	buildman: rm unused imports	2015-03-09 13:04:16 -04:00
Joseph Schorr	d973f9df45	Reenable metrics until we know they are the problem	2015-02-25 16:00:46 -05:00
Joseph Schorr	bdb84f1c20	Merge branch 'master' of github.com:coreos-inc/quay	2015-02-25 16:00:17 -05:00
Joseph Schorr	4551b3a957	Remove the boto timeout set (doesn't work anyway) and add some better logging to the scheduler	2015-02-25 16:00:14 -05:00
Jimmy Zelinskie	090a198afc	temporarily comment out metrics	2015-02-25 15:29:35 -05:00
Joseph Schorr	5dd78f76c7	Add additional logging, timeouts, and exception checks	2015-02-25 15:15:22 -05:00
Jimmy Zelinskie	346d6b933a	buildman: initialize queuemetrics asynchronously	2015-02-25 13:55:18 -05:00
Joseph Schorr	390f8df4ad	Make sure the build manager dies on an unhandled schedule exception	2015-02-25 12:19:21 -05:00
Joseph Schorr	afe7e14254	Add better exception handling and logging to the ephemeral build manager	2015-02-25 12:09:14 -05:00
Jimmy Zelinskie	47f8cb77c4	Merge pull request #11 from coreos-inc/nimbus CloudWatch for build job status	2015-02-18 17:17:28 -05:00
Jimmy Zelinskie	f53dea46b7	buildman: address PR #11 comments	2015-02-18 14:13:36 -05:00
Joseph Schorr	524705b88c	Get dashboard working and upgrade bootstrap. Note: the bootstrap fixes will be coming in the followup CL	2015-02-17 19:15:54 -05:00
Jimmy Zelinskie	5790d7d8cc	buildman: build_metrics call correct method	2015-02-17 17:03:12 -05:00
Jimmy Zelinskie	85edb651e2	buildserver: remove pylint comments	2015-02-17 15:32:25 -05:00
Jimmy Zelinskie	d70c95e42e	buildreporter: move reporting into server callback	2015-02-17 15:31:53 -05:00
Jimmy Zelinskie	b8d9ef0fe9	buildman: remove old create_task for queue metrics	2015-02-17 14:18:32 -05:00
Jimmy Zelinskie	935db5c766	buildman: clarify queue metrics from job state metrics	2015-02-17 12:23:08 -05:00
Joseph Schorr	fbdbc21eb1	Merge branch 'master' into quark	2015-02-13 16:24:53 -05:00
Jimmy Zelinskie	6a3d269574	buildman: update metrics task	2015-02-13 11:25:29 -05:00
Joseph Schorr	ae8bb5fc13	Add preparing build node status item and change the build status colors to be variations on a blue color	2015-02-12 16:38:43 -05:00
Joseph Schorr	a1938593a9	Better handling of retries on build errors	2015-02-03 16:29:47 -05:00
Joseph Schorr	0875d3dce1	Merge branch 'master' of https://github.com/coreos-inc/quay	2015-01-29 18:40:49 -05:00
Joseph Schorr	3872d29de9	Add a transaction around the extend_processing call	2015-01-29 18:40:41 -05:00
Jake Moshenko	fb533a1f4c	Merge branch 'master' of github.com:coreos-inc/quay	2015-01-29 18:40:24 -05:00
Jake Moshenko	63d23a04c0	Make the loop pause when we run out of builder capacity.	2015-01-29 18:40:01 -05:00
Joseph Schorr	838bfe23b1	Remove retries update in the extend processing call and make sure it is under a transaction	2015-01-29 18:33:17 -05:00
Joseph Schorr	ce3f8b438c	Fix pull credentials bug, fix job details parse bug and add some better logging	2015-01-29 18:01:42 -05:00
Joseph Schorr	7ee00b83cb	Switch to using a CloseForLongOperation around the sleep	2015-01-29 14:50:07 -05:00
Joseph Schorr	cf35da30bc	Make sure to not hold DB connections open in the new build manager	2015-01-29 14:40:24 -05:00
Jake Moshenko	2e86417329	Allow the buildman server to die if an uncaught exception terminates the scheduler process.	2015-01-29 10:56:57 -05:00
Joseph Schorr	d359c849cd	Add the build worker and job count information to the charts	2015-01-28 17:12:33 -05:00
Jake Moshenko	dd7664328c	Make the build manager ports configurable.	2015-01-05 15:09:03 -05:00

1 2

68 commits