Commit graph

57 commits

Author SHA1 Message Date
Matt Jibson
87cc3289a0 Remove transaction from metric reporting 2015-10-06 01:28:43 -04:00
Matt Jibson
cfb6e884f2 Refactor metric collection
This change adds a generic queue onto which metrics can be pushed. A
separate module removes metrics from the queue and adds them to Cloudwatch.
Since these are now separate ideas, we can easily change the consumer from
Cloudwatch to anything else.

This change maintains near feature parity (the only change is there is now
just one queue instead of two - not a big deal).
2015-08-12 12:15:52 -04:00
Joseph Schorr
52fa9aad5b Fix etcd watching
Etcd can miss events on watches if they are occurring fast enough, so if we can get an exception indicating that we've missed an index, we reset the state of our local tracking structures by re-reading the *full* list and starting a new watch at HEAD
2015-06-25 21:22:39 -04:00
Jake Moshenko
79f1181a63 Switch build-scheduled to an official build phase. 2015-06-10 16:19:51 -04:00
Jake Moshenko
884fedd229 Improve the log messages in the buildman. 2015-06-10 16:19:51 -04:00
Jake Moshenko
d31e25d5cd Allow the individual build manager types to specify how long the queue should wait before retring a job that fails to schedule. 2015-06-10 16:19:50 -04:00
Joseph Schorr
24ce0decd9 Requeue build jobs after the work check timeout + some additional padding. This ensures that if a build somehow gets wedged, other builds can continue to be picked up. 2015-06-09 20:43:48 -04:00
Jimmy Zelinskie
8589871f43 buildman: rm unused imports 2015-03-09 13:04:16 -04:00
Joseph Schorr
d973f9df45 Reenable metrics until we know they are the problem 2015-02-25 16:00:46 -05:00
Joseph Schorr
bdb84f1c20 Merge branch 'master' of github.com:coreos-inc/quay 2015-02-25 16:00:17 -05:00
Joseph Schorr
4551b3a957 Remove the boto timeout set (doesn't work anyway) and add some better logging to the scheduler 2015-02-25 16:00:14 -05:00
Jimmy Zelinskie
090a198afc temporarily comment out metrics 2015-02-25 15:29:35 -05:00
Joseph Schorr
5dd78f76c7 Add additional logging, timeouts, and exception checks 2015-02-25 15:15:22 -05:00
Jimmy Zelinskie
346d6b933a buildman: initialize queuemetrics asynchronously 2015-02-25 13:55:18 -05:00
Joseph Schorr
390f8df4ad Make sure the build manager dies on an unhandled schedule exception 2015-02-25 12:19:21 -05:00
Joseph Schorr
afe7e14254 Add better exception handling and logging to the ephemeral build manager 2015-02-25 12:09:14 -05:00
Jimmy Zelinskie
47f8cb77c4 Merge pull request #11 from coreos-inc/nimbus
CloudWatch for build job status
2015-02-18 17:17:28 -05:00
Jimmy Zelinskie
f53dea46b7 buildman: address PR #11 comments 2015-02-18 14:13:36 -05:00
Joseph Schorr
524705b88c Get dashboard working and upgrade bootstrap. Note: the bootstrap fixes will be coming in the followup CL 2015-02-17 19:15:54 -05:00
Jimmy Zelinskie
5790d7d8cc buildman: build_metrics call correct method 2015-02-17 17:03:12 -05:00
Jimmy Zelinskie
85edb651e2 buildserver: remove pylint comments 2015-02-17 15:32:25 -05:00
Jimmy Zelinskie
d70c95e42e buildreporter: move reporting into server callback 2015-02-17 15:31:53 -05:00
Jimmy Zelinskie
b8d9ef0fe9 buildman: remove old create_task for queue metrics 2015-02-17 14:18:32 -05:00
Jimmy Zelinskie
935db5c766 buildman: clarify queue metrics from job state metrics 2015-02-17 12:23:08 -05:00
Joseph Schorr
fbdbc21eb1 Merge branch 'master' into quark 2015-02-13 16:24:53 -05:00
Jimmy Zelinskie
6a3d269574 buildman: update metrics task 2015-02-13 11:25:29 -05:00
Joseph Schorr
ae8bb5fc13 Add preparing build node status item and change the build status colors to be variations on a blue color 2015-02-12 16:38:43 -05:00
Joseph Schorr
a1938593a9 Better handling of retries on build errors 2015-02-03 16:29:47 -05:00
Joseph Schorr
0875d3dce1 Merge branch 'master' of https://github.com/coreos-inc/quay 2015-01-29 18:40:49 -05:00
Joseph Schorr
3872d29de9 Add a transaction around the extend_processing call 2015-01-29 18:40:41 -05:00
Jake Moshenko
fb533a1f4c Merge branch 'master' of github.com:coreos-inc/quay 2015-01-29 18:40:24 -05:00
Jake Moshenko
63d23a04c0 Make the loop pause when we run out of builder capacity. 2015-01-29 18:40:01 -05:00
Joseph Schorr
838bfe23b1 Remove retries update in the extend processing call and make sure it is under a transaction 2015-01-29 18:33:17 -05:00
Joseph Schorr
ce3f8b438c Fix pull credentials bug, fix job details parse bug and add some better logging 2015-01-29 18:01:42 -05:00
Joseph Schorr
7ee00b83cb Switch to using a CloseForLongOperation around the sleep 2015-01-29 14:50:07 -05:00
Joseph Schorr
cf35da30bc Make sure to not hold DB connections open in the new build manager 2015-01-29 14:40:24 -05:00
Jake Moshenko
2e86417329 Allow the buildman server to die if an uncaught exception terminates the scheduler process. 2015-01-29 10:56:57 -05:00
Joseph Schorr
d359c849cd Add the build worker and job count information to the charts 2015-01-28 17:12:33 -05:00
Jake Moshenko
dd7664328c Make the build manager ports configurable. 2015-01-05 15:09:03 -05:00
Jake Moshenko
cc70225043 Generalize the ephemeral build managers so that any manager may manage a builder spawned by any other manager. 2014-12-31 11:33:56 -05:00
Jake Moshenko
34bf92673b Add support for adjusting etcd ttl on job_heartbeat. Switch the heartbeat method to a coroutine. 2014-12-22 17:24:44 -05:00
Jake Moshenko
e53b6b0e21 Merge remote-tracking branch 'origin/master' into ephemeral 2014-12-22 12:14:59 -05:00
Jake Moshenko
12ee8e0fc0 Switch a few of the buildman methods to coroutines in order to support network calls in methods. Add a test for the ephemeral build manager. 2014-12-22 12:14:16 -05:00
Jake Moshenko
2d7e844753 First implementation of ephemeral build lifecycle manager. 2014-12-16 13:41:30 -05:00
Jimmy Zelinskie
33f12c58ba Add active worker count to buildmanager logs. 2014-12-16 13:37:40 -05:00
Jimmy Zelinskie
09cc4ba4c1 LOGGER -> logger.
While logger may be a global variable, it is not constant. Let the
linters complain!
2014-11-30 17:48:38 -05:00
Joseph Schorr
9d675b51ed - Change SSL to only be enabled via an environment variable. Nginx will be terminating SSL for the ER.
- Add the missing dependencies to the requirements.txt
- Change the builder ports to non-standard locations
- Add the /b1/socket and /b1/controller endpoints in nginx, to map to the build manager
- Have the build manager start automatically.
2014-11-25 18:08:18 -05:00
Joseph Schorr
04fc6d82a5 Add support for SSL if the certificate is found in the config directory 2014-11-25 16:36:21 -05:00
Joseph Schorr
660a640de6 Better organize the source file structure of the build manager and change it to choose a lifecycle manager based on the config 2014-11-25 16:14:44 -05:00
Joseph Schorr
b8e873b00b Add support to the build system for tracking if/when the build manager crashes and make sure builds are restarted within a few minutes 2014-11-21 14:27:06 -05:00