Joseph Schorr
713ba3abaf
Further updates to the Prometheus client code
2016-07-01 14:16:51 -04:00
Matt Jibson
3d9acf2fff
Use prometheus as a metric backend
...
This entails writing a metric aggregation program since each worker has its
own memory, and thus own metrics because of python gunicorn. The python
client is a simple wrapper that makes web requests to it.
2016-07-01 14:16:50 -04:00
Jimmy Zelinskie
7d356c451b
buildman: fix misspell
2016-06-03 15:42:14 -04:00
Jimmy Zelinskie
44b56ae2cf
queue: explicitly declare ordering requirement
...
This change defaults the ordering requirement of queue items to be off
and only enables it for the build manager. This should make the queries
for getting queueitems significantly faster for every other use case.
2016-05-27 14:44:30 -04:00
Matt Jibson
87cc3289a0
Remove transaction from metric reporting
2015-10-06 01:28:43 -04:00
Matt Jibson
cfb6e884f2
Refactor metric collection
...
This change adds a generic queue onto which metrics can be pushed. A
separate module removes metrics from the queue and adds them to Cloudwatch.
Since these are now separate ideas, we can easily change the consumer from
Cloudwatch to anything else.
This change maintains near feature parity (the only change is there is now
just one queue instead of two - not a big deal).
2015-08-12 12:15:52 -04:00
Joseph Schorr
52fa9aad5b
Fix etcd watching
...
Etcd can miss events on watches if they are occurring fast enough, so if we can get an exception indicating that we've missed an index, we reset the state of our local tracking structures by re-reading the *full* list and starting a new watch at HEAD
2015-06-25 21:22:39 -04:00
Jake Moshenko
79f1181a63
Switch build-scheduled to an official build phase.
2015-06-10 16:19:51 -04:00
Jake Moshenko
884fedd229
Improve the log messages in the buildman.
2015-06-10 16:19:51 -04:00
Jake Moshenko
d31e25d5cd
Allow the individual build manager types to specify how long the queue should wait before retring a job that fails to schedule.
2015-06-10 16:19:50 -04:00
Joseph Schorr
24ce0decd9
Requeue build jobs after the work check timeout + some additional padding. This ensures that if a build somehow gets wedged, other builds can continue to be picked up.
2015-06-09 20:43:48 -04:00
Jimmy Zelinskie
8589871f43
buildman: rm unused imports
2015-03-09 13:04:16 -04:00
Joseph Schorr
d973f9df45
Reenable metrics until we know they are the problem
2015-02-25 16:00:46 -05:00
Joseph Schorr
bdb84f1c20
Merge branch 'master' of github.com:coreos-inc/quay
2015-02-25 16:00:17 -05:00
Joseph Schorr
4551b3a957
Remove the boto timeout set (doesn't work anyway) and add some better logging to the scheduler
2015-02-25 16:00:14 -05:00
Jimmy Zelinskie
090a198afc
temporarily comment out metrics
2015-02-25 15:29:35 -05:00
Joseph Schorr
5dd78f76c7
Add additional logging, timeouts, and exception checks
2015-02-25 15:15:22 -05:00
Jimmy Zelinskie
346d6b933a
buildman: initialize queuemetrics asynchronously
2015-02-25 13:55:18 -05:00
Joseph Schorr
390f8df4ad
Make sure the build manager dies on an unhandled schedule exception
2015-02-25 12:19:21 -05:00
Joseph Schorr
afe7e14254
Add better exception handling and logging to the ephemeral build manager
2015-02-25 12:09:14 -05:00
Jimmy Zelinskie
47f8cb77c4
Merge pull request #11 from coreos-inc/nimbus
...
CloudWatch for build job status
2015-02-18 17:17:28 -05:00
Jimmy Zelinskie
f53dea46b7
buildman: address PR #11 comments
2015-02-18 14:13:36 -05:00
Joseph Schorr
524705b88c
Get dashboard working and upgrade bootstrap. Note: the bootstrap fixes will be coming in the followup CL
2015-02-17 19:15:54 -05:00
Jimmy Zelinskie
5790d7d8cc
buildman: build_metrics call correct method
2015-02-17 17:03:12 -05:00
Jimmy Zelinskie
85edb651e2
buildserver: remove pylint comments
2015-02-17 15:32:25 -05:00
Jimmy Zelinskie
d70c95e42e
buildreporter: move reporting into server callback
2015-02-17 15:31:53 -05:00
Jimmy Zelinskie
b8d9ef0fe9
buildman: remove old create_task for queue metrics
2015-02-17 14:18:32 -05:00
Jimmy Zelinskie
935db5c766
buildman: clarify queue metrics from job state metrics
2015-02-17 12:23:08 -05:00
Joseph Schorr
fbdbc21eb1
Merge branch 'master' into quark
2015-02-13 16:24:53 -05:00
Jimmy Zelinskie
6a3d269574
buildman: update metrics task
2015-02-13 11:25:29 -05:00
Joseph Schorr
ae8bb5fc13
Add preparing build node status item and change the build status colors to be variations on a blue color
2015-02-12 16:38:43 -05:00
Joseph Schorr
a1938593a9
Better handling of retries on build errors
2015-02-03 16:29:47 -05:00
Joseph Schorr
0875d3dce1
Merge branch 'master' of https://github.com/coreos-inc/quay
2015-01-29 18:40:49 -05:00
Joseph Schorr
3872d29de9
Add a transaction around the extend_processing call
2015-01-29 18:40:41 -05:00
Jake Moshenko
fb533a1f4c
Merge branch 'master' of github.com:coreos-inc/quay
2015-01-29 18:40:24 -05:00
Jake Moshenko
63d23a04c0
Make the loop pause when we run out of builder capacity.
2015-01-29 18:40:01 -05:00
Joseph Schorr
838bfe23b1
Remove retries update in the extend processing call and make sure it is under a transaction
2015-01-29 18:33:17 -05:00
Joseph Schorr
ce3f8b438c
Fix pull credentials bug, fix job details parse bug and add some better logging
2015-01-29 18:01:42 -05:00
Joseph Schorr
7ee00b83cb
Switch to using a CloseForLongOperation around the sleep
2015-01-29 14:50:07 -05:00
Joseph Schorr
cf35da30bc
Make sure to not hold DB connections open in the new build manager
2015-01-29 14:40:24 -05:00
Jake Moshenko
2e86417329
Allow the buildman server to die if an uncaught exception terminates the scheduler process.
2015-01-29 10:56:57 -05:00
Joseph Schorr
d359c849cd
Add the build worker and job count information to the charts
2015-01-28 17:12:33 -05:00
Jake Moshenko
dd7664328c
Make the build manager ports configurable.
2015-01-05 15:09:03 -05:00
Jake Moshenko
cc70225043
Generalize the ephemeral build managers so that any manager may manage a builder spawned by any other manager.
2014-12-31 11:33:56 -05:00
Jake Moshenko
34bf92673b
Add support for adjusting etcd ttl on job_heartbeat. Switch the heartbeat method to a coroutine.
2014-12-22 17:24:44 -05:00
Jake Moshenko
e53b6b0e21
Merge remote-tracking branch 'origin/master' into ephemeral
2014-12-22 12:14:59 -05:00
Jake Moshenko
12ee8e0fc0
Switch a few of the buildman methods to coroutines in order to support network calls in methods. Add a test for the ephemeral build manager.
2014-12-22 12:14:16 -05:00
Jake Moshenko
2d7e844753
First implementation of ephemeral build lifecycle manager.
2014-12-16 13:41:30 -05:00
Jimmy Zelinskie
33f12c58ba
Add active worker count to buildmanager logs.
2014-12-16 13:37:40 -05:00
Jimmy Zelinskie
09cc4ba4c1
LOGGER -> logger.
...
While logger may be a global variable, it is not constant. Let the
linters complain!
2014-11-30 17:48:38 -05:00