Matt Jibson
3d9acf2fff
Use prometheus as a metric backend
...
This entails writing a metric aggregation program since each worker has its
own memory, and thus own metrics because of python gunicorn. The python
client is a simple wrapper that makes web requests to it.
2016-07-01 14:16:50 -04:00
Joseph Schorr
1173192739
Move channel back, as it is referenced by generate_cloud_config
2016-06-22 17:25:06 -04:00
Joseph Schorr
61695eb439
Allow the build node AMI to be overridden in config
2016-06-22 15:13:54 -04:00
josephschorr
20a6fdc73f
Merge pull request #1557 from jzelinskie/buildargs
...
buildman: mark missing buildargs as failure
2016-06-20 14:40:17 -04:00
Jimmy Zelinskie
871c1634ed
buildman: mark missing buildargs as failure
2016-06-17 18:33:54 -04:00
Joseph Schorr
7292524d69
Add a cloud watch metric when we fail to start a build via EC2
...
Fixes #1555
2016-06-17 16:19:57 -04:00
Jimmy Zelinskie
5298452fa7
builder cloudconfig: shutdown server after 3 hours ( #1554 )
2016-06-17 16:03:40 -04:00
Joseph Schorr
f9469a84b3
Make the size of the build node HDD configurable
...
Fixes #1520
2016-06-06 11:35:10 -04:00
Jimmy Zelinskie
7d356c451b
buildman: fix misspell
2016-06-03 15:42:14 -04:00
Jimmy Zelinskie
44b56ae2cf
queue: explicitly declare ordering requirement
...
This change defaults the ordering requirement of queue items to be off
and only enables it for the build manager. This should make the queries
for getting queueitems significantly faster for every other use case.
2016-05-27 14:44:30 -04:00
Jimmy Zelinskie
79aa78906a
buildman: refresh and add Evan's key to builders
2016-05-24 14:05:39 -04:00
Joseph Schorr
5262535945
Boto error_code is a string, not the HTTP status code
2015-12-23 15:12:01 -05:00
Jimmy Zelinskie
601b99a083
buildman: add git checkout failure
2015-12-16 14:49:37 -05:00
Joseph Schorr
773e73861f
Change error into info in build manager
...
Fixes #1046
2015-12-09 14:30:14 -05:00
josephschorr
c06e5cc9c7
Merge pull request #1002 from coreos-inc/buildertagexc
...
Add timeout and failure if an EC2 instance could not be found when ta…
2015-12-09 14:28:31 -05:00
Joseph Schorr
946e5fabc0
Add timeout and failure if an EC2 instance could not be found when tagging
...
Fixes #994
2015-12-09 14:28:19 -05:00
Joseph Schorr
edd9a03af5
Catch additional key not found exception
...
Fixes #806
2015-12-01 12:29:58 -05:00
Joseph Schorr
fbc4927544
Change to only exception logging internal errors on builds
...
Fixes #993
2015-11-30 14:30:55 -05:00
Jake Moshenko
c4b637521c
Remove Matt Jibson's public key
2015-11-23 18:18:42 -05:00
Matt Jibson
2325328bbd
Update mjibson ssh key
2015-11-06 15:34:52 -05:00
Jimmy Zelinskie
e973289397
Revert "Revert "Merge pull request #682 from jzelinskie/revertrevert""
...
This reverts commit 278bc736e3
.
2015-10-23 15:26:33 -04:00
Jimmy Zelinskie
278bc736e3
Revert "Merge pull request #682 from jzelinskie/revertrevert"
...
This reverts commit 627ad25c9c
, reversing
changes made to 31c392fecc
.
2015-10-22 16:02:07 -04:00
Jimmy Zelinskie
46b2f10d7f
check for VPC subnet ID before using builder VPC
...
This means you can use legacy networking machines by simply changing the
instance type and removing the specified 'EC2_VPC_SUBNET_ID' from the
executor config.
2015-10-22 14:50:54 -04:00
Jimmy Zelinskie
39cfe77d42
Revert "Merge pull request #557 from coreos-inc/revert-migration"
...
This reverts commit c4f938898a
, reversing
changes made to 7ad2522dbe
.
2015-10-21 15:29:57 -04:00
Joseph Schorr
0f37e66cc8
Better error handling for the build manager
...
Fixes #604
2015-10-13 11:40:07 -04:00
Matt Jibson
87cc3289a0
Remove transaction from metric reporting
2015-10-06 01:28:43 -04:00
Joseph Schorr
752d05dedb
Add exception logging to the build manager
...
Fixes #547
2015-09-30 15:49:35 -04:00
Joseph Schorr
2d3092b826
Make build system resistant to Redis being broken
...
Fixes #549
2015-09-30 15:15:10 -04:00
Silas Sewell
9000169b53
Revert "Merge pull request #491 from jakedt/migratebackp2"
...
This reverts commit 7ad2522dbe
, reversing
changes made to a0b191ffa1
.
2015-09-28 16:09:22 -04:00
josephschorr
7ad2522dbe
Merge pull request #491 from jakedt/migratebackp2
...
Migrate image data back phase 2
2015-09-26 15:11:46 -04:00
Matt Jibson
bba1557437
Monitor queue adds and EC2 node starts
...
fixes #157
see #304
2015-09-18 16:21:16 -04:00
Jake Moshenko
8baacd2741
Migrate old data to new locations, read only new.
2015-09-17 15:47:13 -04:00
Jimmy Zelinskie
cb6b6c4091
buildman: add silas keys to builders
2015-09-09 16:53:19 -04:00
Jimmy Zelinskie
0365831015
add barakmich, quentin, mjibson keys to builders
...
Fixes coreos-inc/quay-policies#38
2015-08-27 11:42:53 -04:00
Jimmy Zelinskie
239f76d39f
Merge pull request #368 from coreos-inc/buildarchive
...
Allow builds to be started with an external archive URL
2015-08-17 17:09:14 -04:00
Joseph Schorr
f092c00621
Allow builds to be started with an external archive URL
...
Fixes #114
2015-08-17 17:01:49 -04:00
Matt Jibson
cfb6e884f2
Refactor metric collection
...
This change adds a generic queue onto which metrics can be pushed. A
separate module removes metrics from the queue and adds them to Cloudwatch.
Since these are now separate ideas, we can easily change the consumer from
Cloudwatch to anything else.
This change maintains near feature parity (the only change is there is now
just one queue instead of two - not a big deal).
2015-08-12 12:15:52 -04:00
Jake Moshenko
18100be481
Refactor the util directory to use subpackages.
2015-08-03 16:04:19 -04:00
Jimmy Zelinskie
7dbcbe4706
Merge pull request #234 from coreos-inc/morespace
...
Increase the HD size on the build nodes
2015-07-27 15:35:45 -04:00
Jake Moshenko
3efaa255e8
Accidental refactor, split out legacy.py into separate sumodules and update all call sites.
2015-07-17 11:56:15 -04:00
Joseph Schorr
04cc471585
Increase the HD size on the build nodes
...
Fixes #228
2015-07-14 15:20:17 +03:00
Joseph Schorr
d842881608
Don't None the build_status, as it might still be used later
2015-07-14 12:49:03 +03:00
Joseph Schorr
e06435fee4
Record phase information and make better error messages on pull failure
2015-06-30 18:04:44 +03:00
Joseph Schorr
6655c7f745
Add exception handling that doesn't log the read-timeout exception
...
Note: This is a *hack* and needs to be replaced with proper code ASAP
2015-06-25 23:35:29 -04:00
Joseph Schorr
6e6610f31a
Switch to a 30s maximum timeout
2015-06-25 23:08:49 -04:00
Joseph Schorr
bead839abd
Make sure build components timeout if the initial connection fails
2015-06-25 22:13:01 -04:00
Joseph Schorr
ecebc06343
Update comment now that restarter is abstracted
2015-06-25 21:53:42 -04:00
Joseph Schorr
9f5f71398c
Abstract out the concept of a restart function
2015-06-25 21:40:50 -04:00
Joseph Schorr
52fa9aad5b
Fix etcd watching
...
Etcd can miss events on watches if they are occurring fast enough, so if we can get an exception indicating that we've missed an index, we reset the state of our local tracking structures by re-reading the *full* list and starting a new watch at HEAD
2015-06-25 21:22:39 -04:00
Jimmy Zelinskie
1195e3ec7c
buildman: rm coroutine decorator from subscribers
...
Python isn't able to figure out that these are generators and properly
handle theme.
2015-06-24 17:38:29 -04:00