Matt Jibson
cfb6e884f2
Refactor metric collection
...
This change adds a generic queue onto which metrics can be pushed. A
separate module removes metrics from the queue and adds them to Cloudwatch.
Since these are now separate ideas, we can easily change the consumer from
Cloudwatch to anything else.
This change maintains near feature parity (the only change is there is now
just one queue instead of two - not a big deal).
2015-08-12 12:15:52 -04:00
Jake Moshenko
18100be481
Refactor the util directory to use subpackages.
2015-08-03 16:04:19 -04:00
Jimmy Zelinskie
7dbcbe4706
Merge pull request #234 from coreos-inc/morespace
...
Increase the HD size on the build nodes
2015-07-27 15:35:45 -04:00
Jake Moshenko
3efaa255e8
Accidental refactor, split out legacy.py into separate sumodules and update all call sites.
2015-07-17 11:56:15 -04:00
Joseph Schorr
04cc471585
Increase the HD size on the build nodes
...
Fixes #228
2015-07-14 15:20:17 +03:00
Joseph Schorr
d842881608
Don't None the build_status, as it might still be used later
2015-07-14 12:49:03 +03:00
Joseph Schorr
e06435fee4
Record phase information and make better error messages on pull failure
2015-06-30 18:04:44 +03:00
Joseph Schorr
6655c7f745
Add exception handling that doesn't log the read-timeout exception
...
Note: This is a *hack* and needs to be replaced with proper code ASAP
2015-06-25 23:35:29 -04:00
Joseph Schorr
6e6610f31a
Switch to a 30s maximum timeout
2015-06-25 23:08:49 -04:00
Joseph Schorr
bead839abd
Make sure build components timeout if the initial connection fails
2015-06-25 22:13:01 -04:00
Joseph Schorr
ecebc06343
Update comment now that restarter is abstracted
2015-06-25 21:53:42 -04:00
Joseph Schorr
9f5f71398c
Abstract out the concept of a restart function
2015-06-25 21:40:50 -04:00
Joseph Schorr
52fa9aad5b
Fix etcd watching
...
Etcd can miss events on watches if they are occurring fast enough, so if we can get an exception indicating that we've missed an index, we reset the state of our local tracking structures by re-reading the *full* list and starting a new watch at HEAD
2015-06-25 21:22:39 -04:00
Jimmy Zelinskie
1195e3ec7c
buildman: rm coroutine decorator from subscribers
...
Python isn't able to figure out that these are generators and properly
handle theme.
2015-06-24 17:38:29 -04:00
josephschorr
2ade08468d
Merge pull request #168 from coreos-inc/etcdindex
...
Fix ephemeral build manager to ask for watches in index order with no gaps
2015-06-23 17:12:18 -04:00
Joseph Schorr
b4c39e8ec0
Fix ephemeral build manager to ask for watches in index order with no gaps
2015-06-23 17:11:46 -04:00
Jimmy Zelinskie
18aa7b6c1e
buildcomponent: use consistent trollius imports
2015-06-23 17:03:26 -04:00
Jimmy Zelinskie
197f3b9b85
buildman: fix ER failing to heartbeat
2015-06-22 18:12:20 -04:00
Jimmy Zelinskie
82287926ab
Merge pull request #140 from coreos-inc/eventinfo
...
Add more build information to the events and have better messaging
2015-06-17 16:49:59 -04:00
Joseph Schorr
c2dc1c9b75
Handle case where etcd key is already removed on job complete
2015-06-17 15:02:58 -04:00
Jimmy Zelinskie
177b96e965
builder: add missing 'yield from' coroutine
2015-06-17 14:16:27 -04:00
Jimmy Zelinskie
59aba93514
builder: update heartbeat timestamp on log message
2015-06-17 14:16:27 -04:00
Joseph Schorr
9b974f6b80
Add more build information to the events and have better messaging
...
Fixes #79
2015-06-16 23:16:36 -04:00
Jake Moshenko
c435f5c127
Add a comment about why we are taking a lock when terminating a builder machine.
2015-06-10 16:19:51 -04:00
Jake Moshenko
f767fc4d03
Track whether builders ever came online in etcd. Mark builds which never successfully heartbeated as incomplete.
2015-06-10 16:19:51 -04:00
Jake Moshenko
79f1181a63
Switch build-scheduled to an official build phase.
2015-06-10 16:19:51 -04:00
Jake Moshenko
884fedd229
Improve the log messages in the buildman.
2015-06-10 16:19:51 -04:00
Jake Moshenko
d31e25d5cd
Allow the individual build manager types to specify how long the queue should wait before retring a job that fails to schedule.
2015-06-10 16:19:50 -04:00
Jimmy Zelinskie
b7303665a2
Merge pull request #111 from coreos-inc/incompletefix
...
Requeue build jobs after the work check timeout + some additional padding.
2015-06-09 20:44:40 -04:00
Joseph Schorr
24ce0decd9
Requeue build jobs after the work check timeout + some additional padding. This ensures that if a build somehow gets wedged, other builds can continue to be picked up.
2015-06-09 20:43:48 -04:00
Joseph Schorr
f82831bff6
Log the etcd exception so we can debug this issue
2015-06-09 20:33:55 -04:00
Jimmy Zelinskie
7f4dd7d42f
triggers: backwards compatible schema for metadata
2015-06-02 16:05:17 -04:00
Jimmy Zelinskie
e01bdd4ab0
triggers: metadata.commit_sha -> metadata.commit
...
This resolves an issue where the custom-git trigger's public facing
schema was not the same as the internal metadata schema. Instead of
breaking users, we rework the internal metadata schema to be the same as
the custom-git JSON schema. This commit also updates everything that
used `metadata.commit_sha` including the test database.
2015-06-02 15:32:28 -04:00
Joseph Schorr
5589bfc6d5
- Have the heartbeat fail to update if the worker has timed out
...
- Add additional build component logging for tracking down problems in the future
2015-05-22 15:24:14 -04:00
Jimmy Zelinskie
db05db6295
cloudconfig: flatten logentries container
2015-05-20 16:34:16 -04:00
Joseph Schorr
598fc6ec46
Add the error code to the worker error logged to redis
2015-05-18 15:01:48 -04:00
Joseph Schorr
91b464d0de
Switch build manager to always just WARN on boto
2015-05-18 12:34:26 -04:00
Jimmy Zelinskie
86f400fdf5
buildman: fix btrfs mounting in worker cloudconfig
2015-05-13 17:40:35 -04:00
Jimmy Zelinskie
6a5cecebc5
buildman: create and mount btrfs volume for docker
...
There are numerous issues with overlayfs that actually aren't present with
btrfs. Btrfs seems to have long-running issues, but our builders are
ephemeral. Example issue: https://github.com/docker/docker/issues/10180
2015-05-12 17:42:34 -04:00
Jimmy Zelinskie
9f31bdd571
buildman: add new io.quay.builder.gitfailure error
2015-05-11 15:25:22 -04:00
Jimmy Zelinskie
15fdae6688
buildman: show base error for buildpack failures
...
Whereas before these were reserved only for S3 errors, users need these
specifics to debug custom-git configurations.
2015-05-11 14:18:48 -04:00
Joseph Schorr
31260d50f5
Rename the new images method to a slightly better name
2015-04-24 16:37:37 -04:00
Joseph Schorr
e70343d849
Faster cache lookup by removing a join with the ImagePlacementTable, removing the extra loop to add the locations and filtering the images looked up by the base image
2015-04-24 16:22:19 -04:00
Jimmy Zelinskie
02498d72ba
almost all PR discussion fixes
2015-04-21 18:04:25 -04:00
Jimmy Zelinskie
ba2cb08904
Merge branch 'master' into git
2015-04-16 17:38:35 -04:00
Jake Moshenko
b10fd4ff22
Tell the journal on the builders to listen on the proper socket.
2015-03-27 16:31:35 -04:00
Jake Moshenko
6eead7c860
Add logentries reporting to the ephemeral builders.
2015-03-27 15:28:08 -04:00
Jake Moshenko
0349f3f1a3
Handle the case where YAML config returns a list not a tuple.
2015-03-26 14:53:56 -04:00
Jimmy Zelinskie
cd1b003ca6
buildcomponent: handle builds without resource_key
2015-03-23 15:46:23 -04:00
Jimmy Zelinskie
d29c8d60c7
trigger: pass trigger into manual_start & handle_trigger_request
2015-03-23 12:14:47 -04:00
Jimmy Zelinskie
b851986cf5
add git_url to metadata, add git to buildargs
2015-03-19 18:09:27 -04:00
Jimmy Zelinskie
b35f6ed25c
buildman: add git_key buildconfig parameter
2015-03-16 13:18:18 -04:00
Jimmy Zelinskie
4c8814866c
buildman: add git_url to build_config
2015-03-13 14:58:05 -04:00
Jimmy Zelinskie
8589871f43
buildman: rm unused imports
2015-03-09 13:04:16 -04:00
Jake Moshenko
5c68e52fce
Really really fix the exception handling.
2015-02-27 17:33:46 -05:00
Jake Moshenko
cf5bc6f0be
Properly catch multiple exceptions.
2015-02-27 17:32:10 -05:00
Jake Moshenko
857c3e2959
Start catching etcd key errors as well.
2015-02-27 17:10:15 -05:00
Joseph Schorr
d973f9df45
Reenable metrics until we know they are the problem
2015-02-25 16:00:46 -05:00
Joseph Schorr
bdb84f1c20
Merge branch 'master' of github.com:coreos-inc/quay
2015-02-25 16:00:17 -05:00
Joseph Schorr
4551b3a957
Remove the boto timeout set (doesn't work anyway) and add some better logging to the scheduler
2015-02-25 16:00:14 -05:00
Jimmy Zelinskie
090a198afc
temporarily comment out metrics
2015-02-25 15:29:35 -05:00
Jimmy Zelinskie
db79ad2dde
unused import
2015-02-25 15:26:36 -05:00
Joseph Schorr
5dd78f76c7
Add additional logging, timeouts, and exception checks
2015-02-25 15:15:22 -05:00
Jimmy Zelinskie
328de0201f
Merge branch 'master' of github.com:coreos-inc/quay
2015-02-25 13:56:05 -05:00
Jimmy Zelinskie
346d6b933a
buildman: initialize queuemetrics asynchronously
2015-02-25 13:55:18 -05:00
Joseph Schorr
2eaec092f0
Handle the case where we cannot write the tags on the build nodes
2015-02-25 13:47:36 -05:00
Joseph Schorr
390f8df4ad
Make sure the build manager dies on an unhandled schedule exception
2015-02-25 12:19:21 -05:00
Joseph Schorr
afe7e14254
Add better exception handling and logging to the ephemeral build manager
2015-02-25 12:09:14 -05:00
Joseph Schorr
b7901d2adb
Add trigger metadata (which includes the SHA) and the built image_id to the event data
2015-02-24 15:13:51 -05:00
Jimmy Zelinskie
47f8cb77c4
Merge pull request #11 from coreos-inc/nimbus
...
CloudWatch for build job status
2015-02-18 17:17:28 -05:00
Jimmy Zelinskie
9ab3554226
buildreporter: does not execute in a coroutine!
2015-02-18 17:11:45 -05:00
Jimmy Zelinskie
0d38e0b00b
metrics: use config['name'] to get metric conf
2015-02-18 16:05:36 -05:00
Jimmy Zelinskie
f53dea46b7
buildman: address PR #11 comments
2015-02-18 14:13:36 -05:00
Joseph Schorr
524705b88c
Get dashboard working and upgrade bootstrap. Note: the bootstrap fixes will be coming in the followup CL
2015-02-17 19:15:54 -05:00
Jimmy Zelinskie
5790d7d8cc
buildman: build_metrics call correct method
2015-02-17 17:03:12 -05:00
Jimmy Zelinskie
1a71925125
buildreporter: remove unused logging
2015-02-17 17:02:37 -05:00
Jimmy Zelinskie
85edb651e2
buildserver: remove pylint comments
2015-02-17 15:32:25 -05:00
Jimmy Zelinskie
d70c95e42e
buildreporter: move reporting into server callback
2015-02-17 15:31:53 -05:00
Jimmy Zelinskie
25fc999d50
buildreporter: handle app=None
2015-02-17 15:30:09 -05:00
Jimmy Zelinskie
b8d9ef0fe9
buildman: remove old create_task for queue metrics
2015-02-17 14:18:32 -05:00
Jimmy Zelinskie
935db5c766
buildman: clarify queue metrics from job state metrics
2015-02-17 12:23:08 -05:00
Jimmy Zelinskie
ffb897dfe6
buildman: add job status logging to managers
2015-02-17 12:22:23 -05:00
Jimmy Zelinskie
ca0d2b1721
buildreporter: getattr method
2015-02-17 12:21:22 -05:00
Jimmy Zelinskie
0a00453024
buildreporter: rm pylint comments
2015-02-17 12:20:46 -05:00
Jimmy Zelinskie
0e7418ffce
buildman: add BuildMetrics and BuildReporter
2015-02-17 10:56:09 -05:00
Joseph Schorr
fbdbc21eb1
Merge branch 'master' into quark
2015-02-13 16:24:53 -05:00
Jimmy Zelinskie
6a3d269574
buildman: update metrics task
2015-02-13 11:25:29 -05:00
Joseph Schorr
ae8bb5fc13
Add preparing build node status item and change the build status colors to be variations on a blue color
2015-02-12 16:38:43 -05:00
Joseph Schorr
f84d1bad45
Handle internal errors in a better fashion: If a build would be marked as internal error, only do so if there are retries remaining. Otherwise, we mark it as failed (since it won't be rebuilt anyway)
2015-02-12 16:19:44 -05:00
Joseph Schorr
f107b50a46
Merge branch 'master' into ackbar
2015-02-12 12:04:45 -05:00
Joseph Schorr
f796c281d5
Remove support for v0.2
2015-02-11 17:12:53 -05:00
Joseph Schorr
e1a15464a1
Fix typo, add some logging and fix command comparison
2015-02-11 16:02:36 -05:00
Joseph Schorr
893ae46dec
Add an ImageTree class and change to searching *all applicable* branches when looking for the best cache tag.
2015-02-10 21:46:58 -05:00
Joseph Schorr
98b4f62ef7
Switch to using a squashed image for the build workers
2015-02-10 15:43:01 -05:00
Joseph Schorr
045614c6c8
Merge branch 'master' into ackbar
2015-02-09 17:16:42 -05:00
Joseph Schorr
6b9464c999
Add support for 0.3 (the new builder version)
2015-02-09 16:59:21 -05:00
Joseph Schorr
9f1ec9d47d
Fix loading of partial caching under a tag
2015-02-09 16:29:15 -05:00
Joseph Schorr
b0e315c332
Fix issues in cache comment comparison
2015-02-09 15:48:36 -05:00
Joseph Schorr
9b0e43514b
Fix typos
2015-02-09 14:53:18 -05:00
Joseph Schorr
384d0eba6f
Fix cache command argument
2015-02-09 14:12:24 -05:00
Joseph Schorr
6cb1212da6
Add logging
2015-02-09 13:54:14 -05:00
Joseph Schorr
4310f47dee
Some code cleanup in the cached tag determination code
2015-02-09 12:16:43 -05:00
Joseph Schorr
0065ac8503
Add back in the cache checking code and remove the old 0.1 build pack code
2015-02-09 12:13:40 -05:00
Joseph Schorr
48949627e0
Merge master in delta
2015-02-09 12:07:43 -05:00
Joseph Schorr
9dfe523615
Merge master changes
2015-02-05 13:11:16 -05:00
Jimmy Zelinskie
c7c5377285
Add my key back to the ephemeral builder machines.
2015-02-05 12:51:02 -05:00
Joseph Schorr
5fedd74399
Remove Jake's key
2015-02-04 21:31:26 -05:00
Jake Moshenko
a952d0b1ce
Merge branch 'master' of github.com:coreos-inc/quay
2015-02-04 11:59:27 -05:00
Jake Moshenko
5b8d65991e
Update the space on the builder nodes because its cheap.
2015-02-04 11:58:58 -05:00
Joseph Schorr
9ffb53cd47
Add support for v2 of the build worker, which performs the Dockerfile parsing on its own. Note that this version is backwards compatible with v1-beta of the build worker, so it should be pushed first. Also note that this version is temporary until such time as we get the caching branches merged.
2015-02-03 21:05:18 -05:00
Joseph Schorr
a1938593a9
Better handling of retries on build errors
2015-02-03 16:29:47 -05:00
Joseph Schorr
3bf5e93f06
Remove log statement
2015-02-03 16:06:23 -05:00
Joseph Schorr
d709e0f64a
Fix the new notifications code to work
2015-02-03 13:08:38 -05:00
Joseph Schorr
07e85324e9
- Add build notifications back in
...
- Fix spelling mistake
- Add the sha output as part of the build script
2015-02-03 13:01:42 -05:00
Joseph Schorr
361fb33574
- Add a small build script
...
- Take in the build worker branch name from config
- Add additional logging (to be removed after we figure out the problem)
2015-02-03 12:48:41 -05:00
Jake Moshenko
2215ec6669
Associate a public IP with the network interfaces on our VPC instances.
2015-02-02 15:28:40 -05:00
Jake Moshenko
db8493f254
update the executor template to use VPC instances.
2015-02-02 14:55:34 -05:00
Jake Moshenko
3687419ab3
Change a typo to an enum
2015-02-02 12:24:32 -05:00
Jake Moshenko
a4b0c8698d
Allow the key prefixes in etcd to be configurable.
2015-02-02 12:00:19 -05:00
Joseph Schorr
0875d3dce1
Merge branch 'master' of https://github.com/coreos-inc/quay
2015-01-29 18:40:49 -05:00
Joseph Schorr
3872d29de9
Add a transaction around the extend_processing call
2015-01-29 18:40:41 -05:00
Jake Moshenko
fb533a1f4c
Merge branch 'master' of github.com:coreos-inc/quay
2015-01-29 18:40:24 -05:00
Jake Moshenko
8e85ff63f1
Add everyones ssh keys to the ephemeral build workers.
2015-01-29 18:40:17 -05:00
Jake Moshenko
63d23a04c0
Make the loop pause when we run out of builder capacity.
2015-01-29 18:40:01 -05:00
Joseph Schorr
838bfe23b1
Remove retries update in the extend processing call and make sure it is under a transaction
2015-01-29 18:33:17 -05:00
Joseph Schorr
a6fa08c19c
Change returns to trollius returns
2015-01-29 18:21:32 -05:00
Joseph Schorr
0e5f6dc17d
Fix typo in timed out
2015-01-29 18:13:31 -05:00
Joseph Schorr
60eae43ae4
Add the date time to the log entries
2015-01-29 18:05:05 -05:00
Joseph Schorr
ce3f8b438c
Fix pull credentials bug, fix job details parse bug and add some better logging
2015-01-29 18:01:42 -05:00
Joseph Schorr
7ee00b83cb
Switch to using a CloseForLongOperation around the sleep
2015-01-29 14:50:07 -05:00
Joseph Schorr
cf35da30bc
Make sure to not hold DB connections open in the new build manager
2015-01-29 14:40:24 -05:00
Jake Moshenko
2e86417329
Allow the buildman server to die if an uncaught exception terminates the scheduler process.
2015-01-29 10:56:57 -05:00
Jake Moshenko
c308794063
Fix the enterprise manager to use the new coroutine based interface.
2015-01-29 10:56:18 -05:00
Joseph Schorr
d359c849cd
Add the build worker and job count information to the charts
2015-01-28 17:12:33 -05:00
Jake Moshenko
0ddfd07749
Use the tiny registry-build-worker image. Bind mount in the root certificates so that Quay SSL certificates can be calidated.
2015-01-27 14:12:47 -05:00
Jake Moshenko
ef0806bd9d
Make the logs for the build manager more bearable.
2015-01-26 15:27:39 -05:00
Joseph Schorr
be6701b310
Have the builder not start and stop, over and over, if not enabled
2015-01-26 14:13:55 -05:00
Jake Moshenko
86852da4ba
Catch exceptions when ELB times out a connection to etcd.
2015-01-23 11:29:38 -05:00
Jake Moshenko
725808a4f8
Make the logs from the build manager more useful.
2015-01-23 11:29:15 -05:00
Jake Moshenko
265aeabf60
We need to tell the etcd client which protocol to use.
2015-01-22 16:59:04 -05:00
Jake Moshenko
f2471a86f6
Fix the python requirements. Add the ability to map in etcd client certs and ca.
2015-01-22 10:53:23 -05:00
Jake Moshenko
fc757fecad
Tag the EC2 instances with the build uuid.
2015-01-05 15:35:14 -05:00
Jake Moshenko
dd7664328c
Make the build manager ports configurable.
2015-01-05 15:09:03 -05:00
Jake Moshenko
8037962716
Change the severity of a log message which is actually expected in the happy case.
2015-01-05 14:44:54 -05:00
Jake Moshenko
f58b09a064
Remove the loop argument from the call to build_component_ready.
2015-01-05 13:08:25 -05:00
Jake Moshenko
320ae63ccd
Handle the case where there are no realms registered.
2015-01-05 12:23:54 -05:00
Jake Moshenko
b33ee1a474
Register existing builders to watch their expirations.
2015-01-05 11:21:36 -05:00
Jake Moshenko
a9839021af
When the etcd key tracking realms is first created the action is create, not set.
2014-12-31 11:46:02 -05:00
Jake Moshenko
cc70225043
Generalize the ephemeral build managers so that any manager may manage a builder spawned by any other manager.
2014-12-31 11:33:56 -05:00
Jake Moshenko
ccb19571d6
Try lowering the sleep on the shutdown timeout to avoid the service dispatch timeout built into systemd.
2014-12-23 17:42:47 -05:00