Joseph Schorr
6655c7f745
Add exception handling that doesn't log the read-timeout exception
...
Note: This is a *hack* and needs to be replaced with proper code ASAP
2015-06-25 23:35:29 -04:00
Joseph Schorr
6e6610f31a
Switch to a 30s maximum timeout
2015-06-25 23:08:49 -04:00
Joseph Schorr
ecebc06343
Update comment now that restarter is abstracted
2015-06-25 21:53:42 -04:00
Joseph Schorr
9f5f71398c
Abstract out the concept of a restart function
2015-06-25 21:40:50 -04:00
Joseph Schorr
52fa9aad5b
Fix etcd watching
...
Etcd can miss events on watches if they are occurring fast enough, so if we can get an exception indicating that we've missed an index, we reset the state of our local tracking structures by re-reading the *full* list and starting a new watch at HEAD
2015-06-25 21:22:39 -04:00
Joseph Schorr
b4c39e8ec0
Fix ephemeral build manager to ask for watches in index order with no gaps
2015-06-23 17:11:46 -04:00
Joseph Schorr
c2dc1c9b75
Handle case where etcd key is already removed on job complete
2015-06-17 15:02:58 -04:00
Jake Moshenko
c435f5c127
Add a comment about why we are taking a lock when terminating a builder machine.
2015-06-10 16:19:51 -04:00
Jake Moshenko
f767fc4d03
Track whether builders ever came online in etcd. Mark builds which never successfully heartbeated as incomplete.
2015-06-10 16:19:51 -04:00
Jake Moshenko
884fedd229
Improve the log messages in the buildman.
2015-06-10 16:19:51 -04:00
Jake Moshenko
d31e25d5cd
Allow the individual build manager types to specify how long the queue should wait before retring a job that fails to schedule.
2015-06-10 16:19:50 -04:00
Joseph Schorr
f82831bff6
Log the etcd exception so we can debug this issue
2015-06-09 20:33:55 -04:00
Jake Moshenko
6eead7c860
Add logentries reporting to the ephemeral builders.
2015-03-27 15:28:08 -04:00
Jake Moshenko
0349f3f1a3
Handle the case where YAML config returns a list not a tuple.
2015-03-26 14:53:56 -04:00
Jimmy Zelinskie
8589871f43
buildman: rm unused imports
2015-03-09 13:04:16 -04:00
Jake Moshenko
5c68e52fce
Really really fix the exception handling.
2015-02-27 17:33:46 -05:00
Jake Moshenko
cf5bc6f0be
Properly catch multiple exceptions.
2015-02-27 17:32:10 -05:00
Jake Moshenko
857c3e2959
Start catching etcd key errors as well.
2015-02-27 17:10:15 -05:00
Joseph Schorr
4551b3a957
Remove the boto timeout set (doesn't work anyway) and add some better logging to the scheduler
2015-02-25 16:00:14 -05:00
Joseph Schorr
5dd78f76c7
Add additional logging, timeouts, and exception checks
2015-02-25 15:15:22 -05:00
Joseph Schorr
2eaec092f0
Handle the case where we cannot write the tags on the build nodes
2015-02-25 13:47:36 -05:00
Joseph Schorr
afe7e14254
Add better exception handling and logging to the ephemeral build manager
2015-02-25 12:09:14 -05:00
Joseph Schorr
524705b88c
Get dashboard working and upgrade bootstrap. Note: the bootstrap fixes will be coming in the followup CL
2015-02-17 19:15:54 -05:00
Joseph Schorr
98b4f62ef7
Switch to using a squashed image for the build workers
2015-02-10 15:43:01 -05:00
Jake Moshenko
5b8d65991e
Update the space on the builder nodes because its cheap.
2015-02-04 11:58:58 -05:00
Joseph Schorr
361fb33574
- Add a small build script
...
- Take in the build worker branch name from config
- Add additional logging (to be removed after we figure out the problem)
2015-02-03 12:48:41 -05:00
Jake Moshenko
2215ec6669
Associate a public IP with the network interfaces on our VPC instances.
2015-02-02 15:28:40 -05:00
Jake Moshenko
db8493f254
update the executor template to use VPC instances.
2015-02-02 14:55:34 -05:00
Jake Moshenko
a4b0c8698d
Allow the key prefixes in etcd to be configurable.
2015-02-02 12:00:19 -05:00
Jake Moshenko
c308794063
Fix the enterprise manager to use the new coroutine based interface.
2015-01-29 10:56:18 -05:00
Jake Moshenko
ef0806bd9d
Make the logs for the build manager more bearable.
2015-01-26 15:27:39 -05:00
Jake Moshenko
86852da4ba
Catch exceptions when ELB times out a connection to etcd.
2015-01-23 11:29:38 -05:00
Jake Moshenko
265aeabf60
We need to tell the etcd client which protocol to use.
2015-01-22 16:59:04 -05:00
Jake Moshenko
f2471a86f6
Fix the python requirements. Add the ability to map in etcd client certs and ca.
2015-01-22 10:53:23 -05:00
Jake Moshenko
fc757fecad
Tag the EC2 instances with the build uuid.
2015-01-05 15:35:14 -05:00
Jake Moshenko
8037962716
Change the severity of a log message which is actually expected in the happy case.
2015-01-05 14:44:54 -05:00
Jake Moshenko
f58b09a064
Remove the loop argument from the call to build_component_ready.
2015-01-05 13:08:25 -05:00
Jake Moshenko
320ae63ccd
Handle the case where there are no realms registered.
2015-01-05 12:23:54 -05:00
Jake Moshenko
b33ee1a474
Register existing builders to watch their expirations.
2015-01-05 11:21:36 -05:00
Jake Moshenko
a9839021af
When the etcd key tracking realms is first created the action is create, not set.
2014-12-31 11:46:02 -05:00
Jake Moshenko
cc70225043
Generalize the ephemeral build managers so that any manager may manage a builder spawned by any other manager.
2014-12-31 11:33:56 -05:00
Jake Moshenko
ec87e37d8c
EC2 terminate_instances does not take a force flag.
2014-12-23 17:17:53 -05:00
Jake Moshenko
cece94e1da
We want to terminate instances, not stop them.
2014-12-23 16:20:42 -05:00
Jake Moshenko
3ce64b4a7f
We must yield from stop_builder.
2014-12-23 16:12:10 -05:00
Jake Moshenko
8e16fbf59b
The root device on CoreOS is /dev/xvda.
2014-12-23 15:41:58 -05:00
Jake Moshenko
2f2a88825d
Try using SSD for root volumes.
2014-12-23 15:35:21 -05:00
Jake Moshenko
723fb27671
Calls to the ec2 service must be async, and responses must be wrapped as well.
2014-12-23 14:54:58 -05:00
Jake Moshenko
2ed9b3d243
Disable the etcd timeout on watch calls to prevent them from disconnecting the client.
2014-12-23 14:54:34 -05:00
Jake Moshenko
4e22e22ba1
We have to serialize our build data before sending it to etc.
2014-12-23 14:09:04 -05:00
Jake Moshenko
709e571b78
Handle read timeouts from etcd when watching a key.
2014-12-23 12:13:49 -05:00