Jake Moshenko
f0ef4347e5
Make the redis client use AsyncWrapper and coroutines
...
Change all log messages to be synchronous
2016-11-18 15:59:14 -05:00
Charlton Austin
96173485f8
Merge pull request #2041 from charltonaustin/add_cancel_to_building_build
...
Adding in the behavior for cancelling a build while it is being built.
2016-11-18 11:02:37 -05:00
Charlton Austin
fd7c566d31
Adding in cancel for a build that is building.
2016-11-16 17:40:24 -05:00
Brad Ison
2c59bd9ee5
Set builder hostnames to build UUID
2016-11-15 12:35:48 -08:00
Charlton Austin
83e8d62bea
Merge pull request #2085 from charltonaustin/move_ephemeral_binary
...
Moving the binary location.
2016-11-08 11:42:31 -05:00
Brad Ison
a8c0376c06
Set imagePullPolicy to IfNotPresent for k8s builder
2016-11-07 17:20:40 -05:00
Joseph Schorr
c98472e9f3
Debug log all cases where we mark a build as incomplete in the queue
...
Should help us narrow down why some builds are falling back
2016-11-07 16:13:52 -05:00
Joseph Schorr
ef41e57aad
Add executor-specific setup time support
...
This will allow us to make the setup time TTL for k8s-based builds much lower (on the order of a minute), which means faster timeouts and fallbacks (which is a better user experience).
2016-11-07 15:45:15 -05:00
Charlton Austin
7a2dca9c53
Moving the binary location.
2016-11-04 15:53:43 -04:00
Charlton Austin
bba51787b5
Adding in a new location for the default popen executor.
2016-11-04 14:22:26 -04:00
Joseph Schorr
9f9d32548b
Standardize the internal error logs for better tracking
2016-10-31 13:47:24 -04:00
Charlton Austin
0c2fec8314
Fixing the build
2016-10-27 15:10:03 -04:00
Charlton Austin
2147005d2c
Adding a method of cancelling a build based on etcd message.
2016-10-25 12:50:58 -04:00
Brad Ison
779f0f1b54
Add emptyDir volume to builder pods to mask secrets
...
This adds a empty volume on a tmpfs to builder pods and mounts it over
the directory Kubernetes uses for secrets, which should prevent pods
from having access to the default service account.
2016-10-05 14:27:07 -04:00
Brad Ison
087dca3482
Only set memory request on Kubernetes builds
...
This removes the aboslute limits on Kubernetes builds for now (KVM
will still limit resources) and only sets the memory request as a hint
to the scheduler.
2016-10-04 20:42:51 -04:00
Evan Cordell
3542255db8
buildman: let metric data live longer in etcd
2016-10-04 15:06:46 -04:00
Brad Ison
febf3751c0
Merge pull request #1937 from coreos-inc/k8s-resource-limits
...
Fix kubernetes resource limits
2016-10-04 14:11:46 -04:00
Brad Ison
94a0fee63f
Merge pull request #1916 from coreos-inc/k8s-generate-name
...
Add a dash to generated k8s job names
2016-10-04 11:56:33 -04:00
Brad Ison
cee7c4be96
Fix kubernetes resource limits
2016-10-04 11:56:06 -04:00
Evan Cordell
943a20f042
buildman: linter fixes
2016-10-04 11:44:31 -04:00
Evan Cordell
f3091c6424
Fix the metrics
2016-10-03 17:53:40 -04:00
Evan Cordell
42ebb0a6c3
Record metrics in a separate etcd record
2016-10-03 16:11:37 -04:00
Evan Cordell
d99c206b47
Fix build time metric
2016-10-01 17:25:13 -04:00
Brad Ison
d8aa22103e
Add a dash to generated k8s job names
2016-10-01 14:02:28 -04:00
Evan Cordell
07e23a34ed
Fix metrics
2016-09-30 13:45:45 -04:00
Evan Cordell
68c5384473
Fixes prometheus start metric
2016-09-30 13:09:03 -04:00
josephschorr
fa4588c7d9
Merge pull request #1908 from coreos-inc/fix-build-phase
...
Add missing call to set_phase when a build doesn't start
2016-09-30 17:52:39 +02:00
josephschorr
0c2b4ed9c1
Merge pull request #1897 from coreos-inc/hash-executor-whitelist
...
Add hash-based staged rollout to build executors
2016-09-30 17:52:19 +02:00
Joseph Schorr
f50bb8a1ce
Add missing call to set_phase when a build doesn't start
...
This change fixes the build manager ephemeral executor to tell the overall build server to call set_phase when a build never starts. Before this change, we'd properly adjust the queue item, but not the repo build row or the logs, which is why users just saw "Preparing Build Node", with no indicating the node failed to start.
Fixes #1904
2016-09-30 14:54:49 +02:00
Joseph Schorr
51a519f653
Add hash-based staged rollout to build executors
...
Fixes #1882
2016-09-29 22:48:42 +02:00
Evan Cordell
832ee89923
Add duration metric collector decorator ( #1885 )
...
Track time-to-start for builders
Track time-to-build for builders
Track ec2 builder fallbacks
Track build time
2016-09-29 15:44:06 -04:00
Brad Ison
593c3eb9c7
Set dnsPolicy to Default on k8s build jobs
...
This prevents the builder pods from having resolv.conf pointed at the
kube-dns service, which they won't have access to.
2016-09-29 11:22:11 -04:00
Brad Ison
631ad0422d
Default to 4GB memory for k8s builders
2016-09-29 11:20:49 -04:00
josephschorr
ad4efba802
Merge pull request #1830 from coreos-inc/superuser-dashboard
...
Add prometheus stats to enable better dashboarding
2016-09-26 17:19:22 +02:00
Joseph Schorr
1571b2867a
Add executor name to the build metric
2016-09-16 16:26:04 -04:00
Joseph Schorr
f9f60b9faf
Fix some issues around state in the build managers
...
- Make sure to cleanup the job if the executor could not be started
- Change the setup leeway to further ensure there isn't any crossover between the queue item timing out and the cleanup of the jobs
- Make the lock used for marking jobs as internal error extremely long, but also based on the execution ID. This should ensure we don't get duplicates while allowing different executions to be handled properly.
- Make sure to invoke the callback update for the queue before we run off to etcd; should reduce certain timeouts
Hopefully Fixes #1836
2016-09-15 14:37:45 -04:00
Brad Ison
2a1cf2bfd1
Always pull latest image in k8s builds
2016-09-08 15:00:12 -04:00
Joseph Schorr
e67b95ae04
Change log level of an expected log message
2016-08-31 17:25:54 -04:00
Joseph Schorr
e17e0e4172
Add log for when the job key is written
2016-08-30 14:08:56 -04:00
Joseph Schorr
292abb5395
Better handling and logging of exceptions in build manager
...
Also increases the setup timeout for EC2
2016-08-30 13:52:36 -04:00
Joseph Schorr
cd2d0341a7
Fix k8s builder to use the declared volume size
...
Fixes #1773
2016-08-29 15:16:28 -04:00
Joseph Schorr
bc670611ef
Increase the timeout on the atomic lock
...
Some nodes were still performing the action twice when falling outside of the 30s window
2016-08-23 12:50:38 -04:00
Joseph Schorr
3112388004
Fix multiple reporting of incomplete
2016-08-17 16:01:28 -04:00
Joseph Schorr
5e1a117ff3
Delete the job first to prevent Kubernetes from starting another pod
2016-08-16 16:33:43 -04:00
Joseph Schorr
742e153133
Fix watch of the jobs key in the build manager
2016-08-16 15:43:09 -04:00
Joseph Schorr
313d65a6a4
Make sure the etcd watch coroutines get called
2016-08-16 13:02:27 -04:00
Joseph Schorr
d78361b041
Cleanup old executions that never start
...
Fixes #1727
2016-08-15 16:54:02 -04:00
Joseph Schorr
c29f9ccc7f
Fix TTL on heartbeat in etcd
...
Until now, once the heartbeat has expired, we would issue a TTL that is negative, which causes etcd to either raise an exception or simply ignore the expiration (depending on the version of etcd). This change ensures that once the key is expired, it is removed immediately via a set of a TTL of 0. Also adds tests for this case and the normal expiration case.
2016-08-03 11:15:03 -04:00
Joseph Schorr
428a7cb435
Fix decreased setup timeout on ephemeral build manager
2016-07-22 13:35:38 -04:00
Joseph Schorr
392242d20b
Another fix for the record keeping in buildman
...
Adds some more mocked tests as well
2016-07-22 12:01:30 -04:00