Commit graph

138 commits

Author SHA1 Message Date
Brad Ison
2c59bd9ee5 Set builder hostnames to build UUID 2016-11-15 12:35:48 -08:00
Charlton Austin
83e8d62bea Merge pull request #2085 from charltonaustin/move_ephemeral_binary
Moving the binary location.
2016-11-08 11:42:31 -05:00
Brad Ison
a8c0376c06 Set imagePullPolicy to IfNotPresent for k8s builder 2016-11-07 17:20:40 -05:00
Joseph Schorr
c98472e9f3 Debug log all cases where we mark a build as incomplete in the queue
Should help us narrow down why some builds are falling back
2016-11-07 16:13:52 -05:00
Joseph Schorr
ef41e57aad Add executor-specific setup time support
This will allow us to make the setup time TTL for k8s-based builds much lower (on the order of a minute), which means faster timeouts and fallbacks (which is a better user experience).
2016-11-07 15:45:15 -05:00
Charlton Austin
7a2dca9c53 Moving the binary location. 2016-11-04 15:53:43 -04:00
Charlton Austin
bba51787b5 Adding in a new location for the default popen executor. 2016-11-04 14:22:26 -04:00
Joseph Schorr
9f9d32548b Standardize the internal error logs for better tracking 2016-10-31 13:47:24 -04:00
Charlton Austin
0c2fec8314 Fixing the build 2016-10-27 15:10:03 -04:00
Charlton Austin
2147005d2c Adding a method of cancelling a build based on etcd message. 2016-10-25 12:50:58 -04:00
Brad Ison
779f0f1b54 Add emptyDir volume to builder pods to mask secrets
This adds a empty volume on a tmpfs to builder pods and mounts it over
the directory Kubernetes uses for secrets, which should prevent pods
from having access to the default service account.
2016-10-05 14:27:07 -04:00
Brad Ison
087dca3482 Only set memory request on Kubernetes builds
This removes the aboslute limits on Kubernetes builds for now (KVM
will still limit resources) and only sets the memory request as a hint
to the scheduler.
2016-10-04 20:42:51 -04:00
Evan Cordell
3542255db8 buildman: let metric data live longer in etcd 2016-10-04 15:06:46 -04:00
Brad Ison
febf3751c0 Merge pull request #1937 from coreos-inc/k8s-resource-limits
Fix kubernetes resource limits
2016-10-04 14:11:46 -04:00
Brad Ison
94a0fee63f Merge pull request #1916 from coreos-inc/k8s-generate-name
Add a dash to generated k8s job names
2016-10-04 11:56:33 -04:00
Brad Ison
cee7c4be96 Fix kubernetes resource limits 2016-10-04 11:56:06 -04:00
Evan Cordell
943a20f042 buildman: linter fixes 2016-10-04 11:44:31 -04:00
Evan Cordell
f3091c6424 Fix the metrics 2016-10-03 17:53:40 -04:00
Evan Cordell
42ebb0a6c3 Record metrics in a separate etcd record 2016-10-03 16:11:37 -04:00
Evan Cordell
d99c206b47 Fix build time metric 2016-10-01 17:25:13 -04:00
Brad Ison
d8aa22103e Add a dash to generated k8s job names 2016-10-01 14:02:28 -04:00
Evan Cordell
07e23a34ed Fix metrics 2016-09-30 13:45:45 -04:00
Evan Cordell
68c5384473 Fixes prometheus start metric 2016-09-30 13:09:03 -04:00
josephschorr
fa4588c7d9 Merge pull request #1908 from coreos-inc/fix-build-phase
Add missing call to set_phase when a build doesn't start
2016-09-30 17:52:39 +02:00
josephschorr
0c2b4ed9c1 Merge pull request #1897 from coreos-inc/hash-executor-whitelist
Add hash-based staged rollout to build executors
2016-09-30 17:52:19 +02:00
Joseph Schorr
f50bb8a1ce Add missing call to set_phase when a build doesn't start
This change fixes the build manager ephemeral executor to tell the overall build server to call set_phase when a build never starts. Before this change, we'd properly adjust the queue item, but not the repo build row or the logs, which is why users just saw "Preparing Build Node", with no indicating the node failed to start.

Fixes #1904
2016-09-30 14:54:49 +02:00
Joseph Schorr
51a519f653 Add hash-based staged rollout to build executors
Fixes #1882
2016-09-29 22:48:42 +02:00
Evan Cordell
832ee89923 Add duration metric collector decorator (#1885)
Track time-to-start for builders
Track time-to-build for builders
Track ec2 builder fallbacks
Track build time
2016-09-29 15:44:06 -04:00
Brad Ison
593c3eb9c7 Set dnsPolicy to Default on k8s build jobs
This prevents the builder pods from having resolv.conf pointed at the
kube-dns service, which they won't have access to.
2016-09-29 11:22:11 -04:00
Brad Ison
631ad0422d Default to 4GB memory for k8s builders 2016-09-29 11:20:49 -04:00
josephschorr
ad4efba802 Merge pull request #1830 from coreos-inc/superuser-dashboard
Add prometheus stats to enable better dashboarding
2016-09-26 17:19:22 +02:00
Joseph Schorr
1571b2867a Add executor name to the build metric 2016-09-16 16:26:04 -04:00
Joseph Schorr
f9f60b9faf Fix some issues around state in the build managers
- Make sure to cleanup the job if the executor could not be started
- Change the setup leeway to further ensure there isn't any crossover between the queue item timing out and the cleanup of the jobs
- Make the lock used for marking jobs as internal error extremely long, but also based on the execution ID. This should ensure we don't get duplicates while allowing different executions to be handled properly.
- Make sure to invoke the callback update for the queue before we run off to etcd; should reduce certain timeouts

Hopefully Fixes #1836
2016-09-15 14:37:45 -04:00
Brad Ison
2a1cf2bfd1 Always pull latest image in k8s builds 2016-09-08 15:00:12 -04:00
Joseph Schorr
e67b95ae04 Change log level of an expected log message 2016-08-31 17:25:54 -04:00
Joseph Schorr
e17e0e4172 Add log for when the job key is written 2016-08-30 14:08:56 -04:00
Joseph Schorr
292abb5395 Better handling and logging of exceptions in build manager
Also increases the setup timeout for EC2
2016-08-30 13:52:36 -04:00
Joseph Schorr
cd2d0341a7 Fix k8s builder to use the declared volume size
Fixes #1773
2016-08-29 15:16:28 -04:00
Joseph Schorr
bc670611ef Increase the timeout on the atomic lock
Some nodes were still performing the action twice when falling outside of the 30s window
2016-08-23 12:50:38 -04:00
Joseph Schorr
3112388004 Fix multiple reporting of incomplete 2016-08-17 16:01:28 -04:00
Joseph Schorr
5e1a117ff3 Delete the job first to prevent Kubernetes from starting another pod 2016-08-16 16:33:43 -04:00
Joseph Schorr
742e153133 Fix watch of the jobs key in the build manager 2016-08-16 15:43:09 -04:00
Joseph Schorr
313d65a6a4 Make sure the etcd watch coroutines get called 2016-08-16 13:02:27 -04:00
Joseph Schorr
d78361b041 Cleanup old executions that never start
Fixes #1727
2016-08-15 16:54:02 -04:00
Joseph Schorr
c29f9ccc7f Fix TTL on heartbeat in etcd
Until now, once the heartbeat has expired, we would issue a TTL that is negative, which causes etcd to either raise an exception or simply ignore the expiration (depending on the version of etcd). This change ensures that once the key is expired, it is removed immediately via a set of a TTL of 0. Also adds tests for this case and the normal expiration case.
2016-08-03 11:15:03 -04:00
Joseph Schorr
428a7cb435 Fix decreased setup timeout on ephemeral build manager 2016-07-22 13:35:38 -04:00
Joseph Schorr
392242d20b Another fix for the record keeping in buildman
Adds some more mocked tests as well
2016-07-22 12:01:30 -04:00
Joseph Schorr
68baa51d55 Fix cross-manager handling of realm components 2016-07-21 15:47:25 -04:00
Joseph Schorr
4420b1bac9 Add temporary back-compat shims for the build manager 2016-07-20 13:41:01 -04:00
Joseph Schorr
2c1880b944 Bug fixes, refactoring and "new" tests for the build manager
- Fixes various bugs introduced in the most recent build system commit
- Refactors state management in the build manager to be cleaner and more contained
- Adds back in the mock-based tests, fixed to not use threads and adjusted for the refactoring
- Adds some more simplified unit tests around non-etch related flows
2016-07-18 13:46:48 -04:00