Commit graph

154 commits

Author SHA1 Message Date
Joseph Schorr
2d6a6a1f6c Add a timeout to various operations against etcd in the build manager when it cannot connect to etcd
This will ensure that the build managers don't simply sit there thrashing against a non-existing cluster, thus driving the CPU up on our production nodes, and thus taking them out of service

Addresses https://jira.coreos.com/browse/QUAY-990
2018-07-08 12:25:33 +03:00
Joseph Schorr
400a5db719 Add additional metrics on executor start and failure
This will allow us to register a pager if one of the executors starts failing consistently
2017-11-27 11:52:37 +02:00
Joseph Schorr
ddb1ed7441 Also delete the job key when expiring a job
Otherwise, we can't requeue the job
2017-10-11 15:55:35 -04:00
Joseph Schorr
c799367ac4 Make sure expired startup marks build jobs incomplete immediately
Currently, we wait for the job to expire, which can take a very long time. We also add yet even more logs, in the attempt to track down the root cause
2017-10-11 14:56:19 -04:00
Antoine Legrand
cdb3722c17 Use $QUAYPATH and $QUAYDIR in conf and init files 2017-07-05 16:23:54 +02:00
Charlton Austin
e30cd931d1 feat(buildtrigger): allow use to specify dockerfile
users can only specify the folder and the
  dockerfile must be names "Dockerfile" this allows
  users to specify the file and it can be called
  "Dockerfile" or <some name>.Dockerfile
2017-03-06 21:20:17 -05:00
EvB
cedce6f98b fix(buildman/ephemeral): remove exception log on noncritical error 2017-02-09 11:32:41 -08:00
Joseph Schorr
b407f88a26 Remove unnecessary CloudWatch metrics
They are spamming the API and costing us a lot of money
2017-02-01 13:08:21 -05:00
Evan Cordell
dd5f7cbe6c Fix the ephemeral build metrics 2016-12-13 18:28:04 -05:00
Charlton Austin
c6be12e31e Adding in a cancel method to the build component so we can properly clean up the job task. 2016-12-06 13:37:49 -05:00
Charlton Austin
8157c9cf33 Fixing the ttl on etcd. 2016-12-02 13:27:47 -05:00
Charlton Austin
0c7a2e4645 Removing realm key from etcd. 2016-12-02 11:37:56 -05:00
Charlton Austin
8ec14ac3bd Adding in a delete of the etcd key for cancelled jobs. 2016-12-01 16:03:54 -05:00
Jake Moshenko
f0ef4347e5 Make the redis client use AsyncWrapper and coroutines
Change all log messages to be synchronous
2016-11-18 15:59:14 -05:00
Charlton Austin
96173485f8 Merge pull request #2041 from charltonaustin/add_cancel_to_building_build
Adding in the behavior for cancelling a build while it is being built.
2016-11-18 11:02:37 -05:00
Charlton Austin
fd7c566d31 Adding in cancel for a build that is building. 2016-11-16 17:40:24 -05:00
Brad Ison
2c59bd9ee5 Set builder hostnames to build UUID 2016-11-15 12:35:48 -08:00
Charlton Austin
83e8d62bea Merge pull request #2085 from charltonaustin/move_ephemeral_binary
Moving the binary location.
2016-11-08 11:42:31 -05:00
Brad Ison
a8c0376c06 Set imagePullPolicy to IfNotPresent for k8s builder 2016-11-07 17:20:40 -05:00
Joseph Schorr
c98472e9f3 Debug log all cases where we mark a build as incomplete in the queue
Should help us narrow down why some builds are falling back
2016-11-07 16:13:52 -05:00
Joseph Schorr
ef41e57aad Add executor-specific setup time support
This will allow us to make the setup time TTL for k8s-based builds much lower (on the order of a minute), which means faster timeouts and fallbacks (which is a better user experience).
2016-11-07 15:45:15 -05:00
Charlton Austin
7a2dca9c53 Moving the binary location. 2016-11-04 15:53:43 -04:00
Charlton Austin
bba51787b5 Adding in a new location for the default popen executor. 2016-11-04 14:22:26 -04:00
Joseph Schorr
9f9d32548b Standardize the internal error logs for better tracking 2016-10-31 13:47:24 -04:00
Charlton Austin
0c2fec8314 Fixing the build 2016-10-27 15:10:03 -04:00
Charlton Austin
2147005d2c Adding a method of cancelling a build based on etcd message. 2016-10-25 12:50:58 -04:00
Brad Ison
779f0f1b54 Add emptyDir volume to builder pods to mask secrets
This adds a empty volume on a tmpfs to builder pods and mounts it over
the directory Kubernetes uses for secrets, which should prevent pods
from having access to the default service account.
2016-10-05 14:27:07 -04:00
Brad Ison
087dca3482 Only set memory request on Kubernetes builds
This removes the aboslute limits on Kubernetes builds for now (KVM
will still limit resources) and only sets the memory request as a hint
to the scheduler.
2016-10-04 20:42:51 -04:00
Evan Cordell
3542255db8 buildman: let metric data live longer in etcd 2016-10-04 15:06:46 -04:00
Brad Ison
febf3751c0 Merge pull request #1937 from coreos-inc/k8s-resource-limits
Fix kubernetes resource limits
2016-10-04 14:11:46 -04:00
Brad Ison
94a0fee63f Merge pull request #1916 from coreos-inc/k8s-generate-name
Add a dash to generated k8s job names
2016-10-04 11:56:33 -04:00
Brad Ison
cee7c4be96 Fix kubernetes resource limits 2016-10-04 11:56:06 -04:00
Evan Cordell
943a20f042 buildman: linter fixes 2016-10-04 11:44:31 -04:00
Evan Cordell
f3091c6424 Fix the metrics 2016-10-03 17:53:40 -04:00
Evan Cordell
42ebb0a6c3 Record metrics in a separate etcd record 2016-10-03 16:11:37 -04:00
Evan Cordell
d99c206b47 Fix build time metric 2016-10-01 17:25:13 -04:00
Brad Ison
d8aa22103e Add a dash to generated k8s job names 2016-10-01 14:02:28 -04:00
Evan Cordell
07e23a34ed Fix metrics 2016-09-30 13:45:45 -04:00
Evan Cordell
68c5384473 Fixes prometheus start metric 2016-09-30 13:09:03 -04:00
josephschorr
fa4588c7d9 Merge pull request #1908 from coreos-inc/fix-build-phase
Add missing call to set_phase when a build doesn't start
2016-09-30 17:52:39 +02:00
josephschorr
0c2b4ed9c1 Merge pull request #1897 from coreos-inc/hash-executor-whitelist
Add hash-based staged rollout to build executors
2016-09-30 17:52:19 +02:00
Joseph Schorr
f50bb8a1ce Add missing call to set_phase when a build doesn't start
This change fixes the build manager ephemeral executor to tell the overall build server to call set_phase when a build never starts. Before this change, we'd properly adjust the queue item, but not the repo build row or the logs, which is why users just saw "Preparing Build Node", with no indicating the node failed to start.

Fixes #1904
2016-09-30 14:54:49 +02:00
Joseph Schorr
51a519f653 Add hash-based staged rollout to build executors
Fixes #1882
2016-09-29 22:48:42 +02:00
Evan Cordell
832ee89923 Add duration metric collector decorator (#1885)
Track time-to-start for builders
Track time-to-build for builders
Track ec2 builder fallbacks
Track build time
2016-09-29 15:44:06 -04:00
Brad Ison
593c3eb9c7 Set dnsPolicy to Default on k8s build jobs
This prevents the builder pods from having resolv.conf pointed at the
kube-dns service, which they won't have access to.
2016-09-29 11:22:11 -04:00
Brad Ison
631ad0422d Default to 4GB memory for k8s builders 2016-09-29 11:20:49 -04:00
josephschorr
ad4efba802 Merge pull request #1830 from coreos-inc/superuser-dashboard
Add prometheus stats to enable better dashboarding
2016-09-26 17:19:22 +02:00
Joseph Schorr
1571b2867a Add executor name to the build metric 2016-09-16 16:26:04 -04:00
Joseph Schorr
f9f60b9faf Fix some issues around state in the build managers
- Make sure to cleanup the job if the executor could not be started
- Change the setup leeway to further ensure there isn't any crossover between the queue item timing out and the cleanup of the jobs
- Make the lock used for marking jobs as internal error extremely long, but also based on the execution ID. This should ensure we don't get duplicates while allowing different executions to be handled properly.
- Make sure to invoke the callback update for the queue before we run off to etcd; should reduce certain timeouts

Hopefully Fixes #1836
2016-09-15 14:37:45 -04:00
Brad Ison
2a1cf2bfd1 Always pull latest image in k8s builds 2016-09-08 15:00:12 -04:00