Joseph Schorr
cd1b4042ae
Move SSH keys and other config hard-coded config out of the build worker template
...
In addition to removing our public keys from git, this will also allow additional customization
Fixes https://jira.coreos.com/browse/QUAY-1035
2018-09-05 17:36:01 -04:00
Joseph Schorr
2d6a6a1f6c
Add a timeout to various operations against etcd in the build manager when it cannot connect to etcd
...
This will ensure that the build managers don't simply sit there thrashing against a non-existing cluster, thus driving the CPU up on our production nodes, and thus taking them out of service
Addresses https://jira.coreos.com/browse/QUAY-990
2018-07-08 12:25:33 +03:00
Joseph Schorr
400a5db719
Add additional metrics on executor start and failure
...
This will allow us to register a pager if one of the executors starts failing consistently
2017-11-27 11:52:37 +02:00
Joseph Schorr
ddb1ed7441
Also delete the job key when expiring a job
...
Otherwise, we can't requeue the job
2017-10-11 15:55:35 -04:00
Joseph Schorr
c799367ac4
Make sure expired startup marks build jobs incomplete immediately
...
Currently, we wait for the job to expire, which can take a very long time. We also add yet even more logs, in the attempt to track down the root cause
2017-10-11 14:56:19 -04:00
Antoine Legrand
cdb3722c17
Use $QUAYPATH and $QUAYDIR in conf and init files
2017-07-05 16:23:54 +02:00
Charlton Austin
e30cd931d1
feat(buildtrigger): allow use to specify dockerfile
...
users can only specify the folder and the
dockerfile must be names "Dockerfile" this allows
users to specify the file and it can be called
"Dockerfile" or <some name>.Dockerfile
2017-03-06 21:20:17 -05:00
EvB
cedce6f98b
fix(buildman/ephemeral): remove exception log on noncritical error
2017-02-09 11:32:41 -08:00
Joseph Schorr
b407f88a26
Remove unnecessary CloudWatch metrics
...
They are spamming the API and costing us a lot of money
2017-02-01 13:08:21 -05:00
Evan Cordell
dd5f7cbe6c
Fix the ephemeral build metrics
2016-12-13 18:28:04 -05:00
Charlton Austin
c6be12e31e
Adding in a cancel method to the build component so we can properly clean up the job task.
2016-12-06 13:37:49 -05:00
Charlton Austin
8157c9cf33
Fixing the ttl on etcd.
2016-12-02 13:27:47 -05:00
Charlton Austin
0c7a2e4645
Removing realm key from etcd.
2016-12-02 11:37:56 -05:00
Charlton Austin
8ec14ac3bd
Adding in a delete of the etcd key for cancelled jobs.
2016-12-01 16:03:54 -05:00
Jake Moshenko
f0ef4347e5
Make the redis client use AsyncWrapper and coroutines
...
Change all log messages to be synchronous
2016-11-18 15:59:14 -05:00
Charlton Austin
96173485f8
Merge pull request #2041 from charltonaustin/add_cancel_to_building_build
...
Adding in the behavior for cancelling a build while it is being built.
2016-11-18 11:02:37 -05:00
Charlton Austin
fd7c566d31
Adding in cancel for a build that is building.
2016-11-16 17:40:24 -05:00
Brad Ison
2c59bd9ee5
Set builder hostnames to build UUID
2016-11-15 12:35:48 -08:00
Charlton Austin
83e8d62bea
Merge pull request #2085 from charltonaustin/move_ephemeral_binary
...
Moving the binary location.
2016-11-08 11:42:31 -05:00
Brad Ison
a8c0376c06
Set imagePullPolicy to IfNotPresent for k8s builder
2016-11-07 17:20:40 -05:00
Joseph Schorr
c98472e9f3
Debug log all cases where we mark a build as incomplete in the queue
...
Should help us narrow down why some builds are falling back
2016-11-07 16:13:52 -05:00
Joseph Schorr
ef41e57aad
Add executor-specific setup time support
...
This will allow us to make the setup time TTL for k8s-based builds much lower (on the order of a minute), which means faster timeouts and fallbacks (which is a better user experience).
2016-11-07 15:45:15 -05:00
Charlton Austin
7a2dca9c53
Moving the binary location.
2016-11-04 15:53:43 -04:00
Charlton Austin
bba51787b5
Adding in a new location for the default popen executor.
2016-11-04 14:22:26 -04:00
Joseph Schorr
9f9d32548b
Standardize the internal error logs for better tracking
2016-10-31 13:47:24 -04:00
Charlton Austin
0c2fec8314
Fixing the build
2016-10-27 15:10:03 -04:00
Charlton Austin
2147005d2c
Adding a method of cancelling a build based on etcd message.
2016-10-25 12:50:58 -04:00
Brad Ison
779f0f1b54
Add emptyDir volume to builder pods to mask secrets
...
This adds a empty volume on a tmpfs to builder pods and mounts it over
the directory Kubernetes uses for secrets, which should prevent pods
from having access to the default service account.
2016-10-05 14:27:07 -04:00
Brad Ison
087dca3482
Only set memory request on Kubernetes builds
...
This removes the aboslute limits on Kubernetes builds for now (KVM
will still limit resources) and only sets the memory request as a hint
to the scheduler.
2016-10-04 20:42:51 -04:00
Evan Cordell
3542255db8
buildman: let metric data live longer in etcd
2016-10-04 15:06:46 -04:00
Brad Ison
febf3751c0
Merge pull request #1937 from coreos-inc/k8s-resource-limits
...
Fix kubernetes resource limits
2016-10-04 14:11:46 -04:00
Brad Ison
94a0fee63f
Merge pull request #1916 from coreos-inc/k8s-generate-name
...
Add a dash to generated k8s job names
2016-10-04 11:56:33 -04:00
Brad Ison
cee7c4be96
Fix kubernetes resource limits
2016-10-04 11:56:06 -04:00
Evan Cordell
943a20f042
buildman: linter fixes
2016-10-04 11:44:31 -04:00
Evan Cordell
f3091c6424
Fix the metrics
2016-10-03 17:53:40 -04:00
Evan Cordell
42ebb0a6c3
Record metrics in a separate etcd record
2016-10-03 16:11:37 -04:00
Evan Cordell
d99c206b47
Fix build time metric
2016-10-01 17:25:13 -04:00
Brad Ison
d8aa22103e
Add a dash to generated k8s job names
2016-10-01 14:02:28 -04:00
Evan Cordell
07e23a34ed
Fix metrics
2016-09-30 13:45:45 -04:00
Evan Cordell
68c5384473
Fixes prometheus start metric
2016-09-30 13:09:03 -04:00
josephschorr
fa4588c7d9
Merge pull request #1908 from coreos-inc/fix-build-phase
...
Add missing call to set_phase when a build doesn't start
2016-09-30 17:52:39 +02:00
josephschorr
0c2b4ed9c1
Merge pull request #1897 from coreos-inc/hash-executor-whitelist
...
Add hash-based staged rollout to build executors
2016-09-30 17:52:19 +02:00
Joseph Schorr
f50bb8a1ce
Add missing call to set_phase when a build doesn't start
...
This change fixes the build manager ephemeral executor to tell the overall build server to call set_phase when a build never starts. Before this change, we'd properly adjust the queue item, but not the repo build row or the logs, which is why users just saw "Preparing Build Node", with no indicating the node failed to start.
Fixes #1904
2016-09-30 14:54:49 +02:00
Joseph Schorr
51a519f653
Add hash-based staged rollout to build executors
...
Fixes #1882
2016-09-29 22:48:42 +02:00
Evan Cordell
832ee89923
Add duration metric collector decorator ( #1885 )
...
Track time-to-start for builders
Track time-to-build for builders
Track ec2 builder fallbacks
Track build time
2016-09-29 15:44:06 -04:00
Brad Ison
593c3eb9c7
Set dnsPolicy to Default on k8s build jobs
...
This prevents the builder pods from having resolv.conf pointed at the
kube-dns service, which they won't have access to.
2016-09-29 11:22:11 -04:00
Brad Ison
631ad0422d
Default to 4GB memory for k8s builders
2016-09-29 11:20:49 -04:00
josephschorr
ad4efba802
Merge pull request #1830 from coreos-inc/superuser-dashboard
...
Add prometheus stats to enable better dashboarding
2016-09-26 17:19:22 +02:00
Joseph Schorr
1571b2867a
Add executor name to the build metric
2016-09-16 16:26:04 -04:00
Joseph Schorr
f9f60b9faf
Fix some issues around state in the build managers
...
- Make sure to cleanup the job if the executor could not be started
- Change the setup leeway to further ensure there isn't any crossover between the queue item timing out and the cleanup of the jobs
- Make the lock used for marking jobs as internal error extremely long, but also based on the execution ID. This should ensure we don't get duplicates while allowing different executions to be handled properly.
- Make sure to invoke the callback update for the queue before we run off to etcd; should reduce certain timeouts
Hopefully Fixes #1836
2016-09-15 14:37:45 -04:00