Commit graph

15 commits

Author SHA1 Message Date
Joseph Schorr
a94f657cb7 Add health check for node disk space
If a node runs out of disk space, nginx can no longer swap, and this can cause issues with large pushes

Fixes https://jira.coreos.com/browse/QUAY-1047
2018-09-05 17:57:22 -04:00
Sam Chow
dbce986af6 Setup reroutes when complete, fix gunicorn check 2018-08-31 15:17:48 -04:00
Joseph Schorr
bbdf9e074c Add metrics for tracking when instance key renewal succeeds and fails, as well as when instance key *lookup* fails 2018-02-02 11:14:42 -05:00
Joseph Schorr
c1cc52f58b Add a health check for the instance key
If the key expires or disappears, the node will now go unhealthy, taking it out of service and preventing downtime
2018-02-02 11:14:00 -05:00
Joseph Schorr
e91b83e1be Add instance health checks for all gunicorn workers
Fixes https://jira.coreos.com/browse/QS-121
2018-01-16 11:29:40 -05:00
Joseph Schorr
4ad3682b9c Make health check failures report their reasons
Note that we add a new block with expanded service info, to avoid breaking compatibility with existing callers of the health endpoint
2017-07-19 16:17:02 +03:00
Joseph Schorr
e44a503bd0 Add status check for auth endpoint 2017-07-19 16:17:02 +03:00
Joseph Schorr
7b1dfbb256 yapf 2017-07-11 13:48:55 +03:00
Joseph Schorr
4853634c2f Switch health to use a data interface 2017-07-11 13:48:25 +03:00
Joseph Schorr
310eded8e6 Add a configuration flag for external TLS termination
This is necessary to ensure that we use the correct scheme when conducting health checks, setting cookies, etc.

Fixes #1865
2016-09-22 18:28:57 -04:00
Joseph Schorr
974ab6c42c Add missing arg to validate call and add logging 2016-08-03 11:13:27 -04:00
Joseph Schorr
c30b8dd1ad Add storage validation to the status endpoint
Fixes #1659
2016-08-01 13:02:26 -04:00
Joseph Schorr
c518874ded I hate Redis!
- Remove redis check from our health endpoint in prod entirely
- Have the redis check have a maximum timeout of 1 second
2015-10-22 14:24:42 -04:00
Jake Moshenko
3efaa255e8 Accidental refactor, split out legacy.py into separate sumodules and update all call sites. 2015-07-17 11:56:15 -04:00
Joseph Schorr
b74b7de197 Clean up the health checking code and move the endpoints to /health/instance and /health/endtoend. 2015-01-20 16:53:05 -05:00