Commit graph

28 commits

Author SHA1 Message Date
Joseph Schorr
a94f657cb7 Add health check for node disk space
If a node runs out of disk space, nginx can no longer swap, and this can cause issues with large pushes

Fixes https://jira.coreos.com/browse/QUAY-1047
2018-09-05 17:57:22 -04:00
Sam Chow
dbce986af6 Setup reroutes when complete, fix gunicorn check 2018-08-31 15:17:48 -04:00
Joseph Schorr
bbdf9e074c Add metrics for tracking when instance key renewal succeeds and fails, as well as when instance key *lookup* fails 2018-02-02 11:14:42 -05:00
Joseph Schorr
c1cc52f58b Add a health check for the instance key
If the key expires or disappears, the node will now go unhealthy, taking it out of service and preventing downtime
2018-02-02 11:14:00 -05:00
Joseph Schorr
e91b83e1be Add instance health checks for all gunicorn workers
Fixes https://jira.coreos.com/browse/QS-121
2018-01-16 11:29:40 -05:00
Joseph Schorr
b7d6bb12fa Hide extended health check information behind superuser permission or a session property
Also adds an endpoint that (when specified with the proper secret), sets the session property
2017-07-19 16:17:02 +03:00
Joseph Schorr
4ad3682b9c Make health check failures report their reasons
Note that we add a new block with expanded service info, to avoid breaking compatibility with existing callers of the health endpoint
2017-07-19 16:17:02 +03:00
Joseph Schorr
e44a503bd0 Add status check for auth endpoint 2017-07-19 16:17:02 +03:00
Joseph Schorr
7b1dfbb256 yapf 2017-07-11 13:48:55 +03:00
Joseph Schorr
4853634c2f Switch health to use a data interface 2017-07-11 13:48:25 +03:00
Joseph Schorr
310eded8e6 Add a configuration flag for external TLS termination
This is necessary to ensure that we use the correct scheme when conducting health checks, setting cookies, etc.

Fixes #1865
2016-09-22 18:28:57 -04:00
Joseph Schorr
974ab6c42c Add missing arg to validate call and add logging 2016-08-03 11:13:27 -04:00
Joseph Schorr
c30b8dd1ad Add storage validation to the status endpoint
Fixes #1659
2016-08-01 13:02:26 -04:00
Joseph Schorr
d1699e75b7 Add missing constructor argument 2016-07-06 16:17:02 -04:00
Joseph Schorr
7fddc61b8f Add instance key ID to the health check endpoint
Fixes #1429
2016-07-05 14:14:22 -04:00
Joseph Schorr
45fe46d619 Add RDSAwareHealthCheck as alias for ProductionHealthCheck 2016-03-25 15:25:42 -04:00
Joseph Schorr
e03058cf6f Add missing arg 2015-10-22 15:57:34 -04:00
Joseph Schorr
c518874ded I hate Redis!
- Remove redis check from our health endpoint in prod entirely
- Have the redis check have a maximum timeout of 1 second
2015-10-22 14:24:42 -04:00
Joseph Schorr
fd3a21fba9 Add Kubernetes configuration provider which writes config to a secret
Fixes #145
2015-09-10 12:19:59 -04:00
Jake Moshenko
3efaa255e8 Accidental refactor, split out legacy.py into separate sumodules and update all call sites. 2015-07-17 11:56:15 -04:00
Joseph Schorr
e23f1e9ded Fix the DB health check
Make sure to search for the proper DB identifier
2015-05-20 17:40:43 -04:00
Joseph Schorr
1cce87b136 Add is_testing info and mirror the moved endpoints so we can migrate safely. 2015-01-20 16:58:29 -05:00
Joseph Schorr
b74b7de197 Clean up the health checking code and move the endpoints to /health/instance and /health/endtoend. 2015-01-20 16:53:05 -05:00
Joseph Schorr
93708d0131 Add the registry value to the other returned health value 2015-01-14 23:41:30 -05:00
Joseph Schorr
a4de476a85 Have the health check also ping the registry endpoint to make sure it is functional. 2015-01-14 23:39:58 -05:00
Jimmy Zelinskie
716d7a737b Strip whitespace from ALL the things. 2014-11-24 16:07:38 -05:00
Joseph Schorr
aed7e67a17 Clarify the health checking logic and remove the accidental inclusion of the override 2014-11-02 15:42:59 -05:00
Joseph Schorr
98602a2d0c Add a new configurable health check, to make sure production instances are not taken down by Redis or non-local DB issues 2014-11-02 15:06:17 -05:00