I am running a cluster on Google Kubernetes Engine and I am currently trying to switch from using an Ingress with external load balancing (and NodePort services) to an ingress with container-native load balancing (and ClusterIP services) following this documentation: Container native load balancing
To communicate with my services I am using the following ingress configuration that used to work just fine when using NodePort services instead of ClusterIP:
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
name: mw-ingress
annotations:
kubernetes.io/ingress.global-static-ip-name: mw-cluster-ip
networking.gke.io/managed-certificates: mw-certificate
kubernetes.io/ingress.allow-http: "false"
spec:
rules:
- http:
paths:
- path: /*
backend:
serviceName: billing-frontend-service
servicePort: 80
- path: /auth/api/*
backend:
serviceName: auth-service
servicePort: 8083
Now following the documentation, instead of using a readinessProbe as a part of the container deployment as a health check I switched to using ClusterIP services in combination with BackendConfig instead. For each deployment I am using a service like this:
apiVersion: v1
kind: Service
metadata:
labels:
app: auth
name: auth-service
namespace: default
annotations:
cloud.google.com/backend-config: '{"default": "auth-hc-config"}'
spec:
type: ClusterIP
selector:
app: auth
ports:
- port: 8083
protocol: TCP
targetPort: 8083
And a Backend config:
apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
name: auth-hc-config
spec:
healthCheck:
checkIntervalSec: 10
port: 8083
type: http
requestPath: /auth/health
As a reference, this is what the readinessProbe used to look like before:
readinessProbe:
failureThreshold: 3
httpGet:
path: /auth/health
port: 8083
scheme: HTTP
periodSeconds: 10
Now to the actual problem. I deploy the containers and services first and they seem to startup just fine. The ingress however does not seem to pick up the health checks properly and shows this in the Cloud console:
Error during sync: error running backend syncing routine: error ensuring health check: googleapi: Error 400: Invalid value for field 'resource.httpHealthCheck': ''. HTTP healthCheck missing., invalid
The cluster as well as the node pool are running GKE version 1.17.6-gke.11 so the annotation cloud.google.com/neg: '{"ingress": true}' is not necessary. I have checked and the service is annotated correctly:
Annotations: cloud.google.com/backend-config: {"default": "auth-hc-config"}
cloud.google.com/neg: {"ingress":true}
cloud.google.com/neg-status: {"network_endpoint_groups":{"8083":"k8s1-2078beeb-default-auth-service-8083-16a14039"},"zones":["europe-west3-b"]}
I have already tried to re-create the cluster and the node-pool with no effect. Any ideas on how to resolve this? Am I missing an additional health check somewhere?
I found my issue. Apparently the BackendConfig's type attribute is case-sensitive. Once I changed it from http to HTTP it worked after I recreated the ingress.
I’ve already seen this question; AFAIK I’m doing everything in the answers there.
Using GKE, I’ve deployed a GCP HTTP(S) load balancer-based ingress for a kubernetes cluster containing two almost identical deployments: production and development instances of the same application.
I set up a dedicated port on each pod template to use for health checks by the load balancer so that they are not impacted by redirects from the root path on the primary HTTP port. However, the health checks are consistently failing.
From these docs I added a readinessProbe parameter to my deployments, which the load balancer seems to be ignoring completely.
I’ve verified that the server on :p-ready (9292; the dedicated health check port) is running correctly using the following (in separate terminals):
➜ kubectl port-forward deployment/d-an-server p-ready
➜ curl http://localhost:9292/ -D -
HTTP/1.1 200 OK
content-length: 0
date: Wed, 26 Feb 2020 01:21:55 GMT
What have I missed?
A couple notes on the below configs:
The ${...} variables below are filled by the build script as part of deployment.
The second service (s-an-server-dev) is almost an exact duplicate of the first (with it’s own deployment) just with -dev suffixes on the names and labels.
Deployment
apiVersion: "apps/v1"
kind: "Deployment"
metadata:
name: "d-an-server"
namespace: "default"
labels:
app: "a-an-server"
spec:
replicas: 1
selector:
matchLabels:
app: "a-an-server"
template:
metadata:
labels:
app: "a-an-server"
spec:
containers:
- name: "c-an-server-app"
image: "gcr.io/${PROJECT_ID}/an-server-app:${SHORT_SHA}"
ports:
- name: "p-http"
containerPort: 8080
- name: "p-ready"
containerPort: 9292
readinessProbe:
httpGet:
path: "/"
port: "p-ready"
initialDelaySeconds: 30
Service
apiVersion: "v1"
kind: "Service"
metadata:
name: "s-an-server"
namespace: "default"
spec:
ports:
- port: 8080
targetPort: "p-http"
protocol: "TCP"
name: "sp-http"
selector:
app: "a-an-server"
type: "NodePort"
Ingress
apiVersion: "networking.k8s.io/v1beta1"
kind: "Ingress"
metadata:
name: "primary-ingress"
annotations:
kubernetes.io/ingress.global-static-ip-name: "primary-static-ipv4"
networking.gke.io/managed-certificates: "appname-production-cert,appname-development-cert"
spec:
rules:
- host: "appname.example.com"
http:
paths:
- backend:
serviceName: "s-an-server"
servicePort: "sp-http"
- host: "dev.appname.example.com"
http:
paths:
- backend:
serviceName: "s-an-server-dev"
servicePort: "sp-http-dev"
I think what's happening here is GKE ingress is not at all informed of port 9292. You are referring sp-http in the ingress which refers to port 8080.
You need to make sure of below:
1.The service's targetPort field must point to the pod port's containerPort value or name.
2.The readiness probe must be exposed on the port matching the servicePort specified in the Ingress.
After I create a very basic Ingress (yaml below) with no special definition of health checks (nor any in other Kubernetes objects), the Ingress creates a GCP Load Balancer.
Why does this LB have two health checks ("backend services") defined against different nodePorts, one against root path / and one against /healthz? I would expect to see only one.
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: ingress1
spec:
rules:
- host: example.com
http:
paths:
- backend:
serviceName: myservice
servicePort: 80
Reason for / health check
One of the limitation of GKE ingress controller is below:
For the GKE ingress controller to use your readinessProbes as health checks, the Pods for an Ingress must exist at the time of Ingress creation. If your replicas are scaled to 0 or Pods don't exist when the ingress is created, the default health check using / applies.
Because of the above it created two health checks.
There are lot of caveats for the health-check from readiness probe to work:
The pod's containerPort field must be defined
The service's targetPort field must point to the pod port's
containerPort value or name. Note that the targetPort defaults to the
port value if not defined
The pods must exist at the time of ingress creation
The readiness probe must be exposed on the port matching the
servicePort specified in the Ingress
The readiness probe cannot have special requirements like headers The
probe timeouts are translated to GCE health check timeouts
Based on above it makes sense to have a default fallback health check using /
Reason for /healthz health check
As per this FAQ all GCE URL maps require at least one default backend, which handles all requests that don't match a host/path. In Ingress, the default backend is optional, since the resource is cross-platform and not all platforms require a default backend. If you don't specify one in your yaml, the GCE ingress controller will inject the default-http-backend Service that runs in the kube-system namespace as the default backend for the GCE HTTP lb allocated for that Ingress resource.
Some caveats concerning the default backend:
It is the only Backend Service that doesn't directly map to a user
specified NodePort Service
It's created when the first Ingress is created, and deleted when the
last Ingress is deleted, since we don't want to waste quota if the
user is not going to need L7 loadbalancing through Ingress
It has a HTTP health check pointing at /healthz, not the default /,
because / serves a 404 by design
So GKE ingress deploys a default backend service using below yaml and setup a /healthz healthcheck for that service.
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: l7-default-backend
namespace: kube-system
labels:
k8s-app: glbc
kubernetes.io/name: "GLBC"
kubernetes.io/cluster-service: "true"
spec:
replicas: 1
selector:
matchLabels:
k8s-app: glbc
template:
metadata:
labels:
k8s-app: glbc
name: glbc
spec:
containers:
- name: default-http-backend
# Any image is permissible as long as:
# 1. It serves a 404 page at /
# 2. It serves 200 on a /healthz endpoint
image: k8s.gcr.io/defaultbackend-amd64:1.5
livenessProbe:
httpGet:
path: /healthz
port: 8080
scheme: HTTP
initialDelaySeconds: 30
timeoutSeconds: 5
ports:
- containerPort: 8080
resources:
limits:
cpu: 10m
memory: 20Mi
requests:
cpu: 10m
memory: 20Mi
---
apiVersion: v1
kind: Service
metadata:
# This must match the --default-backend-service argument of the l7 lb
# controller and is required because GCE mandates a default backend.
name: default-http-backend
namespace: kube-system
labels:
k8s-app: glbc
kubernetes.io/cluster-service: "true"
kubernetes.io/name: "GLBCDefaultBackend"
spec:
# The default backend must be of type NodePort.
type: NodePort
ports:
- port: 80
targetPort: 8080
protocol: TCP
name: http
selector:
k8s-app: glbc
When I create a GCE ingress, Google Load Balancer does not set the health check from the readiness probe. According to the docs (Ingress GCE health checks) it should pick it up.
Expose an arbitrary URL as a readiness probe on the pods backing the Service.
Any ideas why?
Deployment:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: frontend-prod
labels:
app: frontend-prod
spec:
selector:
matchLabels:
app: frontend-prod
replicas: 3
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
labels:
app: frontend-prod
spec:
imagePullSecrets:
- name: regcred
containers:
- image: app:latest
readinessProbe:
httpGet:
path: /healthcheck
port: 3000
initialDelaySeconds: 15
periodSeconds: 5
name: frontend-prod-app
- env:
- name: PASSWORD_PROTECT
value: "1"
image: nginx:latest
readinessProbe:
httpGet:
path: /health
port: 80
initialDelaySeconds: 5
periodSeconds: 5
name: frontend-prod-nginx
Service:
apiVersion: v1
kind: Service
metadata:
name: frontend-prod
labels:
app: frontend-prod
spec:
type: NodePort
ports:
- port: 80
targetPort: 80
protocol: TCP
name: http
selector:
app: frontend-prod
Ingress:
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: frontend-prod-ingress
annotations:
kubernetes.io/ingress.global-static-ip-name: frontend-prod-ip
spec:
tls:
- secretName: testsecret
backend:
serviceName: frontend-prod
servicePort: 80
So apparently, you need to include the container port on the PodSpec.
Does not seem to be documented anywhere.
e.g.
spec:
containers:
- name: nginx
image: nginx:1.7.9
ports:
- containerPort: 80
Thanks, Brian! https://github.com/kubernetes/ingress-gce/issues/241
This is now possible in the latest GKE (I am on 1.14.10-gke.27, not sure if that matters)
Define a readinessProbe on your container in your Deployment.
Recreate your Ingress.
The health check will point to the path in readinessProbe.httpGet.path of the Deployment yaml config.
Update by Jonathan Lin below: This has been fixed very recently. Define a readinessProbe on the Deployment. Recreate your Ingress. It will pick up the health check path from the readinessProbe.
GKE Ingress health check path is currently not configurable. You can go to http://console.cloud.google.com (UI) and visit Load Balancers list to see the health check it uses.
Currently the health check for an Ingress is GET / on each backend: specified on the Ingress. So all your apps behind a GKE Ingress must return HTTP 200 OK to GET / requests.
That said, the health checks you specified on your Pods are still being used ––by the kubelet to make sure your Pod is actually functioning and healthy.
Google has recently added support for CRD that can configure your Backend Services along with healthchecks:
apiVersion: cloud.google.com/v1beta1
kind: BackendConfig
metadata:
name: backend-config
namespace: prod
spec:
healthCheck:
checkIntervalSec: 30
port: 8080
type: HTTP #case-sensitive
requestPath: /healthcheck
See here.
Another reason why Google Cloud Load Balancer does not pick-up GCE health check configuration from Kubernetes Pod readiness probe could be that the service is configured as "selectorless" (the selector attribute is empty and you manage endpoints directly).
This is the case with e.g. kube-lego: see https://github.com/jetstack/kube-lego/issues/68#issuecomment-303748457 and https://github.com/jetstack/kube-lego/issues/68#issuecomment-327457982.
Original question does have selector specified in the service, so this hint doesn't apply. This hints serves visitors that have the same problem with a different cause.
I am trying to setup an Ingress in GCE Kubernetes. But when I visit the IP address and path combination defined in the Ingress, I keep getting the following 502 error:
Here is what I get when I run: kubectl describe ing --namespace dpl-staging
Name: dpl-identity
Namespace: dpl-staging
Address: 35.186.221.153
Default backend: default-http-backend:80 (10.0.8.5:8080)
TLS:
dpl-identity terminates
Rules:
Host Path Backends
---- ---- --------
*
/api/identity/* dpl-identity:4000 (<none>)
Annotations:
https-forwarding-rule: k8s-fws-dpl-staging-dpl-identity--5fc40252fadea594
https-target-proxy: k8s-tps-dpl-staging-dpl-identity--5fc40252fadea594
url-map: k8s-um-dpl-staging-dpl-identity--5fc40252fadea594
backends: {"k8s-be-31962--5fc40252fadea594":"HEALTHY","k8s-be-32396--5fc40252fadea594":"UNHEALTHY"}
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
15m 15m 1 {loadbalancer-controller } Normal ADD dpl-staging/dpl-identity
15m 15m 1 {loadbalancer-controller } Normal CREATE ip: 35.186.221.153
15m 6m 4 {loadbalancer-controller } Normal Service no user specified default backend, using system default
I think the problem is dpl-identity:4000 (<none>). Shouldn't I see the IP address of the dpl-identity service instead of <none>?
Here is my service description: kubectl describe svc --namespace dpl-staging
Name: dpl-identity
Namespace: dpl-staging
Labels: app=dpl-identity
Selector: app=dpl-identity
Type: NodePort
IP: 10.3.254.194
Port: http 4000/TCP
NodePort: http 32396/TCP
Endpoints: 10.0.2.29:8000,10.0.2.30:8000
Session Affinity: None
No events.
Also, here is the result of executing: kubectl describe ep -n dpl-staging dpl-identity
Name: dpl-identity
Namespace: dpl-staging
Labels: app=dpl-identity
Subsets:
Addresses: 10.0.2.29,10.0.2.30
NotReadyAddresses: <none>
Ports:
Name Port Protocol
---- ---- --------
http 8000 TCP
No events.
Here is my deployment.yaml:
apiVersion: v1
kind: Secret
metadata:
namespace: dpl-staging
name: dpl-identity
type: Opaque
data:
tls.key: <base64 key>
tls.crt: <base64 crt>
---
apiVersion: v1
kind: Service
metadata:
namespace: dpl-staging
name: dpl-identity
labels:
app: dpl-identity
spec:
type: NodePort
ports:
- port: 4000
targetPort: 8000
protocol: TCP
name: http
selector:
app: dpl-identity
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
namespace: dpl-staging
name: dpl-identity
labels:
app: dpl-identity
annotations:
kubernetes.io/ingress.allow-http: "false"
spec:
tls:
- secretName: dpl-identity
rules:
- http:
paths:
- path: /api/identity/*
backend:
serviceName: dpl-identity
servicePort: 4000
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
namespace: dpl-staging
name: dpl-identity
kind: Ingress
metadata:
namespace: dpl-staging
name: dpl-identity
labels:
app: dpl-identity
annotations:
kubernetes.io/ingress.allow-http: "false"
spec:
tls:
- secretName: dpl-identity
rules:
- http:
paths:
- path: /api/identity/*
backend:
serviceName: dpl-identity
servicePort: 4000
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
namespace: dpl-staging
name: dpl-identity
labels:
app: dpl-identity
spec:
replicas: 2
strategy:
type: RollingUpdate
template:
metadata:
labels:
app: dpl-identity
spec:
containers:
- image: gcr.io/munpat-container-engine/dpl/identity:0.4.9
name: dpl-identity
ports:
- containerPort: 8000
name: http
volumeMounts:
- name: dpl-identity
mountPath: /data
volumes:
- name: dpl-identity
secret:
secretName: dpl-identity
Your backend k8s-be-32396--5fc40252fadea594 is showing as "UNHEALTHY".
Ingress will not forward traffic if the backend is UNHEALTHY, this will result in the 502 error you are seeing.
It will be being marked as UNHEALTHY becuase it is not passing it's health check, you can check the health check setting for k8s-be-32396--5fc40252fadea594 to see if they are appropriate for your pod, it may be polling an URI or port that is not returning a 200 response. You can find these setting under Compute Engine > Health Checks.
If they are correct then there are many steps between your browser and the container that could be passing traffic incorrectly, you could try kubectl exec -it PODID -- bash (or ash if you are using Alpine) and then try curl-ing localhost to see if the container is responding as expected, if it is and the health checks are also configured correctly then this would narrow down the issue to likely be with your service, you could then try changing the service from a NodePort type to a LoadBalancer and see if hitting the service IP directly from your browser works.
I was having the same issue. It turns out I had to wait a few minutes before ingress to validate the service health. If someone is going to the same and done all the steps like readinessProbe and linvenessProbe, just ensure your ingress is pointing to a service that is either a NodePort, and wait a few minutes until the yellow warning icon turns into a green one. Also, check the log on StackDriver to get a better idea of what's going on. My readinessProbe and livenessProbe is on /login, for the gce class. So I don't think it has to be on /healthz.
Issue is indeed a health check and seemed "random" for my apps where I used name-based virtual hosts to reverse proxy requests from ingress via domains to two separate backend services. Both were secured using Lets Encrypt and kube-lego. My solution was to standardize the path for health checks for all services sharing an ingress, and declare the readinessProbe and livenessProbe configs in my deployment.yml file.
I faced this issue with Google cloud cluster node version 1.7.8 and found this issue that closely-resembled what I experienced:
* https://github.com/jetstack/kube-lego/issues/27
I'm using gce and kube-lego and my backend service health checks were on / and kube-lego is on /healthz. It appears differing paths for health checks with gce ingress might be the cause so it may be worth updating backend services to match the /healthz pattern so all use same (or as one commenter in Github issue stated they updated kube-lego to pass on /).
I had the same problem, and it persisted after I enabled livenessProbe as well readinessPorbe.
It turned this was to do with basic auth. I've added basic auth to livenessProbe and the readinessPorbe, but turns out the GCE HTTP(S) load balancer doesn't have a configuration option for that.
There seem to be a few another kind of issue with too, e.g. setting container port to 8080 and service port to 80 didn't work with GKE ingress controller (yet I wouldn't clearly indicate what the problem was). And broadly, it looks to me like there is very little visibility and running your own ingress container is a better option with respect to visibility.
I picked Traefik for my project, it worked out of the box, and I'd like to enable Let's Encrypt integration. The only change I had to make to Traefik manifests was about tweaking the service object to disabling access to the UI from outside of the cluster and expose my app with through external load balancer (GCE TCP LB). Also, Traefik is more native to Kubernetes. I tried Heptio Contour, but something didn't work out of the box (will give it a go next time when the new version comes out).
I had the same issue. I turned out that the pod itself was running ok, which I tested via port-forwarding and accessing the health-check URL.
Port-Forward can be activated in console as follows:
$ kubectl port-forward <pod-name> local-port:pod-port
So if the pod is running ok and ingress still shows unhealthy state there might be an issue with your service configuration. In my case my app-selector was incorrect, causing the selection of a non existent pod. Interestingly this isn't showed as an errors or alerts in google console.
Definition of the pods:
#pod-definition.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: <pod-name>
namespace: <namespace>
spec:
selector:
matchLabels:
app: **<pod-name>**
template:
metadata:
labels:
app: <pod-name>
spec:
#spec-definition follows
#service.yaml
apiVersion: v1
kind: Service
metadata:
name: <name-of-service-here>
namespace: <namespace>
spec:
type: NodePort
selector:
app: **<pod-name>**
ports:
- protocol: TCP
port: 8080
targetPort: 8080
name: <port-name-here>
The "Limitations" section of the kubernetes documentation states that:
All Kubernetes services must serve a 200 page on '/', or whatever custom value you've specified through GLBC's --health-check-path argument.
https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/cluster-loadbalancing/glbc#limitations
I solved the problem by
Remove the service from ingress definition
Deploy ingress kubectl apply -f ingress.yaml
Add the service to ingress definition
Deploy ingress again
Essentially, I followed Roy's advice and tried to turn it off and on again.
Log can read from Stackdriver Logging, in my case, it is backend_timeout error. After increase the default timeout (30s) via BackendConfig, it stop returning 502 even under load.
More on:
https://cloud.google.com/kubernetes-engine/docs/how-to/configure-backend-service#creating_a_backendconfig
I've fixed this issue after adding the following readiness and liveness probe with successThreshold: 1 and failureThreshold: 3 . Also i kept initialDelaySeconds to 70 because sometime an application responds bit late , it may vary per application.
NOTE: Also ensure that the path in httpGet should exist in your application(like in my case /api/books) other wise GCP pings /healthz path and doesn't guarantee to return 200 OK .
readinessProbe:
httpGet:
path: /api/books
port: 80
periodSeconds: 5
successThreshold: 1
failureThreshold: 3
initialDelaySeconds: 70
timeoutSeconds: 60
livenessProbe:
httpGet:
path: /api/books
port: 80
initialDelaySeconds: 70
periodSeconds: 5
successThreshold: 1
failureThreshold: 3
timeoutSeconds: 60
I could able to sort out after struggling a lot and tried many things.
Keep Learn & Share
I had the same issue when I was using a wrong image and the request couldn't be satisfied as the configurations were different.