GCP load balancer backend status unknown - kubernetes

I'm flabbergasted.
I have a staging and production environment. Both environments have the same deployments, services, ingress, firewall rules, and both serve a 200 on /.
However, after turning on the staging environment and provisioning the same ingress, the staging service fails with Some backend services are in UNKNOWN state. Production is still live.
Both the frontend and backend pods are ready on GKE. I've manually tested the health checks and they pass when I visit /.
I see nothing in the logs or gcp docs pointing in the right direction. What could I have possibly broken?
ingress.yaml:
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: fanout-ingress
annotations:
kubernetes.io/ingress.global-static-ip-name: "STATIC-IP"
spec:
backend:
serviceName: frontend
servicePort: 8080
tls:
- hosts:
- <DOMAIN>
secretName: staging-tls
rules:
- host: <DOMAIN>
http:
paths:
- path: /*
backend:
serviceName: frontend
servicePort: 8080
- path: /backend/*
backend:
serviceName: backend
servicePort: 8080
frontend.yaml:
apiVersion: v1
kind: Service
metadata:
labels:
app: frontend
name: frontend
namespace: default
spec:
ports:
- nodePort: 30664
port: 8080
protocol: TCP
targetPort: 8080
selector:
app: frontend
type: NodePort
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
generation: 15
labels:
app: frontend
name: frontend
namespace: default
spec:
progressDeadlineSeconds: 600
replicas: 1
selector:
matchLabels:
app: frontend
minReadySeconds: 5
template:
metadata:
labels:
app: frontend
spec:
containers:
- image: <our-image>
name: frontend
ports:
- containerPort: 8080
protocol: TCP
readinessProbe:
httpGet:
path: /
port: 8080
initialDelaySeconds: 60
periodSeconds: 30
timeoutSeconds: 3
livenessProbe:
httpGet:
path: /
port: 8080
initialDelaySeconds: 60
periodSeconds: 30
timeoutSeconds: 3

Yesterday even this guide https://cloud.google.com/kubernetes-engine/docs/tutorials/http-balancer
didn't work. Don't know what happened but even waiting 30minutes + the ingress was reporting UNKNOWN state for backends .
After 24 hours, things seem to be much better. L7 http ingress works but with big delay on reporting healthy backends.

I think this is a bug. I created a new cluster and couldn't reproduce. If anyone hits this again, I would suggest trying a new cluster.

If it started happening after altering the scalability settings of your cluster:
Deleting and re-creating the Ingress resource might help - in my case it fixed it almost immediately.
Steps I followed:
kubectl delete ingress <faulty_ingress>
kubectl apply -f <my_ingress.yaml>

What worked for me was deleting and recreating the BackendConfig.

Are you still experiencing this issue?
I tried to reproduce following the Google public documentation on: Setting up HTTP Load Balancing with Ingressto deploy:
A web App using the sample web application container image that listens on a HTTP server on port 8080:
However, it seems to be working now. So if you are still having this issue, please consider filing a public issue against the kubernetes/ingress-gce using the Google issue-tracking tool. Include as much details as possible, including steps to reproduce for so that this issue can get a better visibility as well as more sampling.
Please note:
The Issue Tracker User Content and Conduct Policy details the types of information that are inappropriate for submitting to Issue Tracker which includes things like sensitive personal information and spam. Please do not submit inappropriate content in Issue Tracker.
Repo Output $ kubectl describe ing
sunny#test-dev:~$ kubectl describe ing basic-ingress
Name: basic-ingress
Namespace: default
Address: xx.xxx.xxx.228
Default backend: web:8080 (10.8.2.6:8080)
Rules:
Host Path Backends
---- ---- --------
* * web:8080 (10.8.2.6:8080)
Annotations:
target-proxy: k8s-tp-default-basic-ingress--f5636f071d87exxx
url-map: k8s-um-default-basic-ingress--f5636f071d87exxx
backends: {"k8s-be-31544--f5636f071d87exxx":"HEALTHY"}
forwarding-rule: k8s-fw-default-basic-ingress--f5636f071d87exxx
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Service 7m (x376 over 2d) loadbalancer-controller default backend set to web:31544

Related

BackendConfig with Ingress gives UNHEALTHY backend

I am learning how to use an ingress to expose my application GKE v1.19.
I followed the tutorial on GKE docs for Service, Ingress, and BackendConfig to get to the following setup. However, my backend services still become UNHEALTHY after some time. My aim is to overwrite the default "/" health check path for the ingress controller.
I have the same health checks defined in my deployment.yaml file under livenessProbe and readinessProbe and they seem to work fine since the Pod enters running stage. I have also tried to curl the endpoint and it returns a 200 status.
I have no clue why are my service is marked as unhealthy despite them being accessible from the NodePort service I defined directly. Any advice or help would be appreciated. Thank you.
I will add my yaml files below:
deployment.yaml
....
livenessProbe:
httpGet:
path: /api
port: 3100
initialDelaySeconds: 180
readinessProbe:
httpGet:
path: /api
port: 3100
initialDelaySeconds: 180
.....
backendconfig.yaml
apiVersion: cloud.google.com/v1beta1
kind: BackendConfig
metadata:
name: backend-config
namespace: ns1
spec:
healthCheck:
checkIntervalSec: 30
port: 3100
type: HTTP #case-sensitive
requestPath: /api
service.yaml
apiVersion: v1
kind: Service
metadata:
annotations:
cloud.google.com/backend-config: '{"default": "backend-config"}'
name: service-ns1
namespace: ns1
labels:
app: service-ns1
spec:
type: NodePort
ports:
- protocol: TCP
port: 3100
targetPort: 3100
selector:
app: service-ns1
ingress.yaml
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
name: ns1-ingress
namespace: ns1
annotations:
kubernetes.io/ingress.global-static-ip-name: ns1-ip
networking.gke.io/managed-certificates: ns1-cert
kubernetes.io/ingress.allow-http: "false"
spec:
rules:
- http:
paths:
- path: /api/*
backend:
serviceName: service-ns1
servicePort: 3100
The ideal way to use the ‘BackendConfig’ is when the serving pods for your service contains multiple containers, if you're using the Anthos Ingress controller or if you need control over the port used for the load balancer's health checks, then you should use a BackendConfig CDR to define health check parameters. Refer to the 1.
When a backend service's health check parameters are inferred from a serving Pod's readiness probe, GKE does not keep the readiness probe and health check synchronized. Hence any changes you make to the readiness probe will not be copied to the health check of the corresponding backend service on the load balancer as per 2.
In your scenario, the backend is healthy when it follows path ‘/’ but showing unhealthy when it uses path ‘/api’, so there might be some misconfiguration in your ingress.
I would suggest you to add the annotations: ingress.kubernetes.io/rewrite-target: /api
so the path mentioned in spec.path will be rewritten to /api before the request is sent to the backend service.

GKE Ingress with container-native load balancing does not detect health check (Invalid value for field 'resource.httpHealthCheck')

I am running a cluster on Google Kubernetes Engine and I am currently trying to switch from using an Ingress with external load balancing (and NodePort services) to an ingress with container-native load balancing (and ClusterIP services) following this documentation: Container native load balancing
To communicate with my services I am using the following ingress configuration that used to work just fine when using NodePort services instead of ClusterIP:
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
name: mw-ingress
annotations:
kubernetes.io/ingress.global-static-ip-name: mw-cluster-ip
networking.gke.io/managed-certificates: mw-certificate
kubernetes.io/ingress.allow-http: "false"
spec:
rules:
- http:
paths:
- path: /*
backend:
serviceName: billing-frontend-service
servicePort: 80
- path: /auth/api/*
backend:
serviceName: auth-service
servicePort: 8083
Now following the documentation, instead of using a readinessProbe as a part of the container deployment as a health check I switched to using ClusterIP services in combination with BackendConfig instead. For each deployment I am using a service like this:
apiVersion: v1
kind: Service
metadata:
labels:
app: auth
name: auth-service
namespace: default
annotations:
cloud.google.com/backend-config: '{"default": "auth-hc-config"}'
spec:
type: ClusterIP
selector:
app: auth
ports:
- port: 8083
protocol: TCP
targetPort: 8083
And a Backend config:
apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
name: auth-hc-config
spec:
healthCheck:
checkIntervalSec: 10
port: 8083
type: http
requestPath: /auth/health
As a reference, this is what the readinessProbe used to look like before:
readinessProbe:
failureThreshold: 3
httpGet:
path: /auth/health
port: 8083
scheme: HTTP
periodSeconds: 10
Now to the actual problem. I deploy the containers and services first and they seem to startup just fine. The ingress however does not seem to pick up the health checks properly and shows this in the Cloud console:
Error during sync: error running backend syncing routine: error ensuring health check: googleapi: Error 400: Invalid value for field 'resource.httpHealthCheck': ''. HTTP healthCheck missing., invalid
The cluster as well as the node pool are running GKE version 1.17.6-gke.11 so the annotation cloud.google.com/neg: '{"ingress": true}' is not necessary. I have checked and the service is annotated correctly:
Annotations: cloud.google.com/backend-config: {"default": "auth-hc-config"}
cloud.google.com/neg: {"ingress":true}
cloud.google.com/neg-status: {"network_endpoint_groups":{"8083":"k8s1-2078beeb-default-auth-service-8083-16a14039"},"zones":["europe-west3-b"]}
I have already tried to re-create the cluster and the node-pool with no effect. Any ideas on how to resolve this? Am I missing an additional health check somewhere?
I found my issue. Apparently the BackendConfig's type attribute is case-sensitive. Once I changed it from http to HTTP it worked after I recreated the ingress.

GCP HTTP(S) load balancer ignoring GKE readinessProbe specification

I’ve already seen this question; AFAIK I’m doing everything in the answers there.
Using GKE, I’ve deployed a GCP HTTP(S) load balancer-based ingress for a kubernetes cluster containing two almost identical deployments: production and development instances of the same application.
I set up a dedicated port on each pod template to use for health checks by the load balancer so that they are not impacted by redirects from the root path on the primary HTTP port. However, the health checks are consistently failing.
From these docs I added a readinessProbe parameter to my deployments, which the load balancer seems to be ignoring completely.
I’ve verified that the server on :p-ready (9292; the dedicated health check port) is running correctly using the following (in separate terminals):
➜ kubectl port-forward deployment/d-an-server p-ready
➜ curl http://localhost:9292/ -D -
HTTP/1.1 200 OK
content-length: 0
date: Wed, 26 Feb 2020 01:21:55 GMT
What have I missed?
A couple notes on the below configs:
The ${...} variables below are filled by the build script as part of deployment.
The second service (s-an-server-dev) is almost an exact duplicate of the first (with it’s own deployment) just with -dev suffixes on the names and labels.
Deployment
apiVersion: "apps/v1"
kind: "Deployment"
metadata:
name: "d-an-server"
namespace: "default"
labels:
app: "a-an-server"
spec:
replicas: 1
selector:
matchLabels:
app: "a-an-server"
template:
metadata:
labels:
app: "a-an-server"
spec:
containers:
- name: "c-an-server-app"
image: "gcr.io/${PROJECT_ID}/an-server-app:${SHORT_SHA}"
ports:
- name: "p-http"
containerPort: 8080
- name: "p-ready"
containerPort: 9292
readinessProbe:
httpGet:
path: "/"
port: "p-ready"
initialDelaySeconds: 30
Service
apiVersion: "v1"
kind: "Service"
metadata:
name: "s-an-server"
namespace: "default"
spec:
ports:
- port: 8080
targetPort: "p-http"
protocol: "TCP"
name: "sp-http"
selector:
app: "a-an-server"
type: "NodePort"
Ingress
apiVersion: "networking.k8s.io/v1beta1"
kind: "Ingress"
metadata:
name: "primary-ingress"
annotations:
kubernetes.io/ingress.global-static-ip-name: "primary-static-ipv4"
networking.gke.io/managed-certificates: "appname-production-cert,appname-development-cert"
spec:
rules:
- host: "appname.example.com"
http:
paths:
- backend:
serviceName: "s-an-server"
servicePort: "sp-http"
- host: "dev.appname.example.com"
http:
paths:
- backend:
serviceName: "s-an-server-dev"
servicePort: "sp-http-dev"
I think what's happening here is GKE ingress is not at all informed of port 9292. You are referring sp-http in the ingress which refers to port 8080.
You need to make sure of below:
1.The service's targetPort field must point to the pod port's containerPort value or name.
2.The readiness probe must be exposed on the port matching the servicePort specified in the Ingress.

GCE Ingress not picking up health check from readiness probe

When I create a GCE ingress, Google Load Balancer does not set the health check from the readiness probe. According to the docs (Ingress GCE health checks) it should pick it up.
Expose an arbitrary URL as a readiness probe on the pods backing the Service.
Any ideas why?
Deployment:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: frontend-prod
labels:
app: frontend-prod
spec:
selector:
matchLabels:
app: frontend-prod
replicas: 3
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
labels:
app: frontend-prod
spec:
imagePullSecrets:
- name: regcred
containers:
- image: app:latest
readinessProbe:
httpGet:
path: /healthcheck
port: 3000
initialDelaySeconds: 15
periodSeconds: 5
name: frontend-prod-app
- env:
- name: PASSWORD_PROTECT
value: "1"
image: nginx:latest
readinessProbe:
httpGet:
path: /health
port: 80
initialDelaySeconds: 5
periodSeconds: 5
name: frontend-prod-nginx
Service:
apiVersion: v1
kind: Service
metadata:
name: frontend-prod
labels:
app: frontend-prod
spec:
type: NodePort
ports:
- port: 80
targetPort: 80
protocol: TCP
name: http
selector:
app: frontend-prod
Ingress:
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: frontend-prod-ingress
annotations:
kubernetes.io/ingress.global-static-ip-name: frontend-prod-ip
spec:
tls:
- secretName: testsecret
backend:
serviceName: frontend-prod
servicePort: 80
So apparently, you need to include the container port on the PodSpec.
Does not seem to be documented anywhere.
e.g.
spec:
containers:
- name: nginx
image: nginx:1.7.9
ports:
- containerPort: 80
Thanks, Brian! https://github.com/kubernetes/ingress-gce/issues/241
This is now possible in the latest GKE (I am on 1.14.10-gke.27, not sure if that matters)
Define a readinessProbe on your container in your Deployment.
Recreate your Ingress.
The health check will point to the path in readinessProbe.httpGet.path of the Deployment yaml config.
Update by Jonathan Lin below: This has been fixed very recently. Define a readinessProbe on the Deployment. Recreate your Ingress. It will pick up the health check path from the readinessProbe.
GKE Ingress health check path is currently not configurable. You can go to http://console.cloud.google.com (UI) and visit Load Balancers list to see the health check it uses.
Currently the health check for an Ingress is GET / on each backend: specified on the Ingress. So all your apps behind a GKE Ingress must return HTTP 200 OK to GET / requests.
That said, the health checks you specified on your Pods are still being used ––by the kubelet to make sure your Pod is actually functioning and healthy.
Google has recently added support for CRD that can configure your Backend Services along with healthchecks:
apiVersion: cloud.google.com/v1beta1
kind: BackendConfig
metadata:
name: backend-config
namespace: prod
spec:
healthCheck:
checkIntervalSec: 30
port: 8080
type: HTTP #case-sensitive
requestPath: /healthcheck
See here.
Another reason why Google Cloud Load Balancer does not pick-up GCE health check configuration from Kubernetes Pod readiness probe could be that the service is configured as "selectorless" (the selector attribute is empty and you manage endpoints directly).
This is the case with e.g. kube-lego: see https://github.com/jetstack/kube-lego/issues/68#issuecomment-303748457 and https://github.com/jetstack/kube-lego/issues/68#issuecomment-327457982.
Original question does have selector specified in the service, so this hint doesn't apply. This hints serves visitors that have the same problem with a different cause.

Kubernetes Ingress (GCE) keeps returning 502 error

I am trying to setup an Ingress in GCE Kubernetes. But when I visit the IP address and path combination defined in the Ingress, I keep getting the following 502 error:
Here is what I get when I run: kubectl describe ing --namespace dpl-staging
Name: dpl-identity
Namespace: dpl-staging
Address: 35.186.221.153
Default backend: default-http-backend:80 (10.0.8.5:8080)
TLS:
dpl-identity terminates
Rules:
Host Path Backends
---- ---- --------
*
/api/identity/* dpl-identity:4000 (<none>)
Annotations:
https-forwarding-rule: k8s-fws-dpl-staging-dpl-identity--5fc40252fadea594
https-target-proxy: k8s-tps-dpl-staging-dpl-identity--5fc40252fadea594
url-map: k8s-um-dpl-staging-dpl-identity--5fc40252fadea594
backends: {"k8s-be-31962--5fc40252fadea594":"HEALTHY","k8s-be-32396--5fc40252fadea594":"UNHEALTHY"}
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
15m 15m 1 {loadbalancer-controller } Normal ADD dpl-staging/dpl-identity
15m 15m 1 {loadbalancer-controller } Normal CREATE ip: 35.186.221.153
15m 6m 4 {loadbalancer-controller } Normal Service no user specified default backend, using system default
I think the problem is dpl-identity:4000 (<none>). Shouldn't I see the IP address of the dpl-identity service instead of <none>?
Here is my service description: kubectl describe svc --namespace dpl-staging
Name: dpl-identity
Namespace: dpl-staging
Labels: app=dpl-identity
Selector: app=dpl-identity
Type: NodePort
IP: 10.3.254.194
Port: http 4000/TCP
NodePort: http 32396/TCP
Endpoints: 10.0.2.29:8000,10.0.2.30:8000
Session Affinity: None
No events.
Also, here is the result of executing: kubectl describe ep -n dpl-staging dpl-identity
Name: dpl-identity
Namespace: dpl-staging
Labels: app=dpl-identity
Subsets:
Addresses: 10.0.2.29,10.0.2.30
NotReadyAddresses: <none>
Ports:
Name Port Protocol
---- ---- --------
http 8000 TCP
No events.
Here is my deployment.yaml:
apiVersion: v1
kind: Secret
metadata:
namespace: dpl-staging
name: dpl-identity
type: Opaque
data:
tls.key: <base64 key>
tls.crt: <base64 crt>
---
apiVersion: v1
kind: Service
metadata:
namespace: dpl-staging
name: dpl-identity
labels:
app: dpl-identity
spec:
type: NodePort
ports:
- port: 4000
targetPort: 8000
protocol: TCP
name: http
selector:
app: dpl-identity
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
namespace: dpl-staging
name: dpl-identity
labels:
app: dpl-identity
annotations:
kubernetes.io/ingress.allow-http: "false"
spec:
tls:
- secretName: dpl-identity
rules:
- http:
paths:
- path: /api/identity/*
backend:
serviceName: dpl-identity
servicePort: 4000
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
namespace: dpl-staging
name: dpl-identity
kind: Ingress
metadata:
namespace: dpl-staging
name: dpl-identity
labels:
app: dpl-identity
annotations:
kubernetes.io/ingress.allow-http: "false"
spec:
tls:
- secretName: dpl-identity
rules:
- http:
paths:
- path: /api/identity/*
backend:
serviceName: dpl-identity
servicePort: 4000
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
namespace: dpl-staging
name: dpl-identity
labels:
app: dpl-identity
spec:
replicas: 2
strategy:
type: RollingUpdate
template:
metadata:
labels:
app: dpl-identity
spec:
containers:
- image: gcr.io/munpat-container-engine/dpl/identity:0.4.9
name: dpl-identity
ports:
- containerPort: 8000
name: http
volumeMounts:
- name: dpl-identity
mountPath: /data
volumes:
- name: dpl-identity
secret:
secretName: dpl-identity
Your backend k8s-be-32396--5fc40252fadea594 is showing as "UNHEALTHY".
Ingress will not forward traffic if the backend is UNHEALTHY, this will result in the 502 error you are seeing.
It will be being marked as UNHEALTHY becuase it is not passing it's health check, you can check the health check setting for k8s-be-32396--5fc40252fadea594 to see if they are appropriate for your pod, it may be polling an URI or port that is not returning a 200 response. You can find these setting under Compute Engine > Health Checks.
If they are correct then there are many steps between your browser and the container that could be passing traffic incorrectly, you could try kubectl exec -it PODID -- bash (or ash if you are using Alpine) and then try curl-ing localhost to see if the container is responding as expected, if it is and the health checks are also configured correctly then this would narrow down the issue to likely be with your service, you could then try changing the service from a NodePort type to a LoadBalancer and see if hitting the service IP directly from your browser works.
I was having the same issue. It turns out I had to wait a few minutes before ingress to validate the service health. If someone is going to the same and done all the steps like readinessProbe and linvenessProbe, just ensure your ingress is pointing to a service that is either a NodePort, and wait a few minutes until the yellow warning icon turns into a green one. Also, check the log on StackDriver to get a better idea of what's going on. My readinessProbe and livenessProbe is on /login, for the gce class. So I don't think it has to be on /healthz.
Issue is indeed a health check and seemed "random" for my apps where I used name-based virtual hosts to reverse proxy requests from ingress via domains to two separate backend services. Both were secured using Lets Encrypt and kube-lego. My solution was to standardize the path for health checks for all services sharing an ingress, and declare the readinessProbe and livenessProbe configs in my deployment.yml file.
I faced this issue with Google cloud cluster node version 1.7.8 and found this issue that closely-resembled what I experienced:
* https://github.com/jetstack/kube-lego/issues/27
I'm using gce and kube-lego and my backend service health checks were on / and kube-lego is on /healthz. It appears differing paths for health checks with gce ingress might be the cause so it may be worth updating backend services to match the /healthz pattern so all use same (or as one commenter in Github issue stated they updated kube-lego to pass on /).
I had the same problem, and it persisted after I enabled livenessProbe as well readinessPorbe.
It turned this was to do with basic auth. I've added basic auth to livenessProbe and the readinessPorbe, but turns out the GCE HTTP(S) load balancer doesn't have a configuration option for that.
There seem to be a few another kind of issue with too, e.g. setting container port to 8080 and service port to 80 didn't work with GKE ingress controller (yet I wouldn't clearly indicate what the problem was). And broadly, it looks to me like there is very little visibility and running your own ingress container is a better option with respect to visibility.
I picked Traefik for my project, it worked out of the box, and I'd like to enable Let's Encrypt integration. The only change I had to make to Traefik manifests was about tweaking the service object to disabling access to the UI from outside of the cluster and expose my app with through external load balancer (GCE TCP LB). Also, Traefik is more native to Kubernetes. I tried Heptio Contour, but something didn't work out of the box (will give it a go next time when the new version comes out).
I had the same issue. I turned out that the pod itself was running ok, which I tested via port-forwarding and accessing the health-check URL.
Port-Forward can be activated in console as follows:
$ kubectl port-forward <pod-name> local-port:pod-port
So if the pod is running ok and ingress still shows unhealthy state there might be an issue with your service configuration. In my case my app-selector was incorrect, causing the selection of a non existent pod. Interestingly this isn't showed as an errors or alerts in google console.
Definition of the pods:
#pod-definition.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: <pod-name>
namespace: <namespace>
spec:
selector:
matchLabels:
app: **<pod-name>**
template:
metadata:
labels:
app: <pod-name>
spec:
#spec-definition follows
#service.yaml
apiVersion: v1
kind: Service
metadata:
name: <name-of-service-here>
namespace: <namespace>
spec:
type: NodePort
selector:
app: **<pod-name>**
ports:
- protocol: TCP
port: 8080
targetPort: 8080
name: <port-name-here>
The "Limitations" section of the kubernetes documentation states that:
All Kubernetes services must serve a 200 page on '/', or whatever custom value you've specified through GLBC's --health-check-path argument.
https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/cluster-loadbalancing/glbc#limitations
I solved the problem by
Remove the service from ingress definition
Deploy ingress kubectl apply -f ingress.yaml
Add the service to ingress definition
Deploy ingress again
Essentially, I followed Roy's advice and tried to turn it off and on again.
Log can read from Stackdriver Logging, in my case, it is backend_timeout error. After increase the default timeout (30s) via BackendConfig, it stop returning 502 even under load.
More on:
https://cloud.google.com/kubernetes-engine/docs/how-to/configure-backend-service#creating_a_backendconfig
I've fixed this issue after adding the following readiness and liveness probe with successThreshold: 1 and failureThreshold: 3 . Also i kept initialDelaySeconds to 70 because sometime an application responds bit late , it may vary per application.
NOTE: Also ensure that the path in httpGet should exist in your application(like in my case /api/books) other wise GCP pings /healthz path and doesn't guarantee to return 200 OK .
readinessProbe:
httpGet:
path: /api/books
port: 80
periodSeconds: 5
successThreshold: 1
failureThreshold: 3
initialDelaySeconds: 70
timeoutSeconds: 60
livenessProbe:
httpGet:
path: /api/books
port: 80
initialDelaySeconds: 70
periodSeconds: 5
successThreshold: 1
failureThreshold: 3
timeoutSeconds: 60
I could able to sort out after struggling a lot and tried many things.
Keep Learn & Share
I had the same issue when I was using a wrong image and the request couldn't be satisfied as the configurations were different.