GCE Ingress not picking up health check from readiness probe - kubernetes

When I create a GCE ingress, Google Load Balancer does not set the health check from the readiness probe. According to the docs (Ingress GCE health checks) it should pick it up.
Expose an arbitrary URL as a readiness probe on the pods backing the Service.
Any ideas why?
Deployment:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: frontend-prod
labels:
app: frontend-prod
spec:
selector:
matchLabels:
app: frontend-prod
replicas: 3
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
labels:
app: frontend-prod
spec:
imagePullSecrets:
- name: regcred
containers:
- image: app:latest
readinessProbe:
httpGet:
path: /healthcheck
port: 3000
initialDelaySeconds: 15
periodSeconds: 5
name: frontend-prod-app
- env:
- name: PASSWORD_PROTECT
value: "1"
image: nginx:latest
readinessProbe:
httpGet:
path: /health
port: 80
initialDelaySeconds: 5
periodSeconds: 5
name: frontend-prod-nginx
Service:
apiVersion: v1
kind: Service
metadata:
name: frontend-prod
labels:
app: frontend-prod
spec:
type: NodePort
ports:
- port: 80
targetPort: 80
protocol: TCP
name: http
selector:
app: frontend-prod
Ingress:
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: frontend-prod-ingress
annotations:
kubernetes.io/ingress.global-static-ip-name: frontend-prod-ip
spec:
tls:
- secretName: testsecret
backend:
serviceName: frontend-prod
servicePort: 80

So apparently, you need to include the container port on the PodSpec.
Does not seem to be documented anywhere.
e.g.
spec:
containers:
- name: nginx
image: nginx:1.7.9
ports:
- containerPort: 80
Thanks, Brian! https://github.com/kubernetes/ingress-gce/issues/241

This is now possible in the latest GKE (I am on 1.14.10-gke.27, not sure if that matters)
Define a readinessProbe on your container in your Deployment.
Recreate your Ingress.
The health check will point to the path in readinessProbe.httpGet.path of the Deployment yaml config.

Update by Jonathan Lin below: This has been fixed very recently. Define a readinessProbe on the Deployment. Recreate your Ingress. It will pick up the health check path from the readinessProbe.
GKE Ingress health check path is currently not configurable. You can go to http://console.cloud.google.com (UI) and visit Load Balancers list to see the health check it uses.
Currently the health check for an Ingress is GET / on each backend: specified on the Ingress. So all your apps behind a GKE Ingress must return HTTP 200 OK to GET / requests.
That said, the health checks you specified on your Pods are still being used ––by the kubelet to make sure your Pod is actually functioning and healthy.

Google has recently added support for CRD that can configure your Backend Services along with healthchecks:
apiVersion: cloud.google.com/v1beta1
kind: BackendConfig
metadata:
name: backend-config
namespace: prod
spec:
healthCheck:
checkIntervalSec: 30
port: 8080
type: HTTP #case-sensitive
requestPath: /healthcheck
See here.

Another reason why Google Cloud Load Balancer does not pick-up GCE health check configuration from Kubernetes Pod readiness probe could be that the service is configured as "selectorless" (the selector attribute is empty and you manage endpoints directly).
This is the case with e.g. kube-lego: see https://github.com/jetstack/kube-lego/issues/68#issuecomment-303748457 and https://github.com/jetstack/kube-lego/issues/68#issuecomment-327457982.
Original question does have selector specified in the service, so this hint doesn't apply. This hints serves visitors that have the same problem with a different cause.

Related

Health checks for service returning 301 after updating deployment

We recently updated the deployment of a dropwizard service deployed using Docker and Kubernetes.
It was working correctly before, the readiness probe was yielding a healthcheck ping to internal cluster IP getting 200s. Since we updated the healthcheck pings are resulting in a 301 and the service is considered down.
I've noticed that the healthcheck is now Default kubernetes L7 Loadbalancing health check for NEG. (port is set to 80) where it was previously Default kubernetes L7 Loadbalancing health check. where the port was configurable.
The kube file is deployed via CircleCI but the readiness probe is:
kind: Deployment
metadata:
name: pes-${CIRCLE_BRANCH}
namespace: ${GKE_NAMESPACE_NAME}
annotations:
reloader.stakater.com/auto: 'true'
spec:
replicas: 2
selector:
matchLabels:
app: ***
template:
metadata:
labels:
app: ***
spec:
containers:
- name: ***
image: ***
envFrom:
- configMapRef:
name: ***
- secretRef:
name: ***
command: ['./gradlew', 'run']
resources: {}
ports:
- name: pes
containerPort: 5000
readinessProbe:
httpGet:
path: /api/healthcheck
port: pes
initialDelaySeconds: 15
timeoutSeconds: 30
---
apiVersion: v1
kind: Service
metadata:
name: ***
namespace: ${GKE_NAMESPACE_NAME}
spec:
ports:
- name: pes
port: 5000
targetPort: pes
protocol: TCP
selector:
app: ***
type: LoadBalancer
Any ideas on how this needs to be configured in GCP?
I have a feeling that the new deployment has changed from legacy health check to non legacy but no idea what else needs to be set up for it to work. Does the kube file handle creating firewall rules or does that need to be done manually?
Reading the docs at https://cloud.google.com/load-balancing/docs/health-check-concepts?hl=en
EDIT:
Issue is now resolved. After GKE version was updated it is now creating a NEG healthcheck by default. We disabled this by adding below annotation to service deployment file.
metadata:
annotations:
cloud.google.com/neg: '{"ingress":false}'
Issue is now resolved. After GKE version was updated it is now creating a NEG healthcheck by default. We disabled this by adding below annotation to service deployment file.
metadata: annotations: cloud.google.com/neg: '{"ingress":false}'

Hostname of pods in same statefulset can not be resolved

I am configuring a statefulset deploying 2 Jira DataCenter nodes. The statefulset results in 2 pods. Everything seems fine until the 2 pods try to connect to eachother. They do this with their short hostname being jira-0 and jira-1.
The jira-1 pod reports UnknownHostException when connecting to jira-0. The hostname can not be resolved.
I read about adding a headless service which I didn't have yet. After adding that I can resolve the FQDN but still no luck for the short name.
Then I read this page: DNS for Services and Pods and added:
dnsConfig:
searches:
- jira.default.svc.cluster.local
That solves my issue but I think it shouldn't be necessary to add this?
Some extra info:
Cluster on AKS with CoreDNS
Kubernetes v1.19.9
Network plugin: Kubenet
Network policy: none
My full yaml file:
apiVersion: v1
kind: Service
metadata:
name: jira
labels:
app: jira
spec:
clusterIP: None
selector:
app: jira
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: jira
spec:
serviceName: jira
replicas: 0
selector:
matchLabels:
app: jira
template:
metadata:
labels:
app: jira
spec:
containers:
- name: jira
image: atlassian/jira-software:8.12.2-jdk11
readinessProbe:
httpGet:
path: /jira/status
port: 8080
initialDelaySeconds: 120
periodSeconds: 10
livenessProbe:
httpGet:
path: /jira/
port: 8080
initialDelaySeconds: 600
periodSeconds: 10
envFrom:
– configMapRef:
name: jira-config
ports:
- containerPort: 8080
dnsConfig:
searches:
- jira.default.svc.cluster.local
That solves my issue but I think it shouldn't be necessary to add this?
From the StatefulSet documentation:
StatefulSets currently require a Headless Service to be responsible for the network identity of the Pods. You are responsible for creating this Service.
The example above will create three Pods named web-0,web-1,web-2. A StatefulSet can use a Headless Service to control the domain of its Pods.
The pod-identity is will be subdomain to the governing service, eg. in your case it will be e.g:
jira-0.jira.default.svc.cluster.local
jira-1.jira.default.svc.cluster.local

GCP HTTP(S) load balancer ignoring GKE readinessProbe specification

I’ve already seen this question; AFAIK I’m doing everything in the answers there.
Using GKE, I’ve deployed a GCP HTTP(S) load balancer-based ingress for a kubernetes cluster containing two almost identical deployments: production and development instances of the same application.
I set up a dedicated port on each pod template to use for health checks by the load balancer so that they are not impacted by redirects from the root path on the primary HTTP port. However, the health checks are consistently failing.
From these docs I added a readinessProbe parameter to my deployments, which the load balancer seems to be ignoring completely.
I’ve verified that the server on :p-ready (9292; the dedicated health check port) is running correctly using the following (in separate terminals):
➜ kubectl port-forward deployment/d-an-server p-ready
➜ curl http://localhost:9292/ -D -
HTTP/1.1 200 OK
content-length: 0
date: Wed, 26 Feb 2020 01:21:55 GMT
What have I missed?
A couple notes on the below configs:
The ${...} variables below are filled by the build script as part of deployment.
The second service (s-an-server-dev) is almost an exact duplicate of the first (with it’s own deployment) just with -dev suffixes on the names and labels.
Deployment
apiVersion: "apps/v1"
kind: "Deployment"
metadata:
name: "d-an-server"
namespace: "default"
labels:
app: "a-an-server"
spec:
replicas: 1
selector:
matchLabels:
app: "a-an-server"
template:
metadata:
labels:
app: "a-an-server"
spec:
containers:
- name: "c-an-server-app"
image: "gcr.io/${PROJECT_ID}/an-server-app:${SHORT_SHA}"
ports:
- name: "p-http"
containerPort: 8080
- name: "p-ready"
containerPort: 9292
readinessProbe:
httpGet:
path: "/"
port: "p-ready"
initialDelaySeconds: 30
Service
apiVersion: "v1"
kind: "Service"
metadata:
name: "s-an-server"
namespace: "default"
spec:
ports:
- port: 8080
targetPort: "p-http"
protocol: "TCP"
name: "sp-http"
selector:
app: "a-an-server"
type: "NodePort"
Ingress
apiVersion: "networking.k8s.io/v1beta1"
kind: "Ingress"
metadata:
name: "primary-ingress"
annotations:
kubernetes.io/ingress.global-static-ip-name: "primary-static-ipv4"
networking.gke.io/managed-certificates: "appname-production-cert,appname-development-cert"
spec:
rules:
- host: "appname.example.com"
http:
paths:
- backend:
serviceName: "s-an-server"
servicePort: "sp-http"
- host: "dev.appname.example.com"
http:
paths:
- backend:
serviceName: "s-an-server-dev"
servicePort: "sp-http-dev"
I think what's happening here is GKE ingress is not at all informed of port 9292. You are referring sp-http in the ingress which refers to port 8080.
You need to make sure of below:
1.The service's targetPort field must point to the pod port's containerPort value or name.
2.The readiness probe must be exposed on the port matching the servicePort specified in the Ingress.

Health Checks in GKE in GCloud resets after I change it from HTTP to TCP

I'm working on a Kubernetes cluster where I am directing service from GCloud Ingress to my Services. One of the services endpoints fails health check as HTTP but passes it as TCP.
When I change the health check options inside GCloud to be TCP, the health checks pass, and my endpoint works, but after a few minutes, the health check on GCloud resets for that port back to HTTP and health checks fail again, giving me a 502 response on my endpoint.
I don't know if it's a bug inside Google Cloud or something I'm doing wrong in Kubernetes. I have pasted my YAML configuration here:
namespace
apiVersion: v1
kind: Namespace
metadata:
name: parity
labels:
name: parity
storageclass
apiVersion: storage.k8s.io/v1
metadata:
name: classic-ssd
namespace: parity
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-ssd
zones: us-central1-a
reclaimPolicy: Retain
secret
apiVersion: v1
kind: Secret
metadata:
name: tls-secret
namespace: ingress-nginx
data:
tls.crt: ./config/redacted.crt
tls.key: ./config/redacted.key
statefulset
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: parity
namespace: parity
labels:
app: parity
spec:
replicas: 3
selector:
matchLabels:
app: parity
serviceName: parity
template:
metadata:
name: parity
labels:
app: parity
spec:
containers:
- name: parity
image: "etccoop/parity:latest"
imagePullPolicy: Always
args:
- "--chain=classic"
- "--jsonrpc-port=8545"
- "--jsonrpc-interface=0.0.0.0"
- "--jsonrpc-apis=web3,eth,net"
- "--jsonrpc-hosts=all"
ports:
- containerPort: 8545
protocol: TCP
name: rpc-port
- containerPort: 443
protocol: TCP
name: https
readinessProbe:
tcpSocket:
port: 8545
initialDelaySeconds: 650
livenessProbe:
tcpSocket:
port: 8545
initialDelaySeconds: 650
volumeMounts:
- name: parity-config
mountPath: /parity-config
readOnly: true
- name: parity-data
mountPath: /parity-data
volumes:
- name: parity-config
secret:
secretName: parity-config
volumeClaimTemplates:
- metadata:
name: parity-data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: "classic-ssd"
resources:
requests:
storage: 50Gi
service
apiVersion: v1
kind: Service
metadata:
labels:
app: parity
name: parity
namespace: parity
annotations:
cloud.google.com/app-protocols: '{"my-https-port":"HTTPS","my-http-port":"HTTP"}'
spec:
selector:
app: parity
ports:
- name: default
protocol: TCP
port: 80
targetPort: 80
- name: rpc-endpoint
port: 8545
protocol: TCP
targetPort: 8545
- name: https
port: 443
protocol: TCP
targetPort: 443
type: LoadBalancer
ingress
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: ingress-parity
namespace: parity
annotations:
#nginx.ingress.kubernetes.io/rewrite-target: /
kubernetes.io/ingress.global-static-ip-name: cluster-1
spec:
tls:
secretName: tls-classic
hosts:
- www.redacted.com
rules:
- host: www.redacted.com
http:
paths:
- path: /
backend:
serviceName: web
servicePort: 8080
- path: /rpc
backend:
serviceName: parity
servicePort: 8545
Issue
I've redacted hostnames and such, but this is my basic configuration. I've also run a hello-app container from this documentation here for debugging: https://cloud.google.com/kubernetes-engine/docs/tutorials/hello-app
Which is what the endpoint for ingress on / points to on port 8080 for the hello-app service. That works fine and isn't the issue, but just mentioned here for clarification.
So, the issue here is that, after creating my cluster with GKE and my ingress LoadBalancer on Google Cloud (the cluster-1 global static ip name in the Ingress file), and then creating the Kubernetes configuration in the files above, the Health-Check fails for the /rpc endpoint on Google Cloud when I go to Google Compute Engine -> Health Check -> Specific Health-Check for the /rpc endpoint.
When I edit that Health-Check to not use HTTP Protocol and instead use TCP Protocol, health-checks pass for the /rpc endpoint and I can curl it just fine after and it returns me the correct response.
The issue is that a few minutes after that, the same Health-Check goes back to HTTP protocol even though I edited it to be TCP, and then the health-checks fail and I get a 502 response when I curl it again.
I am not sure if there's a way to attach the Google Cloud Health Check configuration to my Kubernetes Ingress prior to creating the Ingress in kubernetes. Also not sure why it's being reset, can't tell if it's a bug on Google Cloud or something I'm doing wrong in Kubernetes. If you notice on my statefulset deployment, I have specified livenessProbe and readinessProbe to use TCP to check the port 8545.
The delay of 650 seconds was due to this ticket issue here which was solved by increasing the delay to greater than 600 seconds (to avoid mentioned race conditions): https://github.com/kubernetes/ingress-gce/issues/34
I really am not sure why the Google Cloud health-check is resetting back to HTTP after I've specified it to be TCP. Any help would be appreciated.
I found a solution where I added a new container for health check on my stateful set on /healthz endpoint, and configured the health check of the ingress to check that endpoint on the 8080 port assigned by kubernetes as an HTTP type of health-check, which made it work.
It's not immediately obvious why the reset happens when it's TCP.

Kubernetes endpoints empty , can I restart the pods?

I have a situation where I have zero endpoints available for one service. To test this, I specially crafted a yaml descriptor that uses a simple node server to set and retrieve the ready/live status for a pod:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: nodejs-deployment
labels:
app: nodejs
spec:
replicas: 3
selector:
matchLabels:
app: nodejs
template:
metadata:
labels:
app: nodejs
spec:
containers:
- name: nodejs
image: nodejs_server
ports:
- containerPort: 8080
livenessProbe:
httpGet:
path: /is_alive
port: 8080
initialDelaySeconds: 5
timeoutSeconds: 3
periodSeconds: 10
readinessProbe:
httpGet:
path: /is_ready
port: 8080
initialDelaySeconds: 5
timeoutSeconds: 3
periodSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
name: nodejs-service
labels:
app: nodejs
spec:
ports:
- port: 80
protocol: TCP
targetPort: 8080
selector:
app: nodejs
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: nodejs-ingress
spec:
backend:
serviceName: nodejs-service
servicePort: 80
The node server has methods to set and retrieve the liveness and readiness.
When the app start I can see that 3 replicas are created and the status of them is ready. OK then now I trigger manually the status of their readiness to set to false [from outside the ingress]. One pod is correctly removed from the endpoint so no traffic is routed to it[that's OK as this is the expected behavior]. When I set all the ready-statuses to false for all pods the endpoints list is empty [still the expected behavior].
At that point I cannot set ready=true from outside the ingress as the traffic is not routed to any pod. Is there a way here for example of triggering a restart of the pod when the ready is not achieved after n-timer or n-seconds? Or when the endpoints list is empty?
Well, that is perfectly normal and expected behaviour. What you can do, on the side, is to forward traffic from localhost to a particular pod with kubectl port-forward. That way you can access the pod directly, without ingresses etc. and set it's readiness back to ok. If you want to restart when host it not ready for to long, just use the same endpoint for liveness probe, but trigger it after more tries.