Kubernetes health check outside container - kubernetes

Can I do liveness or readiness kind of health check from out of the container. I mean, can I stop traffic to pods and restart containers in case application is not accessible.

The Http Request Liveness Probe and TCP Liveness Probe can be used to see if your application running inside of the container is reachable from the outside world:
pods/probe/http-liveness.yaml
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-http
spec:
containers:
- name: liveness
image: k8s.gcr.io/liveness
args:
- /server
livenessProbe:
httpGet:
path: /healthz
port: 8080
httpHeaders:
- name: X-Custom-Header
value: Awesome
initialDelaySeconds: 3
periodSeconds: 3
See this piece of documentation on configuring probes. Does that answer your question?

Related

Readiness probe based on service

I have 2 pods and my application is based on a cluster i.e. application synchronizes with another pod to bring it up. Let us say in my example I am using appod1 and appod2 and the synchronization port is 8080.
I want the service for DNS to be resolved for these pod hostnames but I want to block the traffic from outside the apppod1 and appod2.
I can use a readiness probe but then the service doesn't have endpoints and I can't resolve the IP of the 2nd pod. If I can't resolve the IP of the 2nd pod from pod1 then I can't complete the configuration of these pods.
E.g.
App Statefulset definition
app1_sts.yaml
===
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
cluster: appcluster
name: app1
namespace: app
spec:
selector:
matchLabels:
cluster: appcluster
serviceName: app1cluster
template:
metadata:
labels:
cluster: appcluster
spec:
containers:
- name: app1-0
image: localhost/linux:8
imagePullPolicy: Always
securityContext:
privileged: false
command: [/usr/sbin/init]
ports:
- containerPort: 8080
name: appport
readinessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 120
periodSeconds: 30
failureThreshold: 20
env:
- name: container
value: "true"
- name: applist
value: "app2-0"
app2_sts.yaml
====
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
cluster: appcluster
name: app2
namespace: app
spec:
selector:
matchLabels:
cluster: appcluster
serviceName: app2cluster
template:
metadata:
labels:
cluster: appcluster
spec:
containers:
- name: app2-0
image: localhost/linux:8
imagePullPolicy: Always
securityContext:
privileged: false
command: [/usr/sbin/init]
ports:
- containerPort: 8080
name: appport
readinessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 120
periodSeconds: 30
failureThreshold: 20
env:
- name: container
value: "true"
- name: applist
value: "app1-0"
Create Statefulsets and check name resolution
[root#oper01 onprem]# kubectl get all -n app
NAME READY STATUS RESTARTS AGE
pod/app1-0 0/1 Running 0 8s
pod/app2-0 0/1 Running 0 22s
NAME READY AGE
statefulset.apps/app1 0/1 49s
statefulset.apps/app2 0/1 22s
kubectl exec -i -t app1-0 /bin/bash -n app
[root#app1-0 ~]# nslookup app2-0
Server: 10.96.0.10
Address: 10.96.0.10#53
** server can't find app2-0: NXDOMAIN
[root#app1-0 ~]# nslookup app1-0
Server: 10.96.0.10
Address: 10.96.0.10#53
** server can't find app1-0: NXDOMAIN
[root#app1-0 ~]#
I understand the behavior of the readiness probe and I am using it as it helps me to make sure service should not resolve to app pods if port 8080 is down. However, I am unable to make out how can I complete the configuration as app pods need to resolve each other and they need their hostname and IPs to configure. DNS resolution can only happen once the service has end points. Is there a better way to handle this situation?

Re-route traffic in kubernetes to a working pod

Not sure if such if there was such a question, so pardon me if I couldn't find such.
I have a cluster based on 3 nodes, my application consists of a frontend and a backend with each running 2 replicas:
front1 - running on node1
front2 - running on node2
be1 - node1
be2 - node2
Both FE pods are served behind frontend-service
Both BE pods are service behind be-service
When I shutdown node-2, the application stopped and in my UI I could see application errors.
I've checked the logs and found out that my application attempted to reach the service type of the backend pods and it failed to respond since be2 wasn't running, the scheduler is yet to terminate the existing one.
Only when the node was terminated and removed from the cluster, the pods were rescheduled to the 3rd node and the application was back online.
I know a service mesh can help by removing the pods that aren't responding from the traffic, however, I don't want to implement it yet, and trying to understand what is the best solution to route the traffic to the healthy pods in a fast and easy way, 5 minutes of downtime is a lot of time.
Here's my be deployment spec:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: backend
name: backend
namespace: default
spec:
progressDeadlineSeconds: 600
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
app: backend
strategy:
rollingUpdate:
maxSurge: 0
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: backend
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-Application
operator: In
values:
- "true"
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- backend
topologyKey: kubernetes.io/hostname
containers:
- env:
- name: SSL_ENABLED
value: "false"
image: quay.io/something:latest
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /liveness
port: 16006
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 20
successThreshold: 1
timeoutSeconds: 10
name: backend
ports:
- containerPort: 16006
protocol: TCP
- containerPort: 8457
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /readiness
port: 16006
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
resources:
limits:
cpu: 1500m
memory: 8500Mi
requests:
cpu: 6m
memory: 120Mi
dnsPolicy: ClusterFirst
Here's my backend service:
apiVersion: v1
kind: Service
metadata:
labels:
app: identity
name: backend
namespace: default
spec:
clusterIP: 10.233.34.115
ports:
- name: tcp
port: 16006
protocol: TCP
targetPort: 16006
- name: internal-http-rpc
port: 8457
protocol: TCP
targetPort: 8457
selector:
app: backend
sessionAffinity: None
type: ClusterIP
This is a community wiki answer. Feel free to expand it.
As already mentioned by #TomerLeibovich the main issue here was due to the Probes Configuration:
Probes have a number of fields that you can use to more precisely
control the behavior of liveness and readiness checks:
initialDelaySeconds: Number of seconds after the container has started before liveness or readiness probes are initiated. Defaults to
0 seconds. Minimum value is 0.
periodSeconds: How often (in seconds) to perform the probe. Default to 10 seconds. Minimum value is 1.
timeoutSeconds: Number of seconds after which the probe times out. Defaults to 1 second. Minimum value is 1.
successThreshold: Minimum consecutive successes for the probe to be considered successful after having failed. Defaults to 1. Must be 1
for liveness and startup Probes. Minimum value is 1.
failureThreshold: When a probe fails, Kubernetes will try failureThreshold times before giving up. Giving up in case of liveness
probe means restarting the container. In case of readiness probe the
Pod will be marked Unready. Defaults to 3. Minimum value is 1.
Plus the proper Pod eviction configuration:
The kubelet needs to preserve node stability when available compute
resources are low. This is especially important when dealing with
incompressible compute resources, such as memory or disk space. If
such resources are exhausted, nodes become unstable.
Changing the threshold to 1 instead of 3 and reducing the pod-eviction solved the issue as the Pod is now being evicted sooner.
EDIT:
The other possible solution in this scenario is to label other nodes with the app backend to make sure that each backend/pod was deployed on different nodes. In your current situation one pod deployed on the node was removed from the endpoint and the application became unresponsive.
Also, the workaround for triggering pod eviction from the unhealthy node is to add tolerations to
deployment.spec. template.spec: tolerations: - key: "node.kubernetes.io/unreachable" operator: "Exists" effect: "NoExecute" tolerationSeconds: 60
instead of using the default value: tolerationSeconds: 300.
You can find more information in this documentation.

Kubernetes ClusterIP service initial delay or liveness

I have a Kubernetes deployment on GCP and a ClusterIP service to discover pods in this deployment. The deployment contains multiple replica set pods which come and go based on our horizontal pod scalar configuration (based on CPU Utilization).
Now, when a new replica set pod is created, it takes some time for the application to start servicing. But the ClusterIP already starts distributing requests to new replica set pod before the application is ready, which causes the requests to be not serviced.
ClusterIP service yaml:
apiVersion: v1
kind: Service
metadata:
labels:
app: service-name
tier: backend
environment: "dev"
creator: internal
name: service-name
spec:
clusterIP: None
ports:
- name: https
protocol: TCP
port: 7070
targetPort: 7070
selector:
app: dep-name
tier: "backend"
environment: "dev"
creator: "ME"
type: ClusterIP
How can the ClusterIP be told to start distributing requests to the new pod after the application starts? Can there be any initial delay or liveness probe set for this purpose?
Kubernetes provides readiness probe for it. With readiness probes, Kubernetes will not send traffic to a pod until the probe is successful. When updating a deployment, it will also leave old replica(s) running until probes have been successful on new replica. That means that if your new pods are broken in some way, they’ll never see traffic, your old pods will continue to serve all traffic for the deployment.
You need to update the deployment file with following readiness probe:
readinessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
If your application have http probe then you can set readiness probe in HTTP mode as well.
For more information how can you use readiness probe refer:
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/#define-readiness-probes
You should have a readiness probe as defined in the documentation at
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/#define-readiness-probes.
As defined in the documentation you should be able to configure using initialDelaySeconds and periodSeconds.
Your current behavior is probably because the service load balancer has seen that all the containers in the pod are started. You can define your readyness checks like the example below picked from documentation.
kind: Pod
metadata:
name: goproxy
labels:
app: goproxy
spec:
containers:
- name: goproxy
image: k8s.gcr.io/goproxy:0.1
ports:
- containerPort: 8080
readinessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
livenessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 15
periodSeconds: 20

Pod does not communicate with other pod through service

I have 2 pods: a server pod and a client pod (basically the client hits port 8090 to interact with the server). I have created a service (which in turn creates an endpoint) but the client pod cannot reach that endpoint and therefore it crashes:
Error :Error in client :rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp :8090: connect: connection refused")
The client pod tries to access port 8090 in its host network. What I am hoping to do is that whenever the client hits 8090 through the service it connects to the server.
I just cannot understand how I would connect these 2 pods and therefore require help.
server pod:
apiVersion: v1
kind: Pod
metadata:
name: server-pod
labels:
app: grpc-app
spec:
containers:
- name: server-pod
image: image
ports:
- containerPort: 8090
client pod :
apiVersion: v1
kind: Pod
metadata:
name: client-pod
labels:
app: grpc-app
spec:
hostNetwork: true
containers:
- name: client-pod
image: image
Service:
apiVersion: v1
kind: Service
metadata:
name: server
labels:
app: grpc-app
spec:
type: ClusterIP
ports:
- port: 8090
targetPort: 8090
protocol: TCP
selector:
app: grpc-app
Your service is selecting both the client and the server. You should change the labels so that the server should have something like app: grpc-server and the client should have app: grpc-client. The service selector should be app: grpc-server to expose the server pod. Then in your client app, connect to server:8090. You should remove hostNetwork: true.
One thing that i feel is going wrong is that the services are not ready to accept connection and your client is trying to access that therefore getting a connection refused.I faced the similar problem few days back. What i did is added a readiness and liveness probe in the yaml config file.Kubernetes provides liveness and readiness probes that are used to check the health of your containers. These probes can check certain files in your containers, check a TCP socket, or make HTTP requests.
A sample like this
spec:
containers:
- name: imagename
image: image
ports:
- containerPort: 19500
name: http
readinessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 120
periodSeconds: 5
livenessProbe:
httpGet:
path: /health
port: http
scheme: HTTP
initialDelaySeconds: 120
timeoutSeconds: 5
So it will check whether your application is ready to accept connection before redirecting traffic.

How to put healthcheck in a deploy manifest?

I'm still learning kubernetes and, from pods, I'm moving to deploy configuration.
On pods I like to put healthchecks, here's an example using spring boot's actuator:
livenessProbe:
httpGet:
path: /actuator/health
port: 8080
initialDelaySeconds: 60
timeoutSeconds: 1
periodSeconds: 30
failureThreshold: 3
Problem is that the above configuration only works for pods. How can I use them in my deploy?
The Deployment will create a ReplicaSet and the ReplicaSet will maintain your Pods
Liveness and readiness probes are configured at a container level and a Pod is considered to be ready when all of its containers are ready.
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-exec
spec:
containers:
- name: liveness
image: k8s.gcr.io/busybox
args:
- /bin/sh
- -c
- touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
The Spring actuator health check API is part of your application that is bundled in a container.
Kubernetes will check the liveness and readiness probes of each container in a Pod, if any of these probes fail to return successfully after a certain amount of time and attempts it will kill the pod and start a new one.
Setting a probe at a deployment level wouldn't make sense since you can potentially have multiple pods running under the same deployment and you wouldn't want to kill healthy pods if one of your pods is not healthy.
A deployment descriptor using the same pod configuration would be something like this:
apiVersion: apps/v1
kind: Deployment
metadata:
name: liveness-deployment
labels:
app: liveness
spec:
replicas: 3
selector:
matchLabels:
app: liveness
template:
metadata:
labels:
app: liveness
spec:
containers:
- name: liveness
image: k8s.gcr.io/busybox
args:
- /bin/sh
- -c
- touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5