What happens when Kubernetes liveness-probe returns false?
Does Kubernetes restart that pod immediately?
First, please note that livenessProbe concerns containers in the pod, not the pod itself. So if you have multiple containers in one pod, only the affected container will be restarted.
It's worth noting, that there is parameter failureThreshold, which is set by default to 3. So, after 3 failed probes a container will be restarted:
failureThreshold: When a probe fails, Kubernetes will try failureThreshold times before giving up. Giving up in case of liveness probe means restarting the container. In case of readiness probe the Pod will be marked Unready. Defaults to 3. Minimum value is 1.
Ok, we have information that a container is restarted after 3 failed probes - but what does it mean to restart?
I found a good article about how Kubernetes terminates a pods - Kubernetes best practices: terminating with grace. Seems for container restart caused by liveness probe it's similar - I will share my experience below.
So basically when container is being terminated by liveness probe steps are:
if there is a PreStop hook, it will be executed
SIGTERM signal is sent to the container
Kubernetes waits for grace period
After grace period, SIGKILL signal is sent to a pod
So... if an app in your container is catching the SIGTERM signal properly, then the container will shut-down and will be started again. Typically it's happening pretty fast (as I tested for the NGINX image) - almost immediately.
Situation is different when SIGTERM is not supported by your application. It means after terminationGracePeriodSeconds period the SIGKILL signal is sent, it means the container will be forcibly removed.
Example below (modified example from this doc) + I set failureThreshold: 1
I have following pod definition:
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-exec
spec:
containers:
- name: liveness
image: nginx
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
periodSeconds: 10
failureThreshold: 1
Of course there is no /tmp/healthy file, so livenessProbe will fail. The NGINX image is properly catching the SIGTERM signal, so the container will be restarted almost immediately (for every failed probe). Let's check it:
user#shell:~/liveness-test-short $ kubectl get pods
NAME READY STATUS RESTARTS AGE
liveness-exec 0/1 CrashLoopBackOff 3 36s
So after ~30 sec the container is already restarted a few times and it's status is CrashLoopBackOff as expected. I created the same pod without livenessProbe and I measured the time need to shutdown it:
user#shell:~/liveness-test-short $ time kubectl delete pod liveness-exec
pod "liveness-exec" deleted
real 0m1.474s
So it's pretty fast.
The similar example but I added sleep 3000 command:
...
image: nginx
args:
- /bin/sh
- -c
- sleep 3000
...
Let's apply it and check...
user#shell:~/liveness-test-short $ kubectl get pods
NAME READY STATUS RESTARTS AGE
liveness-exec 1/1 Running 5 3m37s
So after ~4 min there are only 5 restarts. Why? Because we need to wait for full terminationGracePeriodSeconds period (default is 30 seconds) for every restart. Let's measure time needed to shutdown:
user#shell:~/liveness-test-short $ time kubectl delete pod liveness-exec
pod "liveness-exec" deleted
real 0m42.418s
It's much longer.
To sum up:
What happens when Kubernetes liveness-probe return false? Does Kubernetes restart that pod immediately?
The short answer is: by default no. Why?
Kubernetes will restart a container in a pod after failureThreshold times. By default it is 3 times - so after 3 failed probes.
Depends on your configuration of the container, time needed for container termination could be very differential
You can adjust both failureThreshold and terminationGracePeriodSeconds period parameters, so the container will be restarted immediately after every failed probe.
I have an API that recently started receiving more traffic, about 1.5x. That also lead to a doubling in the latency:
This surprised me since I had setup autoscaling of both nodes and pods as well as GKE internal loadbalancing.
My external API passes the request to an internal server which uses a lot of CPU. And looking at my VM instances it seems like all of the traffic got sent to one of my two VM instances (a.k.a. Kubernetes nodes):
With loadbalancing I would have expected the CPU usage to be more evenly divided between the nodes.
Looking at my deployment there is one pod on the first node:
And two pods on the second node:
My service config:
$ kubectl describe service model-service
Name: model-service
Namespace: default
Labels: app=model-server
Annotations: networking.gke.io/load-balancer-type: Internal
Selector: app=model-server
Type: LoadBalancer
IP Families: <none>
IP: 10.3.249.180
IPs: 10.3.249.180
LoadBalancer Ingress: 10.128.0.18
Port: rest-api 8501/TCP
TargetPort: 8501/TCP
NodePort: rest-api 30406/TCP
Endpoints: 10.0.0.145:8501,10.0.0.152:8501,10.0.1.135:8501
Port: grpc-api 8500/TCP
TargetPort: 8500/TCP
NodePort: grpc-api 31336/TCP
Endpoints: 10.0.0.145:8500,10.0.0.152:8500,10.0.1.135:8500
Session Affinity: None
External Traffic Policy: Cluster
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal UpdatedLoadBalancer 6m30s (x2 over 28m) service-controller Updated load balancer with new hosts
The fact that Kubernetes started a new pod seems like a clue that Kubernetes autoscaling is working. But the pods on the second VM do not receive any traffic. How can I make GKE balance the load more evenly?
Update Nov 2:
Goli's answer leads me to think that it has something to do with the setup of the model service. The service exposes both a REST API and a GRPC API but the GRPC API is the one that receives traffic.
There is a corresponding forwarding rule for my service:
$ gcloud compute forwarding-rules list --filter="loadBalancingScheme=INTERNAL"
NAME REGION IP_ADDRESS IP_PROTOCOL TARGET
aab8065908ed4474fb1212c7bd01d1c1 us-central1 10.128.0.18 TCP us-central1/backendServices/aab8065908ed4474fb1212c7bd01d1c1
Which points to a backend service:
$ gcloud compute backend-services describe aab8065908ed4474fb1212c7bd01d1c1
backends:
- balancingMode: CONNECTION
group: https://www.googleapis.com/compute/v1/projects/questions-279902/zones/us-central1-a/instanceGroups/k8s-ig--42ce3e0a56e1558c
connectionDraining:
drainingTimeoutSec: 0
creationTimestamp: '2021-02-21T20:45:33.505-08:00'
description: '{"kubernetes.io/service-name":"default/model-service"}'
fingerprint: lA2-fz1kYug=
healthChecks:
- https://www.googleapis.com/compute/v1/projects/questions-279902/global/healthChecks/k8s-42ce3e0a56e1558c-node
id: '2651722917806508034'
kind: compute#backendService
loadBalancingScheme: INTERNAL
name: aab8065908ed4474fb1212c7bd01d1c1
protocol: TCP
region: https://www.googleapis.com/compute/v1/projects/questions-279902/regions/us-central1
selfLink: https://www.googleapis.com/compute/v1/projects/questions-279902/regions/us-central1/backendServices/aab8065908ed4474fb1212c7bd01d1c1
sessionAffinity: NONE
timeoutSec: 30
Which has a health check:
$ gcloud compute health-checks describe k8s-42ce3e0a56e1558c-node
checkIntervalSec: 8
creationTimestamp: '2021-02-21T20:45:18.913-08:00'
description: ''
healthyThreshold: 1
httpHealthCheck:
host: ''
port: 10256
proxyHeader: NONE
requestPath: /healthz
id: '7949377052344223793'
kind: compute#healthCheck
logConfig:
enable: true
name: k8s-42ce3e0a56e1558c-node
selfLink: https://www.googleapis.com/compute/v1/projects/questions-279902/global/healthChecks/k8s-42ce3e0a56e1558c-node
timeoutSec: 1
type: HTTP
unhealthyThreshold: 3
List of my pods:
kubectl get pods
NAME READY STATUS RESTARTS AGE
api-server-deployment-6747f9c484-6srjb 2/2 Running 3 3d22h
label-server-deployment-6f8494cb6f-79g9w 2/2 Running 4 38d
model-server-deployment-55c947cf5f-nvcpw 0/1 Evicted 0 22d
model-server-deployment-55c947cf5f-q8tl7 0/1 Evicted 0 18d
model-server-deployment-766946bc4f-8q298 1/1 Running 0 4d5h
model-server-deployment-766946bc4f-hvwc9 0/1 Evicted 0 6d15h
model-server-deployment-766946bc4f-k4ktk 1/1 Running 0 7h3m
model-server-deployment-766946bc4f-kk7hs 1/1 Running 0 9h
model-server-deployment-766946bc4f-tw2wn 0/1 Evicted 0 7d15h
model-server-deployment-7f579d459d-52j5f 0/1 Evicted 0 35d
model-server-deployment-7f579d459d-bpk77 0/1 Evicted 0 29d
model-server-deployment-7f579d459d-cs8rg 0/1 Evicted 0 37d
How do I A) confirm that this health check is in fact showing 2/3 backends as unhealthy? And B) configure the health check to send traffic to all of my backends?
Update Nov 5:
After finding that several pods had gotten evicted in the past because of too little RAM, I migrated the pods to a new nodepool. The old nodepool VMs had 4 CPU and 4GB memory, the new ones have 2 CPU and 8GB memory. That seems to have resolved the eviction/memory issues, but the loadbalancer still only sends traffic to one pod at a time.
Pod 1 on node 1:
Pod 2 on node 2:
It seems like the loadbalancer is not splitting the traffic at all but just randomly picking one of the GRPC modelservers and sending 100% of traffic there. Is there some configuration that I missed which caused this behavior? Is this related to me using GRPC?
Turns out the answer is that you cannot loadbalance gRPC requests using a GKE loadbalancer.
A GKE loadbalancer (as well as Kubernetes' default loadbalancer) picks a new backend every time a new TCP connection is formed. For regular HTTP 1.1 requests each request gets a new TCP connection and the loadbalancer works fine. For gRPC (which is based on HTTP 2), the TCP connection is only setup once and all requests are multiplexed on the same connection.
More details in this blog post.
To enable gRPC loadbalancing I had to:
Install Linkerd
curl -fsL https://run.linkerd.io/install | sh
linkerd install | kubectl apply -f -
Inject the Linkerd proxy in both the receiving and sending pods:
kubectl apply -f api_server_deployment.yaml
kubectl apply -f model_server_deployment.yaml
After realizing that Linkerd would not work together with the GKE loadbalancer, I exposed the receiving deployment as a ClusterIP service instead.
kubectl expose deployment/model-server-deployment
Pointed the gRPC client to the ClusterIP service IP address I just created, and redeployed the client.
kubectl apply -f api_server_deployment.yaml
Google Cloud provides health checks to determine if backends respond to traffic.Health checks connect to backends on a configurable, periodic basis. Each connection attempt is called a probe. Google Cloud records the success or failure of each probe.
Based on a configurable number of sequential successful or failed probes, an overall health state is computed for each backend. Backends that respond successfully for the configured number of times are considered healthy.
Backends that fail to respond successfully for a separately configurable number of times are unhealthy.
The overall health state of each backend determines eligibility to receive new requests or connections.So one of the chances of instance not getting requests can be that your instance is unhealthy. Refer to this documentation for creating health checks .
You can configure the criteria that define a successful probe. This is discussed in detail in the section How health checks work.
Edit1:
The Pod is evicted from the node due to lack of resources, or the node fails. If a node fails, Pods on the node are automatically scheduled for deletion.
So to know the exact reason for pods getting evicted Run
kubectl describe pod <pod name> and look for the node name of this pod. Followed by kubectl describe node <node-name> that will show what type of resource cap the node is hitting under Conditions: section.
From my experience this happens when the host node runs out of disk space.
Also after starting the pod you should run kubectl logs <pod-name> -f and see the logs for more detailed information.
Refer this documentation for more information on eviction.
I have an app running on three pods behind a loadbalancer, all set up with Kubernetes. My problem is that when I take pods down or update them, this results in a couple of 503s before the loadbalancer notices the pod is unavailable and stops sending traffic to it. Is there any way to inform the loadbalancer directly that it should stop sending traffic to a pod? So we can avoid the 503s on pod update
You need to keep in mind if the pods are down the loadbalancer will still be redirecting the traffic to the designated service ports, and as no pod is servicing those ports.
Hence you should use rolling update mechanism in kubernetes which gives zero down time to the deployment. link
Since there are 3 pods running behind a Load balancer, I believe you must be using Deployment/Statefulset to manage them.
If by updating the pods you mean updating docker image version running in the pod then you can make use of Update strategies in Deployment to do rolling update. This will update your pods with zero downtime.
Additionally you can also make use of startup, readiness and liveness probe to only direct traffic to the pod when the pod is ready/live to serve traffic.
You should implement probes for pods. Read Configure Liveness, Readiness and Startup Probes.
There are readinessProbe and LivenessProbes which you can make use of. In your case I think you can make use of readinessProbe, only when your readinessProbe will pass, kubernetes will start sending traffic to your Pod.
For example
apiVersion: v1
Kind: Pod
metadata:
name: my-nginx-pod
spec:
containers:
- name: my-web-server
image: nginx
readinessProbe:
httpGet:
path: /login
port: 3000
in this above example, the nginx Pod will only receive traffic in case it passed the readinessProbe.
You can find more about probes here https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
Is there a way to tell Kubernetes to just destroy a pod and create a new one if the liveness probe fails? What I see from logs now: my node js application is just restarted and runs in the same pod.
The liveness probe is defined in my YAML specification as follows:
livenessProbe:
httpGet:
path: /app/check/status
port: 3000
httpHeaders:
- name: Accept
value: application/x-www-form-urlencoded
initialDelaySeconds: 60
periodSeconds: 60
Disclaimer:
I am fully aware that recreating a pod if a liveness prove fails is probably not the best idea and a right way would be to get a notification that something is going on.
So liveness and readiness probes are defined in containers not pods so if you have 1 container in your pod and you specify restartPolicy to Never. Then your pod will go into a Failed state and will be scrapped at some point based on the terminated-pod-gc-threshold value.
If you have more than one container in your pod it becomes tricker because of your other container(s) running making the pod still be in Running status. You can build your own automation or try Pod Readiness which is still in alpha as of this writing.
I have been creating pods with type:deployment but I see that some documentation uses type:pod, more specifically the documentation for multi-container pods:
apiVersion: v1
kind: Pod
metadata:
name: ""
labels:
name: ""
namespace: ""
annotations: []
generateName: ""
spec:
? "// See 'The spec schema' for details."
: ~
But to create pods I can just use a deployment type:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: ""
spec:
replicas: 3
template:
metadata:
labels:
app: ""
spec:
containers:
etc
I noticed the pod documentation says:
The create command can be used to create a pod directly, or it can
create a pod or pods through a Deployment. It is highly recommended
that you use a Deployment to create your pods. It watches for failed
pods and will start up new pods as required to maintain the specified
number. If you don’t want a Deployment to monitor your pod (e.g. your
pod is writing non-persistent data which won’t survive a restart, or
your pod is intended to be very short-lived), you can create a pod
directly with the create command.
Note: We recommend using a Deployment to create pods. You should use
the instructions below only if you don’t want to create a Deployment.
But this raises the question of what kind:pod is good for? Can you somehow reference pods in a deployment? I didn't see a way. It looks like what you get with pods is some extra metadata but none of the deployment options such as replica or a restart policy. What good is a pod that doesn't persist data, survives a restart? I think I'd be able to create a multi-container pod with a deployment as well.
Radek's answer is very good, but I would like to pitch in from my experience, you will almost never use an object with the kind pod, because that doesn't make any sense in practice.
Because you need a deployment object - or other Kubernetes API objects like a replication controller or replicaset - that needs to keep the replicas (pods) alive (that's kind of the point of using kubernetes).
What you will use in practice for a typical application are:
Deployment object (where you will specify your apps container/containers) that will host your app's container with some other specifications.
Service object (that is like a grouping object and gives it a so-called virtual IP (cluster IP) for the pods that have a certain label - and those pods are basically the app containers that you deployed with the former deployment object).
You need to have the service object because the pods from the deployment object can be killed, scaled up and down, and you can't rely on their IP addresses because they will not be persistent.
So you need an object like a service, that gives those pods a stable IP.
Just wanted to give you some context around pods, so you know how things work together.
Hope that clears a few things for you, not long ago I was in your shoes :)
Both Pod and Deployment are full-fledged objects in the Kubernetes API. Deployment manages creating Pods by means of ReplicaSets. What it boils down to is that Deployment will create Pods with spec taken from the template. It is rather unlikely that you will ever need to create Pods directly for a production use-case.
Kubernetes has three Object Types you should know about:
Pods - runs one or more closely related containers
Services - sets up networking in a Kubernetes cluster
Deployment - Maintains a set of identical pods, ensuring that they have the correct config and that the right number of them exist.
Pods:
Runs a single set of containers
Good for one-off dev purposes
Rarely used directly in production
Deployment:
Runs a set of identical pods
Monitors the state of each pod, updating as necessary
Good for dev
Good for production
And I would agree with other answers, forget about Pods and just use Deployment. Why? Look at the second bullet point, it monitors the state of each pod, updating as necessary.
So, instead of struggling with error messages such as this one:
Forbidden: pod updates may not change fields other than spec.containers[*].image
So just refactor or completely recreate your Pod into a Deployment that creates a pod to do what you need done. With Deployment you can change any piece of configuration you want to and you need not worry about seeing that error message.
Pod is container instance.
That is the output of replicas: 3
Think of one deployment can have many running instances(replica).
//deployment.yaml
apiVersion: apps/v1beta2
kind: Deployment
metadata:
name: tomcat-deployment222
spec:
selector:
matchLabels:
app: tomcat
replicas: 3
template:
metadata:
labels:
app: tomcat
spec:
containers:
- name: tomcat
image: tomcat:9.0
ports:
- containerPort: 8080
I want to add some informations from Kubernetes In Action book, so you can see all picture and connect relation between Kubernetes resources like Pod, Deployment and ReplicationController(ReplicaSet)
Pods
are the basic deployable unit in Kubernetes. But in real-world use cases, you want your deployments to stay up and running automatically and remain healthy without any manual intervention. For this the recommended approach is to use a Deployment, which under the hood create a ReplicaSet.
A ReplicaSet, as the name implies, is a set of replicas (Pods) maintained with their Revision history.
(ReplicaSet extends an older object called ReplicationController -- which is exactly the same but without the Revision history.)
A ReplicaSet constantly monitors the list of running pods and makes sure the running number of pods matching a certain specification always matches the desired number.
Removing a pod from the scope of the ReplicationController comes in handy
when you want to perform actions on a specific pod. For example, you might
have a bug that causes your pod to start behaving badly after a specific amount
of time or a specific event.
A Deployment
is a higher-level resource meant for deploying applications and updating them declaratively.
When you create a Deployment, a ReplicaSet resource is created underneath (eventually more of them). ReplicaSets replicate and manage pods, as well. When using a Deployment, the actual pods are created and managed by the Deployment’s ReplicaSets, not by the Deployment directly
Let’s think about what has happened. By changing the pod template in your Deployment resource, you’ve updated your app to a newer version—by changing a single field!
Finally, Roll back a Deployment either to the previous revision or to any earlier revision so easy with Deployment resource.
These images are from Kubernetes In Action book, too.
Pod is a collection of containers and basic object of Kuberntes. All containers of pod lie in same node.
Not suitable for production
No rolling updates
Deployment is a kind of controller in Kubernetes.
Controllers use a Pod Template that you provide to create the Pods for which it is responsible.
Deployment creates a ReplicaSet which in turn make sure that,
CurrentReplicas is always same as desiredReplicas .
Advantages :
You can rollout and rollback your changes using deployment
Monitors the state of each pod
Best suitable for production
Supports rolling updates
In Kubernetes we can deploy our workloads using different type of API objects like Pods, Deployment, ReplicaSet, ReplicationController and StatefulSets.
Out of those Pods are the smallest deployable unit in Kubernetes. Any workload/application that runs in Kubernetes, has to run inside a container part of a Pod. A Pod could run multiple containers (meaning multiple applications) within it. A Pod is a wrapper on top of one/many running containers. Using a Pod, kubernetes could control, monitor, operate the containers.
Now using stand alone Pods we can't do lot of things. We can't change configurations, volumes inside Pods. We can't restart the Pod if one is down.
So there is another API Object called Deployment comes into picture which maintains the desired state (how many instances, how much compute resource application uses) of the application. The Deployment maintaines multiple instances of same application by running multiple Pods. Deployments unlike Pods are mutable. Deployments uses another API Object called ReplicaSet to maintain the desired state. Deployments through ReplicaSet spawns another Pod if one is down.
So Pod runs applications in containers. Deployments run Pods and maintains desired state of the application.
Try to avoid Pods and implement Deployments instead for managing containers as objects of kind Pod will not be rescheduled (or self healed) in the event of a node failure or pod termination.
A Deployment is generally preferable because it defines a ReplicaSet to ensure that the desired number of Pods is always available and specifies a strategy to replace Pods, such as RollingUpdate.
May be this example will be helpful for beginners !!
1) Listing PODs
controlplane $ kubectl -n my-namespace get pods
NAME READY STATUS RESTARTS AGE
mysql 1/1 Running 0 92s
webapp-mysql-75dfdf859f-9c54j 1/1 Running 0 92s
2) Deleting web-app pode - which is created using deployment
controlplane $ kubectl -n my-namespace delete pod webapp-mysql-75dfdf859f-9c54j
pod "webapp-mysql-75dfdf859f-9c54j" deleted
3) Listing PODs ( You can see, it is recreated automatically)
controlplane $ kubectl -n my-namespace get pods
NAME READY STATUS RESTARTS AGE
mysql 1/1 Running 0 2m42s
webapp-mysql-75dfdf859f-mqrcx 1/1 Running 0 45s
4) Deleting mysql POD whcih is created directly ( with out deployment)
controlplane $ kubectl -n my-namespace delete pod mysql
pod "mysql" deleted
5) Listing PODs ( You can see mysql POD is lost for ever )
controlplane $ kubectl -n my-namespace get pods
NAME READY STATUS RESTARTS AGE
webapp-mysql-75dfdf859f-mqrcx 1/1 Running 0 76s
In kubernetes Pods are the smallest deployable units. Every time when we create a kubernetes object like Deployments, replica-sets, statefulsets, daemonsets it creates pod.
As mentioned above deployments create pods based on desired state mentioned in your deployment object. So for example you want 5 replicas of a application, you mentioned replicas: 5 in your deployment manifest. Now deployment controller is responsible to create 5 identical replicas (no less, no more) of given application with all metadata like RBAC policy, networks policy, labels, annotations, health check, resource quotas, taint/tolerations and others and associate with each pods it creates.
There are some cases when you wants to create pod, for example if you are running a test sidecar where you don't need to run application forever, you don't need multiple replicas, and you run application when you wants to execute in that case pod is suitable. For example helm test, which is a pod definition that specifies a container with a given command to run.
I am also a beginner in k8s so correct me if I am wrong.
We know that a pod is created when we create a deployment. What I observed is that if you see the YAML file of the deployment, you can see its kind:deployment. But if you see the YAML file of the pod, you see its kind:pod.