How to kickoff the dead replicas of Kubernetes Deployment - deployment

Now we have deployed services as Kubernetes Deployments with multiple replicas. Once the server crashes, Kubernetes will migrate its containers to another available server which tasks about 3~5 minutes.
While migrating, the client can access the the Deployment service because we still have other running replicas. But sometimes the requests fail because the load balancer redirect to the dead or migrating containers.
It would be great if Kubernetes could kickoff the dead replicas automatically and add them once they run in other servers. Otherwise, we need to setup LB like haproxy to do the same job with multiple Deployment instances.

You need to configure health checking to have properly working load balancing for a Service. Please have a read of:
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/
The kubelet uses readiness probes to know when a Container is ready to start accepting traffic. A Pod is considered ready when all of its Containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers.

1、kubelet
--node-status-update-frequency duration
Specifies how often kubelet posts node status to master. Note: be cautious when changing the constant, it must work with nodeMonitorGracePeriod in nodecontroller. Default: 10s (default 10s)
2、controller-manager
--node-monitor-grace-period duration
Amount of time which we allow running Node to be unresponsive before marking it unhealthy. Must be N times more than kubelet's nodeStatusUpdateFrequency, where N means number of retries allowed for kubelet to post node status. (default 40s)
--pod-eviction-timeout duration
The grace period for deleting pods on failed nodes. (default 5m0s)

Related

Kubernetes worker node went down, what will happen to the pod?

I have setup a EKS in AWS, setup 2 worker node and configured the autosclaing on those nodes with 3 as desired capacity.
Sometime my worker node goes down due to "an EC2 health check indicating it has been terminated or stopped." which results my pod get restarted. I have not enabled any replicas for the pods. It is one now.
Just wanted to know, how can my services (pod) will be highly available despite of any worked node goes down or restart?
If you have only one pod for your service, then your service is NOT highly available. It is a single point of failure. If that pod dies or is restarted, as has happened here, then during the time the pod is being restarted, your service is dead.
You need a bare minimum, TWO pods for a service to be highly available, they they should be on different nodes (you can force Kuberentes to schedule the pods on different nodes using pod antiaffinity (https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/) so that if one node goes down as in your example, it takes out only pod, leaving the other pod(s) to handle the requests until the other pod can be rescheduled.

Are Kubernetes liveness probe failures voluntary or involuntary disruptions?

I have an application deployed to Kubernetes that depends on an outside application. Sometimes the connection between these 2 goes to an invalid state, and that can only be fixed by restarting my application.
To do automatic restarts, I have configured a liveness probe that will verify the connection.
This has been working great, however, I'm afraid that if that outside application goes down (such that the connection error isn't just due to an invalid pod state), all of my pods will immediately restart, and my application will become completely unavailable. I want it to remain running so that functionality not depending on the bad service can continue.
I'm wondering if a pod disruption budget would prevent this scenario, as it limits the # of pods down due to a "voluntary" disruption. However, the K8s docs don't state whether liveness probe failure are a voluntary disruption. Are they?
I would say, accordingly to the documentation:
Voluntary and involuntary disruptions
Pods do not disappear until someone (a person or a controller) destroys them, or there is an unavoidable hardware or system software error.
We call these unavoidable cases involuntary disruptions to an application. Examples are:
a hardware failure of the physical machine backing the node
cluster administrator deletes VM (instance) by mistake
cloud provider or hypervisor failure makes VM disappear
a kernel panic
the node disappears from the cluster due to cluster network partition
eviction of a pod due to the node being out-of-resources.
Except for the out-of-resources condition, all these conditions should be familiar to most users; they are not specific to Kubernetes.
We call other cases voluntary disruptions. These include both actions initiated by the application owner and those initiated by a Cluster Administrator. Typical application owner actions include:
deleting the deployment or other controller that manages the pod
updating a deployment's pod template causing a restart
directly deleting a pod (e.g. by accident)
Cluster administrator actions include:
Draining a node for repair or upgrade.
Draining a node from a cluster to scale the cluster down (learn about Cluster Autoscaling ).
Removing a pod from a node to permit something else to fit on that node.
-- Kubernetes.io: Docs: Concepts: Workloads: Pods: Disruptions
So your example is quite different and according to my knowledge it's neither voluntary or involuntary disruption.
Also taking a look on another Kubernetes documentation:
Pod lifetime
Like individual application containers, Pods are considered to be relatively ephemeral (rather than durable) entities. Pods are created, assigned a unique ID (UID), and scheduled to nodes where they remain until termination (according to restart policy) or deletion. If a Node dies, the Pods scheduled to that node are scheduled for deletion after a timeout period.
Pods do not, by themselves, self-heal. If a Pod is scheduled to a node that then fails, the Pod is deleted; likewise, a Pod won't survive an eviction due to a lack of resources or Node maintenance. Kubernetes uses a higher-level abstraction, called a controller, that handles the work of managing the relatively disposable Pod instances.
-- Kubernetes.io: Docs: Concepts: Workloads: Pods: Pod lifecycle: Pod lifetime
Container probes
The kubelet can optionally perform and react to three kinds of probes on running containers (focusing on a livenessProbe):
livenessProbe: Indicates whether the container is running. If the liveness probe fails, the kubelet kills the container, and the container is subjected to its restart policy. If a Container does not provide a liveness probe, the default state is Success.
-- Kubernetes.io: Docs: Concepts: Workloads: Pods: Pod lifecycle: Container probes
When should you use a liveness probe?
If the process in your container is able to crash on its own whenever it encounters an issue or becomes unhealthy, you do not necessarily need a liveness probe; the kubelet will automatically perform the correct action in accordance with the Pod's restartPolicy.
If you'd like your container to be killed and restarted if a probe fails, then specify a liveness probe, and specify a restartPolicy of Always or OnFailure.
-- Kubernetes.io: Docs: Concepts: Workloads: Pods: Pod lifecycle: When should you use a startup probe
According to those information it would be better to create custom liveness probe which should consider internal process health checks and external dependency(liveness) health check. In the first scenario your container should stop/terminate your process unlike the the second case with external dependency.
Answering following question:
I'm wondering if a pod disruption budget would prevent this scenario.
In this particular scenario PDB will not help.
I'd reckon giving more visibility to the comment, I've made with additional resources on the matter could prove useful to other community members:
Blog.risingstack.com: Designing microservices architecture for failure
Loft.sh: Blog: Kubernetes readiness probles examples common pitfalls: External depenedencies
Cloud.google.com: Archiecture: Scalable and resilient apps: Resilience designing to withstand failures
Testing with PodDisruptionBudget.
Pod will still restart at the same time.
example
https://github.com/AlphaWong/PodDisruptionBudgetAndPodProbe
So yes. like #Dawid Kruk u should create a customized script like following
# something like this
livenessProbe:
exec:
command:
- /bin/sh
- -c
# generate a random number for sleep
- 'SLEEP_TIME=$(shuf -i 2-40 -n 1);sleep $SLEEP_TIME; curl -L --max-time 5 -f nginx2.default.svc.cluster.local'
initialDelaySeconds: 10
# think about the gap between each call
periodSeconds: 30
# it is required after k8s v1.12
timeoutSeconds: 90
I'm wondering if a pod disruption budget would prevent this scenario.
Yes, it will prevent.
As you stated, when the pod goes down (or node failure) nothing can do pods from becoming unavailable. However, Certain services require that a minimum number of pods always keep running always.
There could be another way (Stateful resource) but it’s one of the simplest Kubernetes resources available.
Note: You can also use a percentage instead of an absolute number in the minAvailable field. For example, you could state that 60% of all pods with the app=run-always label need to be running at all times.

Kubernetes scaling pods by number of active connections

I have a kubernetes cluster that runs some legacy containers (windows containers) .
To simplify , let's say that the container can handle max 5 requests at a time something like
handleRequest(){
requestLock(semaphore_Of_5)
sleep(2s)
return "result"
}
So the cpu is not spiked . I need to scale based on nr of active connections
I can see from the documentation https://kubernetes.io/docs/concepts/services-networking/service/#proxy-mode-iptables
You can use Pod readiness probes to verify that backend Pods are working OK, so that kube-proxy in iptables mode only sees backends that test out as healthy. Doing this means you avoid having traffic sent via kube-proxy to a Pod that’s known to have failed.
So there is a mechanism to make pods available for routing new requests but it is the livenessProbe that actually mark the pod as unhealthy and subject to restart policy. But my pods are just busy. They don't need restarting.
How can I increase the nr of pods in this case ?
You can enable HPA for the deployment.
You can autoscale on the no of requests metrics and perform autoscaling on this metric.
https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/#autoscaling-on-metrics-not-related-to-kubernetes-objects
I would also recommend to configure liveness probe failureThreshold and timeoutSeconds, check if it helps.
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#configure-probes

Cover for unhealthy pods

I am running multiple pods with python/gunicorn serving web request. At times, request get really slow (up to 60s) which blocks all workers and makes the livenessProbe fail.
In some instances, all pods are blocked in this state and are restarted at the same time (graceful shutdown takes up to 60s). This means that no pod is available to take new requests.
Is there a way of telling k8s to cover for pods that it is restarting? For example starting a new pod when other pods are unhealthy.
You can have an ingress or a load balancer at L7 layer which can route traffic to kubernetes service which can have multiple backend pods(selected by labels of the pods and label selector of the service) which spread across different deployments running in different nodes. The ingress controller or loadbalancer can do health check on backends and stop routing traffic to unhealthy pods.This topology overall increases the availability and resiliency of the application.

Kubernetes - which pod receives request from load balancer?

I have a load balancer service for a deployment having 3 pods. When I do a rolling udpate(changing the image) by the following command :
kubectl set image deployment/< deployment name > contname=< image-name >
and hit the service continuously, it gives a few connection refused in between. I want to check which pods it is related to. In other words, is it possible to see which request is served by which pods (without going inside the pods and checking the logs in them)? Also, Is this because of a race condition, as in when a pod might have got a request and had just been terminated before receiving that(almost simultaneously - resulting in no response)?
Have you configured liveness and readiness probes for you Pods? The service will not serve traffic to a Pod unless it thinks it is healthy, but without health checks it won't know for certain if it is ready.