Configure Kubernetes StatefulSet to start pods first restart failed containers after start? - kubernetes

Basic info
Hi, I'm encountering a problem with Kubernetes StatefulSets. I'm trying to spin up a set with 3 replicas.
These replicas/pods each have a container which pings a container in the other pods based on their network-id.
The container requires a response from all the pods. If it does not get a response the container will fail. In my situation I need 3 pods/replicas for my setup to work.
Problem description
What happens is the following. Kubernetes starts 2 pods rather fast. However since I need 3 pods for a fully functional cluster the first 2 pods keep crashing as the 3rd is not up yet.
For some reason Kubernetes opts to keep restarting both pods instead of adding the 3rd pod so my cluster will function.
I've seen my setup run properly after about 15 minutes because Kubernetes added the 3rd pod by then.
Question
So, my question.
Does anyone know a way to delay restarting failed containers until the desired amount of pods/replicas have been booted?

I've since found out the cause of this.
StatefulSets launch pods in a specific order. If one of the pods fails to launch it does not launch the next one.
You can add a podManagementPolicy: "Parallel" to launch the pods without waiting for previous pods to be Running.
See this documentation

I think a better way to deal with your problem is to leverage liveness probe, as described in the document, rather than delay the restart time (not configurable in the YAML).
Your pods respond to the liveness probe right after they are started to let Kubernetes know they are alive, which prevents them from being restarted. Meanwhile, your pods keep ping others until they are all up. Only when all your pods are started will serve the external requests. This is similar to creating a Zookeeper ensemble.

Related

How to prevent or fix Kubernetes pod getting stuck in containerCreating occasionally

I'm running AWS EKS, running on Fargate, and using Kubernetes to orchestrate multiple cron jobs. I spin roughly 1000 pods up and down over the course of a day.
Very seldomly(once every 3 weeks) one of the pods gets stuck in ContainerCreating and just hangs there and because I have concurrency disabled that particular job will never run. The fix is simply terminating the job or the pod and having it restart but this is a manual intervention.
Is there a way to get a pod to terminate or restart, if it takes too long to create?
The reason for the pod getting stuck varies quite a bit. A solution would need to be general. It can be a time based solution as all the pods are running the same code with different configurations so the startup time is relatively consistent.
Sadly there is no mecanism to stop a job if it fail at image pulling or container creating. I also tried to do what you are trying to achieve.
You can set a backoffLimit inside your template. But it won't handle the number of retries during containerCreating, only while running.
What you can do is a script that makes describes of each pods in namespace. And try to parse it and restart the pod if it is stuck in containerCreating.
Or try to debug/trace what is causing this. kubectl describe pods to get info when your pod is in containerCreating.

Is it possible to schedule a pod to run for say 24 hours and then remove deployment/statefulset? or need to use jobs?

We have a bunch of pods running in dev environment. The pods are auto-provisioned by an application on every business action. The problem is that across various namespaces they are accumulating and eating available resources in EKS.
Is there a way without jenkins/k8s jobs to simply put some parameter on the pod manifest to tell it to self destruct say in 24 hours?
Add to your pod.spec:
activeDeadlineSeconds: 86400
After deadline your Pod will be stopped for good with the status DeadlineExceeded
If I understood your situation properly, you would like to scale your cluster down in order to save resources.
Kubernetes is featured with the ability to autoscale your application in a cluster. Literally, it means that Kubernetes can start additional pods when the load is increasing and terminate excessive pods when the load is decreasing.
It is possible to downscale the application to zero pods, but, in this case, you will have a delay serving the first request while the pod is starting.
This functionality relies on performance metrics. From the practical side, it means that autoscaling doesn't happen instantly, because it takes some time to performance metrics reach the configured threshold.
The mentioned Kubernetes feature called HPA(horizontal pod autoscale) is described in this document.
In case you are running your cluster on GCP or GKE, you are able to go further and automatically start additional nodes for your cluster when you need more computing capacity and shut down nodes when they are not running application pods anymore.
More information about this functionality can be found following the link.
Last, but not least, you can use tool like Ansible to manage all your kubernetes assets (it can create/manage deployments via playbooks).
If you decide to give it a try, you might find this information useful:
Creating a Container cluster in GKE
70% cheaper Kubernetes cluster on AWS
How to build a Kubernetes Horizontal Pod Autoscaler using custom metrics

Kubernetes Deployment with Zero Down Time

As a leaner of Kubernetes concepts, their working, and deployment with it. I have a couple of cases which I don't know how to achieve. I am looking for advice or some guideline to achieve it.
I am using the Google Cloud Platform. The current running flow is described below. A push to the google source repository triggers Cloud Build which creates a docker image and pushes the image to the running cluster nodes.
Case 1: Now I want that when new pods are up and running. Then traffic is routed to the new pods. Kill old pod but after each pod complete their running request. Zero downtime is what I'm looking to achieve.
Case 2: What will happen if the space of running pod reaches 100 and in the Debian case that the inode count reaches full capacity. Will kubernetes create new pods to manage?
Case 3: How to manage pod to database connection limits?
Like the other answer use Liveness and Readiness probes. Basically, a new pod is added to the service pool then it will only serve traffic after the readiness probe has passed. The old pod is removed from the Service pool, then drained and then terminated. This happens on a rolling fashion one pod at a time.
This really depends on the capacity of your cluster and the ability to schedule pods depending on the limits for the containers in them. For more about setting up limits for containers refer to here. In terms of the inode limit, if you reach it on a node, the kubelet won't be able to run any more pods on that node. The kubelet eviction manager also has a mechanism in where evicts some pods using the most inodes. You can also configure your eviction thresholds on the kubelet.
This would be more a limitation at the OS level combined your stateful application configuration. You can keep this configuration in a ConfigMap. And for example in something for MySql the option would be max_connections.
I can answer case 1 since Ive done it myself.
Use Deployments with readinessProbes & livelinessProbes

Kubernetes (K8s) - Can you check what time a pod restarts at?

I am trying to create a system health check. I would like to be able to determine how often the pods are restarting. Since the pods have liveness probes, they may continuously restart if something is wrong. Is there anyway to check the time a pod restarted at instead of just the number of restarts and the total time the pod has been alive?
You may want to use a monitoring tool, like cAdvisor/Heapster/Grafana/Prometheus.
Another option you have is to deploy the kube-state-metrics container.
Good articles to read: Monitoring Kubernetes and Get Kubernetes Cluster Metrics with Prometheus.
Also, read a similar question on stackoverflow.

Graceful termination of kubernetes pods

We have an application with 4 pods running with a load balancer! We want to try the rolling update, but we are not sure what happens when a pod goes down! The documentation is unclear! Particularly this quote from Termination Of Pods:
Pod is removed from endpoints list for service, and are no longer considered part of the set of running pods for replication controllers. Pods that shutdown slowly can continue to serve traffic as load balancers (like the service proxy) remove them from their rotations.
So, if someone can guide us on the following questions :
1.) When a pod is shutting down, can it still serve new requests? Or does the load balancer not consider it?
2.) Does it complete the requests it is processing till the grace-period is exhausted? and then kills the container even if any process is still running?
3.) Also, this mentions replication controllers, what we have is a Deployment and Deployment has replica sets, so will there be any difference?
We went through this question but the answers are conflicting without any source : Does a Kubernetes rolling-update gracefully remove pods from a service load balancer
1) when a Pod is shutting down it's state is changed to Terminating and it is not considered by the LoadBalancer - as described in the Pod termination docs
2) Yes - you might want to look at the pod.Spec.TerminationGracePeriodSeconds configuration to gain some control. You'll find details in the API documentation
3) No - the ReplicaSet and the Deployment take care of scheduling Pods, there's no difference when it comes to the shutdown behaviour of the Pods