Pods are deleted regularly - kubernetes

I have a kubernetes cluster, created and managed from stackpoint.io.
The cluster is made of 1 master and 2 nodes, all running on coreOs.
I created 5 deployments with 1 replicas each and 5 services pointing to these deployments.
5 pods are created from these deployments.
My problem is : between 24 to 36 hours after the pods are created, they are automatically deleted and recreated.
because these different apps rely on each other to work, it takes about 3-5 minutes of downtime before everything works correctly again.
I assume there is some configuration I don't know about that control this behavior.
I tested using containers with latest tag as well as not latest. I tested changing the pullStrategie from Always to IfNotPresent. I tested using a ubuntu based cluster instead of CoreOs. In every configuration the pods are deleted and recreated

Related

How to manage auto scale pods in kuberneted openshift deploeyments?

I want to create 6 pods using kubernetes with openshift. So my scenerio is like following.
each pods are different tasks, each of gets data from different remote databases.
run only two pods at a time, and when those two pods are finished, they will be down and the next two pods will work
is this possible?

How to migrate StatefulSet to different nodes?

I have a Kubernetes cluster of 3 nodes in Amazon EKS. It's running 3 pods of Cockroachdb in a StatefulSet. Now I want to use another instance type for all nodes of my cluster.
So my plan was this:
Add 1 new node to the cluster, increase replicas in my StatefulSet to 4 and wait for the new Cockroachdb pod to fully sync.
Decommission and stop one of the old Cockroachdb nodes.
Decrease replicas of the StatefulSet back to 3 to get rid of one of the old pods.
Repeat steps 1-3 two more times.
Obviously, that doesn't work because StatefulSet deletes most recent pods first when scaling down, so my new pod gets deleted instead of the old one.
I guess I could just create a new StatefulSet and make it use existing PVs, but that doesn't seem like the best solution for me. Is there any other way to do the migration?
You can consider make a copy of your ASG current launch template -> upgrade the instance type of the copied template -> point your ASG to use this new launch template -> perform ASG instance refresh. Cluster of 3 nodes with minimum 90% of healthy % ensure only 1 instance will be replace at a time. Affected pod on the drained node will enter pending state for 5~10 mins and redeploy on the new node. This way you do not need to scale up StatefulSet un-necessary.

Does Kubernetes support hundreds PODs in "terminating" status for a week?

Our worker service is a long running service. When scale in happen or deployment happen, we expect the PODs can finish existing works (max alive 1 week) then exit.
What I have tried is I make a deployment with 10 pods and set terminationgraceperiodseconds = 604800, and then scale down instance to 1, that works good.
Question here is our service will have hundreds POD, which means worse case is hundreds pod will be in terminating status, run 7 days then exit. Is this workable in K8s world, or any potential issue?
Thank you if any comment~
Google starts and destroys over 7 billion pods per week. This is the purpose they were made for. As long as you are persisting the necessary data to disk, Kubernetes will replicate the state of the pod exactly as you have configured. A paper has also been published on how this scale may be achieved.

Deployment with replicaset & node shutdown

We have a deployment with replicas: 1
We deploy it in a 3 agent node k8s cluster (k8s 1.8.13) and it gets deployed to a node (say agent node-0). When I shutdown node-0, the rs does not get rescheduled (its been more than an hour now).
I have checked that the selector labels are correct and we have plenty of capacity in the cluster (and also we don’t specify resource requests). Also I checked that our node selectors are just checking for agent nodes and there are 2 other agent nodes available.
Is there any special treatments around this shutdown scenario that k8s does ?
That's the pod that get's re-scheduled, not the replicas set. If you would be doing rolling updates, based on the image version, for example, every time a new image would be available, controller manager would take the number of desired and available pods in the replica set to 0 and would create a new one.
But when you shut down a node, and the pod get's re-scheduled, keeping the same replica set, then your cluster is working fine.

GKE cluster v1.2.4 - route recreation

I've created a 3 nodes GKE cluster which is running fine but I noticed a few times that my components are not able to reach the API Server during 3 or 4 minutes.
I recently had the same problem again on a fresh new cluster so I decided to look a bit closer. In the Compute Engine Operations section I noticed at the same time that the 3 routes had been removed and recreated 4 minutes later... This task had been scheduled by a #cloudservices.gserviceaccount.com address so from the cluster directly I suppose.
What is causing this behavior, forcing the routes to be deleted and recreated randomly ?
The apiserver may become unreachable if it gets temporarily overloaded or if it is upgraded or repaired. This should be unrelated to routes being removed and recreated, although it's possible that the node manager does not behave correctly when it is restarted.