I am new to Kubernetes and I am facing some issues.
The project codebase in bitbucket and in each commit, there are pipelines in bitbucket which build a pod in the Kubernetes cluster. So the pods do some tasks and terminate after the task got completed. When the commits are high cluster fails due to a large number of pods. So I am trying to find a solution to queue it in the Kubernetes cluster so the pods will use all the resources of my cluster after the termination of the pods it will run the other pods in the queue and so on. Any help?
You can set pod resource requirements so that when you create a pod and requirements cannot be satisfied, scheduler will put it in pending state and schedule it as soon as there are available resources.
The only downfall of this solution is that it does not guarantee that pods will be created in the same order as api request sent to create them.
Related
I run a Kubernetes job with thousands of workers following the pattern described in Coarse Parallel Processing Using a Work Queue. I use the Python client for the Kubernetes API to define the job programmatically. The cluster does not scale automatically. The available resources are unknown at the time of programming.
The goal is to use all available resources of the cluster for my job. I have tried to optimise the .spec.parallelism setting. If I set .spec.parallelism and .spec.completions to the same value, all pods for the job are started at the beginning, but most of them could not be scheduled due to resource requirements (e.g. insufficient CPU). When the first pods are finished, the resources are free and more pods are scheduled. But after some time (2.4 hours on my cluster) Kubernetes gives up scheduling the remaining pods and marks them as failed, which eventually causes the whole job to fail.
Is there a pattern for a job on a Kubernetes cluster to use all available resources?
I'm wondering if there is an option to execute the task from POD? So when I have POD which is, for example, listening for some requests and on request will delegate the task to other POD(worker POD). This worker POD is not alive when there are no jobs to do and if there is more than one job to do more PODs will be created. After the job is done "worker PODs" will be stopped. So worker POD is live during one task then killed and when new task arrived new worker POD is started. I hope that I described it properly. Do you know if this is possible in Kubernetes? Worker POD start may be done by, for example, rest call from the main POD.
There are few ways to achieve this behavior.
Pure Kubernetes Way
This solution requires ServiceAccount configuration. Please take a loot at Kubernetes documentation Configure Service Accounts for Pods.
Your application service/pod can handle different custom task. Applying specific serviceaccount to your pod in order to perform specific task is the best practice. In kubernetes using service account with predefined/specified RBAC rules allows you to handle this task almost out of the box.
The main concept is to configure specific RBAC authorization rules for specific service account by giving different permission (get,list,watch,create) to different Kubernetes resources like (pod,jobs).
In this scenario working pod is waiting for incoming request, after it receives specific request it can perform specific task against kubernetes api.
This can be extend i.e. by using sidecar container inside your working pod. More details about sidecar concept can be found in this article.
Create custom controller
Another way to achieve your goal is to use Custom Controller.
Example presented in Extending the Kubernetes Controller article is showing how custom controller watch kubernetes api in order to instrument underling worker pod (watching kubernetes configuration for any changes and then deletes corresponding pods). In your setup, such controller could watch your api for waiting/not processed request and perform additional task like kubernetes job creation inside k8s cluster.
Using existing solution like Job Processing Using a Work Queue.
RabbitMQ on Kubernetes
Coarse Parallel Processing Using a Work Queue
Kubernetes Message Queue
Keda
Suppose we have 4 nodes eks cluster in ec2-autoscaling min with 4 nodes.
A kubernetes application stack deployed on the same with one pod- one node. Now traffic increases HPA triggered on eks level.
Now total pods are 8 pods ,two pods - on one node. Also triggered auto-scaling. Now total nodes are 6 nodes.
Its observed all pods remain in current state. Post autscaling also.
Is there a direct and simpler way?
Some of already running pods should automatically launch on the additional nodes (detect it and reschedule itself on the recently added idle worker/nodes (i:e non-utilized - by using force eviction of pods)
Thanks in Advance.
One easy way is to delete all those pods by selector using below command and let the deployment recreate those pods in the cluster
kubectl delete po -l key=value
There could be other possibilities. would be glad to know from others
Take a look at the Descheduler. This project runs as a Kubernetes Job that aims at killing pods when it thinks the cluster is unbalanced.
The LowNodeUtilization strategy seems to fit your case:
This strategy finds nodes that are under utilized and evicts pods, if
possible, from other nodes in the hope that recreation of evicted pods
will be scheduled on these underutilized nodes.
Another option is to apply a little of chaos engineering manually, forcing a Rolling Update on your deployment, and hopefully, the scheduler will fix the balance problem when pods are recreated.
You can use the kubectl rollout restart my-deployment. It's way better than simply deleting the pods with kubectl delete pod, as the rollout will ensure availability during the "rebalancing" (although deleting the pods altogether increases your chances for a better rebalance).
We are still in a design phase to move away from monolithic architecture towards Microservices with Docker and Kubernetes. We did some basic research on Docker and Kubernetes and got some understanding. We still have couple of open question considering we will be creating K8s cluster with multiple Linux hosts (due to some reason we can't think about Cloud right now) .
Consider a scenario where we have K8s Cluster spanning over multiple linux hosts (5+).
1) If one of the linux worker node crashes and once we bring it back, does enabling kubelet as part of systemctl in advance will be sufficient to bring up required K8s jobs so that it be detected by master again?
2) I believe once worker node is crashed (X pods), after the pod eviction timeout master will reschedule those X pods into some other healthy node(s). Once the node is UP it won't do any deployment of X pods as master already scheduled to other node but will be ready to accept new requests from Master.
Is this correct ?
Yes, should be the default behavior, check your Cluster deployment tool.
Yes, Kubernetes handles these things automatically for Deployments. For StatefulSets (with local volumes) and DaemonSets things can be node specific and Kubernetes will wait for the node to come back.
Better to create a test environment and see/test the failure scenarios
As a leaner of Kubernetes concepts, their working, and deployment with it. I have a couple of cases which I don't know how to achieve. I am looking for advice or some guideline to achieve it.
I am using the Google Cloud Platform. The current running flow is described below. A push to the google source repository triggers Cloud Build which creates a docker image and pushes the image to the running cluster nodes.
Case 1: Now I want that when new pods are up and running. Then traffic is routed to the new pods. Kill old pod but after each pod complete their running request. Zero downtime is what I'm looking to achieve.
Case 2: What will happen if the space of running pod reaches 100 and in the Debian case that the inode count reaches full capacity. Will kubernetes create new pods to manage?
Case 3: How to manage pod to database connection limits?
Like the other answer use Liveness and Readiness probes. Basically, a new pod is added to the service pool then it will only serve traffic after the readiness probe has passed. The old pod is removed from the Service pool, then drained and then terminated. This happens on a rolling fashion one pod at a time.
This really depends on the capacity of your cluster and the ability to schedule pods depending on the limits for the containers in them. For more about setting up limits for containers refer to here. In terms of the inode limit, if you reach it on a node, the kubelet won't be able to run any more pods on that node. The kubelet eviction manager also has a mechanism in where evicts some pods using the most inodes. You can also configure your eviction thresholds on the kubelet.
This would be more a limitation at the OS level combined your stateful application configuration. You can keep this configuration in a ConfigMap. And for example in something for MySql the option would be max_connections.
I can answer case 1 since Ive done it myself.
Use Deployments with readinessProbes & livelinessProbes