kubernetes with slurm, is this correct setup?

kubernetes with slurm, is this correct setup? - kubernetes

i saw that some people use Kubernetes co-exist with slurm, I was just curious as to why you need kubernetes with slurm? what is the main difference between kubernetes and slurm?

Slurm is open source job scheduling system for large and small Linux clusters. It is mainly used as Workload Manager/Job scheduler. Mostly used in HPC (High Performance Computing) and sometimes in BigData.
Kubernetes is an orchestration system for Docker containers using the concepts of ”labels” and ”pods” to group containers into logical units. It was mainly created to run micro-services and AFAIK currently Kubernetes is not supporting Slurm.
Slumr as Job scheduler have more scheduling options than Kubernetes, but K8s is container orchestration system not only Job scheduler. For example Kubernetes is supporting Array jobs and Slurm supports Parallel and array jobs. If you want to dive in to scheduling check this article.
As I mentioned before, Kubernetes is more focused on container orchestration and Slumr is focused on Job/Workload scheduling.
Only thing comes to my mind is that someone needed very personal-customized cluster using WLM-Operator + K8s + Slurm + Singularity to execute HPC/BigData jobs.
Usually Slurm Workload Manager is used by many of the world's supercomputers to optimize locality of task assignments on parallel computers.

Related

Kubernetes Job should use all available resources

I run a Kubernetes job with thousands of workers following the pattern described in Coarse Parallel Processing Using a Work Queue. I use the Python client for the Kubernetes API to define the job programmatically. The cluster does not scale automatically. The available resources are unknown at the time of programming.
The goal is to use all available resources of the cluster for my job. I have tried to optimise the .spec.parallelism setting. If I set .spec.parallelism and .spec.completions to the same value, all pods for the job are started at the beginning, but most of them could not be scheduled due to resource requirements (e.g. insufficient CPU). When the first pods are finished, the resources are free and more pods are scheduled. But after some time (2.4 hours on my cluster) Kubernetes gives up scheduling the remaining pods and marks them as failed, which eventually causes the whole job to fail.
Is there a pattern for a job on a Kubernetes cluster to use all available resources?

Can we spin off a kubernetes cronjob automatically and dynamically? How can we do it in AWS EKS, Azure AKS based on queues or notifications?

For my microservice based application, I am designing a component which is as follows:
Task that we want to execute is of periodic nature. For it, i planned to make use of the Kubernetes cron-jobs. It executes the job every 1 hour. This works perfectly fine.
In few scenarios, i want to execute this task on-demand (in stead of waiting for next hour window). For example, if next job time is 2:00pm, i want to execute it early, say 1:20pm.
There is a related question - How can I trigger a Kubernetes Scheduled Job manually?
But I am not looking for a manual way of achieving it or explicitly calling kubectl
commands. Is there a way do it automatically, based on events/queues?
Our application is deployed on AWS EKS and Azure AKS. Can I integrate the k8 clusters to read onto some queues/pub-subs (ex. aws-sqs, aws-sns) and do it dynamically?
Your help would be immensely appreciated!

If you application is running on Kubernetes and don't want to get migrated to serverless function and keep everything inside the Kubernetes cluster you can use the Knative.
Scale to Zero With Knative
Knative is a serverless platform that is built on top of Kubernetes. It provides higher-level abstractions for common application use cases.
One key feature is its ability to run generic (micro) service-based applications as serverless with the help of built-in scale to zero support. Knative has introduced its own autoscaler, Knative Pod Autoscaler (KPA), that supports scale to zero for any service that uses non-CPU-based scaling matrics.
update your micro service to running with Knative minor change will be there and you can run it on Kubernetes.

Kubernetes parallel computing

I want to know , Kubernetes has any parallel computing implementation ?
long time ago i used OpenHPC or OpenMosix for parallel computation cluster system .
Kubernetes can replace with this services ?
if your answer is NO , so What does the word cluster mean when you talk about kubernetes ?

Kubernetes and HPC / HTC are not yet integrated, but some attempts can be observed.
In Kubernetes, Containers and HPC article you can find some kind of comparison between HPC and Kubernetes with similarities and differences.
The main differences are the workload types they focus on. While HPC workload managers are focused on running distributed memory jobs and support high-throughput scenarios, Kubernetes is primarily built for orchestrating containerized microservice applications.
If you are eager to find more information, you can read some specialist books like Seamlessly Managing HPC Workloads Through Kubernetes.
Regarding second part:
if your answer is NO , so What does the word cluster mean when you talk about kubernetes ?
You can find many definitions in the internet, however one of the easiest to understand is in Redhat Documentation.
A Kubernetes cluster is a set of node machines for running containerized applications. If you’re running Kubernetes, you’re running a cluster.
At a minimum, a cluster contains a control plane and one or more compute machines, or nodes. The control plane is responsible for maintaining the desired state of the cluster, such as which applications are running and which container images they use. Nodes actually run the applications and workloads.
The cluster is the heart of Kubernetes’ key advantage: the ability to schedule and run containers across a group of machines, be they physical or virtual, on premises or in the cloud. Kubernetes containers aren’t tied to individual machines. Rather, they’re abstracted across the cluster.
In addition, you can also find useful information in Official Kubernetes Documentation like What is Kubernetes? and Kubernetes Concepts.

How to best run Apache Airflow tasks on a Kubernetes cluster?

What we want to achieve:
We would like to use Airflow to manage our machine learning and data pipeline while using Kubernetes to manage the resources and schedule the jobs. What we would like to achieve is for Airflow to orchestrate the workflow (e.g. Various tasks dependencies. Re-run jobs upon failures) and Kubernetes to orchestrate the infrastructure (e.g cluster autoscaling and individual jobs assignment to nodes). In other words Airflow will tell the Kubernetes cluster what to do and Kubernetes decides how to distribute the work. In the same time we would also want Airflow to be able to monitor the individual tasks status. For example if we have 10 tasks spreaded across a cluster of 5 nodes, Airflow should be able to communicate with the cluster and reports show something like: 3 “small tasks” are done, 1 “small task” has failed and will be scheduled to re-run and the remaining 6 “big tasks” are still running.
Questions:
Our understanding is that Airflow has no Kubernetes-Operator, see open issues at https://issues.apache.org/jira/browse/AIRFLOW-1314. That being said we don’t want Airflow to manage resources like managing service accounts, env variables, creating clusters, etc. but simply send tasks to an existing Kubernetes cluster and let Airflow know when a job is done. An alternative would be to use Apache Mesos but it looks less flexible and less straightforward compared to Kubernetes.
I guess we could use Airflow’s bash_operator to run kubectl but this seems not like the most elegant solution.
Any thoughts? How do you deal with that?

Airflow has both a Kubernetes Executor as well as a Kubernetes Operator.
You can use the Kubernetes Operator to send tasks (in the form of Docker images) from Airflow to Kubernetes via whichever AirflowExecutor you prefer.
Based on your description though, I believe you are looking for the KubernetesExecutor to schedule all your tasks against your Kubernetes cluster. As you can see from the source code it has a much tighter integration with Kubernetes.
This will also allow you to not have to worry about creating the docker images ahead of time as is required with the Kubernetes Operator.

Kubernetes Architecture - Kubernetes Cluster Management and initializing Nodes [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I am trying to change my deploy scenario from docker to Kubernetes. Now I explored the architecture of Kubernetes - Cluster, Nodes, Pods, Services, replica Sets/controller, Kubernetes-cni, kube-ctl etc. Now I need to begin with deployment into Kubernetes cluster. When I am exploring, I found documentations and discussions that can create single node and master in same machine or possible in VMs. Also found kubespray and minikube documentations for cluster creation.
Here I am adding my confusions about hands on with Kubernetes.
For creating and working with Kubernetes, why there is a variation like single node and master in same or in VMs? Why there is a deviation in cluster container?
How I can decide whether I need to choose single node and master in same machine or I need to use Vms for different nodes?
How the Minikube and Kubespray is providing different methodology in Kubernetes architecture?, Since Kubernetes are product of one single source - Google.
If I am installing kubeadm, kubernetes-cni and kubelet in my ubuntu 16.04, Can I initiate nodes in the same machine ?
How can I clarify these confusions?

The taxonomy of concepts and terms is very complicated, and the documentation is still pretty sparse.
1. For creating and working with kubernetes,
why there is a variation like single node and master in same or in VMs?
Why there is a deviation in cluster container?
The deviation is to support many distinct use cases- container workload developers working on their laptops needing what amounts to a fake cluster without a lot of operational ceremony; kubernetes ops folks learning and testing on a small but real clusters; and real production workloads for varyingly-sized plants.
For the first case, for container workload development, there is a piece of software called minikube, which is like a distribution of kubernetes that automates creating a single virtual machine- using VirtualBox or other desktop-class virtual machine tooling- that is preconfigured to run a combined kubernetes master and node, sufficient to be able to run real kubernetes workloads, but on a laptop.
For the second case, for non production purposes, the master and worker functions can be run on a single machine, or a single master machine can be used with a small number of worker machines.
A production kubernetes cluster will usually have 3 or 5 or 7 master machines- VMs or bare metals. Multiple masters are needed to maintain quorum for etcd- where kubernetes stores all runtime state- in the case of machine failures. 3 master machines allow for 1 master machine to fail without disrupting the cluster. 5 masters will tolerate 2 master machine failures, etc.
This number of masters can support a large number of worker machines- dozens to hundreds- running the container workloads. In a production environment, one would not want to run client workloads on master machines.
2. How I can decide whether I need to choose single node and master
in same machine. Or do I need to use Vms for different nodes?
See above- for development, use minikube. For production, plan to use multiple redundant masters if you are running the cluster yourself, or use a cloud provider's managed kubernetes offering.
3. How the Minikube and Kubespray is providing different methodology
in kubernetes architecture?
Minikube is for development only. Kubespray is one of many tools that provides some automation help when building a production cluster. Kubespray's distinguishing feature is the use of Ansible for machine setup and automation. This may or may not be desirable, depending on your comfort and interest in Ansible and/or its competitors.
4. Why have so many options when kubernetes is the product of a
single source - google.
Kubernetes certainly originated in Google, but now there are hundreds or more engineers across many companies, including Microsoft, Amazon, RedHat, Oracle, and tons of tiny companies, actively working on it. It is a remarkable project.
5. If I am installing kubeadm, kubernetes-cni and kubelet in my ubuntu 16.04
Can I initiate nodes in the same machine ?
Kubeadm is a setup tool, not a production runtime tool, but yes, you can run containers on the same machine as the bits that are needed for a kubernetes master. In addition to etcd, kubelet, apiserver, controller manager, you need to run Docker as well- Kubelet talks to Docker to schedule containers. I would only advise NOT running anything else on this machine- improper configuration can cause problems with the machine serving as master/worker so any other work will be lost.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse