kubernetes - How do I get one job to work with multiple nodes? - kubernetes

Currently, when I create and run deployment, I only work on one node.
I want to work on one task at the same time using Kubernetes.
I want all nodes to work like one computer.

Kubernetes is about managing containers and scheduling them to run across a cluster, not about “jobs” per se. Have a look at MapReduce and Apache Spark.

First you need to understand more about Kubernetes and why your understanding might be a bit misleading for you concept. Kubernetes is an container orchestration tool that automates many of the manual processes involved in deploying, managing, and scaling containerized applications.
In other words, you can cluster together groups of hosts running Linux containers, and K8s helps you manage those clusters. To process some kind of job, data you will need a software that runs on kubernetes.
The next step that you might want to look into is distributed computing concept and distributed computing model called MapReduce.
MapReduce was introduce by Google to meet the demand of large set of users for its applications. Its used to write write scalable applications that can do parallel processing to process a large amount of data on a large cluster of commodity hardware servers. Hadoop is software that has adopted MapReduce and is capable of running it`s programs in various languages (Python, Ruby, C++).
Take a look on this medium article about distributed computing system based on MapReduce and Kubernetes.

Related

Web application deployment approach using Google Cloud - GKE

I deploying a python + tensorflow + flask application using a fully managed Google Cloud Run Service (1 vCPUs and 4 GB Ram).
System works fine but it is really slow, so I am evaluating ways of making it fast (it needs to run 20-30 times faster than what is doing now)
What would be the best approach?
To use a Kubernetes Cluster with one or two powerful machines
To use a Kubernetes Cluster with 3-5 weaker machines
To forget about Kubernets/Docker and run everything on single powerfull VM
Something else maybe?
For now I don't expect to have more than 10 users at a time but I want to be able to scale it up eventually.
You might want to evaluate according to your use case
Per this article, Fully managed Cloud Run is an ideal serverless platform for stateless containerized microservices that don’t require Kubernetes features like namespaces, co-location of containers in pods (sidecars) or node allocation and management.
GKE is a great choice if you are looking for a container orchestration platform that offers advanced scalability and configuration flexibility.
You mentioned you are looking the cheaper/easier method to develop, but this will probably not be as scalable, efficient or manageable, you might want to take a closer look at all cloud compute options in GCP to see what could benefit your use case the most.
You mentioned your use case is CPU intensive, so you might want to leverage the high CPU machine types, these might be used directly by creating a VM, creating an instance group or using them in other services like GKE or App Engine

Best way to setup ELK in production

Is it a good practice to setup Elasticsearch, logstash and kiban on 3 different servers, with each server having RAM of 8GB.
Or
Setup ELK on 1 single machine with higher memory of 16GB.
The machine needs to be highly available.
Can anyone suggest or share inputs
it depends on your task and situation. normally it is good practice to setup Elasticsearch, logstash and kiban on 3 different server. or if you data if more so you have to make a cluster of elastic search or may have more than one server of logstash .
filefeats will be on all the data(log) server .
there are an example of handling 25000 logs per secoung
https://engineering.viki.com/blog/2015/log-processing-at-scale-elk-cluster-at-25k-events-per-second/
Its slightly more complicated than explained here,
Any distributed component would try to offer features with sharded or partioned way. In a similar way the Elastic Search at ELK which is based out of Master Slave model and maintains the data at ES data nodes. This means one needs to set up a cluster of nodes for Elastic search itself for its various components such as ES Master, ES data and ES client.
The next level if the system is required at production grade which requires Multi master setup with minimum 3 master nodes.
This would be the beginning of ELK.
If one needs to run such a complex system in a limited resources, then Containerizing the ELK components and running them in a container orchestration framework is the recommended option. Kubernetes/Docker swarm are the options to run ELK cluster based on the dockerized instances of ELK. Again these orchestration frameworks also require multimaster setup , but that would be fair as one would have lot more components in a cloud environment and all of them could be controlled under these orchestration frameworks.

What is the canonical way to deploy Scala/Akka microservices?

We are going to end up with dozens of these microservices (most are Akka-based), and I'm unsure how to best manage their deployment. Specifically, they are built to be independent of each other and as specialized and distributed as possible.
My question stems from the fact that all of them are too small for their own individual JVMs; even if we were to host them on AWS nano instances, we'll still end up with about 40 machines if you factor in redundancy, and such a high number is simply not needed. Three medium size instances could (and do) easily handle the entire workload.
Currently, I just group them into "container" applications, somewhat randomly, and then run these container applications on larger JVMs.
However, there has to be a better way. I am not aware of any application servers for Akka where you can just "deploy actors", so I wanted to get some insight on how others run Akka microservices in production (and specifically how to manage deployment).
This is probably not limited to Scala and Akka, but most other platforms have dedicated app servers where you deploy these things.
IMHO, the canonical way is to use a service orchestration tool, and that would indeed run them in individual processes, each with their own JVM.
That's the only way you get the decoupling, isolation, resilience you want with microservices, only this way you'll be able to deploy, update, stop, start them individually.
You're saying:
My question stems from the fact that all of them are too small for
their own individual JVMs; even if we were to host them on AWS nano
instances
You seem to treat JVM and Amazon VMs as equivalent, but that's not the case. You can have multiple JVM processes on a single virtual machine.
I suggest you have a look at service orchestration tools such as
Lightbend Production Suite / Service Orchestration
or Kubernetes
These are just examples, there are others. Note that this tool category will give you a lot of features you'll sooner or later need anyway, such as easy scaling, log consolidation, service lookup, health checks / service failure handling etc.

Apache Mesos vs Google Kubernetes

What's the difference between Apache's Mesos and Google's Kubernetes
I read the accepted answers but I'm still confused what the differences are.
If Kubernetes is a cluster management then what does Mesos do (I understand what it does from watching bunch of videos but I suppose I'm more confused how those two work together)?
From reading both Kubernetes and Marathon are "framework" sitting on top of Mesos?
What is Mesos responsible for and what are Kubernetes/Marathon responsible for and how do they work with each other?
EDIT:
I think the better question is When would I want to use Kubernetes on top of Mesos vs just running Mesos alone?
Mesos is another abstraction layer. It simply abstracts underlying hardware so the software that want to run on the top of it could only define required resources without having to know any other information.
Kubernetes could do similar thing but without abstraction provided by Mesos you can't run other frameworks (e.g., Spark or Cassandra) on same machine without manually dividing it between those frameworks.
Apache Mesos is a resource manager that shares resources (CPU shares, RAM, disk, ports) across a cluster of machines in a fair way. By sharing, I mean it offers these resources to so called framework schedulers (such as Marathon) and thereby has a clear separation of concerns in terms of resource management and scheduling decisions (which is implemented, depending on the job type, for example long-running or batch, by the framework scheduler). See also the Mesos architecture for further details.

Messaging between torque jobs in a cluster

So, I need to submit computation intensive jobs (deep neural network training) to a torque cluster that lease computation time on, and I need to exchange a few megs of large float arrays every few minutes between the active nodes, as the nodes need to be working on the most recent version of the neural network in order to train it well.
I was wondering if there were any good communication options, at least to tell each active job its sisters jobs' ips so it can connect to them by tcp. The nodes don't have access to the internet, and we can't have daemons working on the job submitting server.
The only options that I see would be:
some message passing option on Torque (I'm am fairly noob at torque)
the very error prone option of using files to communicate, which I hate.
a way to query the ips of the active nodes from the server.
There are a variety of ways to exchange information between nodes on a cluster, depending on the architecture of the cluster. Torque is a resource manager, so if the job is being submitted to the cluster using a batch script there are a few environment variables that should be able to give you the hostnames or IP addresses of the nodes being used on a job.
The exact syntax for finding the IP addresses and/or hostnames will depend on the scheduler/workload manager being used with Torque on your cluster. This link has documentation for the PBS Works workload manager.
Parallel communication between nodes can be achieved in a variety of ways and will be partly dependent on the hardware available in the cluster. Using MPI is one of the most common ways to parallelize code for use on a cluster and many implementations support multiple high-performance fabrics/interconnect systems like Infiniband. Some useful introductions to the different types of parallelism can be found here.
As an alternative to MPI Remote Direct Memory Access(RDMA) can be used to pass and access information between nodes. If the cluster has Infiniband network adapters looking into the IB-Verbs API from the vendor would be an additional option for passing data between nodes.