We are planning to create few new microservices as part of our platform.
Currently all our microservices are Java-Spring based, and we use docker/kubernetes to scale.
We are now planning to evaluate Scala-Akka to create microservice.
Any pointers about how we can scale Scala-Akka microservice will be great help.
We found that (in some blogs) than scala-akka microservice are also being deployed as docker containers. Is that the only way to scale scala-akka services or is there any other possible way as well?
Also, akka provides components which can help in scaling, as mentioned in following blog:
https://www.datio.com/iaas/building-a-docker-container-orchestrator-with-akka/
Which is better way to scale scala-akka microservices?
Thanks
Anuj
Akka has akka-management which provides 2 ways to deploy/discovery the service by kubernetes: By DNS and Kube APIs.
It really works.
https://doc.akka.io/docs/akka-management/current/discovery/kubernetes.html
Related
CONTEXT: I have been learning Kubernetes and trying to get some hands-on experience. I have been using AKS to abstract the complexity of having to deal with the control plane (and because I have a free student azure account). I am deploying a NodeJS app that connects to the MongoDB database. So far the deployment has been successful but I am using MongoDB Atlas and connecting to it.
Based on the little I have learned about Stateful sets, the MongoDB Atlas service seems a lot easier and more convenient but my question is, when would it be a better idea to consider deploying a stateful set with MongoDB database? (running on the pod) What's more cost-effective? More easily scalable?
I realize the questions might be a little bit vague but I am just getting started with Kubernetes..
disclaimer: This is not a production application, just something simple I am using to learn K8S
Official docs docs uses statefullset and that would make sense. Generally all DB kind of applications deployed as statefullset. Because there can be states that nodes are not sync with each other and that would create data inconsistencies between nodes(mongodb nodes not kubernetes).
You can deploy MongoDB as deployment. I have seen it deployed. But most clients use a connection string to connect(a string of multiple node addresses). And since kubernetes exposes statefullsets with headless services you should be okay.
For learning purpose, I advice you to deploy your MongoDB in a StatefulSet. Then you can learn how it works and what problem you could encounter with this Kubernetes object.
For production application, I advice to never deploy a database in a StatefulSet if you don't need it. In fact, StatefulSet will come with a lot of problematics that you might not need to manage.
Sometimes, companies rules restrict to host their data on external company storage.
To know if you need to put your database in a StatefulSet, the question I try to answer is:
Should my DB be hosted on premise (for privacy)?
Should my DB be scalable?
Should my DB be updated frequently?
You can find a list of pros/cons on the documentation.
I deploying a python + tensorflow + flask application using a fully managed Google Cloud Run Service (1 vCPUs and 4 GB Ram).
System works fine but it is really slow, so I am evaluating ways of making it fast (it needs to run 20-30 times faster than what is doing now)
What would be the best approach?
To use a Kubernetes Cluster with one or two powerful machines
To use a Kubernetes Cluster with 3-5 weaker machines
To forget about Kubernets/Docker and run everything on single powerfull VM
Something else maybe?
For now I don't expect to have more than 10 users at a time but I want to be able to scale it up eventually.
You might want to evaluate according to your use case
Per this article, Fully managed Cloud Run is an ideal serverless platform for stateless containerized microservices that don’t require Kubernetes features like namespaces, co-location of containers in pods (sidecars) or node allocation and management.
GKE is a great choice if you are looking for a container orchestration platform that offers advanced scalability and configuration flexibility.
You mentioned you are looking the cheaper/easier method to develop, but this will probably not be as scalable, efficient or manageable, you might want to take a closer look at all cloud compute options in GCP to see what could benefit your use case the most.
You mentioned your use case is CPU intensive, so you might want to leverage the high CPU machine types, these might be used directly by creating a VM, creating an instance group or using them in other services like GKE or App Engine
There are scenario where you want to run a cluster of microservices in High-Availability but you would like just one of them to execute a specific operation (consuming from a queue, polling a database)
What are the best practices with relation to this use case? Should one use Zookeeper as a registry, or are there other suitable technologies?
There are a couple of technologies for service registration and discovery. Please see if the following articles help:
StackShare's comparison of Consul vs. ZooKeeper vs. Eureka
A nice paper for service-discovery and guide on how to make the choice
What's the difference between Apache's Mesos and Google's Kubernetes
I read the accepted answers but I'm still confused what the differences are.
If Kubernetes is a cluster management then what does Mesos do (I understand what it does from watching bunch of videos but I suppose I'm more confused how those two work together)?
From reading both Kubernetes and Marathon are "framework" sitting on top of Mesos?
What is Mesos responsible for and what are Kubernetes/Marathon responsible for and how do they work with each other?
EDIT:
I think the better question is When would I want to use Kubernetes on top of Mesos vs just running Mesos alone?
Mesos is another abstraction layer. It simply abstracts underlying hardware so the software that want to run on the top of it could only define required resources without having to know any other information.
Kubernetes could do similar thing but without abstraction provided by Mesos you can't run other frameworks (e.g., Spark or Cassandra) on same machine without manually dividing it between those frameworks.
Apache Mesos is a resource manager that shares resources (CPU shares, RAM, disk, ports) across a cluster of machines in a fair way. By sharing, I mean it offers these resources to so called framework schedulers (such as Marathon) and thereby has a clear separation of concerns in terms of resource management and scheduling decisions (which is implemented, depending on the job type, for example long-running or batch, by the framework scheduler). See also the Mesos architecture for further details.
I'm new to Akka Clusters, however as I am understanding its documentation, I need to know at least one "seed node" to join an existing cluster.
So when using clusters with OpenShift I would need to know if the current gear is the first node - then I would create a new cluster - or if there are already some other gears around - I would need to know at least one of their IPs to join them.
Is this possible with OpenShift cloud? (I'm using the DIY catridge, so customizing the start up script wouldn't be a problem. However I can't find any environment variable which provides me relevant data.)
DIY gears on OpenShift Online do not scale. And if you are spinning up separate applications for each of the nodes in your cluster, you are going to (probably) run into inter-gear communication issues. You might need to create your own akka cartridge (http://docs.openshift.org/origin-m4/oo_cartridge_developers_guide.html), then you can set your own scaling options. You might check out this cartridge (https://github.com/smarterclayton/openshift-redis-cart) which supports scaling and might give you some ideas about how to implement yours.