Is it possible to Autoscale Akka - kubernetes

I need an Akka cluster to run multiple CPU intensive jobs. I cannot predict how much CPU power I need. Sometimes load is high, while at other times, there isn't much load. I guess autoscaling is a good option, which means, example: I should be able to specify that I need minimum 2 and maximum 10 Actors. The cluster should scale up or down along with a cool off period as load goes up or down. Is there a way to do that?
I am guessing, maybe one can make an Docker image of the codebase, and autoscale it using Kubernetes. Is it possible? Is there a native Akka solution?
Thanks

If you consider a project like hseeberger/constructr and its issue 179, a native Akka solution should be based on akka/akka-management:
This repository contains interfaces to inspect, interact and manage various Parts of Akka, primarily Akka Cluster. Future additions may extend these concepts to other parts of Akka.
There is a demo for kubernetes.

Related

Guessing kubernetes limits for kubernetes deployments

Is there any way we can correctly guess how much resource limits we need to keep for running deployments on kubernetes clusters.
Yes, you can guess that single threaded application most likely won't need more that 1 CPU.
For any other programs: no, there is not easy way to guess it. Every application is different, and reacts differently under different workloads.
The easiest way to figure out how many resources it needs is to run it and measure it.
Run some benchmarks/profilers and see how application behaves. Then make decisions based on that.

how to do load balancing on remote Actors in Akka?

I am working in a distributed environment where I have to setup Actors in remote systems. I want to distribute the load among all the remote actors. Can anyone suggest me the best way to balance load in a cluster? My current scenario is in one remote system I have 10 actors which are running. so for example, let's say I have 3 system and systems have 10 actors and I want to balance the load among all the 30 actors.
A good way to distribute work is by pulling it from the worker instead of centralising the decision and pushing, which can potentially overload the worker nodes if you have a higher rate of work coming in than you can actually process.
There is a sample project and tutorial showing worker actors pulling work here: https://developer.lightbend.com/guides/akka-distributed-workers-scala/

How to monitor (micro)services?

I have a set of services. Every service contains some components.
Some of them are stateless, some of them are stateful, some are synchronous, some are asynchronous.
I used different approaches to monitoring and alerting.
Log-based alerting and metrics gathering. New Relic based. Own bicycle.
Basically, atm I am looking for a way, how to generalize and aggregate important metrics for all services in single place. One of things, I want is that we monitor more products, than separate services.
As an end result I see it as a single dashboard with small amount of widgets, but looking at those widgets I would be able to say for sure, if services are usable to end-customer.
Probably someone can recommend me some approach/methodology. Or give a reference to some best practices.
I like what you're trying to achieve! A service is not production-ready unless it's thoroughly monitored.
I believe what your're describing goes into the topics of health-checking and metrics.
... I would be able to say for sure, if services are usable to end-customer.
That however will require a little of both ;-) To ensure you're currently fulfilling your SLA, you have to make sure, that your services are all a) running and b) perform as requested. With both problems I suggest to look at the StatsD toolchain. Initially developed by Etsy, it has become the de-facto standard for gathering metrics.
To ensure all your services are running, we're relaying Kubernetes. It takes our description for what should run, be reachable from outside etc. and hosts that on our infrastructure. It also makes sure, that should things die - that they will be restarted. It helps with things like auto-scaling etc. as well! Awesome tooling and kudos to Google!
The way it ensures that is with health-checks. There are multiple ways how you can ensure your service node booted by Kubernetes is alive and kicking (namely HTTP calls and CLI scripts but this should be a modular thing should you need anything else!) If Kubernetes detects unhealthy nodes it will immediately phase them out and start another node instead.
Now, making sure, all your services perform as expected you'll need to gather some metrics. For all of our services (and all individual endpoints), we gather a few metrics via StatsD like:
Requests/sec
number of errors returned (404, etc...)
Response times (Average, Median, Percentiles depending on the services SLA)
Payload size (Average)
sometimes the number of concurrent requests per endpoint, the number of instances currently running
general metrics like the hosts current CPU and memory usage and uptime.
We gather a lot more metrics but that's about the bottom line. Since StatsD has become more of a "protocol specification" than a concrete product there are a myriad of collector, front- and backends to choose from. They help you visualize your systems state and many of them feature alerts of something or some combination of metrics go beyond their thresholds.
Let me know, if this was helpfull!
There's at least 3 types of things you will need to monitor: the host where the service is deployed, the component itself and the SLAs and some of them depend on the software stack you're using as well as the architecture.
With that said, you could for example use Nagios to monitor the hardware where the services are deployed, Splunk for the services metrics/SLAs as well as for any errors that might occur. You can also use SNMP packages in case something goes wrong and you have a more sophisticated support structure, this would be yours triggers. Without knowing how your infrastructure/services are set up it is complicated to go into deeper details.

How does Akka 2.4.x work in A Cluster Application when I loose one of my nodes

My application has a set of Actors, each one doing some heavy computation, and each one executing a different business logic. At the end each actor sends the result back to the Supervisor that in turn persist the data.
My intention is to have them distribute in 3 nodes to split/balance the workload, as well as make the system high available, by allowing on of the machines "die".
There is no need to share state among the machines
How does Akka solve for this scenario?
Is it an Akka cluster that I need?
Are there any examples that fall in this domain?
To share state between instance you can use Sharding and PersistentActor.
You can play Reactive Missile Defend project to visualise what happened if node goes down.
There are nice talks on JDD2015 Sharding with Akka. From theory to production and Scala eXchange - Beat Aliens with Akka Cluster showing how to use distributed Actors (with Cluster and Sharding) and how they behave in situation of turning off one of the nodes.

Scala + Akka: How to develop a Multi-Machine Highly Available Cluster

We're developing a server system in Scala + Akka for a game that will serve clients in Android, iPhone, and Second Life. There are parts of this server that need to be highly available, running on multiple machines. If one of those servers dies (of, say, hardware failure), the system needs to keep running. I think I want the clients to have a list of machines they will try to connect with, similar to how Cassandra works.
The multi-node examples I've seen so far with Akka seem to me to be centered around the idea of scalability, rather than high availability (at least with regard to hardware). The multi-node examples seem to always have a single point of failure. For example there are load balancers, but if I need to reboot one of the machines that have load balancers, my system will suffer some downtime.
Are there any examples that show this type of hardware fault tolerance for Akka? Or, do you have any thoughts on good ways to make this happen?
So far, the best answer I've been able to come up with is to study the Erlang OTP docs, meditate on them, and try to figure out how to put my system together using the building blocks available in Akka.
But if there are resources, examples, or ideas on how to share state between multiple machines in a way that if one of them goes down things keep running, I'd sure appreciate them, because I'm concerned I might be re-inventing the wheel here. Maybe there is a multi-node STM container that automatically keeps the shared state in sync across multiple nodes? Or maybe this is so easy to make that the documentation doesn't bother showing examples of how to do it, or perhaps I haven't been thorough enough in my research and experimentation yet. Any thoughts or ideas will be appreciated.
HA and load management is a very important aspect of scalability and is available as a part of the AkkaSource commercial offering.
If you're listing multiple potential hosts in your clients already, then those can effectively become load balancers.
You could offer a host suggestion service and recommends to the client which machine they should connect to (based on current load, or whatever), then the client can pin to that until the connection fails.
If the host suggestion service is not there, then the client can simply pick a random host from it internal list, trying them until it connects.
Ideally on first time start up, the client will connect to the host suggestion service and not only get directed to an appropriate host, but a list of other potential hosts as well. This list can routinely be updated every time the client connects.
If the host suggestion service is down on the clients first attempt (unlikely, but...) then you can pre-deploy a list of hosts in the client install so it can start immediately randomly selecting hosts from the very beginning if it has too.
Make sure that your list of hosts is actual host names, and not IPs, that give you more flexibility long term (i.e. you'll "always have" host1.example.com, host2.example.com... etc. even if you move infrastructure and change IPs).
You could take a look how RedDwarf and it's fork DimDwarf are built. They are both horizontally scalable crash-only game app servers and DimDwarf is partly written in Scala (new messaging functionality). Their approach and architecture should match your needs quite well :)
2 cents..
"how to share state between multiple machines in a way that if one of them goes down things keep running"
Don't share state between machines, instead partition state across machines. I don't know your domain so I don't know if this will work. But essentially if you assign certain aggregates ( in DDD terms ) to certain nodes, you can keep those aggregates in memory ( actor, agent, etc ) when they are being used. In order to do this you will need to use something like zookeeper to coordinate which nodes handle which aggregates. In the event of failure you can bring the aggregate up on a different node.
Further more, if you use an event sourcing model to build your aggregates, it becomes almost trivial to have real-time copies ( slaves ) of your aggregate on other nodes by those nodes listening for events and maintaining their own copies.
By using Akka, we get remoting between nodes almost for free. This means that which ever node handles a request that might need to interact with an Aggregate/Entity on another nodes can do so with RemoteActors.
What I have outlined here is very general but gives an approach to distributed fault-tolerance with Akka and ZooKeeper. It may or may not help. I hope it does.
All the best,
Andy