Akka -- Deploy two ActorSystems on the same host - scala

I'm writing this as a follow up to PlayFramework -- Look up actors in another local ActorSystem, but this time targetting the question specifically to the Akka crowd.
The question is simple: Does it make sense to deploy two ActorSystems on the same host (not just on the same host but even on the same JVM), given that there appears to be no way to simply lookup the other system through system.actorSelection unless you remote to localhost?
In other words, since system1.actorSelection("akka://system2/user/my-actor") does not work, but system1.actorSelection("akka.tcp://system2#127.0.0.1:2552/user/my-actor") does, why even consider deploying two systems?
I suspect you're going to ask about a use case, so here's one for you. Assume I have a complex real-time system using Akka and that this system is deployed as autonomous agents on any number of machines. Ideally, I'd like to have fine-grained control of the resources I allocate to this system and I'd like it to be somewhat isolated. Furthermore, assume that I want to write a small control interface (e.g., a REST API) with the specific purpose to provide input and monitor the real-time system. Naturally, I would make that control system another ActorSystem which interacts with the first system. It makes sense, right? I don't want to have actors running in the same ActorSystem as the real-time processing (for isolation, practicality, separate logging, non pollution of resource monitoring, supervision -- that would add one more branch to the hierarchy --, etc.). That control ActorSystem would never be deployed on a separate machine since it goes hand in hand with the real-time system. Yet, the only way for these two systems to communicate is through loopback tcp.
Is what I'm suggesting not the proper/intended way to do things? Am I missing something? Is there a way to do this that I haven't considered? Does my use case even call for using Akka?
Thanks in advance for your input!

Instead of having two separate actor systems, you could have a top level actor for each of the branches and run each branch on a dedicated dispatcher. Each top level actor will have its own error kernel as well. Having 2 actor systems mostly makes sense, when they are not related, but as yours communicate, I would not separate them.

Related

Multiple actor systems for an application

This article talks about how we should not create 'too' many actor systems. But the docs say:
An ActorSystem is a heavyweight structure that will allocate 1…N
Threads, so create one per logical application.
I am unable to understand what is the real issue here with using multiple actor systems in an application. Also, is it possible for actors from different actor system to message each other?
There is no issue with using multiple systems. There is a potential issue with creating too many of them. The reason is that with an ActorSystem comes some non-negligible overhead - mainly because each one would allocate its own fork-join pool.
I recommend you read this blogpost for more info.
Actors from different ActorSystems can message each other, but AFAIK this needs to happen through remoting. This counts as yet another reason why system segregation doesn't really make sense as a local pattern.

Azure Service Fabric reliable actors vs reliable services

I am new to Azure Service Fabric and the biggest questions I have are
When should I use reliable actors? Give me practical examples please.
When should I use reliable services? Give me practical examples please.
Taken a look at the differences:
State analogy : Actors work on a single instance of an object graph.
Services usually have state for multiple callers.
Scope : Actors can’t work alone, because of their size (more like objects).
Life-cycle : Actors are only active when used, so
more will fit on your available server resources
Concurrency : Actors
enforce single threaded access
State : Actors just modify the
aggregate, services work on sets so often use transactions on sets
for ACID behavior.
Communication : Actors communicate through
channels provided by the platform. Services may choose otherwise.
Access : Actors in the cluster can’t be reached from the outside by
default. You’ll probably need a Service that provides access.
Samples when to use an actor:
For every user of your mobile app you could have one actor.
For every thermostat that sends information to your application you could have one actor.
For every customer of your e-commerce site, you could have one shopping-basket actor.
Create a service in the cases that you are probably used to. Create a reliable service that provides a service for multiple users at once. For example a weather service.
I don't mean to use a word to define itself, but use Reliable Actors only if you've determined your problem fits the actor design pattern. Actors are a design pattern much like many of the Gang of Four design patterns. If your problem fits one of the patterns, use it. If it doesn't, it's best not to try to shoehorn your problem into the wrong pattern.
In Service Fabric, Reliable Actors are an implementation of the Virtual Actor pattern. It has certain rules of operation and caveats that go with them. This is a good doc to read to get an idea of how the Reliable Actor framework works and whether or not it meets your requirements: https://azure.microsoft.com/en-us/documentation/articles/service-fabric-reliable-actors-platform/
Reliable Actors are in fact just a framework built on top of Reliable Services, so all the same scaling, partitioning, and distribution rule apply.

Akka.Net work queues

I have an existing distributed computing framework built on top of MassTransit and RabbitMQ. There is essentially a manager which responds with work based on requests. Each worker will take a certain amount of items based on the physcial machine specs. The worker then sends completion messages when done. It works rather well and seems to be highly scalable since the only link is the service bus.
I recently evaluated Akka.Net in order to see if that would be a simpler system to implement the same pattern. After looking at it I was somewhat confused at what exactly it is used for. It seems that if I wanted to do something similar the manager would have to know about each worker ahead of time and directly send it work.
I believe I am missing something because that model doesn't seem to scale well.
Service buses like MassTransit are build as reliable messaging services. Ensuring the message delivery is primary concern there.
Actor frameworks also use messages, but this is the only similarity. Messaging is only a mean to achieve goal and it's not as reliable as in case of the service buses. They are more oriented on building high performance, easily distributed system topologies, centered around actors as primary unit of work. Conceptually actor is close to Active Record pattern (however this is a great simplification). They are also very lightweight. You can have millions of them living in memory of the executing machine.
When it comes to performance, Akka.NET is able to send over 30 mln messages/sec on a single VM (tested on 8 cores) - a lot more than any service bus, but the characteristics also differs significantly.
On the JVM we now that akka clusters may rise up to 2400 machines. Unfortunately we where not able to test, what the .NET implementation limits are.
You have to decide what do you really need: a messaging library, an actor framework or a combination of both.
I agree with #Horusiath answer. In addition, I'd say that in most cases you can replace a servicebus for the messaging system of an actor model like akka, but they are not in the same class.
Messaging is just one thing that Akka provides, and while it's a great feature, I wouldn't say it's the main one. When analyzing it as an alternative, you must first look at the benefits of the model itself and then look if the messaging capabilities are good enough for your use case. You can still use a dedicated external servicebus to distribute messages across different clusters and keep akka.net exchanging messages inside clusters for example.
But the point is that if you decide to use Akka.net, you won't be using it only for messaging.

Scala + Akka: How to develop a Multi-Machine Highly Available Cluster

We're developing a server system in Scala + Akka for a game that will serve clients in Android, iPhone, and Second Life. There are parts of this server that need to be highly available, running on multiple machines. If one of those servers dies (of, say, hardware failure), the system needs to keep running. I think I want the clients to have a list of machines they will try to connect with, similar to how Cassandra works.
The multi-node examples I've seen so far with Akka seem to me to be centered around the idea of scalability, rather than high availability (at least with regard to hardware). The multi-node examples seem to always have a single point of failure. For example there are load balancers, but if I need to reboot one of the machines that have load balancers, my system will suffer some downtime.
Are there any examples that show this type of hardware fault tolerance for Akka? Or, do you have any thoughts on good ways to make this happen?
So far, the best answer I've been able to come up with is to study the Erlang OTP docs, meditate on them, and try to figure out how to put my system together using the building blocks available in Akka.
But if there are resources, examples, or ideas on how to share state between multiple machines in a way that if one of them goes down things keep running, I'd sure appreciate them, because I'm concerned I might be re-inventing the wheel here. Maybe there is a multi-node STM container that automatically keeps the shared state in sync across multiple nodes? Or maybe this is so easy to make that the documentation doesn't bother showing examples of how to do it, or perhaps I haven't been thorough enough in my research and experimentation yet. Any thoughts or ideas will be appreciated.
HA and load management is a very important aspect of scalability and is available as a part of the AkkaSource commercial offering.
If you're listing multiple potential hosts in your clients already, then those can effectively become load balancers.
You could offer a host suggestion service and recommends to the client which machine they should connect to (based on current load, or whatever), then the client can pin to that until the connection fails.
If the host suggestion service is not there, then the client can simply pick a random host from it internal list, trying them until it connects.
Ideally on first time start up, the client will connect to the host suggestion service and not only get directed to an appropriate host, but a list of other potential hosts as well. This list can routinely be updated every time the client connects.
If the host suggestion service is down on the clients first attempt (unlikely, but...) then you can pre-deploy a list of hosts in the client install so it can start immediately randomly selecting hosts from the very beginning if it has too.
Make sure that your list of hosts is actual host names, and not IPs, that give you more flexibility long term (i.e. you'll "always have" host1.example.com, host2.example.com... etc. even if you move infrastructure and change IPs).
You could take a look how RedDwarf and it's fork DimDwarf are built. They are both horizontally scalable crash-only game app servers and DimDwarf is partly written in Scala (new messaging functionality). Their approach and architecture should match your needs quite well :)
2 cents..
"how to share state between multiple machines in a way that if one of them goes down things keep running"
Don't share state between machines, instead partition state across machines. I don't know your domain so I don't know if this will work. But essentially if you assign certain aggregates ( in DDD terms ) to certain nodes, you can keep those aggregates in memory ( actor, agent, etc ) when they are being used. In order to do this you will need to use something like zookeeper to coordinate which nodes handle which aggregates. In the event of failure you can bring the aggregate up on a different node.
Further more, if you use an event sourcing model to build your aggregates, it becomes almost trivial to have real-time copies ( slaves ) of your aggregate on other nodes by those nodes listening for events and maintaining their own copies.
By using Akka, we get remoting between nodes almost for free. This means that which ever node handles a request that might need to interact with an Aggregate/Entity on another nodes can do so with RemoteActors.
What I have outlined here is very general but gives an approach to distributed fault-tolerance with Akka and ZooKeeper. It may or may not help. I hope it does.
All the best,
Andy

Maintaining state between two machines

We have two industrial controllers that are used to control critical systems. The idea is that on failure of one controller, the other controller will automatically take over. To ensure the swap over is seamless, each the standby controller must mirror the state of the online controller at all time.
We have a solution, which is poorly coded and documented. The question is, is there a common design pattern that implements such a system or open source software that achieves a similar thing thaty could be used to create a generic solution that could be used for controllers or PC's and can be extended to allow any number of controllers to act as standby routines.
On approach is "cache coherence". Commercial products -- Tangosol, for example -- do this.
Another approach is a light-weight version of an Enterprise Service Bus (ESB) or Service Oriented Architecture (SOA). Almost all the SOA vendors have products for this. I'd start with Tibco, which has a lightweight component set that you can use for this.
Since SOA isn't that hard, you can roll your own using the HTTP protocol so one controller can POST status to it's shadow controllers.
There is a difference between failover and transparent failover. Do you really have requirements for transparent failover? If so, you're going to end up paying for it (in both cost and complexity).
That being said, take a look at this post on Buddy Replication for an elegant solution to the problem.
There is the standard Master-Slave pattern used my almost all DBMS' that support clustering, distributed architectures and replication (http://en.wikipedia.org/wiki/Database_replication).
So, very basically in your situation you could have the Master machine maintaining state, and the slave sitting there doing nothing except updating its own state from that of the master. If the master goes down, the slave sees the master is no longer there, and can take over the control of state, with the master only being used again once it has updated its own state from that of the slave (which has maintained state while the master has not been active).
The traditional approach taken in controlling realtime critical systems is to run the two units in lockstep. Tandem have been building some very impressive fault-tolerant machines using this technique for years.
However, lockstep is very much a hardware-level solution; i don't think you could implement classic lockstep purely at the software level. Or at least, not straightforwardly. Maybe using state machines synchronised by exchange of vector clocks or something equally propeller-headed?
There is an analogous situation with the space shuttle computers. In that situation, they used 5 computers and if one machine was late or different from the others, it was (in essence) voted off of the island.
In your situation, how do you determine which controller has gone bad? Is the determining machine also considered for single-point failure?
What level of communications are available between the two controllers? Shared memory, Ethernet, or something even slower?
How fast does state information change between the two?
Is it possible to feed identical information to both controllers and would both controllers calculate the same state transitions?
Maybe a shared SQLite database or something similar?