Logical Actor Paths vs Physical Actor Paths - scala

In the Akka documentation (https://doc.akka.io/docs/akka/current/general/addressing.html) the definitions of each are
Logical Actor Paths: The unique path obtained by following the parental supervision links towards the root guardian is called the logical actor path. This path matches exactly the creation ancestry of an actor, so it is completely deterministic as soon as the actor system’s remoting configuration (and with it the address component of the path) is set.
Physical Actor Paths: While the logical actor path describes the functional location within one actor system, configuration-based remote deployment means that an actor may be created on a different network host than its parent, i.e. within a different actor system. In this case, following the actor path from the root guardian up entails traversing the network, which is a costly operation. Therefore, each actor also has a physical path, starting at the root guardian of the actor system where the actual actor object resides. Using this path as sender reference when querying other actors will let them reply directly to this actor, minimizing delays incurred by routing.
My question is: How is it possible that an actor and its parent can exist in different actor systems? Would someone please shed light on how to understand the physical path? My understanding of actor system based on reading Akka documentation (https://doc.akka.io/docs/akka/current/general/actor-systems.html) is that each actor system starts with a root actor, then its children actors, then its grandchildren actors. So every actor's parent by definition resides in the same actor system. Maybe it is my understanding of the definition of actor system is off?

First of all it is important to note that Akka was explicitly designed with Location Transparency in mind. So it is designed to be able to run on a cluster of several different "nodes" (i.e. different JVMs instances either running on different physical machines or wrapped into different virtual machines) with minimal or even no changes in code. For example, you can configure Akka to create some Actors on remote machines or you can do the same from code. In the Akka documentation there is no distinction for "Actor System" into "logical" and "physical" ones. In the article you reference the thing called "Actor System" is actually what one might call "Physical Actor System" i.e. something running inside single JVM. But using configuration from the link above Actor in one Actor System can create a remote Actor into another physical JVM process i.e. in a different Actor System. And this is when the notion of "logical path" vs "physical path" comes into reality.
Hope this clarifies the documentation a bit.

What is wrong with that statement? Actors can be distributed, meaning it can be co-located on the same host or on a completely different host. Depending on where the child is, you could do one of the following:
"akka://my-sys/user/service-a/worker1" // purely local
"akka.tcp://my-sys#host.example.com:5678/user/service-b" // remote
If you are concerned about remote supervison, it is just going to work in the same way like local supervison. Have a look at the documentation here:
https://doc.akka.io/docs/akka/2.5.4/scala/remoting.html#watching-remote-actors

It is important to understand the difference between logical actor path and a physical actor path.
Performance of your actor-based distributed system may depend on that.
Remote deployment means that an actor may be created on a different
network host than its parent, i.e. within a different actor system. In
this case, following the actor path from the root guardian up entails
traversing the network, which is a costly operation. Therefore, each
actor also has a physical path, starting at the root guardian of the
actor system where the actual actor object resides. Using this path as
sender reference when querying other actors will let them reply
directly to this actor, minimizing delays incurred by routing.
https://getakka.net/articles/concepts/addressing.html
Notice that the logical path defines a supervision hierarchy for an actor and physical path shows where the actor deployed. A physical actor path never spans multiple actor systems.

Related

Akka cluster-sharding: moving actor shards based on communication patterns

I am building an open-source distributed economic simulation platform using Akka (especially the remote and cluster packages). A key bottleneck in such simulations is the fact that communication patterns between actors evolve over the course of the simulation and often actors will end up sending loads of messages over the wire between nodes in the cluster.
I am looking for a mechanism to detect actors on some node that are communicating heavily with actors on some other node and move them to that other node. Is this possible using existing Akka cluster sharding functionality? Perhaps this is what Roland Kuhn meant by "automatic actor tree partitioning" is his answer to this SO question.
To move shards around according to your own logic is doable by implementing a custom ShardAllocationStrategy.
You just have to extend ShardAllocationStrategy and implement those 2 methods:
def allocateShard(requester: ActorRef, shardId: ShardId,
currentShardAllocations: Map[ActorRef, immutable.IndexedSeq[ShardId]])
: Future[ActorRef]
def rebalance(currentShardAllocations: Map[ActorRef,
immutable.IndexedSeq[ShardId]], rebalanceInProgress: Set[ShardId])
: Future[Set[ShardId]]
The first one determines which region will be chosen when allocating a new shard, and provides you the shards already allocated. The second one is called regularly and lets you control which shards to rebalance to another region (for example, if they became too unbalanced).
Both functions return a Future, which means that you can even query another actor to get the information you need (for example, an actor that has the affinity information between your actors).
For the affinity itself, I think you have to implement something yourself. For example, actors could collect statistics about their sender nodes and post that regularly to a cluster singleton that would determine which actors should be moved to the same node.

Use akka actors to traverse directory tree

I'm new to the actor model and was trying to write a simple example. I want to traverse a directory tree using Scala and Akka. The program should find all files and perform an arbitrary (but fast) operation on each file.
I wanted to check how can I model recursion using actors?
How do I gracefully stop the actor system when the traversal will be finished?
How can I control the number of actors to protect against out of memory?
Is there a way to keep the mailboxes of the actors from growing too big?
What will be different if the file operation will take long time to execute?
Any help would be appreciated!
Actors are workers. They take work in and give results back, or they supervise other workers. In general, you want your actors to have a single responsibility.
In theory, you could have an actor that processes a directory's contents, working on each file, or spawning an actor for each directory encountered. This would be bad, as long file-processing time would stall the system.
There are several methods for stopping the actor system gracefully. The Akka documentation mentions several of them.
You could have an actor supervisor that queues up requests for actors, spawns actors if below an actor threshold count, and decrementing the count when actors finish up. This is the job of a supervisor actor. The supervisor actor could sit to one side while it monitors, or it could also dispatch work. Akka has actor models the implement both of these approaches.
Yes, there are several ways to control the size of a mailbox. Read the documentation.
The file operation can block other processing if you do it the wrong way, such as a naive, recursive traversal.
The first thing to note is there are two types of work: traversing the file hierarchy and processing an individual file. As your first implementation try, create two actors, actor A and actor B. Actor A will traverse the file system, and send messages to actor B with the path to files to process. When actor A is done, it sends an "all done" indicator to actor B and terminates. When actor B processes the "all done" indicator, it terminates. This is a basic implementation that you can use to learn how to use the actors.
Everything else is a variation on this. Next variation might be creating two actor B's with a shared mailbox. Shutdown is a little more involved but still straightforward. The next variation is to create a dispatcher actor which farms out work to one or more actor B's. The next variation uses multiple actor A's to traverse the file system, with a supervisor to control how many actors get created.
If you follow this development plan, you will have learned a lot about how to use Akka, and can answer all of your questions.

Starting Actors on-demand by identifier in Akka

I'm currently implementing a system that that receives inbound messages from an external monitoring system. I'm translating these messages into more concise 'events', and I'm using these to alter the state of 'Managed System' objects. Akka Actors seemed like a good use case for encapsulating mutable state in concurrent applications.
The managed systems are identified by a name (99% of the time this is a hostname). Whenever a proper event is received, the system routes the message to the correct actor based on the name property. At first I used to use actorSelection and the complete paths of said actors, but that was very ugly, and I saw several people advise against relying on the fully qualified name of an actor to deliver message.
So I've set up a simple EventBus, which is great as I can now simply do:
eventBus.subscribe(subscriber1, "/managedSystem01")
eventBus.subscribe(subscriber2, "/managedSystem02")
eventBus.publish(MonitoringEvent("/managedSystem01", MonitoringMessage("managedSystem01", "N", "CPU_LOAD_HIGH", True)))
eventBus.publish(MonitoringEvent("/managedSystem02", MonitoringMessage("managedSystem02", "Y", "DISK_USAGE_HIGH", True)))
Of course, I now have the issue that, should I receive and event that concerns a managed system for which I've not spawned an actor yet (this is entirely possibly, it is impossible for me to get an absolute list of managed systems unfortunately), the message will be routed to the dead-letter mailbox.
Ideally I don't want this to happen. When it is unable to address a specific actor, I want to spawn a new one dynamically.
I suppose that, theoretically, I could subscribe to DeadLetter messages but:
That sounds a little 'hacky', since those message are essentially reserved for the system
Is it even possible to recover the original message (in my case, the MonitoringMessage) that is sent to the DeadLetter mailbox?
Alternatively is there a way to check if there are ZERO subscribers to a certain "topic"?
What you describe ("send to Actor by some identifier, if it does not exist buffer until it gets created and then deliver to that newly on-demand created Actor") is implemented in Akka as Cluster Sharding.
While it is designed primarily for sharding load (work) across a cluster, you could use it locally as well, since your requirement is essentially a scaled down (to one node) version of problem that it solves. It takes care of starting new Actors if they don't exist for a given identifier etc, so you'd simply subscribe the shard-region to the events and it'll take care of creating the actors for you.

In akka 2.x, is root actor supervised by someone else?

Reading the Akka doc : http://doc.akka.io/docs/akka/2.2.3/AkkaScala.pdf its states in section
2.2.1 Hierarchical Structure
The only prerequisite is to know that each actor has exactly one supervisor,
which is the actor that created it.
But at the top of the hierarchy tree the parent actor has no supervisor ?
It is very well explained in akka doc (see The Top-Level Supervisors section), a little excerpt from it:
The root guardian is the grand-parent of all so-called “top-level”
actors and supervises all the special actors mentioned in Top-Level
Scopes for Actor Paths using the SupervisorStrategy.stoppingStrategy,
whose purpose is to terminate the child upon any type of Exception.
All other throwables will be escalated … but to whom? Since every real
actor has a supervisor, the supervisor of the root guardian cannot be
a real actor. And because this means that it is “outside of the
bubble”, it is called the “bubble-walker”. This is a synthetic
ActorRef which in effect stops its child upon the first sign of
trouble and sets the actor system’s isTerminated status to true as
soon as the root guardian is fully terminated (all children
recursively stopped).

Akka - How many instances of an actor should you create?

I'm new to the Akka framework and I'm building an HTTP server application on top of Netty + Akka.
My idea so far is to create an actor for each type of request. E.g. I would have an actor for a POST to /my-resource and another actor for a GET to /my-resource.
Where I'm confused is how I should go about actor creation? Should I:
Create a new actor for every request (by this I mean for every request should I do a TypedActor.newInstance() of the appropriate actor)? How expensive is it to create a new actor?
Create one instance of each actor on server start up and use that actor instance for every request? I've read that an actor can only process one message at a time, so couldn't this be a bottle neck?
Do something else?
Thanks for any feedback.
Well, you create an Actor for each instance of mutable state that you want to manage.
In your case, that might be just one actor if my-resource is a single object and you want to treat each request serially - that easily ensures that you only return consistent states between modifications.
If (more likely) you manage multiple resources, one actor per resource instance is usually ideal unless you run into many thousands of resources. While you can also run per-request actors, you'll end up with a strange design if you don't think about the state those requests are accessing - e.g. if you just create one Actor per POST request, you'll find yourself worrying how to keep them from concurrently modifying the same resource, which is a clear indication that you've defined your actors wrongly.
I usually have fairly trivial request/reply actors whose main purpose it is to abstract the communication with external systems. Their communication with the "instance" actors is then normally limited to one request/response pair to perform the actual action.
If you are using Akka, you can create an actor per request. Akka is extremely slim on resources and you can create literarily millions of actors on an pretty ordinary JVM heap. Also, they will only consume cpu/stack/threads when they actually do something.
A year ago I made a comparison between the resource consumption of the thread-based and event-based standard actors. And Akka is even better than the event-base.
One of the big points of Akka in my opinion is that it allows you to design your system as "one actor per usage" where earlier actor systems often forced you to do "use only actors for shared services" due to resource overhead.
I would recommend that you go for option 1.
Options 1) or 2) have both their drawbacks. So then, let's use options 3) Routing (Akka 2.0+)
Router is an element which act as a load balancer, routing the requests to other Actors which will perform the task needed.
Akka provides different Router implementations with different logic to route a message (for example SmallestMailboxPool or RoundRobinPool).
Every Router may have several children and its task is to supervise their Mailbox to further decide where to route the received message.
//This will create 5 instances of the actor ExampleActor
//managed and supervised by a RoundRobinRouter
ActorRef roundRobinRouter = getContext().actorOf(
Props.create(ExampleActor.class).withRouter(new RoundRobinRouter(5)),"router");
This procedure is well explained in this blog.
It's quite a reasonable option, but whether it's suitable depends on specifics of your request handling.
Yes, of course it could.
For many cases the best thing to do would be to just have one actor responding to every request (or perhaps one actor per type of request), but the only thing this actor does is to forward the task to another actor (or spawn a Future) which will actually do the job.
For scaling up the serial requests handling, add a master actor (Supervisor) which in turn will delegate to the worker actors (Children) (round-robin fashion).