In akka 2.x, is root actor supervised by someone else? - scala

Reading the Akka doc : http://doc.akka.io/docs/akka/2.2.3/AkkaScala.pdf its states in section
2.2.1 Hierarchical Structure
The only prerequisite is to know that each actor has exactly one supervisor,
which is the actor that created it.
But at the top of the hierarchy tree the parent actor has no supervisor ?

It is very well explained in akka doc (see The Top-Level Supervisors section), a little excerpt from it:
The root guardian is the grand-parent of all so-called “top-level”
actors and supervises all the special actors mentioned in Top-Level
Scopes for Actor Paths using the SupervisorStrategy.stoppingStrategy,
whose purpose is to terminate the child upon any type of Exception.
All other throwables will be escalated … but to whom? Since every real
actor has a supervisor, the supervisor of the root guardian cannot be
a real actor. And because this means that it is “outside of the
bubble”, it is called the “bubble-walker”. This is a synthetic
ActorRef which in effect stops its child upon the first sign of
trouble and sets the actor system’s isTerminated status to true as
soon as the root guardian is fully terminated (all children
recursively stopped).

Related

Logical Actor Paths vs Physical Actor Paths

In the Akka documentation (https://doc.akka.io/docs/akka/current/general/addressing.html) the definitions of each are
Logical Actor Paths: The unique path obtained by following the parental supervision links towards the root guardian is called the logical actor path. This path matches exactly the creation ancestry of an actor, so it is completely deterministic as soon as the actor system’s remoting configuration (and with it the address component of the path) is set.
Physical Actor Paths: While the logical actor path describes the functional location within one actor system, configuration-based remote deployment means that an actor may be created on a different network host than its parent, i.e. within a different actor system. In this case, following the actor path from the root guardian up entails traversing the network, which is a costly operation. Therefore, each actor also has a physical path, starting at the root guardian of the actor system where the actual actor object resides. Using this path as sender reference when querying other actors will let them reply directly to this actor, minimizing delays incurred by routing.
My question is: How is it possible that an actor and its parent can exist in different actor systems? Would someone please shed light on how to understand the physical path? My understanding of actor system based on reading Akka documentation (https://doc.akka.io/docs/akka/current/general/actor-systems.html) is that each actor system starts with a root actor, then its children actors, then its grandchildren actors. So every actor's parent by definition resides in the same actor system. Maybe it is my understanding of the definition of actor system is off?
First of all it is important to note that Akka was explicitly designed with Location Transparency in mind. So it is designed to be able to run on a cluster of several different "nodes" (i.e. different JVMs instances either running on different physical machines or wrapped into different virtual machines) with minimal or even no changes in code. For example, you can configure Akka to create some Actors on remote machines or you can do the same from code. In the Akka documentation there is no distinction for "Actor System" into "logical" and "physical" ones. In the article you reference the thing called "Actor System" is actually what one might call "Physical Actor System" i.e. something running inside single JVM. But using configuration from the link above Actor in one Actor System can create a remote Actor into another physical JVM process i.e. in a different Actor System. And this is when the notion of "logical path" vs "physical path" comes into reality.
Hope this clarifies the documentation a bit.
What is wrong with that statement? Actors can be distributed, meaning it can be co-located on the same host or on a completely different host. Depending on where the child is, you could do one of the following:
"akka://my-sys/user/service-a/worker1" // purely local
"akka.tcp://my-sys#host.example.com:5678/user/service-b" // remote
If you are concerned about remote supervison, it is just going to work in the same way like local supervison. Have a look at the documentation here:
https://doc.akka.io/docs/akka/2.5.4/scala/remoting.html#watching-remote-actors
It is important to understand the difference between logical actor path and a physical actor path.
Performance of your actor-based distributed system may depend on that.
Remote deployment means that an actor may be created on a different
network host than its parent, i.e. within a different actor system. In
this case, following the actor path from the root guardian up entails
traversing the network, which is a costly operation. Therefore, each
actor also has a physical path, starting at the root guardian of the
actor system where the actual actor object resides. Using this path as
sender reference when querying other actors will let them reply
directly to this actor, minimizing delays incurred by routing.
https://getakka.net/articles/concepts/addressing.html
Notice that the logical path defines a supervision hierarchy for an actor and physical path shows where the actor deployed. A physical actor path never spans multiple actor systems.

When should an Actor be created in the Actor System vs Actor Context?

In Akka when should I create an Actor using system.actorOf() vs context.actorOf()?
I know context.actorOf() creates a child actor, but when should one actor be a child of another vs top level?
you should avoid creating actors under the System actor. It's usually a good strategy to have new Actors as Children of your own (context) actor and group them accordingly and hierarchically.
That way you have better granularity to control the life cycle of your Actors, which implies you can control how many instances of each type of actor you need at any time (dynamically).
http://doc.akka.io/docs/akka/2.4/scala/actors.html
http://getakka.net/docs/Actor%20lifecycle

Use akka actors to traverse directory tree

I'm new to the actor model and was trying to write a simple example. I want to traverse a directory tree using Scala and Akka. The program should find all files and perform an arbitrary (but fast) operation on each file.
I wanted to check how can I model recursion using actors?
How do I gracefully stop the actor system when the traversal will be finished?
How can I control the number of actors to protect against out of memory?
Is there a way to keep the mailboxes of the actors from growing too big?
What will be different if the file operation will take long time to execute?
Any help would be appreciated!
Actors are workers. They take work in and give results back, or they supervise other workers. In general, you want your actors to have a single responsibility.
In theory, you could have an actor that processes a directory's contents, working on each file, or spawning an actor for each directory encountered. This would be bad, as long file-processing time would stall the system.
There are several methods for stopping the actor system gracefully. The Akka documentation mentions several of them.
You could have an actor supervisor that queues up requests for actors, spawns actors if below an actor threshold count, and decrementing the count when actors finish up. This is the job of a supervisor actor. The supervisor actor could sit to one side while it monitors, or it could also dispatch work. Akka has actor models the implement both of these approaches.
Yes, there are several ways to control the size of a mailbox. Read the documentation.
The file operation can block other processing if you do it the wrong way, such as a naive, recursive traversal.
The first thing to note is there are two types of work: traversing the file hierarchy and processing an individual file. As your first implementation try, create two actors, actor A and actor B. Actor A will traverse the file system, and send messages to actor B with the path to files to process. When actor A is done, it sends an "all done" indicator to actor B and terminates. When actor B processes the "all done" indicator, it terminates. This is a basic implementation that you can use to learn how to use the actors.
Everything else is a variation on this. Next variation might be creating two actor B's with a shared mailbox. Shutdown is a little more involved but still straightforward. The next variation is to create a dispatcher actor which farms out work to one or more actor B's. The next variation uses multiple actor A's to traverse the file system, with a supervisor to control how many actors get created.
If you follow this development plan, you will have learned a lot about how to use Akka, and can answer all of your questions.

Akka actorSelection vs actorOf Difference

Is there a difference between these two? When I do:
context.actorSelection(actorNameString)
I get an ActorSelection reference which I can resolve using the resolveOne and I get back a Future[ActorRef]. But with an actorOf, I get an ActorRef immediately. Is there any other vital differences other than this?
What might be the use cases where in I would like to have the ActorRef wrapped in a Future?
actorOf is used to create new actors by supplying their Props objects.
actorSelection is a "pointer" to a path in actor tree. By using resolveOne you will get actorRef of already existing actor under that path - but that actorRef takes time to resolve, hence the Future.
Here's more detailed explanation:
http://doc.akka.io/docs/akka/snapshot/general/addressing.html
An actor reference designates a single actor and the life-cycle of the reference matches that actor’s life-cycle; an actor path represents a name which may or may not be inhabited by an actor and the path itself does not have a life-cycle, it never becomes invalid. You can create an actor path without creating an actor, but you cannot create an actor reference without creating corresponding actor.
In either processes, there is an associated cost of producing an ActorRef.
Creating user top level actors with system.actorOf cost a lot as it has to deal with error kernel initialization which also cost significantly. Creating ActorRef from child actor is very fair making it suitable for one actor per task design. If in an application, for every request, a new set of actors are created without cleanup, your app may run out of memory although akka actors are cheap. Another good is actorOf is immediate as you mentioned.
In abstract terms, actorSelection with resolveOne looks up the actor tree and produces an actorRef in a future as is not so immediate especially on remote systems. But it enforces re-usability. Futures abstract the waiting time of resolving an ActorRef.
Here is a brief summary of ActorOf vs. ActorSelection; I hope it helps:
https://getakka.net/articles/concepts/addressing.html
Actor references may be looked up using the ActorSystem.ActorSelection
method. The selection can be used for communicating with said actor
and the actor corresponding to the selection is looked up when
delivering each message.
In addition to ActorSystem.actorSelection there is also
ActorContext.ActorSelection, which is available inside any actor as
Context.ActorSelection. This yields an actor selection much like its
twin on ActorSystem, but instead of looking up the path starting from
the root of the actor tree it starts out on the current actor.
Summary: ActorOf vs. ActorSelection
ActorOf only ever creates a new actor, and it creates it as a direct
child of the context on which this method is invoked (which may be any
actor or actor system). ActorSelection only ever looks up existing
actors when messages are delivered, i.e. does not create actors, or
verify existence of actors when the selection is created.

Uniqueness of persistenceId in akka-persistence

I'm using the scala api for akka-persistence to persist a group of actor instances that are organized into a tree. Each node in the tree is a persistent actor and is named based on the path to that node from a 'root' node. The persistenceId is set to the name. For example the root node actor has persistenceId 'root'. The next node down has persistenceId 'root-europe'. Another actor might have persistenceId 'root-europe-italy'.
The state in each actor includes a list of the names of its children. E.g. the 'root' actor maintains a list of 'europe', 'asia' etc as part of its state.
I have implemented snapshotting for this system. When the root is triggered to snapshot, it does so and then tells each child to do the same.
The problem arises during snapshot recovery. When I re-create an actor with persistenceId = 'root' (by passing in the name as a constructor parameter), the SnapshotOffer event received by that actor is wrong. It is, for example, 'root-europe-italy....'. This seems like a contradiction of the contract for persistence, where the persistenceId identifies the actor state to be recovered. I got around this problem by reversing the persistenceId of node actors (e.g. 'italy-europe-root') so this seems to be something related to the way files are retrieved by the persistence module. Note that I tried other approaches first, for example I used a variety of separators between the node names, or no separator at all.
Has anyone else experienced this problem, or can an akka-persistence developer help me understand why this might have happened?
BTW: I am using the built-in file-based snapshot storage for now.
Thanks.
OK - so the issue was with Akka, and has now been resolved. See the related ticket to find out when the patch is released.