Akka actor Kill/restart behavior - scala

I am confused by behavior I am seeing in Akka. Briefly, I have a set of actors performing scientific calculations (star formation simulation). They have some state. When an error occurs such that one or more enter an invalid state, I want to restart the whole set to start over. I also want to do this if a single calc (over the entire set) takes too long (there is no way to predict in advance how long it may run).
So, there is the set of Simulation actors at the bottom of the tree, then a Director above them (that creates them via a Router, and sends them messages via that Router as well). There is one more Director level above that to create Directors on different machines and collect results from them all.
I handle the timeout case by using the Akka Scheduler to create a one-time timeout event, in the local Director, when the simulation is started. When the Director gets this event, if all its Simulation actors have not finished, it does this:
children ! Broadcast(Kill)
where children is the Router that owns/created them - this sends a Kill to all the children (SimulActors).
What I thought would occur is that all the child actors would be restarted. However, their preRestart() hook method is never called. I see the Kill message received, but that's it.
I must be missing something fundamental here. I have read the Akka docs on this topic and I have to say I find them less than clear (especially the page on Supervisors). I would really appreciate either a thorough explanation of the Kill/restart process, or just some other references (Google wasn't very helpful).

Note
If the child of a router terminates, the router will not automatically
spawn a new child. In the event that all children of a router have
terminated the router will terminate itself.
Taken from the akka docs.

I would consider using a supervision strategy - akka has behavior built in for killing all actors (all for one strategy) and you can define the specific strategy - eg restart.
I think a more idiomatic way to run this would be to have the actors throw x exception if they're not done after a period of time and then the supervisor handle that via supervision strategy.
You could throw a not done exception from the child and then define the behaviour like so:
override val supervisorStrategy =
AllForOneStrategy(maxNrOfRetries = 0) {
case _: NotDoneException ⇒ Stop
case _: Exception ⇒ Restart
}
It's important to understand that a restart means stopping the old actor and creating a new separate object/Actor
References:
http://doc.akka.io/docs/akka/snapshot/scala/fault-tolerance.html
http://doc.akka.io/docs/akka/snapshot/general/supervision.html

Related

Use akka actors to traverse directory tree

I'm new to the actor model and was trying to write a simple example. I want to traverse a directory tree using Scala and Akka. The program should find all files and perform an arbitrary (but fast) operation on each file.
I wanted to check how can I model recursion using actors?
How do I gracefully stop the actor system when the traversal will be finished?
How can I control the number of actors to protect against out of memory?
Is there a way to keep the mailboxes of the actors from growing too big?
What will be different if the file operation will take long time to execute?
Any help would be appreciated!
Actors are workers. They take work in and give results back, or they supervise other workers. In general, you want your actors to have a single responsibility.
In theory, you could have an actor that processes a directory's contents, working on each file, or spawning an actor for each directory encountered. This would be bad, as long file-processing time would stall the system.
There are several methods for stopping the actor system gracefully. The Akka documentation mentions several of them.
You could have an actor supervisor that queues up requests for actors, spawns actors if below an actor threshold count, and decrementing the count when actors finish up. This is the job of a supervisor actor. The supervisor actor could sit to one side while it monitors, or it could also dispatch work. Akka has actor models the implement both of these approaches.
Yes, there are several ways to control the size of a mailbox. Read the documentation.
The file operation can block other processing if you do it the wrong way, such as a naive, recursive traversal.
The first thing to note is there are two types of work: traversing the file hierarchy and processing an individual file. As your first implementation try, create two actors, actor A and actor B. Actor A will traverse the file system, and send messages to actor B with the path to files to process. When actor A is done, it sends an "all done" indicator to actor B and terminates. When actor B processes the "all done" indicator, it terminates. This is a basic implementation that you can use to learn how to use the actors.
Everything else is a variation on this. Next variation might be creating two actor B's with a shared mailbox. Shutdown is a little more involved but still straightforward. The next variation is to create a dispatcher actor which farms out work to one or more actor B's. The next variation uses multiple actor A's to traverse the file system, with a supervisor to control how many actors get created.
If you follow this development plan, you will have learned a lot about how to use Akka, and can answer all of your questions.

Num of actor instance

I'm new to akka-actor and confused with some problems:
when I create an actorSystem, and use actorOf(Props(classOf[AX], ...)) to create actor in main method, how many instances are there for my actor AX?
If the answer to Q1 was just one, does this mean whatever data-structure I created in the AX actor class's definition will only appear in one thread and I should not concern about concurrency problems?
What if one of my actor's action (one case in receive method) is a time consuming task and would take quite long time to finish? Will my single Actor instance not responding until it finish that task?
If the answer to Q3 is right, what I am supposed to do to prevent my actor from not responding? Should I start another thread and send another message back to it until finish the task? Is there a best practice there I should follow?
yes, the actor system will only create 1 actor instance for each time you call the 'actorOf' method. However, when using a Router it is possible to create 1 router which spreads the load to any number of actors. So in that case it is possible to construct multiple instances, but 'normally' using actorOf just creates 1 instance.
Yes, within an actor you do not have to worry about concurrency because Akka guarantees that any actor only processes 1 message at the time. You must take care not to somehow mutate the state of the actor from code outside the actor. So whenever exposing the actor state, always do this using an immutable class. Case classes are excellent for this. But also be ware of modifying the actor state when completing a Future from inside the actor. Since the Future runs on it's own thread you could have a concurrency issue when the Future completes and the actor is processing a next message at the same time.
The actor executes on 1 thread at the time, but this might be a different thread each time the actor executes.
Akka is a highly concurrent and distributed framework, everything is asynchronous and non-blocking and you must do the same within your application. Scala and Akka provide several solutions to do this. Whenever you have a time consuming task within an actor you might either delegate the time consuming task to another actor just for this purpose, use Futures or use Scala's 'async/await/blocking'. When using 'blocking' you give a hint to the compiler/runtime a blocking action is done and the runtime might start additional thread to prevent thread starvation. The Scala Concurrent programming book is an excellent guide to learn this stuff. Also look at the concurrent package ScalaDocs and Neophyte's Guide to Scala.
If the actor really has to wait for the time consuming task to complete, then yes, your actor can only respond when that's finished. But this is a very 'request-response' way of thinking. Try to get away from this. The actor could also respond immediately indicating the task has started and send an additional message once the task has been completed.
With time consuming tasks always be sure to use a different threadpool so the ActorSystem will not be blocked because all of it's available threads are used up by time consuming tasks. For Future's you can provide a separate ExecutionContext (do not use the ActorSystem's Dispatch context for this!), but via Akka's configuration you can also configure certain actors to run on a different thread pool.
See 3.
Success!
one instance (if you declare a router in your props then (maybe) more than one)
Yes. This is one of the advantages of actors.
Yes. An Actor will process messages sequentially.
You can use scala.concurrent.Future (do not use actor state in the future) or delegate the work to a child actor (the main actor can manage the state and can respond to messages). Future or child-actor depends on use case.

Stateless Akka actors

I am just starting with Akka and I am trying to split some messy functions in more manageable pieces, each of which would be carried out by a different actor.
My task seems exactly what is suitable for the actor model. I have a tree of objects, which are saved in a database. Each node has some attributes; lets' concentrate on just one and call it wealth. The wealth of the children depends on the wealth of the parent. When one computes the wealth on a node, this should trigger a similar computation on the children. I want to collect all the updated instances of nodes and save them at the same time in the DB.
This seems easy: an actor just performs the computation and then starts another actor for each child of the current node. When all children send back a message with the results of the computation, the actor collects them, adds the current node and sends a message to the actor above him.
The problem is that I do not know a way to be sure that one has received a message from all the children. A trivial way would be to use a counter and increase it whenever a message from a child arrives.
But what happens if two independent parts of the system require such a computation to be performed and the same actor is reused? The actor will spawn twice as many children and the counting will not be reliable anymore. What I would need is to make sure that the same actor is not reused from the external system, but new actors are generated each time the computation is triggered.
Is this even possible? If not, is there some automatic mechanism in Akka to make sure that every child has performed its task?
Am I just using the actor model in a situation that is not suitable here? Essentially I am doing nothing more than could be done with functions - the actors themselves are stateless, but they allow me to simplify and parallelize the computation.
The way that you describe things, I think what you really want is a recursive function that returns a Future[Tree[NodeWealth]], the function would spawn a Future every time it is called and it would call itself for each child in the hierarchy, at the end of the function it would compose the Futures from those calls into a single Future[Result]. When the recursive function terminates and returns the Future[Tree[NodeWealth]] you can then compose it with another Future which updates your DB. Take a look at the Akka documentation here on Futures. And pay particular attention to the section "Composing Futures".
The composition of futures should allow you to avoid state and implement this easily.
Another option would be to use actors and futures, and use the ask pattern on child actors, compose the resulting futures into a single future which you return to the sender (parent actor). This is essentially the same thing, just intertwined with actors. I would probably only go this route if you are already representing your nodes as actors for some other reason.

Akka - How many instances of an actor should you create?

I'm new to the Akka framework and I'm building an HTTP server application on top of Netty + Akka.
My idea so far is to create an actor for each type of request. E.g. I would have an actor for a POST to /my-resource and another actor for a GET to /my-resource.
Where I'm confused is how I should go about actor creation? Should I:
Create a new actor for every request (by this I mean for every request should I do a TypedActor.newInstance() of the appropriate actor)? How expensive is it to create a new actor?
Create one instance of each actor on server start up and use that actor instance for every request? I've read that an actor can only process one message at a time, so couldn't this be a bottle neck?
Do something else?
Thanks for any feedback.
Well, you create an Actor for each instance of mutable state that you want to manage.
In your case, that might be just one actor if my-resource is a single object and you want to treat each request serially - that easily ensures that you only return consistent states between modifications.
If (more likely) you manage multiple resources, one actor per resource instance is usually ideal unless you run into many thousands of resources. While you can also run per-request actors, you'll end up with a strange design if you don't think about the state those requests are accessing - e.g. if you just create one Actor per POST request, you'll find yourself worrying how to keep them from concurrently modifying the same resource, which is a clear indication that you've defined your actors wrongly.
I usually have fairly trivial request/reply actors whose main purpose it is to abstract the communication with external systems. Their communication with the "instance" actors is then normally limited to one request/response pair to perform the actual action.
If you are using Akka, you can create an actor per request. Akka is extremely slim on resources and you can create literarily millions of actors on an pretty ordinary JVM heap. Also, they will only consume cpu/stack/threads when they actually do something.
A year ago I made a comparison between the resource consumption of the thread-based and event-based standard actors. And Akka is even better than the event-base.
One of the big points of Akka in my opinion is that it allows you to design your system as "one actor per usage" where earlier actor systems often forced you to do "use only actors for shared services" due to resource overhead.
I would recommend that you go for option 1.
Options 1) or 2) have both their drawbacks. So then, let's use options 3) Routing (Akka 2.0+)
Router is an element which act as a load balancer, routing the requests to other Actors which will perform the task needed.
Akka provides different Router implementations with different logic to route a message (for example SmallestMailboxPool or RoundRobinPool).
Every Router may have several children and its task is to supervise their Mailbox to further decide where to route the received message.
//This will create 5 instances of the actor ExampleActor
//managed and supervised by a RoundRobinRouter
ActorRef roundRobinRouter = getContext().actorOf(
Props.create(ExampleActor.class).withRouter(new RoundRobinRouter(5)),"router");
This procedure is well explained in this blog.
It's quite a reasonable option, but whether it's suitable depends on specifics of your request handling.
Yes, of course it could.
For many cases the best thing to do would be to just have one actor responding to every request (or perhaps one actor per type of request), but the only thing this actor does is to forward the task to another actor (or spawn a Future) which will actually do the job.
For scaling up the serial requests handling, add a master actor (Supervisor) which in turn will delegate to the worker actors (Children) (round-robin fashion).

An Actor "queue"?

In Java, to write a library that makes requests to a server, I usually implement some sort of dispatcher (not unlike the one found here in the Twitter4J library: http://github.com/yusuke/twitter4j/blob/master/twitter4j-core/src/main/java/twitter4j/internal/async/DispatcherImpl.java) to limit the number of connections, to perform asynchronous tasks, etc.
The idea is that N number of threads are created. A "Task" is queued and all threads are notified, and one of the threads, when it's ready, will pop an item from the queue, do the work, and then return to a waiting state. If all the threads are busy working on a Task, then the Task is just queued, and the next available thread will take it.
This keeps the max number of connections to N, and allows at most N Tasks to be operating at the same time.
I'm wondering what kind of system I can create with Actors that will accomplish the same thing? Is there a way to have N number of Actors, and when a new message is ready, pass it off to an Actor to handle it - and if all Actors are busy, just queue the message?
Akka Framework is designed to solve this kind of problems, and is exactly what you're looking for.
Look thru this docu - there're lots of highly configurable dispathers (event-based, thread-based, load-balanced, work-stealing, etc.) that manage actors mailboxes, and allow them to work in conjunction. You may also find interesting this blog post.
E.g. this code instantiates new Work Stealing Dispatcher based on the fixed thread pool, that fulfils load balancing among the actors it supervises:
val workStealingDispatcher = Dispatchers.newExecutorBasedEventDrivenWorkStealingDispatcher("pooled-dispatcher")
workStealingDispatcher
.withNewThreadPoolWithLinkedBlockingQueueWithUnboundedCapacity
.setCorePoolSize(16)
.buildThreadPool
Actor that uses the dispatcher:
class MyActor extends Actor {
messageDispatcher = workStealingDispatcher
def receive = {
case _ =>
}
}
Now, if you start 2+ instances of the actor, dispatcher will balance the load between the mailboxes (queues) of the actors (actor that has too much messages in the mailbox will "donate" some to the actors that has nothing to do).
Well, you have to see about the actors scheduler, as actors are not usually 1-to-1 with threads. The idea behind actors is that you may have many of them, but the actual number of threads will be limited to something reasonable. They are not supposed to be long running either, but rather quickly answering to messages they receive. In short, the architecture of that code seems to be wholly at odds with how one would design an actor system.
Still, each working actor may send a message to a Queue actor asking for the next task, and then loop back to react. This Queue actor would receive either queueing messages, or dequeuing messages. It could be designed like this:
val q: Queue[AnyRef] = new Queue[AnyRef]
loop {
react {
case Enqueue(d) => q enqueue d
case Dequeue(a) if q.nonEmpty => a ! (q dequeue)
}
}