Simple explanation of Akka Actors - scala

Is the following statement correct, otherwise how can it be improved?
When an Akka actor is sent a message, a job is submitted to the executor.
When there is a free thread, it calls the job which obtains a lock on
the actor (assuming it can, otherwise another job is taken). The
receive method of the actor is then called and once it completes,
the job is discarded and the thread returned to the pool. The cycle is then repeated.
All the complicated stuff related to concurrent threads is
handled by Akka, freeing the programmer to concentrate on
solving business problems.

More accurate would be:
When a message is sent to an actor, it is placed in this actor's queue called mailbox. At the same time there can be hundreds or thousands of actors having pending messages in their mailboxes. Akka, using limited number of worker threads, selects a subset of such actors and calls their receive method with each and every message from mailbox in chronological order.
More than one thread never handles the same actor. Also Akka may decide to interrupt processing of messages from a mailbox and select different actor to remain fair and avoid starvation. Because one thread is needed for each receive invocation, this method should never block, wait or sleep.

Related

Who monitors mailbox of each Actors?

I know actors are persistent, They are always alive sitting idle as long as there is nothing to do(no messages in the mailbox), So who monitors the mailbox ?
Is it the Actor itself ? but it doesn't have a thread assigned from dispatcher until its having a message.?
Or, is there any background daemon running monitoring mailboxes of each actors?
As #Sarvesh Kumar Singh mentioned, actor is not by default persistent. You can create persistent actor by extending PersistentActor: Akka Persistence.
In fact, when a actor(cell) is created in the actor system hierarchy, its mailbox(Mailbox is a subclass of Runnable) will be attached to the dispatcher, when the actor is started, the dispatcher will then execute the mailbox runnable(defined in the run method) by polling message from the mailbox queue, if there's a message, it will trigger the defined actor receive method to actually handle the message. Message looping(during this run, the messages in the queue will the be handled into batch, if handling this batch of batch doesn't pass the limit of dispatcher throughput configuration) is also done in the run method by a recursive call of processMailBox defined in mailbox. At the end of each round, the mail box will attach itself to the dispatcher thread pool for execution if there's messages the actor mail box is scheduled for execution the the loop continues. If not, the loop is broken, the next run will be scheduled when there's message sent to the actor, at this moment, message will be dispatched via the dispatcher attached with the actor cell when you call actorRef ! message, this will make the dispatcher to schedule the message handling, check this out: Dispatch::sendMessage and Dispatcher::dispatch. As explained in Dispatchers, dispatcher's throughput can be configured to determine the maximum number of messages to be processed per actor before the thread jumps to the next actor.

Num of actor instance

I'm new to akka-actor and confused with some problems:
when I create an actorSystem, and use actorOf(Props(classOf[AX], ...)) to create actor in main method, how many instances are there for my actor AX?
If the answer to Q1 was just one, does this mean whatever data-structure I created in the AX actor class's definition will only appear in one thread and I should not concern about concurrency problems?
What if one of my actor's action (one case in receive method) is a time consuming task and would take quite long time to finish? Will my single Actor instance not responding until it finish that task?
If the answer to Q3 is right, what I am supposed to do to prevent my actor from not responding? Should I start another thread and send another message back to it until finish the task? Is there a best practice there I should follow?
yes, the actor system will only create 1 actor instance for each time you call the 'actorOf' method. However, when using a Router it is possible to create 1 router which spreads the load to any number of actors. So in that case it is possible to construct multiple instances, but 'normally' using actorOf just creates 1 instance.
Yes, within an actor you do not have to worry about concurrency because Akka guarantees that any actor only processes 1 message at the time. You must take care not to somehow mutate the state of the actor from code outside the actor. So whenever exposing the actor state, always do this using an immutable class. Case classes are excellent for this. But also be ware of modifying the actor state when completing a Future from inside the actor. Since the Future runs on it's own thread you could have a concurrency issue when the Future completes and the actor is processing a next message at the same time.
The actor executes on 1 thread at the time, but this might be a different thread each time the actor executes.
Akka is a highly concurrent and distributed framework, everything is asynchronous and non-blocking and you must do the same within your application. Scala and Akka provide several solutions to do this. Whenever you have a time consuming task within an actor you might either delegate the time consuming task to another actor just for this purpose, use Futures or use Scala's 'async/await/blocking'. When using 'blocking' you give a hint to the compiler/runtime a blocking action is done and the runtime might start additional thread to prevent thread starvation. The Scala Concurrent programming book is an excellent guide to learn this stuff. Also look at the concurrent package ScalaDocs and Neophyte's Guide to Scala.
If the actor really has to wait for the time consuming task to complete, then yes, your actor can only respond when that's finished. But this is a very 'request-response' way of thinking. Try to get away from this. The actor could also respond immediately indicating the task has started and send an additional message once the task has been completed.
With time consuming tasks always be sure to use a different threadpool so the ActorSystem will not be blocked because all of it's available threads are used up by time consuming tasks. For Future's you can provide a separate ExecutionContext (do not use the ActorSystem's Dispatch context for this!), but via Akka's configuration you can also configure certain actors to run on a different thread pool.
See 3.
Success!
one instance (if you declare a router in your props then (maybe) more than one)
Yes. This is one of the advantages of actors.
Yes. An Actor will process messages sequentially.
You can use scala.concurrent.Future (do not use actor state in the future) or delegate the work to a child actor (the main actor can manage the state and can respond to messages). Future or child-actor depends on use case.

Where to put calculation executed regularly that updates actor's internal state?

I am learning Scala and Akka.
In the problem I am trying to solve I want an actor to be reading a real-time data stream and perform a certain calculation that would update its state.
Every 3 seconds I am sending a request through a Scheduler for the actor to return to its state.
While I have pretty much everything implemented, with my actor having a broadcaster and receiver and the function to update the state right. I am not entirely sure how to do it, I could potentially put the calculations always running in a separate thread inside the actor but I would like to now if there is a more elegant way to make this in scala.
I would suggest to divide the work between two actors. The parent actor would manage child worker actor and would track the state. It sends a message to the child worker actor to trigger data processing.
The child worker actor processes the data stream - don't forget to wrap the processing into a Future so that it doesn't block the actor from processing messages. It also periodically sends messages to the master with current state. So the child worker is stateless, it sends notifications when its state changes.
If you want to know the current state of the work overall, you ask the master. In principle, you can merge this into one actor which sends the status message to itself. I wouldn't update the state directly to avoid concurrency issues. The reason is that the data processing work running in the Future can possible run on a different thread than message processing.

Akka: what is the reason of processing messages one at a time in an Actor?

It is said:
Akka ensures that each instance of an actor runs in its own lightweight thread and that messages are processed one at a time.
Can you please explain what is the reason of processing messages one at a time in an Actor?
This way we can guarantee thread safety inside an Actor.
Because an actor will only ever handle one message at any given time, we can guarantee that accessing the actor's local state is safe to access, even though the Actor itself may be switching Threads which it is executing on. Akka guarantees that the state written while handling message M1 are visible to the Actor once it handles M2, even though it may now be running on a different thread (normally guaranteeing this kind of safety comes at a huge cost, Akka handles this for you).
It also originates from the original Actor model description, which is an concurrency abstraction, described as actors who can only one by one handle messages and respond to these by performing one of these actions: send other messages, change it's behaviour or create new actors.

How does I/O work in Akka?

How does the actor model (in Akka) work when you need to perform I/O (ie. a database operation)?
It is my understanding that a blocking operation will throw an exception (and essentially ruin all concurrency due to the evented nature of Netty, which Akka uses). Hence I would have to use a Future or something similar - however I don't understand the concurrency model.
Can 1 actor be processing multiple message simultaneously?
If an actor makes a blocking call in a future (ie. future.get()) does that block only the current actor's execution; or will it prevent execution on all actors until the blocking call has completed?
If it blocks all execution, how does using a future assist concurrency (ie. wouldn't invoking blocking calls in a future still amount to creating an actor and executing the blocking call)?
What is the best way to deal with a multi-staged process (ie. read from the database; call a blocking webservice; read from the database; write to the database) where each step is dependent on the last?
The basic context is this:
I'm using a Websocket server which will maintain thousands of sessions.
Each session has some state (ie. authentication details, etc);
The Javascript client will send a JSON-RPC message to the server, which will pass it to the appropriate session actor, which will execute it and return a result.
Execution of the RPC call will involve some I/O and blocking calls.
There will be a large number of concurrent requests (each user will be making a significant amount of requests over the WebSocket connection and there will be a lot of users).
Is there a better way to achieve this?
Blocking operations do not throw exceptions in Akka. You can do blocking calls from an Actor (which you probably want to minimize, but thats another story).
no, 1 actor instance cannot.
It will not block any other actors. You can influence this by using a specific Dispatcher. Futures use the default dispatcher (the global event driven one normally) so it runs on a thread in a pool. You can choose which dispatcher you want to use for your actors (per actor, or for all). I guess if you really wanted to create a problem you might be able to pass exactly the same (thread based) dispatcher to futures and actors, but that would take some intent from your part. I guess if you have a huge number of futures blocking indefinitely and the executorservice has been configured to a fixed amount of threads, you could blow up the executorservice. So a lot of 'ifs'. a f.get blocks only if the Future has not completed yet. It will block the 'current thread' of the Actor from which you call it (if you call it from an Actor, which is not necessary by the way)
you do not necessarily have to block. you can use a callback instead of f.get. You can even compose Futures without blocking. check out talk by Viktor on 'the promising future of akka' for more details: http://skillsmatter.com/podcast/scala/talk-by-viktor-klang
I would use async communication between the steps (if the steps are meaningful processes on their own), so use an actor for every step, where every actor sends a oneway message to the next, possibly also oneway messages to some other actor that will not block which can supervise the process. This way you could create chains of actors, of which you could make many, in front of it you could put a load balancing actor, so that if one actor blocks in one chain another of the same type might not in the other chain. That would also work for your 'context' question, pass of workload to local actors, chain them up behind a load balancing actor.
As for netty (and I assume you mean Remote Actors, because this is the only thing that netty is used for in Akka), pass of your work as soon as possible to a local actor or a future (with callback) if you are worried about timing or preventing netty to do it's job in some way.
Blocking operations will generally not throw exceptions, but waiting on a future (for example by using !! or !!! send methods) can throw a time out exception. That's why you should stick with fire-and-forget as much as possible, use a meaningful time-out value and prefer callbacks when possible.
An akka actor cannot explicitly process several messages in a row, but you can play with the throughput value via the config file. The actor will then process several message (i.e. its receive method will be called several times sequentially) if its message queue it's not empty: http://akka.io/docs/akka/1.1.3/scala/dispatchers.html#id5
Blocking operations inside an actor will not "block" all actors, but if you share threads among actors (recommended usage), one of the threads of the dispatcher will be blocked until operations resume. So try composing futures as much as possible and beware of the time-out value).
3 and 4. I agree with Raymond answers.
What Raymond and paradigmatic said, but also, if you want to avoid starving the thread pool, you should wrap any blocking operations in scala.concurrent.blocking.
It's of course best to avoid blocking operations, but sometimes you need to use a library that blocks. If you wrap said code in blocking, it will let the execution context know you may be blocking this thread so it can allocate another one if needed.
The problem is worse than paradigmatic describes since if you have several blocking operations you may end up blocking all threads in the thread pool and have no free threads. You could end up with deadlock if all your threads are blocked on something that won't happen until another actor/future gets scheduled.
Here's an example:
import scala.concurrent.blocking
...
Future {
val image = blocking { load_image_from_potentially_slow_media() }
val enhanced = image.enhance()
blocking {
if (oracle.queryBetter(image, enhanced)) {
write_new_image(enhanced)
}
}
enhanced
}
Documentation is here.