Custom thread logic in Akka - scala

I have a special code to execute when a pool thread start to execute and another when it finished.
I mean, A need to call an initialize() before a thread start to execute actors code, and a cleanup() after it, in order to initialize thread specific resources (Database connections as an example) and cleanup (Close any already open connection)
It will be great to do it in a thread scope. I'm thinking of doing in a trait with all actors mixing, but in this scope, the initialization is by actor. I think I'll have a better performance if I make it by thread.
Any suggestion will be appreciated!
Thanks

Especially for your cleanup code you will have trouble because there is no hook which you could use. I would recommend using the Actor life-cycle to model your resource life-cycle, i.e. create one DB connection when you start the actor and close it in postStop. Then instead of using a ThreadLocal database handle you send your DB queries to the (pool of) actors. Do not worry about threads yourself, that is Akka’s job.

Related

Does an actor guarantee the same execution thread?

I'm working with sqlite so I need to guarantee the thread my calls execute on, but I don't want to use the main thread. I could subclass Thread, however that introduces a host of issues trying to create async methods and executing blocks of code in the thread's main loop.
If instead I used an actor instead of a Thread subclass, will all the work within that actor be guaranteed to be on the same thread? I don't see that defined anywhere in the documentation so I'm guessing no.
You asked:
Does an actor guarantee the same execution thread?
No, it does not. (Neither does GCD serial queue, for that matter.)
But SQLite does not care from which thread you call it. It only cares that you don't call it from different threads simultaneously.
So, you do not have to ”to guarantee the thread my calls execute on“, but merely ensure that you don't have two threads interacting with the same connection at the same time. This is precisely the assurance that actor-isolated functions provide.
So, do not worry about what thread the actor happens to use. Only make sure you don't have simultaneous access from multiple threads at the same time.

Play Framework + JDBC + Futures

Assuming I obtain a JDBC connection through injection, like so:
class SqlQuery #Inject()(db: Database) extends Controller { /* .... */ }
And that the pool of connections is large enough, for example 100. Is it possible to create a Future to avoid blocking when running the SQL statement (similar to Slick futures)? Or the fact that the number of connections in the pool is large means that the SQL statement will not block?
Using futures is not synonymous with non-blocking. Futures allow you to execute code on another thread, or some type of executor, in general. However, the code you execute can still block.
JDBC is a blocking API. This means that when you execute a query through JDBC, the calling thread is blocked while it waits for a response from the database. Another term for this would be synchronous. A non-blocking or asynchronous API would accept a response asynchronously, freeing the calling thread from actively waiting for it. Reactive slick uses it's own driver to accept responses from a database in an asynchronous manner, which means the calling thread can be freed as soon as the query is dispatched to the database.
The difference between the two is this:
Imagine your application has a database connection pool of size 100, and a fixed thread pool of size 10. Then, let's say you wrap all of your JDBC calls in futures. Let's also say that your SqlQuery controller has a method that makes several JDBC calls at the same time. All of these queries will be run in parallel, until the thread pool is exhausted, which means you would only be able to run 10 queries at the same time at any given moment. While the calling thread would not be blocked by the JDBC calls, the threads executing them would. With enough queries running in parallel, the thread pool would become exhausted and it would no longer matter how many connections were in the pool. You could deal with this by making your thread pool larger, or using a fork join pool that expands as needed, but this could incur performance costs due to the creation of new threads and context switching. After all, your CPU is limited.
Using an asynchronous database driver like reactive slick would not block your limited pool of threads, and you would be able to run as many queries concurrently as you had connections in the pool (100 in this example). Saving threads from being blocked means saving CPU time that would otherwise be spent just waiting for responses, which means you can use it to continue to handle other requests, etc.

Akka Actor preStart() & postStop() methods behaviors?

Says if I have an Actor for database accessing, an Actor is a singleton instance to handle all clients, or multiple instances for multiple clients? The Actor preStart() and postStop() methods are called only once for all instances? Or will be called when each new Actor instance is created? Is it good to put database initialisation code inside preStart(), and connection returning code inside postStop()?
Thanks
This is kind of like asking if an object is a singleton. If you only ever create one of the database Actor it will behave as a singleton, but in general Actors are not singletons.
Even if you did just create one, you still need to think about when it might be restarted by the actor system or supervisor.
[Update]
The lifecycle methods are called for every Actor - they are independent entities.
If you are creating an Actor to handle database requests / data access I'd probably have a single Actor that has singleton semantics, but internally it could create and supervise as many or as few Actors that actually deal with the database calls. This would allow you to handle the initialisation and cleanup of the database in a single place (the top level Actor), and allow you to scale internally (if needed) by creating more Actors to handle requests and supervise them to properly handle errors.
As a side note, there's probably plenty of prior art in this scenario so I'd recommend doing a bit of research into how this is handled by others. You should also see how the database driver itself handles threading as you might just be building lots of accidental complexity

Num of actor instance

I'm new to akka-actor and confused with some problems:
when I create an actorSystem, and use actorOf(Props(classOf[AX], ...)) to create actor in main method, how many instances are there for my actor AX?
If the answer to Q1 was just one, does this mean whatever data-structure I created in the AX actor class's definition will only appear in one thread and I should not concern about concurrency problems?
What if one of my actor's action (one case in receive method) is a time consuming task and would take quite long time to finish? Will my single Actor instance not responding until it finish that task?
If the answer to Q3 is right, what I am supposed to do to prevent my actor from not responding? Should I start another thread and send another message back to it until finish the task? Is there a best practice there I should follow?
yes, the actor system will only create 1 actor instance for each time you call the 'actorOf' method. However, when using a Router it is possible to create 1 router which spreads the load to any number of actors. So in that case it is possible to construct multiple instances, but 'normally' using actorOf just creates 1 instance.
Yes, within an actor you do not have to worry about concurrency because Akka guarantees that any actor only processes 1 message at the time. You must take care not to somehow mutate the state of the actor from code outside the actor. So whenever exposing the actor state, always do this using an immutable class. Case classes are excellent for this. But also be ware of modifying the actor state when completing a Future from inside the actor. Since the Future runs on it's own thread you could have a concurrency issue when the Future completes and the actor is processing a next message at the same time.
The actor executes on 1 thread at the time, but this might be a different thread each time the actor executes.
Akka is a highly concurrent and distributed framework, everything is asynchronous and non-blocking and you must do the same within your application. Scala and Akka provide several solutions to do this. Whenever you have a time consuming task within an actor you might either delegate the time consuming task to another actor just for this purpose, use Futures or use Scala's 'async/await/blocking'. When using 'blocking' you give a hint to the compiler/runtime a blocking action is done and the runtime might start additional thread to prevent thread starvation. The Scala Concurrent programming book is an excellent guide to learn this stuff. Also look at the concurrent package ScalaDocs and Neophyte's Guide to Scala.
If the actor really has to wait for the time consuming task to complete, then yes, your actor can only respond when that's finished. But this is a very 'request-response' way of thinking. Try to get away from this. The actor could also respond immediately indicating the task has started and send an additional message once the task has been completed.
With time consuming tasks always be sure to use a different threadpool so the ActorSystem will not be blocked because all of it's available threads are used up by time consuming tasks. For Future's you can provide a separate ExecutionContext (do not use the ActorSystem's Dispatch context for this!), but via Akka's configuration you can also configure certain actors to run on a different thread pool.
See 3.
Success!
one instance (if you declare a router in your props then (maybe) more than one)
Yes. This is one of the advantages of actors.
Yes. An Actor will process messages sequentially.
You can use scala.concurrent.Future (do not use actor state in the future) or delegate the work to a child actor (the main actor can manage the state and can respond to messages). Future or child-actor depends on use case.

How does I/O work in Akka?

How does the actor model (in Akka) work when you need to perform I/O (ie. a database operation)?
It is my understanding that a blocking operation will throw an exception (and essentially ruin all concurrency due to the evented nature of Netty, which Akka uses). Hence I would have to use a Future or something similar - however I don't understand the concurrency model.
Can 1 actor be processing multiple message simultaneously?
If an actor makes a blocking call in a future (ie. future.get()) does that block only the current actor's execution; or will it prevent execution on all actors until the blocking call has completed?
If it blocks all execution, how does using a future assist concurrency (ie. wouldn't invoking blocking calls in a future still amount to creating an actor and executing the blocking call)?
What is the best way to deal with a multi-staged process (ie. read from the database; call a blocking webservice; read from the database; write to the database) where each step is dependent on the last?
The basic context is this:
I'm using a Websocket server which will maintain thousands of sessions.
Each session has some state (ie. authentication details, etc);
The Javascript client will send a JSON-RPC message to the server, which will pass it to the appropriate session actor, which will execute it and return a result.
Execution of the RPC call will involve some I/O and blocking calls.
There will be a large number of concurrent requests (each user will be making a significant amount of requests over the WebSocket connection and there will be a lot of users).
Is there a better way to achieve this?
Blocking operations do not throw exceptions in Akka. You can do blocking calls from an Actor (which you probably want to minimize, but thats another story).
no, 1 actor instance cannot.
It will not block any other actors. You can influence this by using a specific Dispatcher. Futures use the default dispatcher (the global event driven one normally) so it runs on a thread in a pool. You can choose which dispatcher you want to use for your actors (per actor, or for all). I guess if you really wanted to create a problem you might be able to pass exactly the same (thread based) dispatcher to futures and actors, but that would take some intent from your part. I guess if you have a huge number of futures blocking indefinitely and the executorservice has been configured to a fixed amount of threads, you could blow up the executorservice. So a lot of 'ifs'. a f.get blocks only if the Future has not completed yet. It will block the 'current thread' of the Actor from which you call it (if you call it from an Actor, which is not necessary by the way)
you do not necessarily have to block. you can use a callback instead of f.get. You can even compose Futures without blocking. check out talk by Viktor on 'the promising future of akka' for more details: http://skillsmatter.com/podcast/scala/talk-by-viktor-klang
I would use async communication between the steps (if the steps are meaningful processes on their own), so use an actor for every step, where every actor sends a oneway message to the next, possibly also oneway messages to some other actor that will not block which can supervise the process. This way you could create chains of actors, of which you could make many, in front of it you could put a load balancing actor, so that if one actor blocks in one chain another of the same type might not in the other chain. That would also work for your 'context' question, pass of workload to local actors, chain them up behind a load balancing actor.
As for netty (and I assume you mean Remote Actors, because this is the only thing that netty is used for in Akka), pass of your work as soon as possible to a local actor or a future (with callback) if you are worried about timing or preventing netty to do it's job in some way.
Blocking operations will generally not throw exceptions, but waiting on a future (for example by using !! or !!! send methods) can throw a time out exception. That's why you should stick with fire-and-forget as much as possible, use a meaningful time-out value and prefer callbacks when possible.
An akka actor cannot explicitly process several messages in a row, but you can play with the throughput value via the config file. The actor will then process several message (i.e. its receive method will be called several times sequentially) if its message queue it's not empty: http://akka.io/docs/akka/1.1.3/scala/dispatchers.html#id5
Blocking operations inside an actor will not "block" all actors, but if you share threads among actors (recommended usage), one of the threads of the dispatcher will be blocked until operations resume. So try composing futures as much as possible and beware of the time-out value).
3 and 4. I agree with Raymond answers.
What Raymond and paradigmatic said, but also, if you want to avoid starving the thread pool, you should wrap any blocking operations in scala.concurrent.blocking.
It's of course best to avoid blocking operations, but sometimes you need to use a library that blocks. If you wrap said code in blocking, it will let the execution context know you may be blocking this thread so it can allocate another one if needed.
The problem is worse than paradigmatic describes since if you have several blocking operations you may end up blocking all threads in the thread pool and have no free threads. You could end up with deadlock if all your threads are blocked on something that won't happen until another actor/future gets scheduled.
Here's an example:
import scala.concurrent.blocking
...
Future {
val image = blocking { load_image_from_potentially_slow_media() }
val enhanced = image.enhance()
blocking {
if (oracle.queryBetter(image, enhanced)) {
write_new_image(enhanced)
}
}
enhanced
}
Documentation is here.