How to organize concurrent request processing in Scala? - scala

Suppose I have a server in Scala, which processes incoming client requests. I have a function def process(req: Request): Response to process requests. Now I would like to process only K requests concurrently and keep M requests in queue.
In Java I would probably create a ThreadPoolExecutor with K threads and a queue of size M. Now I wonder how to do that in Scala with Actors/Futures etc.

If you must have def process(req: Request): Response), then I think your Scala solution may turn out to be similar to Java. If you can have def process(req: Request): Future[Response], the it opens other possibilities.
When using futures, you provide (implicitly or explicitly) an execution context that can be constructed from a Java executor. So you would be able to choose your thread pool size and queue size that way. The benefit of using futures is that you can compose them with map and flatMap and a few other combinators. See http://docs.scala-lang.org/overviews/core/futures.html for more information.
With actors, you have another model of concurrency where you can create K actors. You can have a router that dispatches each request to the actors. Each actor is independently processes the request and sends the Response to whatever needs it when processing is complete. The nice part about actors is that each actor being independent does not typically share anything with other actors, so you don't have to synchronize the code. See http://doc.akka.io/docs/akka/2.1.4/general/terminology.html for more information.
Overall I think Scala can use anything that Java has, and provides more mechanisms to do more complex things.

Related

When to create an Akka Actor

I have a REST service which services only one POST request. I want to use an actor to process the request. However I don't know if I should create one actor and derive all the requests using this actor or should I create an actor every time I get a request. What are the pros and cons of these choices.
Also, how is it parallel execution when I create one actor and use that actor to process all my requests. It certainly looks like sequential execution. I would want to understand this as well.
If you use one Actor requests are queued inside the actor mail box and are processed one by one by the actor. This is sequential and not recommended.
Thats why it is said
One actor is no actor.
Create a manager Actor which manages other actors. As actors are quite cheap you can create one actor for every request without any problem.
Do db interactions and other heavy computation using a future and direct results of the future to request handling actor using pipeTo pattern.
Use actors only to divide and distribute work and use Futures to do compute intensive work.
I would create an actor per request and use the "tell" pattern to delegate the work to the newly created actor. If the REST framework you use supports completing the request from another actor (Spray, Akka-HTTP does), then you can complete the request from this new actor. This way your request handling actor is free to handle the next request.
I find this a wonderful resource that explains the pros & cons of ask & tell and per-request-actors. It can be helpful to you.
I agree with what #pamu said. Actors are cheap. But be mindful that if ever you are gonna use a singleton Actor, do not make it stateful it will cause trouble.
And if you are gonna use Futures to do intensive work (which you should do). Make sure you give them specific ExecutionContext / Dispatcher. Using the global dispatcher or ExecutionContext is not good.
Or in each api you have, create a certain dispatcher to control the # of Actors that will work on that kind of endpoint / api.
For example you have "/get/transactions"
specify a dispatcher that would only spawn this # of thread. For this api.
The advantage of this is you can control the # of threads and resources your app uses. When it comes to dealing with heavy traffic. This is a good practice.

Akka - How many instances of an actor should you create?

I'm new to the Akka framework and I'm building an HTTP server application on top of Netty + Akka.
My idea so far is to create an actor for each type of request. E.g. I would have an actor for a POST to /my-resource and another actor for a GET to /my-resource.
Where I'm confused is how I should go about actor creation? Should I:
Create a new actor for every request (by this I mean for every request should I do a TypedActor.newInstance() of the appropriate actor)? How expensive is it to create a new actor?
Create one instance of each actor on server start up and use that actor instance for every request? I've read that an actor can only process one message at a time, so couldn't this be a bottle neck?
Do something else?
Thanks for any feedback.
Well, you create an Actor for each instance of mutable state that you want to manage.
In your case, that might be just one actor if my-resource is a single object and you want to treat each request serially - that easily ensures that you only return consistent states between modifications.
If (more likely) you manage multiple resources, one actor per resource instance is usually ideal unless you run into many thousands of resources. While you can also run per-request actors, you'll end up with a strange design if you don't think about the state those requests are accessing - e.g. if you just create one Actor per POST request, you'll find yourself worrying how to keep them from concurrently modifying the same resource, which is a clear indication that you've defined your actors wrongly.
I usually have fairly trivial request/reply actors whose main purpose it is to abstract the communication with external systems. Their communication with the "instance" actors is then normally limited to one request/response pair to perform the actual action.
If you are using Akka, you can create an actor per request. Akka is extremely slim on resources and you can create literarily millions of actors on an pretty ordinary JVM heap. Also, they will only consume cpu/stack/threads when they actually do something.
A year ago I made a comparison between the resource consumption of the thread-based and event-based standard actors. And Akka is even better than the event-base.
One of the big points of Akka in my opinion is that it allows you to design your system as "one actor per usage" where earlier actor systems often forced you to do "use only actors for shared services" due to resource overhead.
I would recommend that you go for option 1.
Options 1) or 2) have both their drawbacks. So then, let's use options 3) Routing (Akka 2.0+)
Router is an element which act as a load balancer, routing the requests to other Actors which will perform the task needed.
Akka provides different Router implementations with different logic to route a message (for example SmallestMailboxPool or RoundRobinPool).
Every Router may have several children and its task is to supervise their Mailbox to further decide where to route the received message.
//This will create 5 instances of the actor ExampleActor
//managed and supervised by a RoundRobinRouter
ActorRef roundRobinRouter = getContext().actorOf(
Props.create(ExampleActor.class).withRouter(new RoundRobinRouter(5)),"router");
This procedure is well explained in this blog.
It's quite a reasonable option, but whether it's suitable depends on specifics of your request handling.
Yes, of course it could.
For many cases the best thing to do would be to just have one actor responding to every request (or perhaps one actor per type of request), but the only thing this actor does is to forward the task to another actor (or spawn a Future) which will actually do the job.
For scaling up the serial requests handling, add a master actor (Supervisor) which in turn will delegate to the worker actors (Children) (round-robin fashion).

How does I/O work in Akka?

How does the actor model (in Akka) work when you need to perform I/O (ie. a database operation)?
It is my understanding that a blocking operation will throw an exception (and essentially ruin all concurrency due to the evented nature of Netty, which Akka uses). Hence I would have to use a Future or something similar - however I don't understand the concurrency model.
Can 1 actor be processing multiple message simultaneously?
If an actor makes a blocking call in a future (ie. future.get()) does that block only the current actor's execution; or will it prevent execution on all actors until the blocking call has completed?
If it blocks all execution, how does using a future assist concurrency (ie. wouldn't invoking blocking calls in a future still amount to creating an actor and executing the blocking call)?
What is the best way to deal with a multi-staged process (ie. read from the database; call a blocking webservice; read from the database; write to the database) where each step is dependent on the last?
The basic context is this:
I'm using a Websocket server which will maintain thousands of sessions.
Each session has some state (ie. authentication details, etc);
The Javascript client will send a JSON-RPC message to the server, which will pass it to the appropriate session actor, which will execute it and return a result.
Execution of the RPC call will involve some I/O and blocking calls.
There will be a large number of concurrent requests (each user will be making a significant amount of requests over the WebSocket connection and there will be a lot of users).
Is there a better way to achieve this?
Blocking operations do not throw exceptions in Akka. You can do blocking calls from an Actor (which you probably want to minimize, but thats another story).
no, 1 actor instance cannot.
It will not block any other actors. You can influence this by using a specific Dispatcher. Futures use the default dispatcher (the global event driven one normally) so it runs on a thread in a pool. You can choose which dispatcher you want to use for your actors (per actor, or for all). I guess if you really wanted to create a problem you might be able to pass exactly the same (thread based) dispatcher to futures and actors, but that would take some intent from your part. I guess if you have a huge number of futures blocking indefinitely and the executorservice has been configured to a fixed amount of threads, you could blow up the executorservice. So a lot of 'ifs'. a f.get blocks only if the Future has not completed yet. It will block the 'current thread' of the Actor from which you call it (if you call it from an Actor, which is not necessary by the way)
you do not necessarily have to block. you can use a callback instead of f.get. You can even compose Futures without blocking. check out talk by Viktor on 'the promising future of akka' for more details: http://skillsmatter.com/podcast/scala/talk-by-viktor-klang
I would use async communication between the steps (if the steps are meaningful processes on their own), so use an actor for every step, where every actor sends a oneway message to the next, possibly also oneway messages to some other actor that will not block which can supervise the process. This way you could create chains of actors, of which you could make many, in front of it you could put a load balancing actor, so that if one actor blocks in one chain another of the same type might not in the other chain. That would also work for your 'context' question, pass of workload to local actors, chain them up behind a load balancing actor.
As for netty (and I assume you mean Remote Actors, because this is the only thing that netty is used for in Akka), pass of your work as soon as possible to a local actor or a future (with callback) if you are worried about timing or preventing netty to do it's job in some way.
Blocking operations will generally not throw exceptions, but waiting on a future (for example by using !! or !!! send methods) can throw a time out exception. That's why you should stick with fire-and-forget as much as possible, use a meaningful time-out value and prefer callbacks when possible.
An akka actor cannot explicitly process several messages in a row, but you can play with the throughput value via the config file. The actor will then process several message (i.e. its receive method will be called several times sequentially) if its message queue it's not empty: http://akka.io/docs/akka/1.1.3/scala/dispatchers.html#id5
Blocking operations inside an actor will not "block" all actors, but if you share threads among actors (recommended usage), one of the threads of the dispatcher will be blocked until operations resume. So try composing futures as much as possible and beware of the time-out value).
3 and 4. I agree with Raymond answers.
What Raymond and paradigmatic said, but also, if you want to avoid starving the thread pool, you should wrap any blocking operations in scala.concurrent.blocking.
It's of course best to avoid blocking operations, but sometimes you need to use a library that blocks. If you wrap said code in blocking, it will let the execution context know you may be blocking this thread so it can allocate another one if needed.
The problem is worse than paradigmatic describes since if you have several blocking operations you may end up blocking all threads in the thread pool and have no free threads. You could end up with deadlock if all your threads are blocked on something that won't happen until another actor/future gets scheduled.
Here's an example:
import scala.concurrent.blocking
...
Future {
val image = blocking { load_image_from_potentially_slow_media() }
val enhanced = image.enhance()
blocking {
if (oracle.queryBetter(image, enhanced)) {
write_new_image(enhanced)
}
}
enhanced
}
Documentation is here.

Using Actors to exploit cores

I'm new to Scala in general and Actors in particular and my problem is so basic, the online resources I have found don't cover it.
I have a CPU-intensive, easily parallelized algorithm that will be run on an n-core machine (I don't know n). How do I implement this in Actors so that all available cores address the problem?
The first way I thought of was to simple break the problem into m pieces (where m is some medium number like 10,000) and create m Actors, one for each piece, give each Actor its little piece and let 'em go.
Somehow, this struck me as inefficient. Zillions of Actors just hanging around, waiting for some CPU love, pointlessly switching contexts...
Then I thought, make some smaller number of Actors, and feed each one several pieces. The problem was, there's no reason to expect the pieces are the same size, so one core might get bogged down, with many of its tasks still queued, while other cores are idle.
I noodled around with a Supervisor that knew which Actors were busy, and eventually realized that this has to be a solved problem. There must be a standard pattern (maybe even a standard library) for dealing with this very generic issue. Any suggestions?
Take a look at the Akka library, which includes an implementaton of actors. The Dispatchers Module gives you more options for limiting actors to cpu threads (HawtDispatch-based event-driven) and/or balancing the workload (Work-stealing event-based).
Generally, there're 2 kinds of actors: those that are tied to threads (one thread per actor), and those that share 1+ thread, working behind a scheduler/dispatcher that allocates resources (= possibility to execute a task/handle incoming message against controlled thread-pool or a single thread).
I assume, you use second type of actors - event-driven actors, because you mention that you run 10k of them. No matter how many event-driven actors you have (thousands or millions), all of them will be fighting for the small thread pool to handle the message. Therefore, you will even have a worse performance dividing your task queue into that huge number of portions - scheduler will either try to handle messages sent to 10k actors against a fixed thread pool (which is slow), or will allocate new threads in the pool (if the pool is not bounded), which is dangerous (in the worst case, there will be started 10k threads to handle messages).
Event-driven actors are good for short-time (ideally, non-blocking) tasks. If you're dealing with CPU-intensive tasks I'd limit number of threads in the scheduler/dispatcher pool (when you use event-driven actors) or actors themselves (when you use thread-based actors) to the number of cores to achieve the best performance.
If you want this to be done automatically (adjust number of threads in dispatcher pool to the number of cores), you should use HawtDisaptch (or it's Akka implementation), as it was proposed earlier:
The 'HawtDispatcher' uses the
HawtDispatch threading library which
is a Java clone of libdispatch. All
actors with this type of dispatcher
are executed on a single system wide
fixed sized thread pool. The number of
of threads will match the number of
cores available on your system. The
dispatcher delivers messages to the
actors in the order that they were
producer at the sender.
You should look into Futures I think. In fact, you probably need a threadpool which simply queues threads when a max number of threads has been reached.
Here is a small example involving futures: http://blog.tackley.net/2010/01/scala-futures.html
I would also suggest that you don't pay too much attention to the context switching since you really can't do anything but rely on the underlying implementation. Of course a rule of thumb would be to keep the active threads around the number of physical cores, but as I noted above this could be handled by a threadpool with a fifo-queue.
NOTE that I don't know if Actors in general or futures are implemented with this kind of pool.
For thread pools, look at this: http://www.scala-lang.org/api/current/scala/concurrent/ThreadPoolRunner.html
and maybe this: http://www.scala-lang.org/api/current/scala/actors/scheduler/ResizableThreadPoolScheduler.html
Good luck
EDIT
Check out this piece of code using futures:
import scala.actors.Futures._
object FibFut {
def fib(i: Int): Int = if (i < 2) 1 else fib(i - 1) + fib(i - 2)
def main(args: Array[String]) {
val fibs = for (i <- 0 to 42) yield future { fib(i) }
for (future <- fibs) println(future())
}
}
It showcases a very good point about futures, namely that you decide in which order to receive the results (as opposed to the normal mailbox-system which employs a fifo-system i.e. the fastest actor sends his result first).
For any significant project, I generally have a supervisor actor, a collection of worker actors each of which can do any work necessary, and a large number of pieces of work to do. Even though I do this fairly often, I've never put it in a (personal) library because the operations end up being so different each time, and the overhead is pretty small compared to the whole coding project.
Be aware of actor starvation if you end up utilizing the general actor threadpool. I ended up simply using my own algorithm-task-owned threadpool to handle the parallelization of a long-running, concurrent task.
The upcoming Scala 2.9 is expected to include parallel data structures which should automatically handle this for some uses. While it does not use Actors, it may be something to consider for your problem.
While this feature was originally slated for 2.8, it has been postponed until the next major release.
A presentation from the last ScalaDays is here:
http://days2010.scala-lang.org/node/138/140

An Actor "queue"?

In Java, to write a library that makes requests to a server, I usually implement some sort of dispatcher (not unlike the one found here in the Twitter4J library: http://github.com/yusuke/twitter4j/blob/master/twitter4j-core/src/main/java/twitter4j/internal/async/DispatcherImpl.java) to limit the number of connections, to perform asynchronous tasks, etc.
The idea is that N number of threads are created. A "Task" is queued and all threads are notified, and one of the threads, when it's ready, will pop an item from the queue, do the work, and then return to a waiting state. If all the threads are busy working on a Task, then the Task is just queued, and the next available thread will take it.
This keeps the max number of connections to N, and allows at most N Tasks to be operating at the same time.
I'm wondering what kind of system I can create with Actors that will accomplish the same thing? Is there a way to have N number of Actors, and when a new message is ready, pass it off to an Actor to handle it - and if all Actors are busy, just queue the message?
Akka Framework is designed to solve this kind of problems, and is exactly what you're looking for.
Look thru this docu - there're lots of highly configurable dispathers (event-based, thread-based, load-balanced, work-stealing, etc.) that manage actors mailboxes, and allow them to work in conjunction. You may also find interesting this blog post.
E.g. this code instantiates new Work Stealing Dispatcher based on the fixed thread pool, that fulfils load balancing among the actors it supervises:
val workStealingDispatcher = Dispatchers.newExecutorBasedEventDrivenWorkStealingDispatcher("pooled-dispatcher")
workStealingDispatcher
.withNewThreadPoolWithLinkedBlockingQueueWithUnboundedCapacity
.setCorePoolSize(16)
.buildThreadPool
Actor that uses the dispatcher:
class MyActor extends Actor {
messageDispatcher = workStealingDispatcher
def receive = {
case _ =>
}
}
Now, if you start 2+ instances of the actor, dispatcher will balance the load between the mailboxes (queues) of the actors (actor that has too much messages in the mailbox will "donate" some to the actors that has nothing to do).
Well, you have to see about the actors scheduler, as actors are not usually 1-to-1 with threads. The idea behind actors is that you may have many of them, but the actual number of threads will be limited to something reasonable. They are not supposed to be long running either, but rather quickly answering to messages they receive. In short, the architecture of that code seems to be wholly at odds with how one would design an actor system.
Still, each working actor may send a message to a Queue actor asking for the next task, and then loop back to react. This Queue actor would receive either queueing messages, or dequeuing messages. It could be designed like this:
val q: Queue[AnyRef] = new Queue[AnyRef]
loop {
react {
case Enqueue(d) => q enqueue d
case Dequeue(a) if q.nonEmpty => a ! (q dequeue)
}
}