Scala: Why are Actors lightweight? - scala

What makes actors so lightweight?
I'm not even sure how they work. Aren't they separate threads?

When they say lightweight they mean that each actor is not mapped to a single thread.
JVM offers shared memory threads with
locks as the primary form of
concurrency abstractions. But shared
memory threads are quite heavyweight
and incur severe performance penalties
from context switching overheads. For
an actor implementation based on a
one-to-one mapping with JVM threads,
the process payload per Scala actor
will not be as lightweight that we can
spawn a million instances of an actor
for a specific computation. Hence
Scala actors have been designed as
lightweight event objects, which get
scheduled and executed on an
underlying worker thread pool that
gets automatically resized when all
threads block on long running
operations. In fact, Scala implements
a unified model of actors - thread
based and event based. Scala actors
offer two form of suspension
mechanisms - a full stack frame
suspension(implemented as receive) and
a suspension based on a continuation
closure (implemented as react). In
case of event based actors, a wait on
react is represented by a continuation
closure, i.e. a closure that captures
the rest of the actor's computation.
When the suspended actor receives a
message that matches one of the
patterns specified in the actor, the
continuation is executed by scheduling
the task to one of the worker threads
from the underlying thread pool. The
paper "Actors that Unify Threads and
Events" by Haller and Odersky
discusses the details of the
implementation.
Source

Important Reference Actors that Unify Threads and Events
I don't think we should strengthen that actor is that lightweight.
firstly thread based actors are actor per thread so not lightweight at all.
event based actors are the point where we start to feel actors are light weight. it is light weight because it does not have working thread wait and switched to another , working thread just switch from a piece of data work to another piece of data work, thus keep spinning on effective calculations.

Related

Num of actor instance

I'm new to akka-actor and confused with some problems:
when I create an actorSystem, and use actorOf(Props(classOf[AX], ...)) to create actor in main method, how many instances are there for my actor AX?
If the answer to Q1 was just one, does this mean whatever data-structure I created in the AX actor class's definition will only appear in one thread and I should not concern about concurrency problems?
What if one of my actor's action (one case in receive method) is a time consuming task and would take quite long time to finish? Will my single Actor instance not responding until it finish that task?
If the answer to Q3 is right, what I am supposed to do to prevent my actor from not responding? Should I start another thread and send another message back to it until finish the task? Is there a best practice there I should follow?
yes, the actor system will only create 1 actor instance for each time you call the 'actorOf' method. However, when using a Router it is possible to create 1 router which spreads the load to any number of actors. So in that case it is possible to construct multiple instances, but 'normally' using actorOf just creates 1 instance.
Yes, within an actor you do not have to worry about concurrency because Akka guarantees that any actor only processes 1 message at the time. You must take care not to somehow mutate the state of the actor from code outside the actor. So whenever exposing the actor state, always do this using an immutable class. Case classes are excellent for this. But also be ware of modifying the actor state when completing a Future from inside the actor. Since the Future runs on it's own thread you could have a concurrency issue when the Future completes and the actor is processing a next message at the same time.
The actor executes on 1 thread at the time, but this might be a different thread each time the actor executes.
Akka is a highly concurrent and distributed framework, everything is asynchronous and non-blocking and you must do the same within your application. Scala and Akka provide several solutions to do this. Whenever you have a time consuming task within an actor you might either delegate the time consuming task to another actor just for this purpose, use Futures or use Scala's 'async/await/blocking'. When using 'blocking' you give a hint to the compiler/runtime a blocking action is done and the runtime might start additional thread to prevent thread starvation. The Scala Concurrent programming book is an excellent guide to learn this stuff. Also look at the concurrent package ScalaDocs and Neophyte's Guide to Scala.
If the actor really has to wait for the time consuming task to complete, then yes, your actor can only respond when that's finished. But this is a very 'request-response' way of thinking. Try to get away from this. The actor could also respond immediately indicating the task has started and send an additional message once the task has been completed.
With time consuming tasks always be sure to use a different threadpool so the ActorSystem will not be blocked because all of it's available threads are used up by time consuming tasks. For Future's you can provide a separate ExecutionContext (do not use the ActorSystem's Dispatch context for this!), but via Akka's configuration you can also configure certain actors to run on a different thread pool.
See 3.
Success!
one instance (if you declare a router in your props then (maybe) more than one)
Yes. This is one of the advantages of actors.
Yes. An Actor will process messages sequentially.
You can use scala.concurrent.Future (do not use actor state in the future) or delegate the work to a child actor (the main actor can manage the state and can respond to messages). Future or child-actor depends on use case.

Actors (scala/akka): is it implied that the receive method will be accessed in a threadsafe manner?

I assume that the messages will be received and processed in a threadsafe manner. However, I have been reading (some) akka/scala docs but I didn't encounter the keyword 'threadsafe' yet.
It is probably because the actor model assumes that each actor instance processes its own mailbox sequentially. That means it should never happen, that two or more concurrent threads execute single actor instance's code. Technically you could create a method in an actor's class (because it is still an object) and call it from multiple threads concurrently, but this would be a major departure from the actor's usage rules and you would do it "at your own risk", because then you would lose all thread-safety guarantees of that model.
This is also one of the reasons, why Akka introduced a concept of ActorRef - a handle, that lets you communicate with the actor through message passing, but not by calling its methods directly.
I think we have it pretty well documented: http://doc.akka.io/docs/akka/2.3.9/general/jmm.html
Actors are 'Treadsafe'. The Actor System (AKKA), provides each actor with its own 'light-weight thread'. Meaning that this is not a tread, but the AKKA system will give the impression that an Actor is always running in it's own thread to the developer. This means that any operations performed as a result of acting on a message are, for all purposes, thread safe.
However, you should not undermine AKKA by using mutable messages or public state. If you develop you actors to be stand alone units of functionality, then they will be threadsafe.
See also:
http://doc.akka.io/docs/akka/2.3.12/general/actors.html#State
and
http://doc.akka.io/docs/akka/2.3.12/general/jmm.html for a more indepth study of the AKKA memory model and how it manages 'tread' issues.

How does I/O work in Akka?

How does the actor model (in Akka) work when you need to perform I/O (ie. a database operation)?
It is my understanding that a blocking operation will throw an exception (and essentially ruin all concurrency due to the evented nature of Netty, which Akka uses). Hence I would have to use a Future or something similar - however I don't understand the concurrency model.
Can 1 actor be processing multiple message simultaneously?
If an actor makes a blocking call in a future (ie. future.get()) does that block only the current actor's execution; or will it prevent execution on all actors until the blocking call has completed?
If it blocks all execution, how does using a future assist concurrency (ie. wouldn't invoking blocking calls in a future still amount to creating an actor and executing the blocking call)?
What is the best way to deal with a multi-staged process (ie. read from the database; call a blocking webservice; read from the database; write to the database) where each step is dependent on the last?
The basic context is this:
I'm using a Websocket server which will maintain thousands of sessions.
Each session has some state (ie. authentication details, etc);
The Javascript client will send a JSON-RPC message to the server, which will pass it to the appropriate session actor, which will execute it and return a result.
Execution of the RPC call will involve some I/O and blocking calls.
There will be a large number of concurrent requests (each user will be making a significant amount of requests over the WebSocket connection and there will be a lot of users).
Is there a better way to achieve this?
Blocking operations do not throw exceptions in Akka. You can do blocking calls from an Actor (which you probably want to minimize, but thats another story).
no, 1 actor instance cannot.
It will not block any other actors. You can influence this by using a specific Dispatcher. Futures use the default dispatcher (the global event driven one normally) so it runs on a thread in a pool. You can choose which dispatcher you want to use for your actors (per actor, or for all). I guess if you really wanted to create a problem you might be able to pass exactly the same (thread based) dispatcher to futures and actors, but that would take some intent from your part. I guess if you have a huge number of futures blocking indefinitely and the executorservice has been configured to a fixed amount of threads, you could blow up the executorservice. So a lot of 'ifs'. a f.get blocks only if the Future has not completed yet. It will block the 'current thread' of the Actor from which you call it (if you call it from an Actor, which is not necessary by the way)
you do not necessarily have to block. you can use a callback instead of f.get. You can even compose Futures without blocking. check out talk by Viktor on 'the promising future of akka' for more details: http://skillsmatter.com/podcast/scala/talk-by-viktor-klang
I would use async communication between the steps (if the steps are meaningful processes on their own), so use an actor for every step, where every actor sends a oneway message to the next, possibly also oneway messages to some other actor that will not block which can supervise the process. This way you could create chains of actors, of which you could make many, in front of it you could put a load balancing actor, so that if one actor blocks in one chain another of the same type might not in the other chain. That would also work for your 'context' question, pass of workload to local actors, chain them up behind a load balancing actor.
As for netty (and I assume you mean Remote Actors, because this is the only thing that netty is used for in Akka), pass of your work as soon as possible to a local actor or a future (with callback) if you are worried about timing or preventing netty to do it's job in some way.
Blocking operations will generally not throw exceptions, but waiting on a future (for example by using !! or !!! send methods) can throw a time out exception. That's why you should stick with fire-and-forget as much as possible, use a meaningful time-out value and prefer callbacks when possible.
An akka actor cannot explicitly process several messages in a row, but you can play with the throughput value via the config file. The actor will then process several message (i.e. its receive method will be called several times sequentially) if its message queue it's not empty: http://akka.io/docs/akka/1.1.3/scala/dispatchers.html#id5
Blocking operations inside an actor will not "block" all actors, but if you share threads among actors (recommended usage), one of the threads of the dispatcher will be blocked until operations resume. So try composing futures as much as possible and beware of the time-out value).
3 and 4. I agree with Raymond answers.
What Raymond and paradigmatic said, but also, if you want to avoid starving the thread pool, you should wrap any blocking operations in scala.concurrent.blocking.
It's of course best to avoid blocking operations, but sometimes you need to use a library that blocks. If you wrap said code in blocking, it will let the execution context know you may be blocking this thread so it can allocate another one if needed.
The problem is worse than paradigmatic describes since if you have several blocking operations you may end up blocking all threads in the thread pool and have no free threads. You could end up with deadlock if all your threads are blocked on something that won't happen until another actor/future gets scheduled.
Here's an example:
import scala.concurrent.blocking
...
Future {
val image = blocking { load_image_from_potentially_slow_media() }
val enhanced = image.enhance()
blocking {
if (oracle.queryBetter(image, enhanced)) {
write_new_image(enhanced)
}
}
enhanced
}
Documentation is here.

Using Actors to exploit cores

I'm new to Scala in general and Actors in particular and my problem is so basic, the online resources I have found don't cover it.
I have a CPU-intensive, easily parallelized algorithm that will be run on an n-core machine (I don't know n). How do I implement this in Actors so that all available cores address the problem?
The first way I thought of was to simple break the problem into m pieces (where m is some medium number like 10,000) and create m Actors, one for each piece, give each Actor its little piece and let 'em go.
Somehow, this struck me as inefficient. Zillions of Actors just hanging around, waiting for some CPU love, pointlessly switching contexts...
Then I thought, make some smaller number of Actors, and feed each one several pieces. The problem was, there's no reason to expect the pieces are the same size, so one core might get bogged down, with many of its tasks still queued, while other cores are idle.
I noodled around with a Supervisor that knew which Actors were busy, and eventually realized that this has to be a solved problem. There must be a standard pattern (maybe even a standard library) for dealing with this very generic issue. Any suggestions?
Take a look at the Akka library, which includes an implementaton of actors. The Dispatchers Module gives you more options for limiting actors to cpu threads (HawtDispatch-based event-driven) and/or balancing the workload (Work-stealing event-based).
Generally, there're 2 kinds of actors: those that are tied to threads (one thread per actor), and those that share 1+ thread, working behind a scheduler/dispatcher that allocates resources (= possibility to execute a task/handle incoming message against controlled thread-pool or a single thread).
I assume, you use second type of actors - event-driven actors, because you mention that you run 10k of them. No matter how many event-driven actors you have (thousands or millions), all of them will be fighting for the small thread pool to handle the message. Therefore, you will even have a worse performance dividing your task queue into that huge number of portions - scheduler will either try to handle messages sent to 10k actors against a fixed thread pool (which is slow), or will allocate new threads in the pool (if the pool is not bounded), which is dangerous (in the worst case, there will be started 10k threads to handle messages).
Event-driven actors are good for short-time (ideally, non-blocking) tasks. If you're dealing with CPU-intensive tasks I'd limit number of threads in the scheduler/dispatcher pool (when you use event-driven actors) or actors themselves (when you use thread-based actors) to the number of cores to achieve the best performance.
If you want this to be done automatically (adjust number of threads in dispatcher pool to the number of cores), you should use HawtDisaptch (or it's Akka implementation), as it was proposed earlier:
The 'HawtDispatcher' uses the
HawtDispatch threading library which
is a Java clone of libdispatch. All
actors with this type of dispatcher
are executed on a single system wide
fixed sized thread pool. The number of
of threads will match the number of
cores available on your system. The
dispatcher delivers messages to the
actors in the order that they were
producer at the sender.
You should look into Futures I think. In fact, you probably need a threadpool which simply queues threads when a max number of threads has been reached.
Here is a small example involving futures: http://blog.tackley.net/2010/01/scala-futures.html
I would also suggest that you don't pay too much attention to the context switching since you really can't do anything but rely on the underlying implementation. Of course a rule of thumb would be to keep the active threads around the number of physical cores, but as I noted above this could be handled by a threadpool with a fifo-queue.
NOTE that I don't know if Actors in general or futures are implemented with this kind of pool.
For thread pools, look at this: http://www.scala-lang.org/api/current/scala/concurrent/ThreadPoolRunner.html
and maybe this: http://www.scala-lang.org/api/current/scala/actors/scheduler/ResizableThreadPoolScheduler.html
Good luck
EDIT
Check out this piece of code using futures:
import scala.actors.Futures._
object FibFut {
def fib(i: Int): Int = if (i < 2) 1 else fib(i - 1) + fib(i - 2)
def main(args: Array[String]) {
val fibs = for (i <- 0 to 42) yield future { fib(i) }
for (future <- fibs) println(future())
}
}
It showcases a very good point about futures, namely that you decide in which order to receive the results (as opposed to the normal mailbox-system which employs a fifo-system i.e. the fastest actor sends his result first).
For any significant project, I generally have a supervisor actor, a collection of worker actors each of which can do any work necessary, and a large number of pieces of work to do. Even though I do this fairly often, I've never put it in a (personal) library because the operations end up being so different each time, and the overhead is pretty small compared to the whole coding project.
Be aware of actor starvation if you end up utilizing the general actor threadpool. I ended up simply using my own algorithm-task-owned threadpool to handle the parallelization of a long-running, concurrent task.
The upcoming Scala 2.9 is expected to include parallel data structures which should automatically handle this for some uses. While it does not use Actors, it may be something to consider for your problem.
While this feature was originally slated for 2.8, it has been postponed until the next major release.
A presentation from the last ScalaDays is here:
http://days2010.scala-lang.org/node/138/140

How to limit concurrency when using actors in Scala?

I'm coming from Java, where I'd submit Runnables to an ExecutorService backed by a thread pool. It's very clear in Java how to set limits to the size of the thread pool.
I'm interested in using Scala actors, but I'm unclear on how to limit concurrency.
Let's just say, hypothetically, that I'm creating a web service which accepts "jobs". A job is submitted with POST requests, and I want my service to enqueue the job then immediately return 202 Accepted — i.e. the jobs are handled asynchronously.
If I'm using actors to process the jobs in the queue, how can I limit the number of simultaneous jobs that are processed?
I can think of a few different ways to approach this; I'm wondering if there's a community best practice, or at least, some clearly established approaches that are somewhat standard in the Scala world.
One approach I've thought of is having a single coordinator actor which would manage the job queue and the job-processing actors; I suppose it could use a simple int field to track how many jobs are currently being processed. I'm sure there'd be some gotchyas with that approach, however, such as making sure to track when an error occurs so as to decrement the number. That's why I'm wondering if Scala already provides a simpler or more encapsulated approach to this.
BTW I tried to ask this question a while ago but I asked it badly.
Thanks!
I'd really encourage you to have a look at Akka, an alternative Actor implementation for Scala.
http://www.akkasource.org
Akka already has a JAX-RS[1] integration and you could use that in concert with a LoadBalancer[2] to throttle how many actions can be done in parallell:
[1] http://doc.akkasource.org/rest
[2] http://github.com/jboner/akka/blob/master/akka-patterns/src/main/scala/Patterns.scala
You can override the system properties actors.maxPoolSize and actors.corePoolSize which limit the size of the actor thread pool and then throw as many jobs at the pool as your actors can handle. Why do you think you need to throttle your reactions?
You really have two problems here.
The first is keeping the thread pool used by actors under control. That can be done by setting the system property actors.maxPoolSize.
The second is runaway growth in the number of tasks that have been submitted to the pool. You may or may not be concerned with this one, however it is fully possible to trigger failure conditions such as out of memory errors and in some cases potentially more subtle problems by generating too many tasks too fast.
Each worker thread maintains a dequeue of tasks. The dequeue is implemented as an array that the worker thread will dynamically enlarge up to some maximum size. In 2.7.x the queue can grow itself quite large and I've seen that trigger out of memory errors when combined with lots of concurrent threads. The max dequeue size is smaller 2.8. The dequeue can also fill up.
Addressing this problem requires you control how many tasks you generate, which probably means some sort of coordinator as you've outlined. I've encountered this problem when the actors that initiate a kind of data processing pipeline are much faster than ones later in the pipeline. In order control the process I usually have the actors later in the chain ping back actors earlier in the chain every X messages, and have the ones earlier in the chain stop after X messages and wait for the ping back. You could also do it with a more centralized coordinator.