Processing concurrently in Scala - scala

As in my own answer to my own question, I have the situation whereby I am processing a large number of events which arrive on a queue. Each event is handled in exactly the same manner and each even can be handled independently of all other events.
My program takes advantage of the Scala concurrency framework and many of the processes involved are modelled as Actors. As Actors process their messages sequentially, they are not well-suited to this particular problem (even though my other actors are performing actions which are sequential). As I want Scala to "control" all thread creation (which I assume is the point of it having a concurrency system in the first place) it seems I have 2 choices:
Send the events to a pool of event processors, which I control
get my Actor to process them concurrently by some other mechanism
I would have thought that #1 negates the point of using the actors subsystem: how many processor actors should I create? being one obvious question. These things are supposedly hidden from me and solved by the subsystem.
My answer was to do the following:
val eventProcessor = actor {
loop {
react {
case MyEvent(x) =>
//I want to be able to handle multiple events at the same time
//create a new actor to handle it
actor {
//processing code here
process(x)
}
}
}
}
Is there a better approach? Is this incorrect?
edit: A possibly better approach is:
val eventProcessor = actor {
loop {
react {
case MyEvent(x) =>
//Pass processing to the underlying ForkJoin framework
Scheduler.execute(process(e))
}
}
}

This seems like a duplicate of another question. So I'll duplicate my answer
Actors process one message at a time. The classic pattern to process multiple messages is to have one coordinator actor front for a pool of consumer actors. If you use react then the consumer pool can be large but will still only use a small number of JVM threads. Here's an example where I create a pool of 10 consumers and one coordinator to front for them.
import scala.actors.Actor
import scala.actors.Actor._
case class Request(sender : Actor, payload : String)
case class Ready(sender : Actor)
case class Result(result : String)
case object Stop
def consumer(n : Int) = actor {
loop {
react {
case Ready(sender) =>
sender ! Ready(self)
case Request(sender, payload) =>
println("request to consumer " + n + " with " + payload)
// some silly computation so the process takes awhile
val result = ((payload + payload + payload) map {case '0' => 'X'; case '1' => "-"; case c => c}).mkString
sender ! Result(result)
println("consumer " + n + " is done processing " + result )
case Stop => exit
}
}
}
// a pool of 10 consumers
val consumers = for (n <- 0 to 10) yield consumer(n)
val coordinator = actor {
loop {
react {
case msg # Request(sender, payload) =>
consumers foreach {_ ! Ready(self)}
react {
// send the request to the first available consumer
case Ready(consumer) => consumer ! msg
}
case Stop =>
consumers foreach {_ ! Stop}
exit
}
}
}
// a little test loop - note that it's not doing anything with the results or telling the coordinator to stop
for (i <- 0 to 1000) coordinator ! Request(self, i.toString)
This code tests to see which consumer is available and sends a request to that consumer. Alternatives are to just randomly assign to consumers or to use a round robin scheduler.
Depending on what you are doing, you might be better served with Scala's Futures. For instance, if you don't really need actors then all of the above machinery could be written as
import scala.actors.Futures._
def transform(payload : String) = {
val result = ((payload + payload + payload) map {case '0' => 'X'; case '1' => "-"; case c => c}).mkString
println("transformed " + payload + " to " + result )
result
}
val results = for (i <- 0 to 1000) yield future(transform(i.toString))

If the events can all be handled independently, why are they on a queue? Knowing nothing else about your design, this seems like an unnecessary step. If you could compose the process function with whatever is firing those events, you could potentially obviate the queue.
An actor essentially is a concurrent effect equipped with a queue. If you want to process multiple messages simultaneously, you don't really want an actor. You just want a function (Any => ()) to be scheduled for execution at some convenient time.
Having said that, your approach is reasonable if you want to stay within the actors library and if the event queue is not within your control.
Scalaz makes a distinction between Actors and concurrent Effects. While its Actor is very light-weight, scalaz.concurrent.Effect is lighter still. Here's your code roughly translated to the Scalaz library:
val eventProcessor = effect (x => process x)
This is with the latest trunk head, not yet released.

This sounds like a simple consumer/producer problem. I'd use a queue with a pool of consumers. You could probably write this with a few lines of code using java.util.concurrent.

The purpose of an actor (well, one of them) is to ensure that the state within the actor can only be accessed by a single thread at a time. If the processing of a message doesn't depend on any mutable state within the actor, then it would probably be more appropriate to just submit a task to a scheduler or a thread pool to process. The extra abstraction that the actor provides is actually getting in your way.
There are convenient methods in scala.actors.Scheduler for this, or you could use an Executor from java.util.concurrent.

Actors are much more lightweight than threads, and as such one other option is to use actor objects like Runnable objects you are used to submitting to a Thread Pool. The main difference is you do not need to worry about the ThreadPool - the thread pool is managed for you by the actor framework and is mostly a configuration concern.
def submit(e: MyEvent) = actor {
// no loop - the actor exits immediately after processing the first message
react {
case MyEvent(x) =>
process(x)
}
} ! e // immediately send the new actor a message
Then to submit a message, say this:
submit(new MyEvent(x))
, which corresponds to
eventProcessor ! new MyEvent(x)
from your question.
Tested this pattern successfully with 1 million messages sent and received in about 10 seconds on a quad-core i7 laptop.
Hope this helps.

Related

Why can't Actors complete all work although I created 10000 of them?

I create 10000 actors and send a message to each, but it seems that the akka system can't complete all the work.
when I check the thread state, they are all in TIMED_WATIING.
My code:
class Reciver extends Actor {
val log = Logging(context.system, this)
var num = 0
def receive = {
case a: Int => log.info(s"[${self.path}] receive $a, num is $num")
Thread.sleep(2000)
log.info(s"[${self.path}] processing $a, num is $num")
num = a
}
}
object ActorSyncOrAsync extends App {
val system = ActorSystem("mysys")
for (i <- 0 to 10000) {
val actor = system.actorOf(Props[Reciver])
actor ! i
}
println("main thread send request complete")
}
You should remove Thread.sleep or (if you're using default thread-pool) surround it with:
scala.concurrent.blocking {
Thread.sleep(2000)
}
scala.concurrent.blocking marks the computation to have a managed blocking, which means that it tells the pool that computation is not taking CPU resources but just waits for some result or timeout. You should be careful with this however. So, basically, this advice works if you're using Thread.sleep for debugging purposes or just to emulate some activity - no Thread.sleep (even surrounded by blocking) should take place in production code.
Explanation:
When some fixed pool is used (including fork-join as it doesn't steal work from threads blocked by Thread.sleep) - there is only POOL_SIZE threads (it equals to the number of cores in your system by default) is used for computation. Everything else is going to be queued.
So, let's say 4 cores, 2 seconds per task, 10000 tasks - it's gonna take 2*10000/4 = 5000 seconds.
The general advice is to not block (including Thread.sleep) inside your actors: Blocking needs careful management. If you need to delay some action it's better to use Scheduler (as #Lukasz mentioned): http://doc.akka.io/docs/akka/2.4.4/scala/scheduler.html

How to run futures on the current actor's dispatcher in Akka

Akka's documentation warns:
When using future callbacks, such as onComplete, onSuccess, and onFailure, inside actors you need to carefully avoid closing over the containing actor’s reference, i.e. do not call methods or access mutable state on the enclosing actor from within the callback
It seems to me that if I could get the Future which wants to access the mutable state to run on the same dispatcher that arranges for mutual exclusion of threads handling actor messages then this issue could be avoided. Is that possible? (Why not?)
The ExecutionContext provided by context.dispatcher is not tied to the actor messages dispatcher, but what if it were? i.e.
class MyActorWithSafeFutures {
implicit def safeDispatcher = context.dispatcherOnMessageThread
var successCount = 0
var failureCount = 0
override def receive: Receive = {
case MakeExternalRequest(req) =>
val response: Future[Response] = someClient.makeRequest(req)
response.onComplete {
case Success(_) => successCount += 1
case Failure(_) => failureCount += 1
}
response pipeTo sender()
}
}
}
Is there any way to do that in Akka?
(I know that I could convert the above example to do something like self ! IncrementSuccess, but this question is about mutating actor state from Futures, rather than via messages.)
It looks like I might be able to implement this myself, using code like the following:
class MyActorWithSafeFutures {
implicit val executionContext: ExecutionContextExecutor = new ExecutionContextExecutor {
override def execute(runnable: Runnable): Unit = {
self ! runnable
}
override def reportFailure(cause: Throwable): Unit = {
throw new Error("Unhandled throwable", cause)
}
}
override def receive: Receive = {
case runnable: Runnable => runnable.run()
... other cases here
}
}
Would that work? Why doesn't Akka offer that - is there some huge drawback I'm not seeing?
(See https://github.com/jducoeur/Requester for a library which does just this in a limited way -- for Asks only, not for all Future callbacks.)
Your actor is executing its receive under one of the dispatcher's threads, and you want to spin off a Future that's firmly attached to this particular thread? In that case the system can't reuse this thread to run a different actor, because that would mean the thread was unavailable when you wanted to execute the Future. If it happened to use that same thread to execute someClient, you might deadlock with yourself. So this thread can no longer be used freely to run other actors - it has to belong to MySafeActor.
And no other threads can be allowed to freely run MySafeActor - if they were, two different threads might try to update successCount at the same time and you'd lose data (e.g. if the value is 0 and two threads both try to do successCount +=1, the value can end up as 1 rather that 2). So to do this safely, MySafeActor has to have a single Thread that's used for itself and its Future. So you end up with MySafeActor and that Future being tightly, but invisibly, coupled. The two can't run at the same time and could deadlock against each other. (It's still possible for a badly-written actor to deadlock against itself, but the fact that all the code using that actor's "imaginary mutex" is in a single place makes it easier to see potential problems).
You could use traditional multithreading techniques - mutexes and the like - to allow the Future and MySafeActor to run concurrently. But what you really want is to encapsulate successCount in something that can be used concurrently but safely - some kind of... Actor?
TL;DR: either the Future and the Actor: 1) may not run concurrently, in which case you may deadlock 2) may run concurrently, in which case you will corrupt data 3) access state in a concurrency-safe way, in which case you're reimplementing Actors.
You could use a PinnedDispatcher for your MyActorWithSafeFutures actor class which would create a thread pool with exactly one thread for each instance of the given class, and use context.dispatcher as execution context for your Future.
To do this you have to put something like this in your application.conf:
akka {
...
}
my-pinned-dispatcher {
executor = "thread-pool-executor"
type = PinnedDispatcher
}
and to create your actor:
actorSystem.actorOf(
Props(
classOf[MyActorWithSafeFutures]
).withDispatcher("my-pinned-dispatcher"),
"myActorWithSafeFutures"
)
Although what you are trying to achieve breaks completely the purpose of the actor model. The actor state should be encapsulated, and actor state changes should be driven by incoming messages.
This does not answer your question directly, but rather offers an alternative solution using Akka Agents:
class MyActorWithSafeFutures extends Actor {
var successCount = Agent(0)
var failureCount = Agent(0)
def doSomethingWithPossiblyStaleCounts() = {
val (s, f) = (successCount.get(), failureCount.get())
statisticsCollector ! Ratio(f/s+f)
}
def doSomethingWithCurrentCounts() = {
val (successF, failureF) = (successCount.future(), failureCount.future())
val ratio : Future[Ratio] = for {
s <- successF
f <- failureF
} yield Ratio(f/s+f)
ratio pipeTo statisticsCollector
}
override def receive: Receive = {
case MakeExternalRequest(req) =>
val response: Future[Response] = someClient.makeRequest(req)
response.onComplete {
case Success(_) => successCount.send(_ + 1)
case Failure(_) => failureCount.send(_ + 1)
}
response pipeTo sender()
}
}
The catch is that if you want to operate on the counts that would result if you were using #volatile, then you need to operate inside a Future, see doSomethingWithCurrentCounts().
If you are fine with having values which are eventually consistent (there might be pending updates scheduled for the Agents), then something like doSometinghWithPossiblyStaleCounts() is fine.
#rkuhn explains why this would be a bad idea on the akka-user list:
My main consideration here is that such a dispatcher would make it very convenient to have multiple concurrent entry points into the Actor’s behavior, where with the current recommendation there is only one—the active behavior. While classical data races are excluded by the synchronization afforded by the proposed ExecutionContext, it would still allow higher-level races by suspending a logical thread and not controlling the intermediate execution of other messages. In a nutshell, I don’t think this would make the Actor easier to reason about, quite the opposite.

akka how to to launch a master task and block on it finishing?

I'm trying to get started with akka in scala. In the main scala thread I'd like to start an akka actor, send one message to it, and block until that actor terminates. What is the best way to do this?
For example I have a test actor that just repeatedly sends messages to itself:
class Incrementer() extends Actor {
val maxMessages = 5
var counter = 0
def receive() = {
case DoIncr() => {
if (counter < maxMessages) {
counter += 1
self ! DoIncr()
} else {
self.stop()
}
}
}
}
and it is invoked via:
val inc = actorOf(new Incrementer()).start()
val result = inc !! DoIncr()
println(result) // this should block this thread, but it doesn't seem to.
// do other stuff
That block takes just over 5,000 ms to execute instead of what I expect to be a few ms, so it seems to have to do with a default future timeout - and the program does not actually terminate. All I'm really trying to do is time the performance of sending x number of messages. What's going on here?
As Viktor mentioned, in order for !! to terminate successfully, you have to reply to the message. The 5 second delay you are seeing the actor's default timeout, which is configurable. More info can be found on the Akka site.
If you use forward to send the message instead of !, then self.reply will respond to the original sender.
The first message you send to an Akka actor performs some setup that doesn't happen when processing other messages. Be sure to take that into account for your timings.
Corrected code would be:
import akka.actor._
object DoIncr
class Incrementer extends Actor {
val maxMessages = 5
var counter = 0
def receive = {
case DoIncr =>
if (counter < maxMessages) {
counter += 1
self forward DoIncr
} else {
self.reply(()) // replying with () since we have nothing better to say
self.stop()
}
}
}
Aside: I made a few other changes to get your code in line with idiomatic Scala. Your code works without these changes, but it now looks like more typical Scala code.
Case classes without parameter lists have been deprecated. Use objects instead.
If you have a class without a parameter list, you can omit the parenthesis
Actor's receive method does not have parens; your implementing class shouldn't have them either.
It's purely a matter of style, but the body of a case statement does not require braces.

Should my Scala actors' properties be marked #volatile?

In Scala, if I have a simple class as follows:
val calc = actor {
var sum = 0
loop {
react {
case Add(n) =>
sum += n
case RequestSum =>
sender ! sum
}
}
}
Should my field sum be marked #volatile? Whilst the actor is logically single-threaded (i.e. the messages are processed sequentially), the individual reactions may be happening on separate threads and hence the state variable may be being altered on one thread and then read from another.
You don't need to mark them as volatile. The execution of your code isn't inside a synchronized block, but the actor will always pass through one before your code is invoked, thus forcing memory into a consistent state across threads.

Sleeping actors?

What's the best way to have an actor sleep? I have actors set up as agents which want to maintain different parts of a database (including getting data from external sources). For a number of reasons (including not overloading the database or communications and general load issues), I want the actors to sleep between each operation. I'm looking at something like 10 actor objects.
The actors will run pretty much infinitely, as there will always be new data coming in, or sitting in a table waiting to be propagated to other parts of the database etc. The idea is for the database to be as complete as possible at any point in time.
I could do this with an infinite loop, and a sleep at the end of each loop, but according to http://www.scala-lang.org/node/242 actors use a thread pool which is expanded whenever all threads are blocked. So I imagine a Thread.sleep in each actor would be a bad idea as would waste threads unnecessarily.
I could perhaps have a central actor with its own loop that sends messages to subscribers on a clock (like async event clock observers)?
Has anyone done anything similar or have any suggestions? Sorry for extra (perhaps superfluous) information.
Cheers
Joe
There was a good point to Erlang in the first answer, but it seems disappeared. You can do the same Erlang-like trick with Scala actors easily. E.g. let's create a scheduler that does not use threads:
import actors.{Actor,TIMEOUT}
def scheduler(time: Long)(f: => Unit) = {
def fixedRateLoop {
Actor.reactWithin(time) {
case TIMEOUT => f; fixedRateLoop
case 'stop =>
}
}
Actor.actor(fixedRateLoop)
}
And let's test it (I did it right in Scala REPL) using a test client actor:
case class Ping(t: Long)
import Actor._
val test = actor { loop {
receiveWithin(3000) {
case Ping(t) => println(t/1000)
case TIMEOUT => println("TIMEOUT")
case 'stop => exit
}
} }
Run the scheduler:
import compat.Platform.currentTime
val sched = scheduler(2000) { test ! Ping(currentTime) }
and you will see something like this
scala> 1249383399
1249383401
1249383403
1249383405
1249383407
which means our scheduler sends a message every 2 seconds as expected. Let's stop the scheduler:
sched ! 'stop
the test client will begin to report timeouts:
scala> TIMEOUT
TIMEOUT
TIMEOUT
stop it as well:
test ! 'stop
There's no need to explicitly cause an actor to sleep: using loop and react for each actor means that the underlying thread pool will have waiting threads whilst there are no messages for the actors to process.
In the case that you want to schedule events for your actors to process, this is pretty easy using a single-threaded scheduler from the java.util.concurrent utilities:
object Scheduler {
import java.util.concurrent.Executors
import scala.compat.Platform
import java.util.concurrent.TimeUnit
private lazy val sched = Executors.newSingleThreadScheduledExecutor();
def schedule(f: => Unit, time: Long) {
sched.schedule(new Runnable {
def run = f
}, time , TimeUnit.MILLISECONDS);
}
}
You could extend this to take periodic tasks and it might be used thus:
val execTime = //...
Scheduler.schedule( { Actor.actor { target ! message }; () }, execTime)
Your target actor will then simply need to implement an appropriate react loop to process the given message. There is no need for you to have any actor sleep.
ActorPing (Apache License) from lift-util has schedule and scheduleAtFixedRate Source: ActorPing.scala
From scaladoc:
The ActorPing object schedules an actor to be ping-ed with a given message at specific intervals. The schedule methods return a ScheduledFuture object which can be cancelled if necessary
There unfortunately are two errors in the answer of oxbow_lakes.
One is a simple declaration mistake (long time vs time: Long), but the second is some more subtle.
oxbow_lakes declares run as
def run = actors.Scheduler.execute(f)
This however leads to messages disappearing from time to time. That is: they are scheduled but get never send. Declaring run as
def run = f
fixed it for me. It's done the exact way in the ActorPing of lift-util.
The whole scheduler code becomes:
object Scheduler {
private lazy val sched = Executors.newSingleThreadedScheduledExecutor();
def schedule(f: => Unit, time: Long) {
sched.schedule(new Runnable {
def run = f
}, time - Platform.currentTime, TimeUnit.MILLISECONDS);
}
}
I tried to edit oxbow_lakes post, but could not save it (broken?), not do I have rights to comment, yet. Therefore a new post.