Incremental processing in an akka actor - scala

I have actors that need to do very long-running and computationally expensive work, but the computation itself can be done incrementally. So while the complete computation itself takes hours to complete, the intermediate results are actually extremely useful, and I'd like to be able to respond to any requests of them. This is the pseudo code of what I want to do:
var intermediateResult = ...
loop {
while (mailbox.isEmpty && computationNotFinished)
intermediateResult = computationStep(intermediateResult)
receive {
case GetCurrentResult => sender ! intermediateResult
...other messages...
}
}

The best way to do this is very close to what you are doing already:
case class Continue(todo: ToDo)
class Worker extends Actor {
var state: IntermediateState = _
def receive = {
case Work(x) =>
val (next, todo) = calc(state, x)
state = next
self ! Continue(todo)
case Continue(todo) if todo.isEmpty => // done
case Continue(todo) =>
val (next, rest) = calc(state, todo)
state = next
self ! Continue(rest)
}
def calc(state: IntermediateState, todo: ToDo): (IntermediateState, ToDo)
}
EDIT: more background
When an actor sends messages to itself, Akka’s internal processing will basically run those within a while loop; the number of messages processed in one go is determined by the actor’s dispatcher’s throughput setting (defaults to 5), after this amount of processing the thread will be returned to the pool and the continuation be enqueued to the dispatcher as a new task. Hence there are two tunables in the above solution:
process multiple steps for a single message (if processing steps are REALLY small)
increase throughput setting for increased throughput and decreased fairness
The original problem seems to have hundreds of such actors running, presumably on common hardware which does not have hundreds of CPUs, so the throughput setting should probably be set such that each batch takes no longer than ca. 10ms.
Performance Assessment
Let’s play a bit with Fibonacci:
Welcome to Scala version 2.10.0-RC1 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_07).
Type in expressions to have them evaluated.
Type :help for more information.
scala> def fib(x1: BigInt, x2: BigInt, steps: Int): (BigInt, BigInt) = if(steps>0) fib(x2, x1+x2, steps-1) else (x1, x2)
fib: (x1: BigInt, x2: BigInt, steps: Int)(BigInt, BigInt)
scala> def time(code: =>Unit) { val start = System.currentTimeMillis; code; println("code took " + (System.currentTimeMillis - start) + "ms") }
time: (code: => Unit)Unit
scala> time(fib(1, 1, 1000))
code took 1ms
scala> time(fib(1, 1, 1000))
code took 1ms
scala> time(fib(1, 1, 10000))
code took 5ms
scala> time(fib(1, 1, 100000))
code took 455ms
scala> time(fib(1, 1, 1000000))
code took 17172ms
Which means that in a presumably quite optimized loop, fib_100000 takes half a second. Now let’s play a bit with actors:
scala> case class Cont(steps: Int, batch: Int)
defined class Cont
scala> val me = inbox()
me: akka.actor.ActorDSL.Inbox = akka.actor.dsl.Inbox$Inbox#32c0fe13
scala> val a = actor(new Act {
var s: (BigInt, BigInt) = _
become {
case Cont(x, y) if y < 0 => s = (1, 1); self ! Cont(x, -y)
case Cont(x, y) if x > 0 => s = fib(s._1, s._2, y); self ! Cont(x - 1, y)
case _: Cont => me.receiver ! s
}
})
a: akka.actor.ActorRef = Actor[akka://repl/user/$c]
scala> time{a ! Cont(1000, -1); me.receive(10 seconds)}
code took 4ms
scala> time{a ! Cont(10000, -1); me.receive(10 seconds)}
code took 27ms
scala> time{a ! Cont(100000, -1); me.receive(10 seconds)}
code took 632ms
scala> time{a ! Cont(1000000, -1); me.receive(30 seconds)}
code took 17936ms
This is already interesting: given long enough time per step (with the huge BigInts behind the scenes in the last line), actors don’t much extra. Now let’s see what happens when doing smaller calculations in a more batched way:
scala> time{a ! Cont(10000, -10); me.receive(30 seconds)}
code took 462ms
This is pretty close to the result for the direct variant above.
Conclusion
Sending messages to self is NOT expensive for almost all applications, just keep the actual processing step slightly larger than a few hundred nanoseconds.

I assume from your comment to Roland Kuhn answer that you have some work which can be considered as recursive, at least in blocks. If this is not the case, I don't think there could be any clean solution to handle your problem and you will have to deal with complicated pattern matching blocks.
If my assumptions are correct, I would schedule the computation asynchronously and let the actor be free to answer other messages. The key point is to use Future monadic capabilities and having a simple receive block. You would have to handle three messages (startComputation, changeState, getState)
You will end up having the following receive:
def receive {
case StartComputation(myData) =>expensiveStuff(myData)
case ChangeState(newstate) = this.state = newstate
case GetState => sender ! this.state
}
And then you can leverage the map method on Future, by defining your own recursive map:
def mapRecursive[A](f:Future[A], handler: A => A, exitConditions: A => Boolean):Future[A] = {
f.flatMap { a=>
if (exitConditions(a))
f
else {
val newFuture = f.flatMap{ a=> Future(handler(a))}
mapRecursive(newFuture,handler,exitConditions)
}
}
}
Once you have this tool, everything is easier. If you look to the following example :
def main(args:Array[String]){
val baseFuture:Future[Int] = Promise.successful(64)
val newFuture:Future[Int] = mapRecursive(baseFuture,
(a:Int) => {
val result = a/2
println("Additional step done: the current a is " + result)
result
}, (a:Int) => (a<=1))
val one = Await.result(newFuture,Duration.Inf)
println("Computation finished, result = " + one)
}
Its output is:
Additional step done: the current a is 32
Additional step done: the current a is 16
Additional step done: the current a is 8
Additional step done: the current a is 4
Additional step done: the current a is 2
Additional step done: the current a is 1
Computation finished, result = 1
You understand you can do the same, inside your expensiveStuffmethod
def expensiveStuff(myData:MyData):Future[MyData]= {
val firstResult = Promise.successful(myData)
val handler : MyData => MyData = (myData) => {
val result = myData.copy(myData.value/2)
self ! ChangeState(result)
result
}
val exitCondition : MyData => Boolean = (myData:MyData) => myData.value==1
mapRecursive(firstResult,handler,exitCondition)
}
EDIT - MORE DETAILED
If you don't want to block the Actor, which processes messages from its mailbox in a thread-safe and synchronous manner, the only thing you can do is to get the computation executed on a different thread. This is exactly an high performance non blocking receive.
However, you were right in saying that the approach I propose pays a high performance penalty. Every step is done on a different future, which might be not necessary at all. You can therefore recurse the handler to obtain a single-threaded or multiple-threaded execution. There is no magic formula after all:
If you want to schedule asynchronously and minimize the cost, all the work should be done by a single thread
This however could prevent other work to start, because if all the threads on a thread pool are taken, the futures will queue. You might therefore want to break the operation into multiple futures so that even at full usage some new work can be scheduled before old work has been completed.
def recurseFuture[A](entryFuture: Future[A], handler: A => A, exitCondition: A => Boolean, maxNestedRecursion: Long = Long.MaxValue): Future[A] = {
def recurse(a:A, handler: A => A, exitCondition: A => Boolean, maxNestedRecursion: Long, currentStep: Long): Future[A] = {
if (exitCondition(a))
Promise.successful(a)
else
if (currentStep==maxNestedRecursion)
Promise.successful(handler(a)).flatMap(a => recurse(a,handler,exitCondition,maxNestedRecursion,0))
else{
recurse(handler(a),handler,exitCondition,maxNestedRecursion,currentStep+1)
}
}
entryFuture.flatMap { a => recurse(a,handler,exitCondition,maxNestedRecursion,0) }
}
I have enhanced for testing purposes my handler method:
val handler: Int => Int = (a: Int) => {
val result = a / 2
println("Additional step done: the current a is " + result + " on thread " + Thread.currentThread().getName)
result
}
Approach 1: Recurse the handler on itself so to get all execute on a single thread.
println("Starting strategy with all the steps on the same thread")
val deepestRecursion: Future[Int] = recurseFuture(baseFuture,handler, exitCondition)
Await.result(deepestRecursion, Duration.Inf)
println("Completed strategy with all the steps on the same thread")
println("")
Approach 2: Recurse for a limited depth the handler on itself
println("Starting strategy with the steps grouped by three")
val threeStepsInSameFuture: Future[Int] = recurseFuture(baseFuture,handler, exitCondition,3)
val threeStepsInSameFuture2: Future[Int] = recurseFuture(baseFuture,handler, exitCondition,4)
Await.result(threeStepsInSameFuture, Duration.Inf)
Await.result(threeStepsInSameFuture2, Duration.Inf)
println("Completed strategy with all the steps grouped by three")
executorService.shutdown()

You should not use Actors to make long running computations as these will block the threads that are supposed to run the Actors code.
I would try to go with a design that uses a separate Thread/ThreadPool for the computations and use AtomicReferences to store/query the intermediate results in the lines of the following pseudo code:
val cancelled = new AtomicBoolean(false)
val intermediateResult = new AtomicReference[IntermediateResult]()
object WorkerThread extends Thread {
override def run {
while(!cancelled.get) {
intermediateResult.set(computationStep(intermediateResult.get))
}
}
}
loop {
react {
case StartComputation => WorkerThread.start()
case CancelComputation => cancelled.set(true)
case GetCurrentResult => sender ! intermediateResult.get
}
}

This is a classic concurrency problem. You want want several routines/actors (or whatever you want to call them). Code is mostly correct Go, with obscenely long variable names for context. The first routine handles queries and intermediate results:
func serveIntermediateResults(
computationChannel chan *IntermediateResult,
queryChannel chan chan<-*IntermediateResult) {
var latestIntermediateResult *IntermediateResult // initial result
for {
select {
// an update arrives
case latestIntermediateResult, notClosed := <-computationChannel:
if !notClosed {
// the computation has finished, stop checking
computationChannel = nil
}
// a query arrived
case queryResponseChannel, notClosed := <-queryChannel:
if !notClosed {
// no more queries, so we're done
return
}
// respond with the latest result
queryResponseChannel<-latestIntermediateResult
}
}
}
In your long computation, you update your intermediate result wherever appropriate:
func longComputation(intermediateResultChannel chan *IntermediateResult) {
for notFinished {
// lots of stuff
intermediateResultChannel<-currentResult
}
close(intermediateResultChannel)
}
Finally to ask for the current result, you have a wrapper to make this nice:
func getCurrentResult() *IntermediateResult {
responseChannel := make(chan *IntermediateResult)
// queryChannel was given to the intermediate result server routine earlier
queryChannel<-responseChannel
return <-responseChannel
}

Related

Why do async calculations slow down the program?

I'm reading Akka cookbook and found it interesting to improve the function performance in one sample.
I have the next client object:
object HelloAkkaActorSystem extends App {
implicit val timeout = Timeout(50 seconds)
val actorSystem = ActorSystem("HelloAkka")
val actor = actorSystem.actorOf(Props[FibonacciActor])
// asking for result from actor
val future = (actor ? 6).mapTo[Int]
val st = System.nanoTime()
val fiboacciNumber = Await.result(future, 60 seconds)
println("Elapsed time: " + (System.nanoTime() - st) / math.pow(10, 6))
println(fiboacciNumber)
}
And two implementations of actor class.
First:
class FibonacciActor extends Actor {
override def receive: Receive = {
case num : Int =>
val fibonacciNumber = fib(num)
sender ! fibonacciNumber
}
def fib( n : Int) : Int = n match {
case 0 | 1 => n
case _ => fib( n-1 ) + fib( n-2 )
}
}
Second:
class FibonacciActor extends Actor {
override def receive: PartialFunction[Any, Unit] = {
case num : Int =>
val fibonacciNumber = fib(num)
val s = sender()
fibonacciNumber.onComplete {
case Success(x) => s ! x
case Failure(e) => s ! -1
}
}
def fib( n : Int) : Future[Int] = n match {
case 0 | 1 => Future{ n }
case _ =>
fib( n-1 ).flatMap(n_1 =>
fib( n-2 ).map(n_2 =>
n_1 + n_2))
}
}
At my machine First variant executed in 0.12 ms, Second in 360 ms. So the Second is 300 times slower. Using htop I found that First variant use 1 core against all 4 in second case.
Is it so because too many async tasks generated? And how to speed up fib(n: Int) method?
Firstly, your benchmark results there are not likely to have any reflection on reality. The JVM is probably spending more time JITing your code that actually running it in that example. Here's a good SO post on creating a microbenchmark:
How do I write a correct micro-benchmark in Java?
Generally, in Java, you need to do things about 10000 times over as a warmup just to ensure all the JIT compiling has happened (because the JVM will analyse your code as you run it, and when it works out that a method is called a lot, it stops the world, and compiles it to machine code instead of executing it. In the above benchmark, very little of the code will have been compiled to machine code, it will be mostly interpretted, which means it will run really slow, plus some of it may be detected as hotspots and so you get this stop the world, compile, start again, which makes it run even slower. That's why you should be running it in a loop thousands of times over to ensure all that is done, before you actually start timing anything.
Secondly, to answer your question, in your example there, you are dispatching each execution of fib (not the whole operation, but each iteration of executing fib), a method that is only going to take a few nanoseconds to run, to a thread pool. The overhead of dispatching something to a thread pool is several microseconds, and the thing you're sending the thread pool to do is only taking a few nanoseconds to execute, so you're paying 1000x the cost of your operation to do it "asynchronously". You should only dispatch computationally expensive operations to a thread pool, for example operations that will take seconds to run, it makes no sense to dispatch very small operations to a thread pool.

How to page REST calls in a future with Dispatch and Scala

I use Scala and Dispatch to get JSON from a paged REST API. The reason I use futures with Dispatch here is because I want to execute the calls to fetchIssuesByFile in parallel, because that function could result in many REST calls for every lookupId (1x findComponentKeyByRest, n x fetchIssuePage, where n is the number of pages yielded by the REST API).
Here's my code so far:
def fetchIssuePage(componentKey: String, pageIndex: Int): Future[json4s.JValue]
def extractPagingInfo(json: org.json4s.JValue): PagingInfo
def extractIssues(json: org.json4s.JValue): Seq[Issue]
def findAllIssuesByRest(componentKey: String): Future[Seq[Issue]] = {
Future {
var paging = PagingInfo(pageIndex = 0, pages = 0)
val allIssues = mutable.ListBuffer[Issue]()
do {
fetchIssuePage(componentKey, paging.pageIndex) onComplete {
case Success(json) =>
allIssues ++= extractIssues(json)
paging = extractPagingInfo(json)
case _ => //TODO error handling
}
} while (paging.pageIndex < paging.pages)
allIssues // (1)
}
}
def findComponentKeyByRest(lookupId: String): Future[Option[String]]
def fetchIssuesByFile(lookupId: String): Future[(String, Option[Seq[Issue]])] =
findComponentKeyByRest(lookupId).flatMap {
case Some(key) => findAllIssuesByRest(key).map(
issues => // (2) process issues
)
case None => //TODO error handling
}
The actual problem is that I never get the collected issues from findAllIssuesByRest (1) (i.e., the sequence of issues is always empty) when I try to process them at (2). Any ideas? Also, the pagination code isn't very functional, so I'm also open to ideas on how to improve this with Scala.
Any help is much appreciated.
Thanks,
Michael
I think you could do something like:
def findAllIssuesByRest(componentKey: String): Future[Seq[Issue]] =
// fetch first page
fetchIssuePage(componentKey, 0).flatMap { json =>
val nbPages = extractPagingInfo(json).pages // get the total number of pages
val firstIssues = extractIssues(json) // the first issues
// get the other pages
Future.sequence((1 to nbPages).map(page => fetchIssuePage(componentKey, page)))
// get the issues from the other pages
.map(pages => pages.map(extractIssues))
// combine first issues with other issues
.flatMap(issues => (firstIssues +: issues).flatten)
}
That's because fetchIssuePage returns a Future that the code doesn't await the result for.
Solution would be to build up a Seq of Futures from the fetchIssuePage calls. Then Future.sequence the Seq to produce a single future. Return this instead. This future will fire when they're all complete, ready for your flatMap code.
Update: Although Michael understood the above well (see comments), I thought I'd put in a much simplified code for the benefit of other readers, just to illustrate the point:
def fetch(n: Int): Future[Int] = Future { n+1 }
val fetches = Seq(1, 2, 3).map(fetch)
// a simplified version of your while loop, just to illustrate the point
Future.sequence(fetches).flatMap(results => ...)
// where results will be Seq[Int] - i.e., 2, 3, 4

What are some opportunities improve performance / concurrency in the following Scala + Akka code?

I am looking for opportunities to increase concurrency and performance in my Scala 2.9 / Akka 2.0 RC2 code. Given the following code:
import akka.actor._
case class DataDelivery(data:Double)
class ComputeActor extends Actor {
var buffer = scala.collection.mutable.ArrayBuffer[Double]()
val functionsToCompute = List("f1","f2","f3","f4","f5")
var functionMap = scala.collection.mutable.LinkedHashMap[String,(Map[String,Any]) => Double]()
functionMap += {"f1" -> f1}
functionMap += {"f2" -> f2}
functionMap += {"f3" -> f3}
functionMap += {"f4" -> f4}
functionMap += {"f5" -> f5}
def updateData(data:Double):scala.collection.mutable.ArrayBuffer[Double] = {
buffer += data
buffer
}
def f1(map:Map[String,Any]):Double = {
// println("hello from f1")
0.0
}
def f2(map:Map[String,Any]):Double = {
// println("hello from f2")
0.0
}
def f3(map:Map[String,Any]):Double = {
// println("hello from f3")
0.0
}
def f4(map:Map[String,Any]):Double = {
// println("hello from f4")
0.0
}
def f5(map:Map[String,Any]):Double = {
// println("hello from f5")
0.0
}
def computeValues(immutableBuffer:IndexedSeq[Double]):Map[String,Double] = {
var map = Map[String,Double]()
try {
functionsToCompute.foreach(function => {
val value = functionMap(function)
function match {
case "f1" =>
var v = value(Map("lookback"->10,"buffer"->immutableBuffer,"parm1"->0.0))
map += {function -> v}
case "f2" =>
var v = value(Map("lookback"->20,"buffer"->immutableBuffer))
map += {function -> v}
case "f3" =>
var v = value(Map("lookback"->30,"buffer"->immutableBuffer,"parm1"->1.0,"parm2"->false))
map += {function -> v}
case "f4" =>
var v = value(Map("lookback"->40,"buffer"->immutableBuffer))
map += {function -> v}
case "f5" =>
var v = value(Map("buffer"->immutableBuffer))
map += {function -> v}
case _ =>
println(this.unhandled())
}
})
} catch {
case ex: Exception =>
ex.printStackTrace()
}
map
}
def receive = {
case DataDelivery(data) =>
val startTime = System.nanoTime()/1000
val answers = computeValues(updateData(data))
val endTime = System.nanoTime()/1000
val elapsedTime = endTime - startTime
println("elapsed time is " + elapsedTime)
// reply or forward
case msg =>
println("msg is " + msg)
}
}
object Test {
def main(args:Array[String]) {
val system = ActorSystem("actorSystem")
val computeActor = system.actorOf(Props(new ComputeActor),"computeActor")
var i = 0
while (i < 1000) {
computeActor ! DataDelivery(i.toDouble)
i += 1
}
}
}
When I run this the output (converted to microseconds) is
elapsed time is 4898
elapsed time is 184
elapsed time is 144
.
.
.
elapsed time is 109
elapsed time is 103
You can see the JVM's incremental compiler kicking in.
I thought that one quick win might be to change
functionsToCompute.foreach(function => {
to
functionsToCompute.par.foreach(function => {
but this results in the following elapsed times
elapsed time is 31689
elapsed time is 4874
elapsed time is 622
.
.
.
elapsed time is 698
elapsed time is 2171
Some info:
1) I'm running this on a Macbook Pro with 2 cores.
2) In the full version, the functions are long running operations that loop over portions of the mutable shared buffer. This doesn't appear to be a problem since retrieving messages from the actor's mailbox is controlling the flow, but I suspect it could be an issue with increased concurrency. This is why I've converted to an IndexedSeq.
3) In the full version, the functionsToCompute list may vary, so that not all items in the functionMap are necessarily called (i.e.) functionMap.size may be much larger than functionsToCompute.size
4) The functions can be computed in parallel, but the resultant map must be complete before returning
Some questions:
1) What can I do to make the parallel version run faster?
2) Where would it make sense to add non-blocking and blocking futures?
3) Where would it make sense to forward computation to another actor?
4) What are some opportunities for increasing immutability/safety?
Thanks,
Bruce
Providing an example, as requested (sorry about the delay... I don't have notifications on for SO).
There's a great example in the Akka documentation Section on 'Composing Futures' but I'll give you something a little more tailored to your situation.
Now, after reading this, please take some time to read through the tutorials and docs on Akka's website. You're missing a lot of key information that those docs will provide for you.
import akka.dispatch.{Await, Future, ExecutionContext}
import akka.util.duration._
import java.util.concurrent.Executors
object Main {
// This just makes the example work. You probably have enough context
// set up already to not need these next two lines
val pool = Executors.newCachedThreadPool()
implicit val ec = ExecutionContext.fromExecutorService(pool)
// I'm simulating your function. It just has to return a tuple, I believe
// with a String and a Double
def theFunction(s: String, d: Double) = (s, d)
def main(args: Array[String]) {
// Here we run your functions - I'm just doing a thousand of them
// for fun. You do what yo need to do
val listOfFutures = (1 to 1000) map { i =>
// Run them in parallel in the future
Future {
theFunction(i.toString, i.toDouble)
}
}
// These lines can be composed better, but breaking them up should
// be more illustrative.
//
// Turn the list of Futures (i.e. Seq[Future[(String, Double)]]) into a
// Future with a sequence of results (i.e. Future[Seq[(String, Double)]])
val futureOfResults = Future.sequence(listOfFutures)
// Convert that future into another future that contains a map instead
// instead of a sequence
val intermediate = futureOfResults map { _.toList.toMap }
// Wait for it complete. Ideally you don't do this. Continue to
// transform the future into other forms or use pipeTo() to get it to go
// as a result to some other Actor. "Await" is really just evil... the
// only place you should really use it is in silly programs like this or
// some other special purpose app.
val resultingMap = Await.result(intermediate, 1 second)
println(resultingMap)
// Again, just to make the example work
pool.shutdown()
}
}
All you need in your classpath to get this running is the akka-actor jar. The Akka website will tell you how to set up what you need, but it's really dead simple.

Simple Scala actor question

I'm sure this is a very simple question, but embarrassed to say I can't get my head around it:
I have a list of values in Scala.
I would like to use use actors to make some (external) calls with each value, in parallel.
I would like to wait until all values have been processed, and then proceed.
There's no shared values being modified.
Could anyone advise?
Thanks
There's an actor-using class in Scala that's made precisely for this kind of problem: Futures. This problem would be solved like this:
// This assigns futures that will execute in parallel
// In the example, the computation is performed by the "process" function
val tasks = list map (value => scala.actors.Futures.future { process(value) })
// The result of a future may be extracted with the apply() method, which
// will block if the result is not ready.
// Since we do want to block until all results are ready, we can call apply()
// directly instead of using a method such as Futures.awaitAll()
val results = tasks map (future => future.apply())
There you go. Just that.
Create workers and ask them for futures using !!; then read off the results (which will be calculated and come in in parallel as they're ready; you can then use them). For example:
object Example {
import scala.actors._
class Worker extends Actor {
def act() { Actor.loop { react {
case s: String => reply(s.length)
case _ => exit()
}}}
}
def main(args: Array[String]) {
val arguments = args.toList
val workers = arguments.map(_ => (new Worker).start)
val futures = for ((w,a) <- workers zip arguments) yield w !! a
val results = futures.map(f => f() match {
case i: Int => i
case _ => throw new Exception("Whoops--didn't expect to get that!")
})
println(results)
workers.foreach(_ ! None)
}
}
This does a very inexpensive computation--calculating the length of a string--but you can put something expensive there to make sure it really does happen in parallel (the last thing that case of the act block should be to reply with the answer). Note that we also include a case for the worker to shut itself down, and when we're all done, we tell the workers to shut down. (In this case, any non-string shuts down the worker.)
And we can try this out to make sure it works:
scala> Example.main(Array("This","is","a","test"))
List(4, 2, 1, 4)

How do I rewrite a for loop with a shared dependency using actors

We have some code which needs to run faster. Its already profiled so we would like to make use of multiple threads. Usually I would setup an in memory queue, and have a number of threads taking jobs of the queue and calculating the results. For the shared data I would use a ConcurrentHashMap or similar.
I don't really want to go down that route again. From what I have read using actors will result in cleaner code and if I use akka migrating to more than 1 jvm should be easier. Is that true?
However, I don't know how to think in actors so I am not sure where to start.
To give a better idea of the problem here is some sample code:
case class Trade(price:Double, volume:Int, stock:String) {
def value(priceCalculator:PriceCalculator) =
(priceCalculator.priceFor(stock)-> price)*volume
}
class PriceCalculator {
def priceFor(stock:String) = {
Thread.sleep(20)//a slow operation which can be cached
50.0
}
}
object ValueTrades {
def valueAll(trades:List[Trade],
priceCalculator:PriceCalculator):List[(Trade,Double)] = {
trades.map { trade => (trade,trade.value(priceCalculator)) }
}
def main(args:Array[String]) {
val trades = List(
Trade(30.5, 10, "Foo"),
Trade(30.5, 20, "Foo")
//usually much longer
)
val priceCalculator = new PriceCalculator
val values = valueAll(trades, priceCalculator)
}
}
I'd appreciate it if someone with experience using actors could suggest how this would map on to actors.
This is a complement to my comment on shared results for expensive calculations. Here it is:
import scala.actors._
import Actor._
import Futures._
case class PriceFor(stock: String) // Ask for result
// The following could be an "object" as well, if it's supposed to be singleton
class PriceCalculator extends Actor {
val map = new scala.collection.mutable.HashMap[String, Future[Double]]()
def act = loop {
react {
case PriceFor(stock) => reply(map getOrElseUpdate (stock, future {
Thread.sleep(2000) // a slow operation
50.0
}))
}
}
}
Here's an usage example:
scala> val pc = new PriceCalculator; pc.start
pc: PriceCalculator = PriceCalculator#141fe06
scala> class Test(stock: String) extends Actor {
| def act = {
| println(System.currentTimeMillis().toString+": Asking for stock "+stock)
| val f = (pc !? PriceFor(stock)).asInstanceOf[Future[Double]]
| println(System.currentTimeMillis().toString+": Got the future back")
| val res = f.apply() // this blocks until the result is ready
| println(System.currentTimeMillis().toString+": Value: "+res)
| }
| }
defined class Test
scala> List("abc", "def", "abc").map(new Test(_)).map(_.start)
1269310737461: Asking for stock abc
res37: List[scala.actors.Actor] = List(Test#6d888e, Test#1203c7f, Test#163d118)
1269310737461: Asking for stock abc
1269310737461: Asking for stock def
1269310737464: Got the future back
scala> 1269310737462: Got the future back
1269310737465: Got the future back
1269310739462: Value: 50.0
1269310739462: Value: 50.0
1269310739465: Value: 50.0
scala> new Test("abc").start // Should return instantly
1269310755364: Asking for stock abc
res38: scala.actors.Actor = Test#15b5b68
1269310755365: Got the future back
scala> 1269310755367: Value: 50.0
For simple parallelization, where I throw a bunch of work out to process and then wait for it all to come back, I tend to like to use a Futures pattern.
class ActorExample {
import actors._
import Actor._
class Worker(val id: Int) extends Actor {
def busywork(i0: Int, i1: Int) = {
var sum,i = i0
while (i < i1) {
i += 1
sum += 42*i
}
sum
}
def act() { loop { react {
case (i0:Int,i1:Int) => sender ! busywork(i0,i1)
case None => exit()
}}}
}
val workforce = (1 to 4).map(i => new Worker(i)).toList
def parallelFourSums = {
workforce.foreach(_.start())
val futures = workforce.map(w => w !! ((w.id,1000000000)) );
val computed = futures.map(f => f() match {
case i:Int => i
case _ => throw new IllegalArgumentException("I wanted an int!")
})
workforce.foreach(_ ! None)
computed
}
def serialFourSums = {
val solo = workforce.head
workforce.map(w => solo.busywork(w.id,1000000000))
}
def timed(f: => List[Int]) = {
val t0 = System.nanoTime
val result = f
val t1 = System.nanoTime
(result, t1-t0)
}
def go {
val serial = timed( serialFourSums )
val parallel = timed( parallelFourSums )
println("Serial result: " + serial._1)
println("Parallel result:" + parallel._1)
printf("Serial took %.3f seconds\n",serial._2*1e-9)
printf("Parallel took %.3f seconds\n",parallel._2*1e-9)
}
}
Basically, the idea is to create a collection of workers--one per workload--and then throw all the data at them with !! which immediately gives back a future. When you try to read the future, the sender blocks until the worker's actually done with the data.
You could rewrite the above so that PriceCalculator extended Actor instead, and valueAll coordinated the return of the data.
Note that you have to be careful passing non-immutable data around.
Anyway, on the machine I'm typing this from, if you run the above you get:
scala> (new ActorExample).go
Serial result: List(-1629056553, -1629056636, -1629056761, -1629056928)
Parallel result:List(-1629056553, -1629056636, -1629056761, -1629056928)
Serial took 1.532 seconds
Parallel took 0.443 seconds
(Obviously I have at least four cores; the parallel timing varies rather a bit depending on which worker gets what processor and what else is going on on the machine.)