Given a function A => IO[B] (aka Kleisli[IO, A, B]) that is meant to be called multiple times, and has side effects, like updating a DB, how to delegate such multiple calls of it into a stream (I guess Pipe[IO, A, B]) (fs2, monix observable/iterant)? Reason for this is to be able to accumulate state, batch calls together over a time window etc.
More concretely, http4s server requires a Request => IO[Response], so I am looking how to operate on streams (for the above benefits), but ultimately provide such a function to http4s.
I suspect it will need some correlation ID behind the scenes and I am fine with that, I am more interested in how to do it safely and properly from an FP perspective.
Ultimately, the signature I expect is probably something like:
Pipe[IO, A, B] => (A => IO[B]), such that calls to Kleisli are piped through the pipe.
As an afterthought, would it be at all possible to backpressure?
One idea is to model it with MPSC (Multiple Publisher Single Consumer). I'll give an example with Monix since I'm more familiar with it, but the idea stays the same even if you use FS2.
object MPSC extends App {
sealed trait Event
object Event {
// You'll need a promise in order to send the response back to user
case class SaveItem(num: Int, promise: Deferred[Task, Int]) extends Event
}
// For backpressure, take a look at `PublishSubject`.
val cs = ConcurrentSubject[Event](MulticastStrategy.Publish)
def pushEvent(num: Int) = {
for {
promise <- Deferred[Task, Int]
_ <- Task.delay(cs.onNext(SaveItem(num, promise)))
} yield promise
}
// You get a list of events now since it is buffered
// Monix has a lot of buffer strategies, check the docs for more details
def processEvents(items: Seq[Event]): Task[Unit] = {
Task.delay(println(s"Items: $items")) >>
Task.traverse(items) {
case SaveItem(_, promise) => promise.complete(Random.nextInt(100))
}.void
}
val app = for {
// Start the stream in the background
_ <- cs
.bufferTimed(3.seconds) // Buffer all events within 3 seconds
.filter(_.nonEmpty)
.mapEval(processEvents)
.completedL
.startAndForget
_ <- Task.sleep(1.second)
p1 <- pushEvent(10)
p2 <- pushEvent(20)
p3 <- pushEvent(30)
// Wait for the promise to complete, you'll do this for each request
x <- p1.get
y <- p2.get
z <- p3.get
_ <- Task.delay(println(s"Completed promise: [$x, $y, $z]"))
} yield ()
app.runSyncUnsafe()
}
Related
I have an external (that is, I cannot change it) Java API which looks like this:
public interface Sender {
void send(Event e);
}
I need to implement a Sender which accepts each event, transforms it to a JSON object, collects some number of them into a single bundle and sends over HTTP to some endpoint. This all should be done asynchronously, without send() blocking the calling thread, with some fixed-size buffer and dropping new events if the buffer is full.
With akka-streams this is quite simple: I create a graph of stages (which uses akka-http to send HTTP requests), materialize it and use the materialized ActorRef to push new events to the stream:
lazy val eventPipeline = Source.actorRef[Event](Int.MaxValue, OverflowStrategy.fail)
.via(CustomBuffer(bufferSize)) // buffer all events
.groupedWithin(batchSize, flushDuration) // group events into chunks
.map(toBundle) // convert each chunk into a JSON message
.mapAsyncUnordered(1)(sendHttpRequest) // send an HTTP request
.toMat(Sink.foreach { response =>
// print HTTP response for debugging
})(Keep.both)
lazy val (eventsActor, completeFuture) = eventPipeline.run()
override def send(e: Event): Unit = {
eventsActor ! e
}
Here CustomBuffer is a custom GraphStage which is very similar to the library-provided Buffer but tailored to our specific needs; it probably does not matter for this particular question.
As you can see, interacting with the stream from non-stream code is very simple - the ! method on the ActorRef trait is asynchronous and does not need any additional machinery to be called. Each event which is sent to the actor is then processed through the entire reactive pipeline. Moreover, because of how akka-http is implemented, I even get connection pooling for free, so no more than one connection is opened to the server.
However, I cannot find a way to do the same thing with FS2 properly. Even discarding the question of buffering (I will probably need to write a custom Pipe implementation which does additional things that we need) and HTTP connection pooling, I'm still stuck with a more basic thing - that is, how to push the data to the reactive stream "from outside".
All tutorials and documentation that I can find assume that the entire program happens inside some effect context, usually IO. This is not my case - the send() method is invoked by the Java library at unspecified times. Therefore, I just cannot keep everything inside one IO action, I necessarily have to finalize the "push" action inside the send() method, and have the reactive stream as a separate entity, because I want to aggregate events and hopefully pool HTTP connections (which I believe is naturally tied to the reactive stream).
I assume that I need some additional data structure, like Queue. fs2 does indeed have some kind of fs2.concurrent.Queue, but again, all documentation shows how to use it inside a single IO context, so I assume that doing something like
val queue: Queue[IO, Event] = Queue.unbounded[IO, Event].unsafeRunSync()
and then using queue inside the stream definition and then separately inside the send() method with further unsafeRun calls:
val eventPipeline = queue.dequeue
.through(customBuffer(bufferSize))
.groupWithin(batchSize, flushDuration)
.map(toBundle)
.mapAsyncUnordered(1)(sendRequest)
.evalTap(response => ...)
.compile
.drain
eventPipeline.unsafeRunAsync(...) // or something
override def send(e: Event) {
queue.enqueue(e).unsafeRunSync()
}
is not the correct way and most likely would not even work.
So, my question is, how do I properly use fs2 to solve my problem?
Consider the following example:
import cats.implicits._
import cats.effect._
import cats.effect.implicits._
import fs2._
import fs2.concurrent.Queue
import scala.concurrent.ExecutionContext
import scala.concurrent.duration._
object Answer {
type Event = String
trait Sender {
def send(event: Event): Unit
}
def main(args: Array[String]): Unit = {
val sender: Sender = {
val ec = ExecutionContext.global
implicit val cs: ContextShift[IO] = IO.contextShift(ec)
implicit val timer: Timer[IO] = IO.timer(ec)
fs2Sender[IO](2)
}
val events = List("a", "b", "c", "d")
events.foreach { evt => new Thread(() => sender.send(evt)).start() }
Thread sleep 3000
}
def fs2Sender[F[_]: Timer : ContextShift](maxBufferedSize: Int)(implicit F: ConcurrentEffect[F]): Sender = {
// dummy impl
// this is where the actual logic for batching
// and shipping over the network would live
val consume: Pipe[F, Event, Unit] = _.evalMap { event =>
for {
_ <- F.delay { println(s"consuming [$event]...") }
_ <- Timer[F].sleep(1.seconds)
_ <- F.delay { println(s"...[$event] consumed") }
} yield ()
}
val suspended = for {
q <- Queue.bounded[F, Event](maxBufferedSize)
_ <- q.dequeue.through(consume).compile.drain.start
sender <- F.delay[Sender] { evt =>
val enqueue = for {
wasEnqueued <- q.offer1(evt)
_ <- F.delay { println(s"[$evt] enqueued? $wasEnqueued") }
} yield ()
enqueue.toIO.unsafeRunAsyncAndForget()
}
} yield sender
suspended.toIO.unsafeRunSync()
}
}
The main idea is to use a concurrent Queue from fs2. Note, that the above code demonstrates that neither the Sender interface nor the logic in main can be changed. Only an implementation of the Sender interface can be swapped out.
I don't have much experience with exactly that library but it should look somehow like that:
import cats.effect.{ExitCode, IO, IOApp}
import fs2.concurrent.Queue
case class Event(id: Int)
class JavaProducer{
new Thread(new Runnable {
override def run(): Unit = {
var id = 0
while(true){
Thread.sleep(1000)
id += 1
send(Event(id))
}
}
}).start()
def send(event: Event): Unit ={
println(s"Original producer prints $event")
}
}
class HackedProducer(queue: Queue[IO, Event]) extends JavaProducer {
override def send(event: Event): Unit = {
println(s"Hacked producer pushes $event")
queue.enqueue1(event).unsafeRunSync()
println(s"Hacked producer pushes $event - Pushed")
}
}
object Test extends IOApp{
override def run(args: List[String]): IO[ExitCode] = {
val x: IO[Unit] = for {
queue <- Queue.unbounded[IO, Event]
_ = new HackedProducer(queue)
done <- queue.dequeue.map(ev => {
println(s"Got $ev")
}).compile.drain
} yield done
x.map(_ => ExitCode.Success)
}
}
We can create a bounded queue that will consume elements from sender and make them available to fs2 stream processing.
import cats.effect.IO
import cats.effect.std.Queue
import fs2.Stream
trait Sender[T]:
def send(e: T): Unit
object Sender:
def apply[T](bufferSize: Int): IO[(Sender[T], Stream[IO, T])] =
for
q <- Queue.bounded[IO, T](bufferSize)
yield
val sender: Sender[T] = (e: T) => q.offer(e).unsafeRunSync()
def stm: Stream[IO, T] = Stream.eval(q.take) ++ stm
(sender, stm)
Then we'll have two ends - one for Java worlds, to send new elements to Sender. Another one - for stream processing in fs2.
class TestSenderQueue:
#Test def testSenderQueue: Unit =
val (sender, stream) = Sender[Int](1)
.unsafeRunSync()// we have to run it preliminary to make `sender` available to external system
val processing =
stream
.map(i => i * i)
.evalMap{ ii => IO{ println(ii)}}
sender.send(1)
processing.compile.toList.start//NB! we start processing in a separate fiber
.unsafeRunSync() // immediately right now.
sender.send(2)
Thread.sleep(100)
(0 until 100).foreach(sender.send)
println("finished")
Note that we push data in the current thread and have to run fs2 in a separate thread (.start).
My demo app is simple. Here is an actor:
class CounterActor extends Actor {
#volatile private[this] var counter = 0
def receive: PartialFunction[Any, Unit] = {
case Count(id) ⇒ sender ! self ? Increment(id)
case Increment(id) ⇒ sender ! {
counter += 1
println(s"req_id=$id, counter=$counter")
counter
}
}
}
The main app:
sealed trait ActorMessage
case class Count(id: Int = 0) extends ActorMessage
case class Increment(id: Int) extends ActorMessage
object CountingApp extends App {
// Get incremented counter
val future0 = counter ? Count(1)
val future1 = counter ? Count(2)
val future2 = counter ? Count(3)
val future3 = counter ? Count(4)
val future4 = counter ? Count(5)
// Handle response
handleResponse(future0)
handleResponse(future1)
handleResponse(future2)
handleResponse(future3)
handleResponse(future4)
// Bye!
exit()
}
My handler:
def handleResponse(future: Future[Any]): Unit = {
future.onComplete {
case Success(f) => f.asInstanceOf[Future[Any]].onComplete {
case x => x match {
case Success(n) => println(s" -> $n")
case Failure(t) => println(s" -> ${t.getMessage}")
}
}
case Failure(t) => println(t.getMessage)
}
}
If I run the app I'll see the next output:
req_id=1, counter=1
req_id=2, counter=2
req_id=3, counter=3
req_id=4, counter=4
req_id=5, counter=5
-> 4
-> 1
-> 5
-> 3
-> 2
The order of handled responses is random. Is it normal behaviour? If no, how can I make it ordered?
PS
Do I need volatile var in the actor?
PS2
Also, I'm looking for some more convenient logic for handleResponse, because matching here is very ambiguous...
normal behavior?
Yes, this is absolutely normal behavior.
Your Actor is receiving the Count increments in the order you sent them but the Futures are being completed via submission to an underlying thread pool. It is that indeterminate ordering of Future-thread binding that is resulting in the out of order println executions.
how can I make it ordered?
If you want ordered execution of Futures then that is synonymous with synchronous programming, i.e. no concurrency at all.
do I need volatile?
The state of an Actor is only accessible within an Actor itself. That is why users of the Actor never get an actual Actor object, they only get an ActorRef, e.g. val actorRef = actorSystem actorOf Props[Actor] . This is partially to ensure that users of Actors never have the ability to change an Actor's state except through messaging. From the docs:
The good news is that Akka actors conceptually each have their own
light-weight thread, which is completely shielded from the rest of the
system. This means that instead of having to synchronize access using
locks you can just write your actor code without worrying about
concurrency at all.
Therefore, you don't need volatile.
more convenient logic
For more convenient logic I would recommend Agents, which are a kind of typed Actor with a simpler message framework. From the docs:
import scala.concurrent.ExecutionContext.Implicits.global
import akka.agent.Agent
val agent = Agent(5)
val result = agent()
val result = agent.get
agent send 7
agent send (_ + 1)
Reads are synchronous but instantaneous. Writes are asynch. This means any time you do a read you don't have to worry about Futures because the internal value returns immediately. But definitely read the docs because there are more complicated tricks you can play with the queueing logic.
Real trouble in your approach is not asynchronous nature but overcomplicated logic.
And despite pretty answer from Ramon which I +1d, yes there is way to ensure order in some parts of akka. As we can read from the doc there is message ordering per sender–receiver pair guarantee.
It means that for each one-way channel of two actors there is guarantee, that messages will be delivered in order they been sent.
But there is no such guarantee for Future task accomplishment order which you are using to handle answers. And sending Future from ask as message to original sender is way strange.
Thing you can do:
redefine your Increment as
case class Increment(id: Int, requester: ActorRef) extends ActorMessage
so handler could know original requester
modify CounterActor's receive as
def receive: Receive = {
case Count(id) ⇒ self ! Increment(id, sender)
case Increment(id, snd) ⇒ snd ! {
counter += 1
println(s"req_id=$id, counter=$counter")
counter
}
}
simplify your handleResponse to
def handleResponse(future: Future[Any]): Unit = {
future.onComplete {
case Success(n: Int) => println(s" -> $n")
case Failure(t) => println(t.getMessage)
}
}
Now you can probably see that messages are received back in the same order.
I said probably because handling still occures in Future.onComplete so we need another actor to ensure the order.
Lets define additional message
case object StartCounting
And actor itself:
class SenderActor extends Actor {
val counter = system.actorOf(Props[CounterActor])
def receive: Actor.Receive = {
case n: Int => println(s" -> $n")
case StartCounting =>
counter ! Count(1)
counter ! Count(2)
counter ! Count(3)
counter ! Count(4)
counter ! Count(5)
}
}
In your main you can now just write
val sender = system.actorOf(Props[SenderActor])
sender ! StartCounting
And throw away that handleResponse method.
Now you definitely should see your message handling in the right order.
We've implemented whole logic without single ask, and that's good.
So magic rule is: leave handling responses to actors, get only final results from them via ask.
Note there is also forward method but this creates proxy actor so message ordering will be broken again.
I have a chain of akka actors like
A --> B --> C
Actor A 'asks' actor B which in turn 'asks' actor C.
Actor A needs to wait till actor C has finished processing.
B is a thin layer and does nothing more than passing(asking) the message to C and returns a Future back to A. Basically B just do
{ case msgFromA => sender ! C ? msgFromA }
Hence what A is getting is a Future[Any].
The way A is handling the request is using nested maps
actorRefFactory.actorOf(Props[A]) ? msgA map {
resp =>
// Type cast Any to Future and use another map to complete processing.
resp.asInstanceOf[Future[_]] map {
case Success =>
// Complete processing
case Failure(exc) =>
// Log error
This works(i.e The final processing happens only when the actor C has finished processing ) but needless to say it looks horrible. I tried using flatmaps but could not make it work. Any ideas to make it look good :)
Thanks
A more proper way:
In A:
val f: Future[MyType] = (B ? msg).mapTo[MyType]
f onComplete {
case Success(res) => // do something
case Failure(t) => // do something
}
In B, use forward:
{ case msgFromA => C forward msgFromA }
In C:
// call database
// update cache
sender() ! res // actually sends to A
I have actors that need to do very long-running and computationally expensive work, but the computation itself can be done incrementally. So while the complete computation itself takes hours to complete, the intermediate results are actually extremely useful, and I'd like to be able to respond to any requests of them. This is the pseudo code of what I want to do:
var intermediateResult = ...
loop {
while (mailbox.isEmpty && computationNotFinished)
intermediateResult = computationStep(intermediateResult)
receive {
case GetCurrentResult => sender ! intermediateResult
...other messages...
}
}
The best way to do this is very close to what you are doing already:
case class Continue(todo: ToDo)
class Worker extends Actor {
var state: IntermediateState = _
def receive = {
case Work(x) =>
val (next, todo) = calc(state, x)
state = next
self ! Continue(todo)
case Continue(todo) if todo.isEmpty => // done
case Continue(todo) =>
val (next, rest) = calc(state, todo)
state = next
self ! Continue(rest)
}
def calc(state: IntermediateState, todo: ToDo): (IntermediateState, ToDo)
}
EDIT: more background
When an actor sends messages to itself, Akka’s internal processing will basically run those within a while loop; the number of messages processed in one go is determined by the actor’s dispatcher’s throughput setting (defaults to 5), after this amount of processing the thread will be returned to the pool and the continuation be enqueued to the dispatcher as a new task. Hence there are two tunables in the above solution:
process multiple steps for a single message (if processing steps are REALLY small)
increase throughput setting for increased throughput and decreased fairness
The original problem seems to have hundreds of such actors running, presumably on common hardware which does not have hundreds of CPUs, so the throughput setting should probably be set such that each batch takes no longer than ca. 10ms.
Performance Assessment
Let’s play a bit with Fibonacci:
Welcome to Scala version 2.10.0-RC1 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_07).
Type in expressions to have them evaluated.
Type :help for more information.
scala> def fib(x1: BigInt, x2: BigInt, steps: Int): (BigInt, BigInt) = if(steps>0) fib(x2, x1+x2, steps-1) else (x1, x2)
fib: (x1: BigInt, x2: BigInt, steps: Int)(BigInt, BigInt)
scala> def time(code: =>Unit) { val start = System.currentTimeMillis; code; println("code took " + (System.currentTimeMillis - start) + "ms") }
time: (code: => Unit)Unit
scala> time(fib(1, 1, 1000))
code took 1ms
scala> time(fib(1, 1, 1000))
code took 1ms
scala> time(fib(1, 1, 10000))
code took 5ms
scala> time(fib(1, 1, 100000))
code took 455ms
scala> time(fib(1, 1, 1000000))
code took 17172ms
Which means that in a presumably quite optimized loop, fib_100000 takes half a second. Now let’s play a bit with actors:
scala> case class Cont(steps: Int, batch: Int)
defined class Cont
scala> val me = inbox()
me: akka.actor.ActorDSL.Inbox = akka.actor.dsl.Inbox$Inbox#32c0fe13
scala> val a = actor(new Act {
var s: (BigInt, BigInt) = _
become {
case Cont(x, y) if y < 0 => s = (1, 1); self ! Cont(x, -y)
case Cont(x, y) if x > 0 => s = fib(s._1, s._2, y); self ! Cont(x - 1, y)
case _: Cont => me.receiver ! s
}
})
a: akka.actor.ActorRef = Actor[akka://repl/user/$c]
scala> time{a ! Cont(1000, -1); me.receive(10 seconds)}
code took 4ms
scala> time{a ! Cont(10000, -1); me.receive(10 seconds)}
code took 27ms
scala> time{a ! Cont(100000, -1); me.receive(10 seconds)}
code took 632ms
scala> time{a ! Cont(1000000, -1); me.receive(30 seconds)}
code took 17936ms
This is already interesting: given long enough time per step (with the huge BigInts behind the scenes in the last line), actors don’t much extra. Now let’s see what happens when doing smaller calculations in a more batched way:
scala> time{a ! Cont(10000, -10); me.receive(30 seconds)}
code took 462ms
This is pretty close to the result for the direct variant above.
Conclusion
Sending messages to self is NOT expensive for almost all applications, just keep the actual processing step slightly larger than a few hundred nanoseconds.
I assume from your comment to Roland Kuhn answer that you have some work which can be considered as recursive, at least in blocks. If this is not the case, I don't think there could be any clean solution to handle your problem and you will have to deal with complicated pattern matching blocks.
If my assumptions are correct, I would schedule the computation asynchronously and let the actor be free to answer other messages. The key point is to use Future monadic capabilities and having a simple receive block. You would have to handle three messages (startComputation, changeState, getState)
You will end up having the following receive:
def receive {
case StartComputation(myData) =>expensiveStuff(myData)
case ChangeState(newstate) = this.state = newstate
case GetState => sender ! this.state
}
And then you can leverage the map method on Future, by defining your own recursive map:
def mapRecursive[A](f:Future[A], handler: A => A, exitConditions: A => Boolean):Future[A] = {
f.flatMap { a=>
if (exitConditions(a))
f
else {
val newFuture = f.flatMap{ a=> Future(handler(a))}
mapRecursive(newFuture,handler,exitConditions)
}
}
}
Once you have this tool, everything is easier. If you look to the following example :
def main(args:Array[String]){
val baseFuture:Future[Int] = Promise.successful(64)
val newFuture:Future[Int] = mapRecursive(baseFuture,
(a:Int) => {
val result = a/2
println("Additional step done: the current a is " + result)
result
}, (a:Int) => (a<=1))
val one = Await.result(newFuture,Duration.Inf)
println("Computation finished, result = " + one)
}
Its output is:
Additional step done: the current a is 32
Additional step done: the current a is 16
Additional step done: the current a is 8
Additional step done: the current a is 4
Additional step done: the current a is 2
Additional step done: the current a is 1
Computation finished, result = 1
You understand you can do the same, inside your expensiveStuffmethod
def expensiveStuff(myData:MyData):Future[MyData]= {
val firstResult = Promise.successful(myData)
val handler : MyData => MyData = (myData) => {
val result = myData.copy(myData.value/2)
self ! ChangeState(result)
result
}
val exitCondition : MyData => Boolean = (myData:MyData) => myData.value==1
mapRecursive(firstResult,handler,exitCondition)
}
EDIT - MORE DETAILED
If you don't want to block the Actor, which processes messages from its mailbox in a thread-safe and synchronous manner, the only thing you can do is to get the computation executed on a different thread. This is exactly an high performance non blocking receive.
However, you were right in saying that the approach I propose pays a high performance penalty. Every step is done on a different future, which might be not necessary at all. You can therefore recurse the handler to obtain a single-threaded or multiple-threaded execution. There is no magic formula after all:
If you want to schedule asynchronously and minimize the cost, all the work should be done by a single thread
This however could prevent other work to start, because if all the threads on a thread pool are taken, the futures will queue. You might therefore want to break the operation into multiple futures so that even at full usage some new work can be scheduled before old work has been completed.
def recurseFuture[A](entryFuture: Future[A], handler: A => A, exitCondition: A => Boolean, maxNestedRecursion: Long = Long.MaxValue): Future[A] = {
def recurse(a:A, handler: A => A, exitCondition: A => Boolean, maxNestedRecursion: Long, currentStep: Long): Future[A] = {
if (exitCondition(a))
Promise.successful(a)
else
if (currentStep==maxNestedRecursion)
Promise.successful(handler(a)).flatMap(a => recurse(a,handler,exitCondition,maxNestedRecursion,0))
else{
recurse(handler(a),handler,exitCondition,maxNestedRecursion,currentStep+1)
}
}
entryFuture.flatMap { a => recurse(a,handler,exitCondition,maxNestedRecursion,0) }
}
I have enhanced for testing purposes my handler method:
val handler: Int => Int = (a: Int) => {
val result = a / 2
println("Additional step done: the current a is " + result + " on thread " + Thread.currentThread().getName)
result
}
Approach 1: Recurse the handler on itself so to get all execute on a single thread.
println("Starting strategy with all the steps on the same thread")
val deepestRecursion: Future[Int] = recurseFuture(baseFuture,handler, exitCondition)
Await.result(deepestRecursion, Duration.Inf)
println("Completed strategy with all the steps on the same thread")
println("")
Approach 2: Recurse for a limited depth the handler on itself
println("Starting strategy with the steps grouped by three")
val threeStepsInSameFuture: Future[Int] = recurseFuture(baseFuture,handler, exitCondition,3)
val threeStepsInSameFuture2: Future[Int] = recurseFuture(baseFuture,handler, exitCondition,4)
Await.result(threeStepsInSameFuture, Duration.Inf)
Await.result(threeStepsInSameFuture2, Duration.Inf)
println("Completed strategy with all the steps grouped by three")
executorService.shutdown()
You should not use Actors to make long running computations as these will block the threads that are supposed to run the Actors code.
I would try to go with a design that uses a separate Thread/ThreadPool for the computations and use AtomicReferences to store/query the intermediate results in the lines of the following pseudo code:
val cancelled = new AtomicBoolean(false)
val intermediateResult = new AtomicReference[IntermediateResult]()
object WorkerThread extends Thread {
override def run {
while(!cancelled.get) {
intermediateResult.set(computationStep(intermediateResult.get))
}
}
}
loop {
react {
case StartComputation => WorkerThread.start()
case CancelComputation => cancelled.set(true)
case GetCurrentResult => sender ! intermediateResult.get
}
}
This is a classic concurrency problem. You want want several routines/actors (or whatever you want to call them). Code is mostly correct Go, with obscenely long variable names for context. The first routine handles queries and intermediate results:
func serveIntermediateResults(
computationChannel chan *IntermediateResult,
queryChannel chan chan<-*IntermediateResult) {
var latestIntermediateResult *IntermediateResult // initial result
for {
select {
// an update arrives
case latestIntermediateResult, notClosed := <-computationChannel:
if !notClosed {
// the computation has finished, stop checking
computationChannel = nil
}
// a query arrived
case queryResponseChannel, notClosed := <-queryChannel:
if !notClosed {
// no more queries, so we're done
return
}
// respond with the latest result
queryResponseChannel<-latestIntermediateResult
}
}
}
In your long computation, you update your intermediate result wherever appropriate:
func longComputation(intermediateResultChannel chan *IntermediateResult) {
for notFinished {
// lots of stuff
intermediateResultChannel<-currentResult
}
close(intermediateResultChannel)
}
Finally to ask for the current result, you have a wrapper to make this nice:
func getCurrentResult() *IntermediateResult {
responseChannel := make(chan *IntermediateResult)
// queryChannel was given to the intermediate result server routine earlier
queryChannel<-responseChannel
return <-responseChannel
}
I'm sure this is a very simple question, but embarrassed to say I can't get my head around it:
I have a list of values in Scala.
I would like to use use actors to make some (external) calls with each value, in parallel.
I would like to wait until all values have been processed, and then proceed.
There's no shared values being modified.
Could anyone advise?
Thanks
There's an actor-using class in Scala that's made precisely for this kind of problem: Futures. This problem would be solved like this:
// This assigns futures that will execute in parallel
// In the example, the computation is performed by the "process" function
val tasks = list map (value => scala.actors.Futures.future { process(value) })
// The result of a future may be extracted with the apply() method, which
// will block if the result is not ready.
// Since we do want to block until all results are ready, we can call apply()
// directly instead of using a method such as Futures.awaitAll()
val results = tasks map (future => future.apply())
There you go. Just that.
Create workers and ask them for futures using !!; then read off the results (which will be calculated and come in in parallel as they're ready; you can then use them). For example:
object Example {
import scala.actors._
class Worker extends Actor {
def act() { Actor.loop { react {
case s: String => reply(s.length)
case _ => exit()
}}}
}
def main(args: Array[String]) {
val arguments = args.toList
val workers = arguments.map(_ => (new Worker).start)
val futures = for ((w,a) <- workers zip arguments) yield w !! a
val results = futures.map(f => f() match {
case i: Int => i
case _ => throw new Exception("Whoops--didn't expect to get that!")
})
println(results)
workers.foreach(_ ! None)
}
}
This does a very inexpensive computation--calculating the length of a string--but you can put something expensive there to make sure it really does happen in parallel (the last thing that case of the act block should be to reply with the answer). Note that we also include a case for the worker to shut itself down, and when we're all done, we tell the workers to shut down. (In this case, any non-string shuts down the worker.)
And we can try this out to make sure it works:
scala> Example.main(Array("This","is","a","test"))
List(4, 2, 1, 4)