Multiple Scala actors servicing one task - scala

I need to process multiple data values in parallel ("SIMD"). I can use the java.util.concurrent APIs (Executors.newFixedThreadPool()) to process several values in parallels using Future instances:
import java.util.concurrent.{Executors, Callable}
class ExecutorsTest {
private class Process(value: Int)
extends Callable[Int] {
def call(): Int = {
// Do some time-consuming task
value
}
}
val executorService = {
val threads = Runtime.getRuntime.availableProcessors
Executors.newFixedThreadPool(threads)
}
val processes = for (process <- 1 to 1000) yield new Process(process)
val futures = executorService.invokeAll(processes)
// Wait for futures
}
How do I do the same thing using Actors? I do not believe that I want to "feed" all of the processes to a single actor because the actor will then execute them sequentially.
Do I need to create multiple "processor" actors with a "dispatcher" actor that sends an equal number of processes to each "processor" actor?

If you just want fire-and-forget processing, why not use Scala futures?
import scala.actors.Futures._
def example = {
val answers = (1 to 4).map(x => future {
Thread.sleep(x*1000)
println("Slept for "+x)
x
})
val t0 = System.nanoTime
awaitAll(1000000,answers: _*) // Number is timeout in ms
val t1 = System.nanoTime
printf("%.3f seconds elapsed\n",(t1-t0)*1e-9)
answers.map(_()).sum
}
scala> example
Slept for 1
Slept for 2
Slept for 3
Slept for 4
4.000 seconds elapsed
res1: Int = 10
Basically, all you do is put the code you want inside a future { } block, and it will immediately return a future; apply it to get the answer (it will block until done), or use awaitAll with a timeout to wait until everyone is done.
Update: As of 2.11, the way to do this is with scala.concurrent.Future. A translation of the above code is:
import scala.concurrent._
import duration._
import ExecutionContext.Implicits.global
def example = {
val answers = Future.sequence(
(1 to 4).map(x => Future {
Thread.sleep(x*1000)
println("Slept for "+x)
x
})
)
val t0 = System.nanoTime
val completed = Await.result(answers, Duration(1000, SECONDS))
val t1 = System.nanoTime
printf("%.3f seconds elapsed\n",(t1-t0)*1e-9)
completed.sum
}

If you can use Akka, take a look at the ActorPool support: http://doc.akka.io/routing-scala
It lets you specify parameters about how many actors you want running in parallel and then dispatches work to those actors.

Related

In akka streaming program w/ Source.queue & Sink.queue I offer 1000 items, but it just hangs when I try to get 'em out

I am trying to understand how i should be working with Source.queue & Sink.queue in Akka streaming.
In the little test program that I wrote below I find that I am able to successfully offer 1000 items to the Source.queue.
However, when i wait on the future that should give me the results of pulling all those items off the queue, my
future never completes. Specifically, the message 'print what we pulled off the queue' that we should see at the end
never prints out -- instead we see the error "TimeoutException: Futures timed out after [10 seconds]"
any guidance greatly appreciated !
import akka.actor.ActorSystem
import akka.event.{Logging, LoggingAdapter}
import akka.stream.scaladsl.{Flow, Keep, Sink, Source}
import akka.stream.{ActorMaterializer, Attributes}
import org.scalatest.FunSuite
import scala.collection.immutable
import scala.concurrent.duration._
import scala.concurrent.{Await, ExecutionContext, Future}
class StreamSpec extends FunSuite {
implicit val actorSystem: ActorSystem = ActorSystem()
implicit val materializer: ActorMaterializer = ActorMaterializer()
implicit val log: LoggingAdapter = Logging(actorSystem.eventStream, "basis-test")
implicit val ec: ExecutionContext = actorSystem.dispatcher
case class Req(name: String)
case class Response(
httpVersion: String = "",
method: String = "",
url: String = "",
headers: Map[String, String] = Map())
test("put items on queue then take them off") {
val source = Source.queue[String](128, akka.stream.OverflowStrategy.backpressure)
val flow = Flow[String].map(element => s"Modified $element")
val sink = Sink.queue[String]().withAttributes( Attributes.inputBuffer(128, 128))
val (sourceQueue, sinkQueue) = source.via(flow).toMat(sink)(Keep.both).run()
(1 to 1000).map( i =>
Future {
println("offerd" + i) // I see this print 1000 times as expected
sourceQueue.offer(s"batch-$i")
}
)
println("DONE OFFER FUTURE FIRING")
// Now use the Sink.queue to pull the items we added onto the Source.queue
val seqOfFutures: immutable.Seq[Future[Option[String]]] =
(1 to 1000).map{ i => sinkQueue.pull() }
val futureOfSeq: Future[immutable.Seq[Option[String]]] =
Future.sequence(seqOfFutures)
val seq: immutable.Seq[Option[String]] =
Await.result(futureOfSeq, 10.second)
// unfortunately our future times out here
println("print what we pulled off the queue:" + seq);
}
}
Looking at this again, I realize that I originally set up and posed my question incorrectly.
The test that accompanies my original question launches a wave
of 1000 futures, each of which tries to offer 1 item to the queue.
Then the second step in that test attempts create a 1000-element sequence (seqOfFutures)
where each future is trying to pull a value from the queue.
My theory as to why I was getting time-out errors is that there was some kind of deadlock due to running
out of threads or due to one thread waiting on another but where the waited-on-thread was blocked,
or something like that.
I'm not interested in hunting down the exact cause at this point because I have corrected
things in the code below (see CORRECTED CODE).
In the new code the test that uses the queue is called:
"put items on queue then take them off (with async parallelism) - (3)".
In this test I have a set of 10 tasks which run in parallel to do the 'enequeue' operation.
Then I have another 10 tasks which do the dequeue operation, which involves not only taking
the item off the list, but also calling stringModifyFunc which introduces a 1 ms processing delay.
I also wanted to prove that I got some performance benefit from
launching tasks in parallel and having the task steps communicate by passing their results through a
queue, so test 3 runs as a timed operation, and I found that it takes 1.9 seconds.
Tests (1) and (2) do the same amount of work, but serially -- The first with no intervening queue, and the second
using the queue to pass results between steps. These tests run in 13.6 and 15.6 seconds respectively
(which shows that the queue adds a bit of overhead, but that this is overshadowed by the efficiencies of running tasks in parallel.)
CORRECTED CODE
import akka.{Done, NotUsed}
import akka.actor.ActorSystem
import akka.event.{Logging, LoggingAdapter}
import akka.stream.scaladsl.{Flow, Keep, Sink, Source}
import akka.stream.{ActorMaterializer, Attributes, QueueOfferResult}
import org.scalatest.FunSuite
import scala.concurrent.duration._
import scala.concurrent.{Await, ExecutionContext, Future}
class Speco extends FunSuite {
implicit val actorSystem: ActorSystem = ActorSystem()
implicit val materializer: ActorMaterializer = ActorMaterializer()
implicit val log: LoggingAdapter = Logging(actorSystem.eventStream, "basis-test")
implicit val ec: ExecutionContext = actorSystem.dispatcher
val stringModifyFunc: String => String = element => {
Thread.sleep(1)
s"Modified $element"
}
def setup = {
val source = Source.queue[String](128, akka.stream.OverflowStrategy.backpressure)
val sink = Sink.queue[String]().withAttributes(Attributes.inputBuffer(128, 128))
val (sourceQueue, sinkQueue) = source.toMat(sink)(Keep.both).run()
val offers: Source[String, NotUsed] = Source(
(1 to iterations).map { i =>
s"item-$i"
}
)
(sourceQueue,sinkQueue,offers)
}
val outer = 10
val inner = 1000
val iterations = outer * inner
def timedOperation[T](block : => T) = {
val t0 = System.nanoTime()
val result: T = block // call-by-name
val t1 = System.nanoTime()
println("Elapsed time: " + (t1 - t0) / (1000 * 1000) + " milliseconds")
result
}
test("20k iterations in single threaded loop no queue (1)") {
timedOperation{
(1 to iterations).foreach { i =>
val str = stringModifyFunc(s"tag-${i.toString}")
System.out.println("str:" + str);
}
}
}
test("20k iterations in single threaded loop with queue (2)") {
timedOperation{
val (sourceQueue, sinkQueue, offers) = setup
val resultFuture: Future[Done] = offers.runForeach{ str =>
val itemFuture = for {
_ <- sourceQueue.offer(str)
item <- sinkQueue.pull()
} yield (stringModifyFunc(item.getOrElse("failed")) )
val item = Await.result(itemFuture, 10.second)
System.out.println("item:" + item);
}
val result = Await.result(resultFuture, 20.second)
System.out.println("result:" + result);
}
}
test("put items on queue then take them off (with async parallelism) - (3)") {
timedOperation{
val (sourceQueue, sinkQueue, offers) = setup
def enqueue(str: String) = sourceQueue.offer(str)
def dequeue = {
sinkQueue.pull().map{
maybeStr =>
val str = stringModifyFunc( maybeStr.getOrElse("failed2"))
println(s"dequeud value is $str")
}
}
val offerResults: Source[QueueOfferResult, NotUsed] =
offers.mapAsyncUnordered(10){ string => enqueue(string)}
val dequeueResults: Source[Unit, NotUsed] = offerResults.mapAsyncUnordered(10){ _ => dequeue }
val runAll: Future[Done] = dequeueResults.runForeach(u => u)
Await.result(runAll, 20.second)
}
}
}

Control main thread with multiple Future jobs in scala

def fixture =
new {
val xyz = new XYZ(spark)
}
val fList: scala.collection.mutable.MutableList[scala.concurrent.Future[Dataset[Row]]] = scala.collection.mutable.MutableList[scala.concurrent.Future[Dataset[Row]]]() //mutable List of future means List[Future]
test("test case") {
val tasks = for (i <- 1 to 10) {
fList ++ scala.collection.mutable.MutableList[scala.concurrent.Future[Dataset[Row]]](Future {
println("Executing task " + i )
val ds = read(fixture.etlSparkLayer,i)
ds
})
}
Thread.sleep(1000*4200)
val futureOfList = Future.sequence(fList)//list of Future job in Future sequence
println(Await.ready(futureOfList, Duration.Inf))
val await_result: Seq[Dataset[Row]] = Await.result(futureOfList, Duration.Inf)
println("Squares: " + await_result)
futureOfList.onComplete {
case Success(x) => println("Success!!! " + x)
case Failure(ex) => println("Failed !!! " + ex)
}
}
I am executing one test case with sequence of Future List and List have collection of Future.I trying to execute same fuction multiple time parallely by help of using Future in scala.In my system only 4 job start in one time after completion of 4 jobs next 4 job will starting like that complete all the jobs. So how to start more than 4 job at a time and how main Thread will wait to complete all the Future thread ? I tried Await.result and Await.ready but not able to control main thread , for main thread control i m use Thread.sleep concept.this program is for read from RDBMS table and write in Elasticsearch. So how to control main thread main issue?
Assuming that you use the scala.concurrent.ExecutionContext.Implicits.global ExecutionContext you can tune the number of threads as described here:
https://github.com/scala/scala/blob/2.12.x/src/library/scala/concurrent/impl/ExecutionContextImpl.scala#L100
Specifically the following System Properties: scala.concurrent.context.minThreads, scala.concurrent.context.numThreads. scala.concurrent.context.maxThreads, and scala.concurrent.context.maxExtraThreads
Otherwise, you can rewrite your code to something like this:
import scala.collection.immutable
import scala.concurrent.duration._
import scala.concurrent._
import java.util.concurrent.Executors
test("test case") {
implicit val ec = ExecutionContext.fromExecutorService(ExecutorService.newFixedThreadPool(NUMBEROFTHREADSYOUWANT))
val aFuture = Future.traverse(1 to 10) {
i => Future {
println("Executing task " + i)
read(fixture.etlSparkLayer,i) // If this is a blocking operation you may want to consider wrapping it in a `blocking {}`-block.
}
}
aFuture.onComplete(_ => ec.shutdownNow()) // Only for this test, and to make sure the pool gets cleaned up
val await_result: immutable.Seq[Dataset[Row]] = Await.result(aFuture, 60.minutes) // Or other timeout
println("Squares: " + await_result)
}

akka-http queries do not run in parallel

I am very new to akka-http and have troubles to run queries on the same route in parallel.
I have a route that may return the result very quickly (if cached) or not (heavy CPU multithreaded computations). I would like to run these queries in parallel, in case a short one arrives after a long one with heavy computation, I do not want the second call to wait for the first to finish.
However it seems that these queries do not run in parallel if they are on the same route (run in parallel if on different routes)
I can reproduice it in a basic project:
Calling the server 3 time in parallel (with 3 Chrome's tab on http://localhost:8080/test) causes the responses to arrive respectively at 3.0s, 6.0-s and 9.0-s. I suppose queries do not run in parallel.
Running on a 6 cores (with HT) machine on Windows 10 with jdk 8.
build.sbt
name := "akka-http-test"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies += "com.typesafe.akka" %% "akka-http-experimental" % "2.4.11"
*AkkaHttpTest.scala**
import java.util.concurrent.Executors
import akka.actor.ActorSystem
import akka.http.scaladsl.Http
import akka.http.scaladsl.server.Directives._
import akka.stream.ActorMaterializer
import scala.concurrent.{ExecutionContext, Future}
object AkkaHttpTest extends App {
implicit val actorSystem = ActorSystem("system") // no application.conf here
implicit val executionContext =
ExecutionContext.fromExecutor(Executors.newFixedThreadPool(6))
implicit val actorMaterializer = ActorMaterializer()
val route = path("test") {
onComplete(slowFunc()) { slowFuncResult =>
complete(slowFuncResult)
}
}
def slowFunc()(implicit ec: ExecutionContext): Future[String] = Future {
Thread.sleep(3000)
"Waited 3s"
}
Http().bindAndHandle(route, "localhost", 8080)
println("server started")
}
What am I doing wrong here ?
Thanks for your help
EDIT: Thanks to #Ramon J Romero y Vigil, I added Future Wrapping, but the problem still persists
def slowFunc()(implicit ec : ExecutionContext) : Future[String] = Future {
Thread.sleep(3000)
"Waited 3.0s"
}
val route = path("test") {
onComplete(slowFunc()) { slowFuncResult =>
complete(slowFuncResult)
}
}
Tries with a the default Thread pool, the one defined above in the config file, and a Fixed Thread Pool (6 Threads).
It seems that the onComplete directive still waits for the future to complete and then block the Route (with same connection).
Same problem with the Flow trick
import akka.stream.scaladsl.Flow
val parallelism = 10
val reqFlow =
Flow[HttpRequest].filter(_.getUri().path().equalsIgnoreCase("/test"))
.mapAsync(parallelism)(_ => slowFunc())
.map(str => HttpResponse(status=StatusCodes.Ok, entity=str))
Http().bindAndHandle(reqFlow, ...)
Thanks for your help
Each IncomingConnection is handled by the same Route, therefore when you "call the server 3 times in parallel" you are likely using the same Connection and therefore the same Route.
The Route is handling all 3 incoming HttpRequest values in an akka-stream fashion, i.e. the Route is composed of multiple stages but each stage can only processes 1 element at any given time. In your example the "complete" stage of the stream will call Thread.sleep for each incoming Request and process each Request one-at-a-time.
To get multiple concurrent requests handled at the same time you should establish a unique connection for each request.
An example of the client side connection pool can be created similar to the documentation examples:
import akka.http.scaladsl.Http
val connPoolFlow = Http().newHostConnectionPool("localhost", 8080)
This can then be integrated into a stream that makes the requests:
import akka.http.scaladsl.model.Uri._
import akka.http.scaladsl.model.HttpRequest
val request = HttpRequest(uri="/test")
import akka.stream.scaladsl.Source
val reqStream =
Source.fromIterator(() => Iterator.continually(request).take(3))
.via(connPoolFlow)
.via(Flow.mapAsync(3)(identity))
.to(Sink foreach { resp => println(resp)})
.run()
Route Modification
If you want each HttpRequest to be processed in parallel then you can use the same Route to do so but you must spawn off Futures inside of the Route and use the onComplete directive:
def slowFunc()(implicit ec : ExecutionContext) : Future[String] = Future {
Thread.sleep(1500)
"Waited 1.5s"
}
val route = path("test") {
onComplete(slowFunc()) { slowFuncResult =>
complete(slowFuncResult)
}
}
One thing to be aware of: if you don't specify a different ExecutionContext for your sleeping function then the same thread pool for routes will be used for your sleeping. You may exhaust the available threads this way. You should probably use a seperate ec for your sleeping...
Flow Based
One other way to handle the HttpRequests is with a stream Flow:
import akka.stream.scaladsl.Flow
val parallelism = 10
val reqFlow =
Flow[HttpRequest].filter(_.getUri().path().equalsIgnoreCase("/test"))
.mapAsync(parallelism)(_ => slowFunc())
.map(str => HttpResponse(status=StatusCodes.Ok, entity=str))
Http().bindAndHandle(reqFlow, ...)
In case this is still relevant, or for future readers, the answer is inside Http().bindAndHandle documentation:
/**
* Convenience method which starts a new HTTP server...
* ...
* The number of concurrently accepted connections can be configured by overriding
* the `akka.http.server.max-connections` setting....
* ...
*/
def bindAndHandle(...
use akka.http.server.max-connections setting for number of concurrent connections.

Flaky onSuccess of Future.sequence

I wrote this method:
import scala.concurrent._
import ExecutionContext.Implicits.global
import scala.util.{ Success, Failure }
object FuturesSequence extends App {
val f1 = future {
1
}
val f2 = future {
2
}
val lf = List(f1, f2)
val seq = Future.sequence(lf)
seq.onSuccess {
case l => println(l)
}
}
I was expecting Future.sequence to gather a List[Future] into a Future[List] and then wait for every futures (f1 and f2 in my case) to complete before calling onSuccess on the Future[List] seq in my case.
But after many runs of this code, it prints "List(1, 2)" only once in a while and I can't figure out why it does not work as expected.
Try this for once,
import scala.concurrent._
import java.util.concurrent.Executors
import scala.util.{ Success, Failure }
object FuturesSequence extends App {
implicit val exec = ExecutionContext.fromExecutor(Executors.newCachedThreadPool)
val f1 = future {
1
}
val f2 = future {
2
}
val lf = List(f1, f2)
val seq = Future.sequence(lf)
seq.onSuccess {
case l => println(l)
}
}
This will always print List(1,2). The reason is simple, the exec above is an ExecutionContext of threads (not daemon threads) where as in your example the ExecutionContext was the default one implicitly taken from ExecutionContext.Implicits.global which contains daemon threads.
Hence being daemon, the process doesn't wait for seq future to be completed and terminates. if at all seq does get completed then it prints. But that doesn't happen always
The application is exiting before the future is completes.
You need to block until the future has completed. This can be achieved in a variety of ways, including changing the ExecutionContext, instantiating a new ThreadPool, Thread.sleep etc, or by using methods on scala.concurrent.Await
The simplest way for your code is by using Await.ready. This blocks on a future for a specified amount of time. In the modified code below, the application waits for 5 seconds before exiting.
Note also, the extra import scala.concurrent.duration so we can specify the time to wait.
import scala.concurrent._
import scala.concurrent.duration._
import java.util.concurrent.Executors
import scala.util.{ Success, Failure }
object FuturesSequence extends App {
val f1 = future {
1
}
val f2 = future {
2
}
val lf = List(f1, f2)
val seq = Future.sequence(lf)
seq.onSuccess {
case l => println(l)
}
Await.ready(seq, 5 seconds)
}
By using Await.result instead, you can skip the onSuccess method too, as it will return the resulting list to you.
Example:
val seq: List[Int] = Await.result(Future.sequence(lf), 5 seconds)
println(seq)

Why future example do not work?

I am reading akkaScala documentation, there is an example (p. 171 bottom)
// imports added for compilation
import scala.concurrent.{ExecutionContext, Future}
import ExecutionContext.Implicits.global
class Some {
}
object Some {
def main(args: Array[String]) {
// Create a sequence of Futures
val futures = for (i <- 1 to 1000) yield Future(i * 2)
val futureSum = Future.fold(futures)(0)(_ + _)
futureSum foreach println
}
}
I run it, but nothing happened. I mean that nothing was in console output. What is wrong?
You don't wait for the future to complete, so you create a race between the program exiting and the futures completing and the side-effect running. On your machine, the future seems to lose the race, on the commenters' who say "it works", the future is winning the race.
You can use Await to block on a future and wait for it to complete. This is something you should only be doing "at the ends of the world", you should very rarely actually be using Await...
// imports added for compilation
import scala.concurrent.{ExecutionContext, Future}
import ExecutionContext.Implicits.global
import scala.concurrent.duration._ // for the "1 second" syntax
import scala.concurrent.Await
class Some {
}
object Some {
def main(args: Array[String]) {
// Create a sequence of Futures
val futures = for (i <- 1 to 1000) yield Future(i * 2)
val futureSum = Future.fold(futures)(0)(_ + _)
// we map instead of foreach, to make sure that the side-effect is part of the future
// and we "await" for the future to complete (for 1 second)
Await.result(futureSum map println, 1 second)
}
}
As others have stated, the issue is the race condition where the futures are competing with the program terminating. The JVM has a concept of daemon threads. It waits for non-daemon threads to terminate but not daemon threads. So if you want to wait for threads to complete, use non-daemon threads.
The way threads are created for scala futures is using an implicit scala.concurrent.ExecutionContext. The one you use (import ExecutionContext.Implicits.global) starts daemon threads. However, it is possible to use non-daemon threads. So if you use an ExecutionContext with non-daemon threads, it will wait, which in your case is reasonable behaviour. Naively:
import scala.concurrent.Future
import scala.concurrent.ExecutionContextExecutor
import scala.concurrent.ExecutionContext
class MyExecutionContext extends ExecutionContext {
override def execute(runnable:Runnable) = {
val t = new Thread(runnable)
t.setDaemon(false)
t.start()
}
override def reportFailure(t:Throwable) = t.printStackTrace
}
object Some {
implicit lazy val context: ExecutionContext = new MyExecutionContext
def main(args: Array[String]) {
// Create a sequence of Futures
val futures = for (i <- 1 to 1000) yield Future(i * 2)
val futureSum = Future.fold(futures)(0)(_ + _)
futureSum foreach println
}
}
Careful with using the above ExecutionContext in production because it doesn't use a thread pool and can create unbounded threads, but the message is: you can control everything about the threads behind Futures through an ExecutionContext. Explore the various scala and akka contexts to find what you need, or if nothing suits, write your own.
Both of the following statement at the end of main function would help your need. As the above answers said, allow the future to complete. Main thread is different from the Future thread, as main completes, it terminates before Future thread.
Thread.sleep(500) //... Simple solution
Await.result(futureSum, Duration(500, MILLISECONDS)) //...have to import scala.concurrent.duration._ to use Duration object.