akka-http queries do not run in parallel

akka-http queries do not run in parallel - scala

I am very new to akka-http and have troubles to run queries on the same route in parallel.
I have a route that may return the result very quickly (if cached) or not (heavy CPU multithreaded computations). I would like to run these queries in parallel, in case a short one arrives after a long one with heavy computation, I do not want the second call to wait for the first to finish.
However it seems that these queries do not run in parallel if they are on the same route (run in parallel if on different routes)
I can reproduice it in a basic project:
Calling the server 3 time in parallel (with 3 Chrome's tab on http://localhost:8080/test) causes the responses to arrive respectively at 3.0s, 6.0-s and 9.0-s. I suppose queries do not run in parallel.
Running on a 6 cores (with HT) machine on Windows 10 with jdk 8.
build.sbt
name := "akka-http-test"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies += "com.typesafe.akka" %% "akka-http-experimental" % "2.4.11"
*AkkaHttpTest.scala**
import java.util.concurrent.Executors
import akka.actor.ActorSystem
import akka.http.scaladsl.Http
import akka.http.scaladsl.server.Directives._
import akka.stream.ActorMaterializer
import scala.concurrent.{ExecutionContext, Future}
object AkkaHttpTest extends App {
implicit val actorSystem = ActorSystem("system") // no application.conf here
implicit val executionContext =
ExecutionContext.fromExecutor(Executors.newFixedThreadPool(6))
implicit val actorMaterializer = ActorMaterializer()
val route = path("test") {
onComplete(slowFunc()) { slowFuncResult =>
complete(slowFuncResult)
}
}
def slowFunc()(implicit ec: ExecutionContext): Future[String] = Future {
Thread.sleep(3000)
"Waited 3s"
}
Http().bindAndHandle(route, "localhost", 8080)
println("server started")
}
What am I doing wrong here ?
Thanks for your help
EDIT: Thanks to #Ramon J Romero y Vigil, I added Future Wrapping, but the problem still persists
def slowFunc()(implicit ec : ExecutionContext) : Future[String] = Future {
Thread.sleep(3000)
"Waited 3.0s"
}
val route = path("test") {
onComplete(slowFunc()) { slowFuncResult =>
complete(slowFuncResult)
}
}
Tries with a the default Thread pool, the one defined above in the config file, and a Fixed Thread Pool (6 Threads).
It seems that the onComplete directive still waits for the future to complete and then block the Route (with same connection).
Same problem with the Flow trick
import akka.stream.scaladsl.Flow
val parallelism = 10
val reqFlow =
Flow[HttpRequest].filter(_.getUri().path().equalsIgnoreCase("/test"))
.mapAsync(parallelism)(_ => slowFunc())
.map(str => HttpResponse(status=StatusCodes.Ok, entity=str))
Http().bindAndHandle(reqFlow, ...)
Thanks for your help

Each IncomingConnection is handled by the same Route, therefore when you "call the server 3 times in parallel" you are likely using the same Connection and therefore the same Route.
The Route is handling all 3 incoming HttpRequest values in an akka-stream fashion, i.e. the Route is composed of multiple stages but each stage can only processes 1 element at any given time. In your example the "complete" stage of the stream will call Thread.sleep for each incoming Request and process each Request one-at-a-time.
To get multiple concurrent requests handled at the same time you should establish a unique connection for each request.
An example of the client side connection pool can be created similar to the documentation examples:
import akka.http.scaladsl.Http
val connPoolFlow = Http().newHostConnectionPool("localhost", 8080)
This can then be integrated into a stream that makes the requests:
import akka.http.scaladsl.model.Uri._
import akka.http.scaladsl.model.HttpRequest
val request = HttpRequest(uri="/test")
import akka.stream.scaladsl.Source
val reqStream =
Source.fromIterator(() => Iterator.continually(request).take(3))
.via(connPoolFlow)
.via(Flow.mapAsync(3)(identity))
.to(Sink foreach { resp => println(resp)})
.run()
Route Modification
If you want each HttpRequest to be processed in parallel then you can use the same Route to do so but you must spawn off Futures inside of the Route and use the onComplete directive:
def slowFunc()(implicit ec : ExecutionContext) : Future[String] = Future {
Thread.sleep(1500)
"Waited 1.5s"
}
val route = path("test") {
onComplete(slowFunc()) { slowFuncResult =>
complete(slowFuncResult)
}
}
One thing to be aware of: if you don't specify a different ExecutionContext for your sleeping function then the same thread pool for routes will be used for your sleeping. You may exhaust the available threads this way. You should probably use a seperate ec for your sleeping...
Flow Based
One other way to handle the HttpRequests is with a stream Flow:
import akka.stream.scaladsl.Flow
val parallelism = 10
val reqFlow =
Flow[HttpRequest].filter(_.getUri().path().equalsIgnoreCase("/test"))
.mapAsync(parallelism)(_ => slowFunc())
.map(str => HttpResponse(status=StatusCodes.Ok, entity=str))
Http().bindAndHandle(reqFlow, ...)

In case this is still relevant, or for future readers, the answer is inside Http().bindAndHandle documentation:
/**
* Convenience method which starts a new HTTP server...
* ...
* The number of concurrently accepted connections can be configured by overriding
* the `akka.http.server.max-connections` setting....
* ...
*/
def bindAndHandle(...
use akka.http.server.max-connections setting for number of concurrent connections.

Related

Timeout Akka Streams Flow

I'm trying to use completionTimeout in an akka streams flow. I've provided a contrived example where the flow takes 10 seconds but I've added a completionTimeout with a timeout of 1 second. I would expect this flow to timeout after 1 second. However, in the example the flow completes in 10 seconds without any errors.
Why doesn't the flow timeout? Is there a better way to timeout a flow?
import akka.NotUsed
import akka.actor.ActorSystem
import akka.stream.scaladsl.{Flow, Sink, Source}
import org.scalatest.{FlatSpec, Matchers}
import scala.concurrent.duration._
import scala.concurrent.{Await, Future}
class Test extends FlatSpec with Matchers {
implicit val system = ActorSystem("test")
"This Test" should "fail but passes and I don't know why" in {
//This takes 10 seconds to complete
val flow: Flow[String, String, NotUsed] = Flow[String]
.map(str => {
println(s"Processing ${str}")
Thread.sleep(10000)
})
.map(_ => {"Done!"})
val future: Future[String] =
Source.single("Input")
.via(flow)
.completionTimeout(1 second) // Set a timeout of 1 second
.runWith(Sink.last)
val result = Await.result(future, 15 seconds)
result should be("Done!")
}
}

In executing a given stream, Akka Stream leverages operator fusion to fuse stream operators by a single underlying actor for optimal performance. For your main thread to catch the timeout, you could introduce asynchrony by means of .async:
val future: Future[String] =
Source.single("Input")
.via(flow)
.async // <--- asynchronous boundary
.completionTimeout(1 second)
.runWith(Sink.last)
future.onComplete(println)
// Processing Input
// Failure(java.util.concurrent.TimeoutException: The stream has not been completed in 1 second.)
An alternative to introduce asynchrony is to use the mapAsync flow stage:
val flow: Flow[String, String, NotUsed] = Flow[String]
.map(str => {
println(s"Processing ${str}")
Thread.sleep(10000)
})
.mapAsync(1)(_ => Future("Done!")) // <--- asynchronous flow stage
Despite getting the same timeout error, you may notice it'll take ~10s to see result when using mapAsync, whereas only ~1s using async. That's because while mapAsync introduces an asynchronous flow stage, it's not an asynchronous boundary (like what async does) and is still subject to operator fusion.

How to represent multiple incoming TCP connections as a stream of Akka streams?

I'm prototyping a network server using Akka Streams that will listen on a port, accept incoming connections, and continuously read data off each connection. Each connected client will only send data, and will not expect to receive anything useful from the server.
Conceptually, I figured it would be fitting to model the incoming events as one single stream that only incidentally happens to be delivered via multiple TCP connections. Thus, assuming that I have a case class Msg(msg: String) that represents each data message, what I want is to represent the entirety of incoming data as a Source[Msg, _]. This makes a lot of sense for my use case, because I can very simply connect flows & sinks to this source.
Here's the code I wrote to implement my idea:
import akka.actor.ActorSystem
import akka.stream.ActorMaterializer
import akka.stream.SourceShape
import akka.stream.scaladsl._
import akka.util.ByteString
import akka.NotUsed
import scala.concurrent.{ Await, Future }
import scala.concurrent.duration._
case class Msg(msg: String)
object tcp {
val N = 2
def main(argv: Array[String]) {
implicit val system = ActorSystem()
implicit val materializer = ActorMaterializer()
val connections = Tcp().bind("0.0.0.0", 65432)
val delim = Framing.delimiter(
ByteString("\n"),
maximumFrameLength = 256, allowTruncation = true
)
val parser = Flow[ByteString].via(delim).map(_.utf8String).map(Msg(_))
val messages: Source[Msg, Future[Tcp.ServerBinding]] =
connections.flatMapMerge(N, {
connection =>
println(s"client connected: ${connection.remoteAddress}")
Source.fromGraph(GraphDSL.create() { implicit builder =>
import GraphDSL.Implicits._
val F = builder.add(connection.flow.via(parser))
val nothing = builder.add(Source.tick(
initialDelay = 1.second,
interval = 1.second,
tick = ByteString.empty
))
F.in <~ nothing.out
SourceShape(F.out)
})
})
import scala.concurrent.ExecutionContext.Implicits.global
Await.ready(for {
_ <- messages.runWith(Sink.foreach {
msg => println(s"${System.currentTimeMillis} $msg")
})
_ <- system.terminate()
} yield (), Duration.Inf)
}
}
This code works as expected, however, note the val N = 2, which is passed into the flatMapMerge call that ultimately combines the incoming data streams into one. In practice this means that I can only read from that many streams at a time.
I don't know how many connections will be made to this server at any given time. Ideally I would want to support as many as possible, but hardcoding an upper bound doesn't seem like the right thing to do.
My question, at long last, is: How can I obtain or create a flatMapMerge stage that can read from more than a fixed number of connections at one time?

As indicated by Viktor Klang's comments I don't think this is possible in 1 stream. However, I think it would be possible to create a stream that can receive messages after materialization and use that as a "sink" for messages coming from the TCP connections.
First create the "sink" stream:
val sinkRef =
Source
.actorRef[Msg](Int.MaxValue, fail)
.to(Sink foreach {m => println(s"${System.currentTimeMillis} $m")})
.run()
This sinkRef can be used by each Connection to receive the messages:
connections foreach { conn =>
Source
.empty[ByteString]
.via(conn.flow)
.via(parser)
.runForeach(msg => sinkRef ! msg)
}

Akka Flow hangs when making http requests via connection pool

I'm using Akka 2.4.4 and trying to move from Apache HttpAsyncClient (unsuccessfully).
Below is simplified version of code that I use in my project.
The problem is that it hangs if I send more than 1-3 requests to the flow. So far after 6 hours of debugging I couldn't even locate the problem. I don't see exceptions, error logs, events in Decider. NOTHING :)
I tried reducing connection-timeout setting to 1s thinking that maybe it's waiting for response from the server but it didn't help.
What am I doing wrong ?
import akka.actor.ActorSystem
import akka.http.scaladsl.Http
import akka.http.scaladsl.model.headers.Referer
import akka.http.scaladsl.model.{HttpRequest, HttpResponse}
import akka.http.scaladsl.settings.ConnectionPoolSettings
import akka.stream.Supervision.Decider
import akka.stream.scaladsl.{Sink, Source}
import akka.stream.{ActorAttributes, Supervision}
import com.typesafe.config.ConfigFactory
import scala.collection.immutable.{Seq => imSeq}
import scala.concurrent.{Await, Future}
import scala.concurrent.duration.Duration
import scala.util.Try
object Main {
implicit val system = ActorSystem("root")
implicit val executor = system.dispatcher
val config = ConfigFactory.load()
private val baseDomain = "www.google.com"
private val poolClientFlow = Http()(system).cachedHostConnectionPool[Any](baseDomain, 80, ConnectionPoolSettings(config))
private val decider: Decider = {
case ex =>
ex.printStackTrace()
Supervision.Stop
}
private def sendMultipleRequests[T](items: Seq[(HttpRequest, T)]): Future[Seq[(Try[HttpResponse], T)]] =
Source.fromIterator(() => items.toIterator)
.via(poolClientFlow)
.log("Logger")(log = myAdapter)
.recoverWith {
case ex =>
println(ex)
null
}
.withAttributes(ActorAttributes.supervisionStrategy(decider))
.runWith(Sink.seq)
.map { v =>
println(s"Got ${v.length} responses in Flow")
v.asInstanceOf[Seq[(Try[HttpResponse], T)]]
}
def main(args: Array[String]) {
val headers = imSeq(Referer("https://www.google.com/"))
val reqPair = HttpRequest(uri = "/intl/en/policies/privacy").withHeaders(headers) -> "some req ID"
val requests = List.fill(10)(reqPair)
val qwe = sendMultipleRequests(requests).map { case responses =>
println(s"Got ${responses.length} responses")
system.terminate()
}
Await.ready(system.whenTerminated, Duration.Inf)
}
}
Also what's up with proxy support ? Doesn't seem to work for me either.

You need to consume the body of the response fully so that the connection is made available for subsequent requests. If you don't care about the response entity at all, then you can just drain it to a Sink.ignore, something like this:
resp.entity.dataBytes.runWith(Sink.ignore)
By the default config, when using a host connection pool, the max connections is set to 4. Each pool has it's own queue where requests wait until one of the open connections becomes available. If that queue ever goes over 32 (default config, can be changed, must be a power of 2) then yo will start seeing failures. In your case, you only do 10 requests, so you don't hit that limit. But by not consuming the response entity you don't free up the connection and everything else just queues in behind, waiting for the connections to free up.

Why future example do not work?

I am reading akkaScala documentation, there is an example (p. 171 bottom)
// imports added for compilation
import scala.concurrent.{ExecutionContext, Future}
import ExecutionContext.Implicits.global
class Some {
}
object Some {
def main(args: Array[String]) {
// Create a sequence of Futures
val futures = for (i <- 1 to 1000) yield Future(i * 2)
val futureSum = Future.fold(futures)(0)(_ + _)
futureSum foreach println
}
}
I run it, but nothing happened. I mean that nothing was in console output. What is wrong?

You don't wait for the future to complete, so you create a race between the program exiting and the futures completing and the side-effect running. On your machine, the future seems to lose the race, on the commenters' who say "it works", the future is winning the race.
You can use Await to block on a future and wait for it to complete. This is something you should only be doing "at the ends of the world", you should very rarely actually be using Await...
// imports added for compilation
import scala.concurrent.{ExecutionContext, Future}
import ExecutionContext.Implicits.global
import scala.concurrent.duration._ // for the "1 second" syntax
import scala.concurrent.Await
class Some {
}
object Some {
def main(args: Array[String]) {
// Create a sequence of Futures
val futures = for (i <- 1 to 1000) yield Future(i * 2)
val futureSum = Future.fold(futures)(0)(_ + _)
// we map instead of foreach, to make sure that the side-effect is part of the future
// and we "await" for the future to complete (for 1 second)
Await.result(futureSum map println, 1 second)
}
}

As others have stated, the issue is the race condition where the futures are competing with the program terminating. The JVM has a concept of daemon threads. It waits for non-daemon threads to terminate but not daemon threads. So if you want to wait for threads to complete, use non-daemon threads.
The way threads are created for scala futures is using an implicit scala.concurrent.ExecutionContext. The one you use (import ExecutionContext.Implicits.global) starts daemon threads. However, it is possible to use non-daemon threads. So if you use an ExecutionContext with non-daemon threads, it will wait, which in your case is reasonable behaviour. Naively:
import scala.concurrent.Future
import scala.concurrent.ExecutionContextExecutor
import scala.concurrent.ExecutionContext
class MyExecutionContext extends ExecutionContext {
override def execute(runnable:Runnable) = {
val t = new Thread(runnable)
t.setDaemon(false)
t.start()
}
override def reportFailure(t:Throwable) = t.printStackTrace
}
object Some {
implicit lazy val context: ExecutionContext = new MyExecutionContext
def main(args: Array[String]) {
// Create a sequence of Futures
val futures = for (i <- 1 to 1000) yield Future(i * 2)
val futureSum = Future.fold(futures)(0)(_ + _)
futureSum foreach println
}
}
Careful with using the above ExecutionContext in production because it doesn't use a thread pool and can create unbounded threads, but the message is: you can control everything about the threads behind Futures through an ExecutionContext. Explore the various scala and akka contexts to find what you need, or if nothing suits, write your own.

Both of the following statement at the end of main function would help your need. As the above answers said, allow the future to complete. Main thread is different from the Future thread, as main completes, it terminates before Future thread.
Thread.sleep(500) //... Simple solution
Await.result(futureSum, Duration(500, MILLISECONDS)) //...have to import scala.concurrent.duration._ to use Duration object.

Multiple Scala actors servicing one task

I need to process multiple data values in parallel ("SIMD"). I can use the java.util.concurrent APIs (Executors.newFixedThreadPool()) to process several values in parallels using Future instances:
import java.util.concurrent.{Executors, Callable}
class ExecutorsTest {
private class Process(value: Int)
extends Callable[Int] {
def call(): Int = {
// Do some time-consuming task
value
}
}
val executorService = {
val threads = Runtime.getRuntime.availableProcessors
Executors.newFixedThreadPool(threads)
}
val processes = for (process <- 1 to 1000) yield new Process(process)
val futures = executorService.invokeAll(processes)
// Wait for futures
}
How do I do the same thing using Actors? I do not believe that I want to "feed" all of the processes to a single actor because the actor will then execute them sequentially.
Do I need to create multiple "processor" actors with a "dispatcher" actor that sends an equal number of processes to each "processor" actor?

If you just want fire-and-forget processing, why not use Scala futures?
import scala.actors.Futures._
def example = {
val answers = (1 to 4).map(x => future {
Thread.sleep(x*1000)
println("Slept for "+x)
x
})
val t0 = System.nanoTime
awaitAll(1000000,answers: _*) // Number is timeout in ms
val t1 = System.nanoTime
printf("%.3f seconds elapsed\n",(t1-t0)*1e-9)
answers.map(_()).sum
}
scala> example
Slept for 1
Slept for 2
Slept for 3
Slept for 4
4.000 seconds elapsed
res1: Int = 10
Basically, all you do is put the code you want inside a future { } block, and it will immediately return a future; apply it to get the answer (it will block until done), or use awaitAll with a timeout to wait until everyone is done.
Update: As of 2.11, the way to do this is with scala.concurrent.Future. A translation of the above code is:
import scala.concurrent._
import duration._
import ExecutionContext.Implicits.global
def example = {
val answers = Future.sequence(
(1 to 4).map(x => Future {
Thread.sleep(x*1000)
println("Slept for "+x)
x
})
)
val t0 = System.nanoTime
val completed = Await.result(answers, Duration(1000, SECONDS))
val t1 = System.nanoTime
printf("%.3f seconds elapsed\n",(t1-t0)*1e-9)
completed.sum
}

If you can use Akka, take a look at the ActorPool support: http://doc.akka.io/routing-scala
It lets you specify parameters about how many actors you want running in parallel and then dispatches work to those actors.