How to represent multiple incoming TCP connections as a stream of Akka streams? - scala

I'm prototyping a network server using Akka Streams that will listen on a port, accept incoming connections, and continuously read data off each connection. Each connected client will only send data, and will not expect to receive anything useful from the server.
Conceptually, I figured it would be fitting to model the incoming events as one single stream that only incidentally happens to be delivered via multiple TCP connections. Thus, assuming that I have a case class Msg(msg: String) that represents each data message, what I want is to represent the entirety of incoming data as a Source[Msg, _]. This makes a lot of sense for my use case, because I can very simply connect flows & sinks to this source.
Here's the code I wrote to implement my idea:
import akka.actor.ActorSystem
import akka.stream.ActorMaterializer
import akka.stream.SourceShape
import akka.stream.scaladsl._
import akka.util.ByteString
import akka.NotUsed
import scala.concurrent.{ Await, Future }
import scala.concurrent.duration._
case class Msg(msg: String)
object tcp {
val N = 2
def main(argv: Array[String]) {
implicit val system = ActorSystem()
implicit val materializer = ActorMaterializer()
val connections = Tcp().bind("0.0.0.0", 65432)
val delim = Framing.delimiter(
ByteString("\n"),
maximumFrameLength = 256, allowTruncation = true
)
val parser = Flow[ByteString].via(delim).map(_.utf8String).map(Msg(_))
val messages: Source[Msg, Future[Tcp.ServerBinding]] =
connections.flatMapMerge(N, {
connection =>
println(s"client connected: ${connection.remoteAddress}")
Source.fromGraph(GraphDSL.create() { implicit builder =>
import GraphDSL.Implicits._
val F = builder.add(connection.flow.via(parser))
val nothing = builder.add(Source.tick(
initialDelay = 1.second,
interval = 1.second,
tick = ByteString.empty
))
F.in <~ nothing.out
SourceShape(F.out)
})
})
import scala.concurrent.ExecutionContext.Implicits.global
Await.ready(for {
_ <- messages.runWith(Sink.foreach {
msg => println(s"${System.currentTimeMillis} $msg")
})
_ <- system.terminate()
} yield (), Duration.Inf)
}
}
This code works as expected, however, note the val N = 2, which is passed into the flatMapMerge call that ultimately combines the incoming data streams into one. In practice this means that I can only read from that many streams at a time.
I don't know how many connections will be made to this server at any given time. Ideally I would want to support as many as possible, but hardcoding an upper bound doesn't seem like the right thing to do.
My question, at long last, is: How can I obtain or create a flatMapMerge stage that can read from more than a fixed number of connections at one time?

As indicated by Viktor Klang's comments I don't think this is possible in 1 stream. However, I think it would be possible to create a stream that can receive messages after materialization and use that as a "sink" for messages coming from the TCP connections.
First create the "sink" stream:
val sinkRef =
Source
.actorRef[Msg](Int.MaxValue, fail)
.to(Sink foreach {m => println(s"${System.currentTimeMillis} $m")})
.run()
This sinkRef can be used by each Connection to receive the messages:
connections foreach { conn =>
Source
.empty[ByteString]
.via(conn.flow)
.via(parser)
.runForeach(msg => sinkRef ! msg)
}

Related

Fetch size in PGConnection.getNotifications

A function in my postgresql database sends a notification when a table is updated.
I'm polling that postgresql database by scalikejdbc, to get all the notifications, and then, do something with them.
The process is explained here . A typical reactive system to sql tables updates.
I get the PGConnection from the java.sql.Connection. And, after that, I get the notifications in this way:
val notifications = Option(pgConnection.getNotifications).getOrElse(Array[PGNotification]())
I'm trying to get the notifications in chunks of 1000 by setting the fetch size to 1000, and disabling the auto commit. But fetch size property is ignored.
Any ideas how I could do that?
I wouldn't want to handle hundreds of thousands of notifications in a single map over my notifications dataset.
pgConnection.getNotifications.size could be huge, and therefore, this code wouldn't scale well.
Thanks!!!
To better scale, consider using postgresql-async and Akka Streams: the former is a library that can obtain PostgreSQL notifications asynchronously, and the former is a Reactive Streams implementation that provides backpressure (which would obviate the need for paging). For example:
import akka.actor._
import akka.stream._
import akka.stream.scaladsl._
import com.github.mauricio.async.db.postgresql.PostgreSQLConnection
import com.github.mauricio.async.db.postgresql.util.URLParser
import scala.concurrent.duration._
import scala.concurrent.Await
class DbActor(implicit materializer: ActorMaterializer) extends Actor with ActorLogging {
private implicit val ec = context.system.dispatcher
val queue =
Source.queue[String](Int.MaxValue, OverflowStrategy.backpressure)
.to(Sink.foreach(println))
.run()
val configuration = URLParser.parse("jdbc:postgresql://localhost:5233/my_db?user=dbuser&password=pwd")
val connection = new PostgreSQLConnection(configuration)
Await.result(connection.connect, 5 seconds)
connection.sendQuery("LISTEN my_channel")
connection.registerNotifyListener { message =>
val msg = message.payload
log.debug("Sending the payload: {}", msg)
self ! msg
}
def receive = {
case payload: String =>
queue.offer(payload).pipeTo(self)
case QueueOfferResult.Dropped =>
log.warning("Dropped a message.")
case QueueOfferResult.Enqueued =>
log.debug("Enqueued a message.")
case QueueOfferResult.Failure(t) =>
log.error("Stream failed: {}", t.getMessage)
case QueueOfferResult.QueueClosed =>
log.debug("Stream closed.")
}
}
The code above simply prints notifications from PostgreSQL as they occur; you can replace the Sink.foreach(println) with another Sink. To run it:
import akka.actor._
import akka.stream.ActorMaterializer
object Example extends App {
implicit val system = ActorSystem()
implicit val materializer = ActorMaterializer()
system.actorOf(Props(classOf[DbActor], materializer))
}

akka-http queries do not run in parallel

I am very new to akka-http and have troubles to run queries on the same route in parallel.
I have a route that may return the result very quickly (if cached) or not (heavy CPU multithreaded computations). I would like to run these queries in parallel, in case a short one arrives after a long one with heavy computation, I do not want the second call to wait for the first to finish.
However it seems that these queries do not run in parallel if they are on the same route (run in parallel if on different routes)
I can reproduice it in a basic project:
Calling the server 3 time in parallel (with 3 Chrome's tab on http://localhost:8080/test) causes the responses to arrive respectively at 3.0s, 6.0-s and 9.0-s. I suppose queries do not run in parallel.
Running on a 6 cores (with HT) machine on Windows 10 with jdk 8.
build.sbt
name := "akka-http-test"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies += "com.typesafe.akka" %% "akka-http-experimental" % "2.4.11"
*AkkaHttpTest.scala**
import java.util.concurrent.Executors
import akka.actor.ActorSystem
import akka.http.scaladsl.Http
import akka.http.scaladsl.server.Directives._
import akka.stream.ActorMaterializer
import scala.concurrent.{ExecutionContext, Future}
object AkkaHttpTest extends App {
implicit val actorSystem = ActorSystem("system") // no application.conf here
implicit val executionContext =
ExecutionContext.fromExecutor(Executors.newFixedThreadPool(6))
implicit val actorMaterializer = ActorMaterializer()
val route = path("test") {
onComplete(slowFunc()) { slowFuncResult =>
complete(slowFuncResult)
}
}
def slowFunc()(implicit ec: ExecutionContext): Future[String] = Future {
Thread.sleep(3000)
"Waited 3s"
}
Http().bindAndHandle(route, "localhost", 8080)
println("server started")
}
What am I doing wrong here ?
Thanks for your help
EDIT: Thanks to #Ramon J Romero y Vigil, I added Future Wrapping, but the problem still persists
def slowFunc()(implicit ec : ExecutionContext) : Future[String] = Future {
Thread.sleep(3000)
"Waited 3.0s"
}
val route = path("test") {
onComplete(slowFunc()) { slowFuncResult =>
complete(slowFuncResult)
}
}
Tries with a the default Thread pool, the one defined above in the config file, and a Fixed Thread Pool (6 Threads).
It seems that the onComplete directive still waits for the future to complete and then block the Route (with same connection).
Same problem with the Flow trick
import akka.stream.scaladsl.Flow
val parallelism = 10
val reqFlow =
Flow[HttpRequest].filter(_.getUri().path().equalsIgnoreCase("/test"))
.mapAsync(parallelism)(_ => slowFunc())
.map(str => HttpResponse(status=StatusCodes.Ok, entity=str))
Http().bindAndHandle(reqFlow, ...)
Thanks for your help
Each IncomingConnection is handled by the same Route, therefore when you "call the server 3 times in parallel" you are likely using the same Connection and therefore the same Route.
The Route is handling all 3 incoming HttpRequest values in an akka-stream fashion, i.e. the Route is composed of multiple stages but each stage can only processes 1 element at any given time. In your example the "complete" stage of the stream will call Thread.sleep for each incoming Request and process each Request one-at-a-time.
To get multiple concurrent requests handled at the same time you should establish a unique connection for each request.
An example of the client side connection pool can be created similar to the documentation examples:
import akka.http.scaladsl.Http
val connPoolFlow = Http().newHostConnectionPool("localhost", 8080)
This can then be integrated into a stream that makes the requests:
import akka.http.scaladsl.model.Uri._
import akka.http.scaladsl.model.HttpRequest
val request = HttpRequest(uri="/test")
import akka.stream.scaladsl.Source
val reqStream =
Source.fromIterator(() => Iterator.continually(request).take(3))
.via(connPoolFlow)
.via(Flow.mapAsync(3)(identity))
.to(Sink foreach { resp => println(resp)})
.run()
Route Modification
If you want each HttpRequest to be processed in parallel then you can use the same Route to do so but you must spawn off Futures inside of the Route and use the onComplete directive:
def slowFunc()(implicit ec : ExecutionContext) : Future[String] = Future {
Thread.sleep(1500)
"Waited 1.5s"
}
val route = path("test") {
onComplete(slowFunc()) { slowFuncResult =>
complete(slowFuncResult)
}
}
One thing to be aware of: if you don't specify a different ExecutionContext for your sleeping function then the same thread pool for routes will be used for your sleeping. You may exhaust the available threads this way. You should probably use a seperate ec for your sleeping...
Flow Based
One other way to handle the HttpRequests is with a stream Flow:
import akka.stream.scaladsl.Flow
val parallelism = 10
val reqFlow =
Flow[HttpRequest].filter(_.getUri().path().equalsIgnoreCase("/test"))
.mapAsync(parallelism)(_ => slowFunc())
.map(str => HttpResponse(status=StatusCodes.Ok, entity=str))
Http().bindAndHandle(reqFlow, ...)
In case this is still relevant, or for future readers, the answer is inside Http().bindAndHandle documentation:
/**
* Convenience method which starts a new HTTP server...
* ...
* The number of concurrently accepted connections can be configured by overriding
* the `akka.http.server.max-connections` setting....
* ...
*/
def bindAndHandle(...
use akka.http.server.max-connections setting for number of concurrent connections.

Consume TCP stream and redirect it to another Sink (with Akka Streams)

I try to redirect/forward a TCP stream to another Sink with Akka 2.4.3.
The program should open a server socket, listen for incoming connections and then consume the tcp stream. Our sender does not expect/accept replies from us so we never send back anything - we just consume the stream. After framing the tcp stream we need to transform the bytes into something more useful and send it to the Sink.
I tried the following so far but i struggle especially with the part how to not sending tcp packets back to the sender and to properly connect the Sink.
import scala.util.Failure
import scala.util.Success
import akka.actor.ActorSystem
import akka.event.Logging
import akka.stream.ActorMaterializer
import akka.stream.scaladsl.Sink
import akka.stream.scaladsl.Tcp
import akka.stream.scaladsl.Framing
import akka.util.ByteString
import java.nio.ByteOrder
import akka.stream.scaladsl.Flow
object TcpConsumeOnlyStreamToSink {
implicit val system = ActorSystem("stream-system")
private val log = Logging(system, getClass.getName)
//The Sink
//In reality this is of course a real Sink doing some useful things :-)
//The Sink accept types of "SomethingMySinkUnderstand"
val mySink = Sink.ignore;
def main(args: Array[String]): Unit = {
//our sender is not interested in getting replies from us
//so we just want to consume the tcp stream and never send back anything to the sender
val (address, port) = ("127.0.0.1", 6000)
server(system, address, port)
}
def server(system: ActorSystem, address: String, port: Int): Unit = {
implicit val sys = system
import system.dispatcher
implicit val materializer = ActorMaterializer()
val handler = Sink.foreach[Tcp.IncomingConnection] { conn =>
println("Client connected from: " + conn.remoteAddress)
conn handleWith Flow[ByteString]
//this is neccessary since we use a self developed tcp wire protocol
.via(Framing.lengthField(4, 0, 65532, ByteOrder.BIG_ENDIAN))
//here we want to map the raw bytes into something our Sink understands
.map(msg => new SomethingMySinkUnderstand(msg.utf8String))
//here we like to connect our Sink to the Tcp Source
.to(mySink) //<------ NOT COMPILING
}
val tcpSource = Tcp().bind(address, port)
val binding = tcpSource.to(handler).run()
binding.onComplete {
case Success(b) =>
println("Server started, listening on: " + b.localAddress)
case Failure(e) =>
println(s"Server could not bind to $address:$port: ${e.getMessage}")
system.terminate()
}
}
class SomethingMySinkUnderstand(x:String) {
}
}
Update: Add this to your build.sbt file to get necessary deps
libraryDependencies += "com.typesafe.akka" % "akka-stream_2.11" % "2.4.3"
handleWith expects a Flow, i.e. a box with an unconnected inlet and an unconnected outlet. You effectively provide a Source, because you connected the Flow with a Sink by using the to operation.
I think you could do the following:
conn.handleWith(
Flow[ByteString]
.via(Framing.lengthField(4, 0, 65532, ByteOrder.BIG_ENDIAN))
.map(msg => new SomethingMySinkUnderstand(msg.utf8String))
.alsoTo(mySink)
.map(_ => ByteString.empty)
.filter(_ => false) // Prevents sending anything back
)
Alternative (and in my view cleaner) way to code it (AKKA 2.6.x), that will also emphasize of the fact, that you do not do any outbound flow, would be:
val receivingPipeline = Flow
.via(framing)
.via(decoder)
.to(mySink)
val sendingNothing = Source.never[ByteString]()
conn.handleWith(Flow.fromSinkAndSourceCoupled(receivingPiline, sendingNothing))

Akka Flow hangs when making http requests via connection pool

I'm using Akka 2.4.4 and trying to move from Apache HttpAsyncClient (unsuccessfully).
Below is simplified version of code that I use in my project.
The problem is that it hangs if I send more than 1-3 requests to the flow. So far after 6 hours of debugging I couldn't even locate the problem. I don't see exceptions, error logs, events in Decider. NOTHING :)
I tried reducing connection-timeout setting to 1s thinking that maybe it's waiting for response from the server but it didn't help.
What am I doing wrong ?
import akka.actor.ActorSystem
import akka.http.scaladsl.Http
import akka.http.scaladsl.model.headers.Referer
import akka.http.scaladsl.model.{HttpRequest, HttpResponse}
import akka.http.scaladsl.settings.ConnectionPoolSettings
import akka.stream.Supervision.Decider
import akka.stream.scaladsl.{Sink, Source}
import akka.stream.{ActorAttributes, Supervision}
import com.typesafe.config.ConfigFactory
import scala.collection.immutable.{Seq => imSeq}
import scala.concurrent.{Await, Future}
import scala.concurrent.duration.Duration
import scala.util.Try
object Main {
implicit val system = ActorSystem("root")
implicit val executor = system.dispatcher
val config = ConfigFactory.load()
private val baseDomain = "www.google.com"
private val poolClientFlow = Http()(system).cachedHostConnectionPool[Any](baseDomain, 80, ConnectionPoolSettings(config))
private val decider: Decider = {
case ex =>
ex.printStackTrace()
Supervision.Stop
}
private def sendMultipleRequests[T](items: Seq[(HttpRequest, T)]): Future[Seq[(Try[HttpResponse], T)]] =
Source.fromIterator(() => items.toIterator)
.via(poolClientFlow)
.log("Logger")(log = myAdapter)
.recoverWith {
case ex =>
println(ex)
null
}
.withAttributes(ActorAttributes.supervisionStrategy(decider))
.runWith(Sink.seq)
.map { v =>
println(s"Got ${v.length} responses in Flow")
v.asInstanceOf[Seq[(Try[HttpResponse], T)]]
}
def main(args: Array[String]) {
val headers = imSeq(Referer("https://www.google.com/"))
val reqPair = HttpRequest(uri = "/intl/en/policies/privacy").withHeaders(headers) -> "some req ID"
val requests = List.fill(10)(reqPair)
val qwe = sendMultipleRequests(requests).map { case responses =>
println(s"Got ${responses.length} responses")
system.terminate()
}
Await.ready(system.whenTerminated, Duration.Inf)
}
}
Also what's up with proxy support ? Doesn't seem to work for me either.
You need to consume the body of the response fully so that the connection is made available for subsequent requests. If you don't care about the response entity at all, then you can just drain it to a Sink.ignore, something like this:
resp.entity.dataBytes.runWith(Sink.ignore)
By the default config, when using a host connection pool, the max connections is set to 4. Each pool has it's own queue where requests wait until one of the open connections becomes available. If that queue ever goes over 32 (default config, can be changed, must be a power of 2) then yo will start seeing failures. In your case, you only do 10 requests, so you don't hit that limit. But by not consuming the response entity you don't free up the connection and everything else just queues in behind, waiting for the connections to free up.

akka streams over tcp

Here is the setup: I want to be able to stream messages (jsons converted to bytestrings) from a publisher to a remote server subscriber over a tcp connection.
Ideally, the publisher would be an actor that would receive internal messages, queue them and then stream them to the subscriber server if there is outstanding demand of course. I understood that what is necessary for this is to extend ActorPublisher class in order to onNext() the messages when needed.
My problem is that so far I am able just to send (receive and decode properly) one shot messages to the server opening a new connection each time. I did not manage to get my head around the akka doc and be able to set the proper tcp Flow with the ActorPublisher.
Here is the code from the publisher:
def send(message: Message): Unit = {
val system = Akka.system()
implicit val sys = system
import system.dispatcher
implicit val materializer = ActorMaterializer()
val address = Play.current.configuration.getString("eventservice.location").getOrElse("localhost")
val port = Play.current.configuration.getInt("eventservice.port").getOrElse(9000)
/*** Try with actorPublisher ***/
//val result = Source.actorPublisher[Message] (Props[EventActor]).via(Flow[Message].map(Json.toJson(_).toString.map(ByteString(_))))
/*** Try with actorRef ***/
/*val source = Source.actorRef[Message](0, OverflowStrategy.fail).map(
m => {
Logger.info(s"Sending message: ${m.toString}")
ByteString(Json.toJson(m).toString)
}
)
val ref = Flow[ByteString].via(Tcp().outgoingConnection(address, port)).to(Sink.ignore).runWith(source)*/
val result = Source(Json.toJson(message).toString.map(ByteString(_))).
via(Tcp().outgoingConnection(address, port)).
runFold(ByteString.empty) { (acc, in) ⇒ acc ++ in }//Handle the future
}
and the code from the actor which is quite standard in the end:
import akka.actor.Actor
import akka.stream.actor.ActorSubscriberMessage.{OnComplete, OnError}
import akka.stream.actor.{ActorPublisherMessage, ActorPublisher}
import models.events.Message
import play.api.Logger
import scala.collection.mutable
class EventActor extends Actor with ActorPublisher[Message] {
import ActorPublisherMessage._
var queue: mutable.Queue[Message] = mutable.Queue.empty
def receive = {
case m: Message =>
Logger.info(s"EventActor - message received and queued: ${m.toString}")
queue.enqueue(m)
publish()
case Request => publish()
case Cancel =>
Logger.info("EventActor - cancel message received")
context.stop(self)
case OnError(err: Exception) =>
Logger.info("EventActor - error message received")
onError(err)
context.stop(self)
case OnComplete =>
Logger.info("EventActor - onComplete message received")
onComplete()
context.stop(self)
}
def publish() = {
while (queue.nonEmpty && isActive && totalDemand > 0) {
Logger.info("EventActor - message published")
onNext(queue.dequeue())
}
}
I can provide the code from the subscriber if necessary:
def connect(system: ActorSystem, address: String, port: Int): Unit = {
implicit val sys = system
import system.dispatcher
implicit val materializer = ActorMaterializer()
val handler = Sink.foreach[Tcp.IncomingConnection] { conn =>
Logger.info("Event server connected to: " + conn.remoteAddress)
// Get the ByteString flow and reconstruct the msg for handling and then output it back
// that is how handleWith work apparently
conn.handleWith(
Flow[ByteString].fold(ByteString.empty)((acc, b) => acc ++ b).
map(b => handleIncomingMessages(system, b.utf8String)).
map(ByteString(_))
)
}
val connections = Tcp().bind(address, port)
val binding = connections.to(handler).run()
binding.onComplete {
case Success(b) =>
Logger.info("Event server started, listening on: " + b.localAddress)
case Failure(e) =>
Logger.info(s"Event server could not bind to $address:$port: ${e.getMessage}")
system.terminate()
}
}
thanks in advance for the hints.
My first recommendation is to not write your own queue logic. Akka provides this out-of-the-box. You also don't need to write your own Actor, Akka Streams can provide it as well.
First we can create the Flow that will connect your publisher to your subscriber via Tcp. In your publisher code you only need to create the ActorSystem once and connect to the outside server once:
//this code is at top level of your application
implicit val actorSystem = ActorSystem()
implicit val actorMaterializer = ActorMaterializer()
import actorSystem.dispatcher
val host = Play.current.configuration.getString("eventservice.location").getOrElse("localhost")
val port = Play.current.configuration.getInt("eventservice.port").getOrElse(9000)
val publishFlow = Tcp().outgoingConnection(host, port)
publishFlow is a Flow that will input ByteString data that you want to send to the external subscriber and outputs ByteString data that comes from subscriber:
// data to subscriber ----> publishFlow ----> data returned from subscriber
The next step is the publisher Source. Instead of writing your own Actor you can use Source.actorRef to "materialize" the Stream into an ActorRef. Essentially the Stream will become an ActorRef for us to use later:
//these values control the buffer
val bufferSize = 1024
val overflowStrategy = akka.stream.OverflowStrategy.dropHead
val messageSource = Source.actorRef[Message](bufferSize, overflowStrategy)
We also need a Flow to convert Messages into ByteString
val marshalFlow =
Flow[Message].map(message => ByteString(Json.toJson(message).toString))
Finally we can connect all of the pieces. Since you aren't receiving any data back from the external subscriber we'll ignore any data coming in from the connection:
val subscriberRef : ActorRef = messageSource.via(marshalFlow)
.via(publishFlow)
.runWith(Sink.ignore)
We can now treat this stream as if it were an Actor:
val message1 : Message = ???
subscriberRef ! message1
val message2 : Message = ???
subscriberRef ! message2