Consume TCP stream and redirect it to another Sink (with Akka Streams) - scala

I try to redirect/forward a TCP stream to another Sink with Akka 2.4.3.
The program should open a server socket, listen for incoming connections and then consume the tcp stream. Our sender does not expect/accept replies from us so we never send back anything - we just consume the stream. After framing the tcp stream we need to transform the bytes into something more useful and send it to the Sink.
I tried the following so far but i struggle especially with the part how to not sending tcp packets back to the sender and to properly connect the Sink.
import scala.util.Failure
import scala.util.Success
import akka.actor.ActorSystem
import akka.event.Logging
import akka.stream.ActorMaterializer
import akka.stream.scaladsl.Sink
import akka.stream.scaladsl.Tcp
import akka.stream.scaladsl.Framing
import akka.util.ByteString
import java.nio.ByteOrder
import akka.stream.scaladsl.Flow
object TcpConsumeOnlyStreamToSink {
implicit val system = ActorSystem("stream-system")
private val log = Logging(system, getClass.getName)
//The Sink
//In reality this is of course a real Sink doing some useful things :-)
//The Sink accept types of "SomethingMySinkUnderstand"
val mySink = Sink.ignore;
def main(args: Array[String]): Unit = {
//our sender is not interested in getting replies from us
//so we just want to consume the tcp stream and never send back anything to the sender
val (address, port) = ("127.0.0.1", 6000)
server(system, address, port)
}
def server(system: ActorSystem, address: String, port: Int): Unit = {
implicit val sys = system
import system.dispatcher
implicit val materializer = ActorMaterializer()
val handler = Sink.foreach[Tcp.IncomingConnection] { conn =>
println("Client connected from: " + conn.remoteAddress)
conn handleWith Flow[ByteString]
//this is neccessary since we use a self developed tcp wire protocol
.via(Framing.lengthField(4, 0, 65532, ByteOrder.BIG_ENDIAN))
//here we want to map the raw bytes into something our Sink understands
.map(msg => new SomethingMySinkUnderstand(msg.utf8String))
//here we like to connect our Sink to the Tcp Source
.to(mySink) //<------ NOT COMPILING
}
val tcpSource = Tcp().bind(address, port)
val binding = tcpSource.to(handler).run()
binding.onComplete {
case Success(b) =>
println("Server started, listening on: " + b.localAddress)
case Failure(e) =>
println(s"Server could not bind to $address:$port: ${e.getMessage}")
system.terminate()
}
}
class SomethingMySinkUnderstand(x:String) {
}
}
Update: Add this to your build.sbt file to get necessary deps
libraryDependencies += "com.typesafe.akka" % "akka-stream_2.11" % "2.4.3"

handleWith expects a Flow, i.e. a box with an unconnected inlet and an unconnected outlet. You effectively provide a Source, because you connected the Flow with a Sink by using the to operation.
I think you could do the following:
conn.handleWith(
Flow[ByteString]
.via(Framing.lengthField(4, 0, 65532, ByteOrder.BIG_ENDIAN))
.map(msg => new SomethingMySinkUnderstand(msg.utf8String))
.alsoTo(mySink)
.map(_ => ByteString.empty)
.filter(_ => false) // Prevents sending anything back
)

Alternative (and in my view cleaner) way to code it (AKKA 2.6.x), that will also emphasize of the fact, that you do not do any outbound flow, would be:
val receivingPipeline = Flow
.via(framing)
.via(decoder)
.to(mySink)
val sendingNothing = Source.never[ByteString]()
conn.handleWith(Flow.fromSinkAndSourceCoupled(receivingPiline, sendingNothing))

Related

how to read tcp stream using scala

I have a java jar that generates the tcp stream on particular port.
I cam run the using java command like java -jar runner.jar and this start producing the stream of message on port 8888.
When I do nc -l 8888 I can see the messages.
I want to read this stream using scala and another framework or tool like akka, akka-stream.
Can anyone help me understand the best tool, framework or any other technique to read this tcp stream.
I tried using akka stream with following code :-
implicit val system = ActorSystem()
implicit val mater = ActorMaterializer() val ss = Tcp().outgoingConnection("127.0.0.1", 8888)
.to(Sink.foreach(println(_)))
Source.empty.to(ss).run()
I also tried
Tcp().outgoingConnection(new InetSocketAddress("127.0.0.1", 8888))
.runWith(Source.maybe[ByteString], Sink.foreach(bs => println(bs.utf8String)))
This doesn't work.
I only need to read the messages and process on my own.
Thanks
As I understand you want to setup TCP server, here is example of TCP Echo using akka streams
def server(system: ActorSystem, address: String, port: Int): Unit = {
implicit val sys = system
import system.dispatcher
implicit val materializer = ActorMaterializer()
val handler = Sink.foreach[Tcp.IncomingConnection] { conn =>
println("Client connected from: " + conn.remoteAddress)
conn handleWith Flow[ByteString]
}
val connections = Tcp().bind(address, port)
val binding = connections.to(handler).run()
binding.onComplete {
case Success(b) =>
println("Server started, listening on: " + b.localAddress)
case Failure(e) =>
println(s"Server could not bind to $address:$port: ${e.getMessage}")
system.terminate()
}
}

Does composite flow make loop?

I am trying to understand, how the following code snippet works:
val flow: Flow[Message, Message, Future[Done]] =
Flow.fromSinkAndSourceMat(printSink, helloSource)(Keep.left)
Two guys gave a very wonderful explanation on this thread. I understand the concept of the Composite flow, but how does it work on the websocket client.
Consider the following code:
import akka.actor.ActorSystem
import akka.{ Done, NotUsed }
import akka.http.scaladsl.Http
import akka.stream.ActorMaterializer
import akka.stream.scaladsl._
import akka.http.scaladsl.model._
import akka.http.scaladsl.model.ws._
import scala.concurrent.Future
object SingleWebSocketRequest {
def main(args: Array[String]) = {
implicit val system = ActorSystem()
implicit val materializer = ActorMaterializer()
import system.dispatcher
// print each incoming strict text message
val printSink: Sink[Message, Future[Done]] =
Sink.foreach {
case message: TextMessage.Strict =>
println(message.text)
}
val helloSource: Source[Message, NotUsed] =
Source.single(TextMessage("hello world!"))
// the Future[Done] is the materialized value of Sink.foreach
// and it is completed when the stream completes
val flow: Flow[Message, Message, Future[Done]] =
Flow.fromSinkAndSourceMat(printSink, helloSource)(Keep.left)
// upgradeResponse is a Future[WebSocketUpgradeResponse] that
// completes or fails when the connection succeeds or fails
// and closed is a Future[Done] representing the stream completion from above
val (upgradeResponse, closed) =
Http().singleWebSocketRequest(WebSocketRequest("ws://echo.websocket.org"), flow)
val connected = upgradeResponse.map { upgrade =>
// just like a regular http request we can access response status which is available via upgrade.response.status
// status code 101 (Switching Protocols) indicates that server support WebSockets
if (upgrade.response.status == StatusCodes.SwitchingProtocols) {
Done
} else {
throw new RuntimeException(s"Connection failed: ${upgrade.response.status}")
}
}
// in a real application you would not side effect here
// and handle errors more carefully
connected.onComplete(println)
closed.foreach(_ => println("closed"))
}
}
It is a websocket client, that send a message to the websocket server and the printSink receives it and print it out.
How can it be, that printSink receives messages, there is no a connection between the Sink and Source.
Is it like a loop?
Stream flow is from left to right, how it comes that the Sink can consume messages from websocket server?
Flow.fromSinkAndSourceMat puts an independent Sink and a Source to a shape of the Flow. Elements going into that Sink do not end up at the Source.
From the Websocket client API perspective, it needs a Source from which requests will be sent to the server and a Sink that it will send the responses to. The singleWebSocketRequest could take a Source and a Sink separately, but that would be a bit more verbose API.
Here is a shorter example that demonstrates the same as in your code snippet but is runnable, so you can play around with it:
import akka._
import akka.actor._
import akka.stream._
import akka.stream.scaladsl._
implicit val sys = ActorSystem()
implicit val mat = ActorMaterializer()
def openConnection(userFlow: Flow[String, String, NotUsed])(implicit mat: Materializer) = {
val processor = Flow[String].map(_.toUpperCase)
processor.join(userFlow).run()
}
val requests = Source(List("one", "two", "three"))
val responses = Sink.foreach(println)
val userFlow = Flow.fromSinkAndSource(responses, requests)
openConnection(userFlow)

Fetch size in PGConnection.getNotifications

A function in my postgresql database sends a notification when a table is updated.
I'm polling that postgresql database by scalikejdbc, to get all the notifications, and then, do something with them.
The process is explained here . A typical reactive system to sql tables updates.
I get the PGConnection from the java.sql.Connection. And, after that, I get the notifications in this way:
val notifications = Option(pgConnection.getNotifications).getOrElse(Array[PGNotification]())
I'm trying to get the notifications in chunks of 1000 by setting the fetch size to 1000, and disabling the auto commit. But fetch size property is ignored.
Any ideas how I could do that?
I wouldn't want to handle hundreds of thousands of notifications in a single map over my notifications dataset.
pgConnection.getNotifications.size could be huge, and therefore, this code wouldn't scale well.
Thanks!!!
To better scale, consider using postgresql-async and Akka Streams: the former is a library that can obtain PostgreSQL notifications asynchronously, and the former is a Reactive Streams implementation that provides backpressure (which would obviate the need for paging). For example:
import akka.actor._
import akka.stream._
import akka.stream.scaladsl._
import com.github.mauricio.async.db.postgresql.PostgreSQLConnection
import com.github.mauricio.async.db.postgresql.util.URLParser
import scala.concurrent.duration._
import scala.concurrent.Await
class DbActor(implicit materializer: ActorMaterializer) extends Actor with ActorLogging {
private implicit val ec = context.system.dispatcher
val queue =
Source.queue[String](Int.MaxValue, OverflowStrategy.backpressure)
.to(Sink.foreach(println))
.run()
val configuration = URLParser.parse("jdbc:postgresql://localhost:5233/my_db?user=dbuser&password=pwd")
val connection = new PostgreSQLConnection(configuration)
Await.result(connection.connect, 5 seconds)
connection.sendQuery("LISTEN my_channel")
connection.registerNotifyListener { message =>
val msg = message.payload
log.debug("Sending the payload: {}", msg)
self ! msg
}
def receive = {
case payload: String =>
queue.offer(payload).pipeTo(self)
case QueueOfferResult.Dropped =>
log.warning("Dropped a message.")
case QueueOfferResult.Enqueued =>
log.debug("Enqueued a message.")
case QueueOfferResult.Failure(t) =>
log.error("Stream failed: {}", t.getMessage)
case QueueOfferResult.QueueClosed =>
log.debug("Stream closed.")
}
}
The code above simply prints notifications from PostgreSQL as they occur; you can replace the Sink.foreach(println) with another Sink. To run it:
import akka.actor._
import akka.stream.ActorMaterializer
object Example extends App {
implicit val system = ActorSystem()
implicit val materializer = ActorMaterializer()
system.actorOf(Props(classOf[DbActor], materializer))
}

How to represent multiple incoming TCP connections as a stream of Akka streams?

I'm prototyping a network server using Akka Streams that will listen on a port, accept incoming connections, and continuously read data off each connection. Each connected client will only send data, and will not expect to receive anything useful from the server.
Conceptually, I figured it would be fitting to model the incoming events as one single stream that only incidentally happens to be delivered via multiple TCP connections. Thus, assuming that I have a case class Msg(msg: String) that represents each data message, what I want is to represent the entirety of incoming data as a Source[Msg, _]. This makes a lot of sense for my use case, because I can very simply connect flows & sinks to this source.
Here's the code I wrote to implement my idea:
import akka.actor.ActorSystem
import akka.stream.ActorMaterializer
import akka.stream.SourceShape
import akka.stream.scaladsl._
import akka.util.ByteString
import akka.NotUsed
import scala.concurrent.{ Await, Future }
import scala.concurrent.duration._
case class Msg(msg: String)
object tcp {
val N = 2
def main(argv: Array[String]) {
implicit val system = ActorSystem()
implicit val materializer = ActorMaterializer()
val connections = Tcp().bind("0.0.0.0", 65432)
val delim = Framing.delimiter(
ByteString("\n"),
maximumFrameLength = 256, allowTruncation = true
)
val parser = Flow[ByteString].via(delim).map(_.utf8String).map(Msg(_))
val messages: Source[Msg, Future[Tcp.ServerBinding]] =
connections.flatMapMerge(N, {
connection =>
println(s"client connected: ${connection.remoteAddress}")
Source.fromGraph(GraphDSL.create() { implicit builder =>
import GraphDSL.Implicits._
val F = builder.add(connection.flow.via(parser))
val nothing = builder.add(Source.tick(
initialDelay = 1.second,
interval = 1.second,
tick = ByteString.empty
))
F.in <~ nothing.out
SourceShape(F.out)
})
})
import scala.concurrent.ExecutionContext.Implicits.global
Await.ready(for {
_ <- messages.runWith(Sink.foreach {
msg => println(s"${System.currentTimeMillis} $msg")
})
_ <- system.terminate()
} yield (), Duration.Inf)
}
}
This code works as expected, however, note the val N = 2, which is passed into the flatMapMerge call that ultimately combines the incoming data streams into one. In practice this means that I can only read from that many streams at a time.
I don't know how many connections will be made to this server at any given time. Ideally I would want to support as many as possible, but hardcoding an upper bound doesn't seem like the right thing to do.
My question, at long last, is: How can I obtain or create a flatMapMerge stage that can read from more than a fixed number of connections at one time?
As indicated by Viktor Klang's comments I don't think this is possible in 1 stream. However, I think it would be possible to create a stream that can receive messages after materialization and use that as a "sink" for messages coming from the TCP connections.
First create the "sink" stream:
val sinkRef =
Source
.actorRef[Msg](Int.MaxValue, fail)
.to(Sink foreach {m => println(s"${System.currentTimeMillis} $m")})
.run()
This sinkRef can be used by each Connection to receive the messages:
connections foreach { conn =>
Source
.empty[ByteString]
.via(conn.flow)
.via(parser)
.runForeach(msg => sinkRef ! msg)
}

akka streams over tcp

Here is the setup: I want to be able to stream messages (jsons converted to bytestrings) from a publisher to a remote server subscriber over a tcp connection.
Ideally, the publisher would be an actor that would receive internal messages, queue them and then stream them to the subscriber server if there is outstanding demand of course. I understood that what is necessary for this is to extend ActorPublisher class in order to onNext() the messages when needed.
My problem is that so far I am able just to send (receive and decode properly) one shot messages to the server opening a new connection each time. I did not manage to get my head around the akka doc and be able to set the proper tcp Flow with the ActorPublisher.
Here is the code from the publisher:
def send(message: Message): Unit = {
val system = Akka.system()
implicit val sys = system
import system.dispatcher
implicit val materializer = ActorMaterializer()
val address = Play.current.configuration.getString("eventservice.location").getOrElse("localhost")
val port = Play.current.configuration.getInt("eventservice.port").getOrElse(9000)
/*** Try with actorPublisher ***/
//val result = Source.actorPublisher[Message] (Props[EventActor]).via(Flow[Message].map(Json.toJson(_).toString.map(ByteString(_))))
/*** Try with actorRef ***/
/*val source = Source.actorRef[Message](0, OverflowStrategy.fail).map(
m => {
Logger.info(s"Sending message: ${m.toString}")
ByteString(Json.toJson(m).toString)
}
)
val ref = Flow[ByteString].via(Tcp().outgoingConnection(address, port)).to(Sink.ignore).runWith(source)*/
val result = Source(Json.toJson(message).toString.map(ByteString(_))).
via(Tcp().outgoingConnection(address, port)).
runFold(ByteString.empty) { (acc, in) ⇒ acc ++ in }//Handle the future
}
and the code from the actor which is quite standard in the end:
import akka.actor.Actor
import akka.stream.actor.ActorSubscriberMessage.{OnComplete, OnError}
import akka.stream.actor.{ActorPublisherMessage, ActorPublisher}
import models.events.Message
import play.api.Logger
import scala.collection.mutable
class EventActor extends Actor with ActorPublisher[Message] {
import ActorPublisherMessage._
var queue: mutable.Queue[Message] = mutable.Queue.empty
def receive = {
case m: Message =>
Logger.info(s"EventActor - message received and queued: ${m.toString}")
queue.enqueue(m)
publish()
case Request => publish()
case Cancel =>
Logger.info("EventActor - cancel message received")
context.stop(self)
case OnError(err: Exception) =>
Logger.info("EventActor - error message received")
onError(err)
context.stop(self)
case OnComplete =>
Logger.info("EventActor - onComplete message received")
onComplete()
context.stop(self)
}
def publish() = {
while (queue.nonEmpty && isActive && totalDemand > 0) {
Logger.info("EventActor - message published")
onNext(queue.dequeue())
}
}
I can provide the code from the subscriber if necessary:
def connect(system: ActorSystem, address: String, port: Int): Unit = {
implicit val sys = system
import system.dispatcher
implicit val materializer = ActorMaterializer()
val handler = Sink.foreach[Tcp.IncomingConnection] { conn =>
Logger.info("Event server connected to: " + conn.remoteAddress)
// Get the ByteString flow and reconstruct the msg for handling and then output it back
// that is how handleWith work apparently
conn.handleWith(
Flow[ByteString].fold(ByteString.empty)((acc, b) => acc ++ b).
map(b => handleIncomingMessages(system, b.utf8String)).
map(ByteString(_))
)
}
val connections = Tcp().bind(address, port)
val binding = connections.to(handler).run()
binding.onComplete {
case Success(b) =>
Logger.info("Event server started, listening on: " + b.localAddress)
case Failure(e) =>
Logger.info(s"Event server could not bind to $address:$port: ${e.getMessage}")
system.terminate()
}
}
thanks in advance for the hints.
My first recommendation is to not write your own queue logic. Akka provides this out-of-the-box. You also don't need to write your own Actor, Akka Streams can provide it as well.
First we can create the Flow that will connect your publisher to your subscriber via Tcp. In your publisher code you only need to create the ActorSystem once and connect to the outside server once:
//this code is at top level of your application
implicit val actorSystem = ActorSystem()
implicit val actorMaterializer = ActorMaterializer()
import actorSystem.dispatcher
val host = Play.current.configuration.getString("eventservice.location").getOrElse("localhost")
val port = Play.current.configuration.getInt("eventservice.port").getOrElse(9000)
val publishFlow = Tcp().outgoingConnection(host, port)
publishFlow is a Flow that will input ByteString data that you want to send to the external subscriber and outputs ByteString data that comes from subscriber:
// data to subscriber ----> publishFlow ----> data returned from subscriber
The next step is the publisher Source. Instead of writing your own Actor you can use Source.actorRef to "materialize" the Stream into an ActorRef. Essentially the Stream will become an ActorRef for us to use later:
//these values control the buffer
val bufferSize = 1024
val overflowStrategy = akka.stream.OverflowStrategy.dropHead
val messageSource = Source.actorRef[Message](bufferSize, overflowStrategy)
We also need a Flow to convert Messages into ByteString
val marshalFlow =
Flow[Message].map(message => ByteString(Json.toJson(message).toString))
Finally we can connect all of the pieces. Since you aren't receiving any data back from the external subscriber we'll ignore any data coming in from the connection:
val subscriberRef : ActorRef = messageSource.via(marshalFlow)
.via(publishFlow)
.runWith(Sink.ignore)
We can now treat this stream as if it were an Actor:
val message1 : Message = ???
subscriberRef ! message1
val message2 : Message = ???
subscriberRef ! message2