How to emit messages from a Sink and pass them to another function? - scala

I am currently building a client-side WebSockets consumer using Akka-HTTP. Instead of trying to do the parsing in the Sink, I wanted to wrap the code in a function which emits the outcome from the Sink, and then use this output (from the function) later for further processing (more parsing...etc.).
I am currently able to print every message from the Sink; however, the return type of the function remains to be Unit. My objective is to Emit a String from the function, for each item that lands in the sink, and then use the returned string to do further parsing. I have the code I have so far (Note: it's mostly boiler plate).
import java.util.concurrent.atomic.AtomicInteger
import akka.Done
import akka.actor.ActorSystem
import akka.http.scaladsl.Http
import akka.http.scaladsl.model.StatusCodes
import akka.http.scaladsl.model.ws.{Message, TextMessage, WebSocketRequest, WebSocketUpgradeResponse}
import akka.http.scaladsl.settings.ClientConnectionSettings
import akka.stream.Materializer
import akka.stream.scaladsl.{Flow, Keep, Sink, Source}
import akka.util.ByteString
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.Future
import scala.util.{Failure, Success, Try}
object client extends App {
def parseData(uri: String)(implicit system: ActorSystem, materializer: Materializer): Unit = {
val defaultSettings = ClientConnectionSettings(system)
val pingCounter = new AtomicInteger()
val customWebsocketSettings = defaultSettings.websocketSettings.withPeriodicKeepAliveData(
() => ByteString(s"debug-${pingCounter.incrementAndGet()}")
)
val customSettings = defaultSettings.withWebsocketSettings(customWebsocketSettings)
val outgoing = Source.maybe[Message]
val sink: Sink[Message, Future[Done]] = Sink.foreach[Message] {
case message: TextMessage.Strict => message.text // I Want to emit/stream this message as a String from the function (or get a handle on it from the outside)
case _ => println("Other")
}
val webSocketFlow: Flow[Message, Message, Future[WebSocketUpgradeResponse]] =
Http().webSocketClientFlow(WebSocketRequest(uri), settings = customSettings)
val (upgradeResponse, closed) =
outgoing
.viaMat(webSocketFlow)(Keep.right)
.toMat(sink)(Keep.both)
.run()
val connected = upgradeResponse.flatMap { upgrade =>
if (upgrade.response.status == StatusCodes.SwitchingProtocols) {
Future.successful(Done)
} else {
throw new RuntimeException(
s"Connection failed: ${upgrade.response.status}"
)
}
}
connected.onComplete {
case Success(value) => value
case Failure(exception) => throw exception
}
closed.onComplete { _ =>
println("Retrying...")
parseData(uri)
}
upgradeResponse.onComplete {
case Success(value) => println(value)
case Failure(exception) => throw exception
}
}
}
And in a seperate object, I would like to do the parsing, so something like:
import akka.actor.ActorSystem
import akka.stream.Materializer
import api.client.parseData
object Application extends App {
implicit val system: ActorSystem = ActorSystem()
implicit val materializer: Materializer = Materializer(system)
val uri = "ws://localhost:8080/foobar"
val res = parseData(uri) // I want to handle the function output here
// parse(res)
println(res)
Is there a way I can get a handle on the Sink from outside the function, or do I need to do any parsing in the Sink. I am mainly trying to not overcomplicate the Sink.
Update: I am also considering if adding another Flow element to the stream (which handles the parsing) is a better practice than getting values outside of the stream.

Adding a flow element seems to solve your problem while being totally idiomatic.
What you have to keep in mind is that the sinks semantic is meant to describe how to "terminate" the stream, so while it can describe very complex computations, it will always return a single value which is returned only once the stream ends.
Said differently, a sink does not return a value per stream element, it returns a value per whole stream.

Related

items fail to be processed in Akka streams app that uses Source.queues and Sink.queues in a flow

I am trying to create an (Akka HTTP) stream procsesing flow using the classes akka.stream.scaladsl.Source and Sink queues.
I am using a queue because I have a processing step in my flow that issues http requests and I want this step to take as many
items off the queue as there are max-open-requests, and stop taking off the queue once max-open-requests are in flight.
The result is that backpressure is applied when my connection pool is overloaded.
Below, I have a very simplified test that reflects the main logic of my app. In the test 'Stress Spec' (below)
I am simulating a number of simultaneous connections via which I will send a 'Source' of 'Requesto' objects
to the getResponses method of the class ServiceImpl.
In the processing step 'pullOffSinkQueue' you will note that I am incrementing a counter to see how many items
I have pulled off the queue.
The test will send Serviceimpl a set of requests whose cardinality is set to equal
streamedRequestsPerConnection * numSimultaneousConnections.
When I send 20 requests my test passes fine. In particular the count of requests pulled off the
Sink.queue will be equal to the number of requests I send out. However, if
I increase the number of requests I send to above 50 or so, I see consistent failures in the test.
I get a message such as the one below
180 was not equal to 200
ScalaTestFailureLocation: com.foo.StressSpec at (StressSpec.scala:116)
Expected :200
Actual :180
<Click to see difference>
This indicates that the number of items pulled off the queue does not equal the number of items put on the queue.
I have a feeling this might be due to the fact that my test is not properly waiting for all items put into the stream
to be processed. If anyone has any suggestions, I'd be all ears ! Code is below.
package com.foo
import java.util.concurrent.atomic.AtomicInteger
import akka.stream.ActorAttributes.supervisionStrategy
import akka.stream.{Attributes, Materializer, QueueOfferResult}
import akka.stream.Supervision.resumingDecider
import akka.stream.scaladsl.{Flow, Keep, Sink, Source}
import scala.concurrent.{ExecutionContext, Future}
import akka.NotUsed
import akka.actor.ActorSystem
import akka.event.{Logging, LoggingAdapter}
import akka.stream.ActorMaterializer
import akka.stream.scaladsl.{Sink, Source}
import org.scalatest.mockito.MockitoSugar
import org.scalatest.{FunSuite, Matchers}
import scala.collection.immutable
import scala.concurrent.duration._
import scala.concurrent.{Await, Future, _}
final case class Responso()
final case class Requesto()
object Handler {
val dbRequestCounter = new AtomicInteger(0)
}
class Handler(implicit ec: ExecutionContext, mat: Materializer) {
import Handler._
private val source =
Source.queue[(Requesto, String)](8, akka.stream.OverflowStrategy.backpressure)
private val sink =
Sink.queue[(Requesto, String)]().withAttributes(Attributes.inputBuffer(8, 8))
private val (sourceQueue, sinkQueue) = source.toMat(sink)(Keep.both).run()
def placeOnSourceQueue(ar: Requesto): Future[QueueOfferResult] = {
sourceQueue.offer((ar, "foo"))
}
def pullOffSinkQueue(qofr: QueueOfferResult): Future[Responso] = {
dbRequestCounter.incrementAndGet()
qofr match {
case QueueOfferResult.Enqueued =>
sinkQueue.pull().flatMap { maybeRequestPair: Option[(Requesto, String)] =>
Future.successful(Responso())
}
case error =>
println("enqueuing error: " + error)
Future.failed(new RuntimeException("enqueuing error: " + error))
}
}
}
class ServiceImpl(readHandler: Handler, writeHandler: Handler)
(implicit log: LoggingAdapter, mat: Materializer) {
private val readAttributeFlow: Flow[Requesto, Responso, NotUsed] = {
Flow[Requesto]
.mapAsyncUnordered(1)(readHandler.placeOnSourceQueue)
.mapAsyncUnordered(1)(readHandler.pullOffSinkQueue)
}
def getResponses(request: Source[Requesto, NotUsed]): Source[Responso, NotUsed] =
request
.via(readAttributeFlow)
.withAttributes(supervisionStrategy(resumingDecider))
}
class StressSpec
extends FunSuite
with MockitoSugar
with Matchers {
val streamedRequestsPerConnection = 10
val numSimultaneousConnections = 20
implicit val actorSystem: ActorSystem = ActorSystem()
implicit val materializer: ActorMaterializer = ActorMaterializer()
implicit val log: LoggingAdapter = Logging(actorSystem.eventStream, "test")
implicit val ec: ExecutionContext = actorSystem.dispatcher
import Handler._
lazy val requestHandler = new Handler()
lazy val svc: ServiceImpl =
new ServiceImpl(requestHandler, requestHandler)
test("can handle lots of simultaneous read requests") {
val totalExpected = streamedRequestsPerConnection * numSimultaneousConnections
def sendRequestAndAwaitResponse(): Unit = {
def getResponses(i: Integer) = {
val requestStream: Source[Requesto, NotUsed] =
Source(1 to streamedRequestsPerConnection)
.map { i =>
Requesto()
}
svc.getResponses(requestStream).runWith(Sink.seq)
}
val responses: immutable.Seq[Future[immutable.Seq[Responso]]] =
(1 to numSimultaneousConnections).map { getResponses(_) }
val flattenedResponses: Future[immutable.Seq[Responso]] =
Future.sequence(responses).map(_.flatten)
Await.ready(flattenedResponses, 1000.seconds).value.get
}
sendRequestAndAwaitResponse()
dbRequestCounter.get shouldBe(totalExpected)
}
}

Does composite flow make loop?

I am trying to understand, how the following code snippet works:
val flow: Flow[Message, Message, Future[Done]] =
Flow.fromSinkAndSourceMat(printSink, helloSource)(Keep.left)
Two guys gave a very wonderful explanation on this thread. I understand the concept of the Composite flow, but how does it work on the websocket client.
Consider the following code:
import akka.actor.ActorSystem
import akka.{ Done, NotUsed }
import akka.http.scaladsl.Http
import akka.stream.ActorMaterializer
import akka.stream.scaladsl._
import akka.http.scaladsl.model._
import akka.http.scaladsl.model.ws._
import scala.concurrent.Future
object SingleWebSocketRequest {
def main(args: Array[String]) = {
implicit val system = ActorSystem()
implicit val materializer = ActorMaterializer()
import system.dispatcher
// print each incoming strict text message
val printSink: Sink[Message, Future[Done]] =
Sink.foreach {
case message: TextMessage.Strict =>
println(message.text)
}
val helloSource: Source[Message, NotUsed] =
Source.single(TextMessage("hello world!"))
// the Future[Done] is the materialized value of Sink.foreach
// and it is completed when the stream completes
val flow: Flow[Message, Message, Future[Done]] =
Flow.fromSinkAndSourceMat(printSink, helloSource)(Keep.left)
// upgradeResponse is a Future[WebSocketUpgradeResponse] that
// completes or fails when the connection succeeds or fails
// and closed is a Future[Done] representing the stream completion from above
val (upgradeResponse, closed) =
Http().singleWebSocketRequest(WebSocketRequest("ws://echo.websocket.org"), flow)
val connected = upgradeResponse.map { upgrade =>
// just like a regular http request we can access response status which is available via upgrade.response.status
// status code 101 (Switching Protocols) indicates that server support WebSockets
if (upgrade.response.status == StatusCodes.SwitchingProtocols) {
Done
} else {
throw new RuntimeException(s"Connection failed: ${upgrade.response.status}")
}
}
// in a real application you would not side effect here
// and handle errors more carefully
connected.onComplete(println)
closed.foreach(_ => println("closed"))
}
}
It is a websocket client, that send a message to the websocket server and the printSink receives it and print it out.
How can it be, that printSink receives messages, there is no a connection between the Sink and Source.
Is it like a loop?
Stream flow is from left to right, how it comes that the Sink can consume messages from websocket server?
Flow.fromSinkAndSourceMat puts an independent Sink and a Source to a shape of the Flow. Elements going into that Sink do not end up at the Source.
From the Websocket client API perspective, it needs a Source from which requests will be sent to the server and a Sink that it will send the responses to. The singleWebSocketRequest could take a Source and a Sink separately, but that would be a bit more verbose API.
Here is a shorter example that demonstrates the same as in your code snippet but is runnable, so you can play around with it:
import akka._
import akka.actor._
import akka.stream._
import akka.stream.scaladsl._
implicit val sys = ActorSystem()
implicit val mat = ActorMaterializer()
def openConnection(userFlow: Flow[String, String, NotUsed])(implicit mat: Materializer) = {
val processor = Flow[String].map(_.toUpperCase)
processor.join(userFlow).run()
}
val requests = Source(List("one", "two", "three"))
val responses = Sink.foreach(println)
val userFlow = Flow.fromSinkAndSource(responses, requests)
openConnection(userFlow)

akka stream integrating akka-htpp web request call into stream

Getting started with Akka Streams I want to perform a simple computation. Extending the basic QuickStart https://doc.akka.io/docs/akka/2.5/stream/stream-quickstart.html with a call to a restful web api:
val source: Source[Int, NotUsed] = Source(1 to 100)
source.runForeach(println)
already works nicely to print the numbers. But when trying to create an Actor to perform the HTTP request (is this actually necessary?) according to https://doc.akka.io/docs/akka/2.5.5/scala/stream/stream-integrations.html
import akka.pattern.ask
implicit val askTimeout = Timeout(5.seconds)
val words: Source[String, NotUsed] =
Source(List("hello", "hi"))
words
.mapAsync(parallelism = 5)(elem => (ref ? elem).mapTo[String])
// continue processing of the replies from the actor
.map(_.toLowerCase)
.runWith(Sink.ignore)
I cannot get it to compile as the ? operator is not defined. As ar as I know this one would only be defined inside an actor.
I also do not understand yet where exactly inside mapAsync my custom actor needs to be called.
edit
https://blog.colinbreck.com/backoff-and-retry-error-handling-for-akka-streams/ contains at least parts of an example.
It looks like it is not mandatory to create an actor i.e.
implicit val system = ActorSystem()
implicit val ec = system.dispatcher
implicit val materializer = ActorMaterializer()
val source = Source(List("232::03::14062::19965186", "232::03::14062::19965189"))
.map(cellKey => {
val splits = cellKey.split("::")
val mcc = splits(0)
val mnc = splits(1)
val lac = splits(2)
val ci = splits(3)
CellKeySource(cellKey, mcc, mnc, lac, ci)
})
.limit(2)
.mapAsyncUnordered(2)(ck => getResponse(ck.cellKey, ck.mobileCountryCode, ck.mobileNetworkCode, ck.locationArea, ck.cellKey)("<<myToken>>"))
def getResponse(cellKey: String, mobileCountryCode:String, mobileNetworkCode:String, locationArea:String, cellId:String)(token:String): Future[String] = {
RestartSource.withBackoff(
minBackoff = 10.milliseconds,
maxBackoff = 30.seconds,
randomFactor = 0.2,
maxRestarts = 2
) { () =>
val responseFuture: Future[HttpResponse] =
Http().singleRequest(HttpRequest(uri = s"https://www.googleapis.com/geolocation/v1/geolocate?key=${token}", entity = ByteString(
// TODO use proper JSON objects
s"""
|{
| "cellTowers": [
| "mobileCountryCode": $mobileCountryCode,
| "mobileNetworkCode": $mobileNetworkCode,
| "locationAreaCode": $locationArea,
| "cellId": $cellId,
| ]
|}
""".stripMargin)))
Source.fromFuture(responseFuture)
.mapAsync(parallelism = 1) {
case HttpResponse(StatusCodes.OK, _, entity, _) =>
Unmarshal(entity).to[String]
case HttpResponse(statusCode, _, _, _) =>
throw WebRequestException(statusCode.toString() )
}
}
.runWith(Sink.head)
.recover {
case _ => throw StreamFailedAfterMaxRetriesException()
}
}
val done: Future[Done] = source.runForeach(println)
done.onComplete(_ ⇒ system.terminate())
is already the (partial) answer for the question i.e. how to integrate Akka-streams + akka-http. However, it does not work, i.e. only throws error 400s and never terminates.
i think you already found an api how to call akka-http client
regarding your first code snippet which doesn't work. i think there happened some misunderstanding of the example itself. you expected the code in the example to work after just copied. but the intension of the doc was to demonstrate just an example/concept, how you can delegate some long running task out of the stream flow and then consuming the result when it's ready. for this was used ask call to akka actor, because call to ask method returns a Future. probably the authors of the doc just omitted the definition of actor. you can try this one example:
import java.lang.System.exit
import akka.NotUsed
import akka.actor.{Actor, ActorRef, ActorSystem, Props}
import akka.pattern.ask
import akka.stream.ActorMaterializer
import akka.stream.scaladsl.{Sink, Source}
import akka.util.Timeout
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration._
import scala.language.higherKinds
object App extends scala.App {
implicit val sys: ActorSystem = ActorSystem()
implicit val mat: ActorMaterializer = ActorMaterializer()
val ref: ActorRef = sys.actorOf(Props[Translator])
implicit val askTimeout: Timeout = Timeout(5.seconds)
val words: Source[String, NotUsed] = Source(List("hello", "hi"))
words
.mapAsync(parallelism = 5)(elem => (ref ? elem).mapTo[String])
.map(_.toLowerCase)
.runWith(Sink.foreach(println))
.onComplete(t => {
println(s"finished: $t")
exit(1)
})
}
class Translator extends Actor {
override def receive: Receive = {
case msg => sender() ! s"$msg!"
}
}
You must import ask pattern from akka.
import akka.pattern.ask
Edit: OK, sorry, I can see that you have already imported. What is ref in your code? ActorRef?

Fetch size in PGConnection.getNotifications

A function in my postgresql database sends a notification when a table is updated.
I'm polling that postgresql database by scalikejdbc, to get all the notifications, and then, do something with them.
The process is explained here . A typical reactive system to sql tables updates.
I get the PGConnection from the java.sql.Connection. And, after that, I get the notifications in this way:
val notifications = Option(pgConnection.getNotifications).getOrElse(Array[PGNotification]())
I'm trying to get the notifications in chunks of 1000 by setting the fetch size to 1000, and disabling the auto commit. But fetch size property is ignored.
Any ideas how I could do that?
I wouldn't want to handle hundreds of thousands of notifications in a single map over my notifications dataset.
pgConnection.getNotifications.size could be huge, and therefore, this code wouldn't scale well.
Thanks!!!
To better scale, consider using postgresql-async and Akka Streams: the former is a library that can obtain PostgreSQL notifications asynchronously, and the former is a Reactive Streams implementation that provides backpressure (which would obviate the need for paging). For example:
import akka.actor._
import akka.stream._
import akka.stream.scaladsl._
import com.github.mauricio.async.db.postgresql.PostgreSQLConnection
import com.github.mauricio.async.db.postgresql.util.URLParser
import scala.concurrent.duration._
import scala.concurrent.Await
class DbActor(implicit materializer: ActorMaterializer) extends Actor with ActorLogging {
private implicit val ec = context.system.dispatcher
val queue =
Source.queue[String](Int.MaxValue, OverflowStrategy.backpressure)
.to(Sink.foreach(println))
.run()
val configuration = URLParser.parse("jdbc:postgresql://localhost:5233/my_db?user=dbuser&password=pwd")
val connection = new PostgreSQLConnection(configuration)
Await.result(connection.connect, 5 seconds)
connection.sendQuery("LISTEN my_channel")
connection.registerNotifyListener { message =>
val msg = message.payload
log.debug("Sending the payload: {}", msg)
self ! msg
}
def receive = {
case payload: String =>
queue.offer(payload).pipeTo(self)
case QueueOfferResult.Dropped =>
log.warning("Dropped a message.")
case QueueOfferResult.Enqueued =>
log.debug("Enqueued a message.")
case QueueOfferResult.Failure(t) =>
log.error("Stream failed: {}", t.getMessage)
case QueueOfferResult.QueueClosed =>
log.debug("Stream closed.")
}
}
The code above simply prints notifications from PostgreSQL as they occur; you can replace the Sink.foreach(println) with another Sink. To run it:
import akka.actor._
import akka.stream.ActorMaterializer
object Example extends App {
implicit val system = ActorSystem()
implicit val materializer = ActorMaterializer()
system.actorOf(Props(classOf[DbActor], materializer))
}

Why akka http is not emiting responses for first N requests?

I'm trying to use akka-http in order to make http requests to a single host (e.g. "akka.io"). The problem is that the created flow (Http().cachedHostConnectionPool) starts emitting responses only after N http requests are made, where N is equal to max-connections.
import scala.util.Failure
import scala.util.Success
import com.typesafe.config.ConfigFactory
import akka.actor.ActorSystem
import akka.http.scaladsl.Http
import akka.http.scaladsl.model.HttpRequest
import akka.http.scaladsl.model.Uri.apply
import akka.http.scaladsl.settings.ConnectionPoolSettings
import akka.stream.ActorMaterializer
import akka.stream.scaladsl.Sink
import akka.stream.scaladsl.Source
object ConnectionPoolExample extends App {
implicit val system = ActorSystem()
implicit val executor = system.dispatcher
implicit val materializer = ActorMaterializer()
val config = ConfigFactory.load()
val connectionPoolSettings = ConnectionPoolSettings(config).withMaxConnections(10)
lazy val poolClientFlow = Http().cachedHostConnectionPool[Unit]("akka.io", 80, connectionPoolSettings)
val fakeSource = Source.fromIterator[Unit] { () => Iterator.continually { Thread.sleep(1000); () } }
val requests = fakeSource.map { _ => println("Creating request"); HttpRequest(uri = "/") -> (()) }
val responses = requests.via(poolClientFlow)
responses.runForeach {
case (tryResponse, jsonData) =>
tryResponse match {
case Success(httpResponse) =>
httpResponse.entity.dataBytes.runWith(Sink.ignore)
println(s"status: ${httpResponse.status}")
case Failure(e) => {
println(e)
}
}
}
}
The output looks like this:
Creating request
Creating request
Creating request
Creating request
Creating request
Creating request
Creating request
Creating request
Creating request
Creating request
status: 200 OK
Creating request
status: 200 OK
Creating request
status: 200 OK
...
I am failing to find any configuration parameters which would allow emitting responses as soon as they are ready and not when the pool is out of free connections.
Thanks!
The reason is that you block the client from doing other work by calling Thread.sleep—that method is simply forbidden inside reactive programs. The proper and simpler approach is to use Source.tick.