Detecting that back pressure is happening - scala

My Akka HTTP application streams some data via server-sent events, and clients can request way more events than they can handle. The code looks something like this
complete {
source.filter(predicate.isMatch)
.buffer(1000, OverflowStrategy.dropTail)
.throttle(20, 1 second)
.map { evt => ServerSentEvent(evt) }
}
Is there a way to detect the fact that a stage backpressures and somehow notify the client preferably using the same sink (by emitting a different kind of output) or if not possible just make Akka framework call some sort of callback that will deal with the fact through a control side-channel?

So, I'm not sure I understand your use case. Are you asking about back pressure at .buffer or at .throttle? Another part of my confusion is that you are suggesting emitting a new "control" element in a situation where the stream is already back pressured. So your control element might not be received for some time. Also, if you emit a control element every single time you receive back pressure you will likely create a flood of control elements.
One way to build this (overly naive) solution would be to use conflate.
val simpleSink: Sink[String, Future[Done]] =
Sink.foreach(e => println(s"simple: $e"))
val cycleSource: Source[String, NotUsed] =
Source.cycle(() => List("1", "2", "3", "4").iterator).throttle(5, 1.second)
val conflateFlow: Flow[String, String, NotUsed] =
Flow[String].conflate((a, b) => {
"BACKPRESSURE CONTROL ELEMENT"
})
val backpressureFlow: Flow[String, String, NotUsed] =
Flow[String]
.buffer(10, OverflowStrategy.backpressure) throttle (2, 1.second)
val backpressureTest =
cycleSource.via(conflateFlow).via(backpressureFlow).to(simpleSink).run()
To turn this into a more usable example you could either:
Make some sort of call inside of .conflate (and then just drop one of the elements). Be careful not to do anything blocking though. Perhaps just send a message that could be de-duplicated elsewhere.
Write a custom graph stage. Doing something simple like this wouldn't be too difficult.
I think I'd have to understand more about the use case though. Take a look at all of the off the shelf backpressure aware operators and see if one of them helps.

Related

When materialiser is actually used in Akka Streams Flows and when do we need to Keep values

I'm trying to learn Akka Streams and I'm stuck with this materialization here.
Every tutorial shows some basics source via to run examples where no real between Keep.left and Keep.right is explained. So I wrote this little piece of code, asked IntelliJ to add a type annotation to the values and started to dig the sources.
val single: Source[Int, NotUsed] = Source(Seq(1, 2, 3, 4, 5))
val flow: Flow[Int, Int, NotUsed] = Flow[Int].map(_ * 2)
val sink: Sink[Int, Future[Int]] = Sink.fold[Int, Int](0)(_ + _)
val run1: RunnableGraph[Future[Int]] =
single.viaMat(flow)(Keep.right).toMat(sink)(Keep.right)
val run2: RunnableGraph[NotUsed] =
single.viaMat(flow)(Keep.right).toMat(sink)(Keep.left)
val run3: RunnableGraph[(NotUsed, Future[Int])] =
single.viaMat(flow)(Keep.right).toMat(sink)(Keep.both)
val run4: RunnableGraph[NotUsed] =
single.viaMat(flow)(Keep.right).toMat(sink)(Keep.none)
So far I can understand that at the end of the execution we can need the value of the Sink that is of type Future[Int]. But I cannot think of any case when I gonna need to keep some of the values.
In the third example it is possible to acces both left and right values of the materialized output.
run3.run()._2 onComplete {
case Success(value) ⇒ println(value)
case Failure(exception) ⇒ println(exception.getMessage)
}
It actually works absolutely the same way if I change it to viaMat(flowMultiply)(Keep.left) or none or both.
But in what scenarios the materialized value could be used within the graph? Why would we need it if the value is flowing within anyway? Why do we need one of the values if we aren't gonna keep it?
Could you pelase provide an example where changing from left to right will not just break the compiler, but will actually bring a difference to the program logic?
For most streams, you only care about the value at the end of the stream. Accordingly, most of the Source and nearly all of the standard Flow operators have a materialized value of NotUsed, and the syntactic sugar .runWith boils down to .toMat(sink)(Keep.right).run.
Where one might care about the materialized value of a Source or Flow stage is when you want to be able to control a stage outside of the stream. An example of this is Source.actorRef, which allows you to send messages to an actor which get forwarded to the stream: you need the Source's materialized ActorRef in order to actually send a message to it. Likewise, you probably still want the materialized value of the Sink (whether to know that the stream processing happened (Future[Done]) or for an actual value at the end of the stream). In such a stream you'd probably have something like:
val stream: RunnableGraph[(ActorRef, Future[Done])] =
Source.actorRef(...)
.viaMat(calculateStuffFlow)(Keep.left) // propagates the ActorRef
.toMat(Sink.foreach { ... })(Keep.both)
val (sendToStream, done) = stream.run()
Another reasonably common use-case for this is in the Alpakka Kafka integration, where it's possible for the consumer to have a controller as a materialized value: this controller allows you to stop consuming from a topic and not unsubscribe until any pending offset commits have happened.

Update concurrent map inside a stream map on flink

I have one stream that constantly streaming the latest values of some keys.
Stream A:DataStream[(String,Double)]
I have another stream that wants to get the latest value on each process call.
My approach was to introduce a concurrentHashMap which will be updated by stream A and read by the second stream.
val rates = new concurrentHasMap[String,Double].asScala
val streamA : DataStream[(String,Double)]= ???
streamA.map(keyWithValue => rates(keyWithValue._1)= keyWithValue._2) //rates never gets updated
rates("testKey")=2 //this works
val streamB: DataStream[String] = ???
streamB.map(str=> rates(str) // rates does not contain the values of the streamA at this point
//some other functionality
)
Is it possible to update a concurrent map from a stream? Any other solution on sharing data from a stream with another is also acceptable
The behaviour You are trying to use will not work in a distributed manner, basically if You will have parellelism > 1 it will not work. In Your code rates are actually updated, but in different instance of parallel operator.
Actually, what You would like to do in this case is use a BroadcastState which was designed to solve exactly the issue You are facing.
In Your specific usecase it would look like something like this:
val streamA : DataStream[(String,Double)]= ???
val streamABroadcasted = streamA.broadcast(<Your Map State Definition>)
val streamB: DataStream[String] = ???
streamB.connect(streamABroadcasted)
Then You could easily use BroadcastProcessFunction to implement Your logic. More on the Broadcast state pattern can be found here

Looking for something like a TestFlow analogous to TestSink and TestSource

I am writing a class that takes a Flow (representing a kind of socket) as a constructor argument and that allows to send messages and wait for the respective answers asynchronously by returning a Future. Example:
class SocketAdapter(underlyingSocket: Flow[String, String, _]) {
def sendMessage(msg: MessageType): Future[ResponseType]
}
This is not necessarily trivial because there may be other messages in the socket stream that are irrelevant, so some filtering is required.
In order to test the class I need to provide something like a "TestFlow" analogous to TestSink and TestSource. In fact I can create a flow by combining both. However, the problem is that I only obtain the actual probes upon materialization and materialization happens inside the class under test.
The problem is similar to the one I described in this question. My problem would be solved if I could materialize the flow first and then pass it to a client to connect to it. Again, I'm thinking about using MergeHub and BroadcastHub and again I see the problem that the resulting stream would behave differently because it is not linear anymore.
Maybe I misunderstood how a Flow is supposed to be used. In order to feed messages into the flow when sendMessage() is called, I need a certain kind of Source anyway. Maybe a Source.actorRef(...) or Source.queue(...), so I could pass in the ActorRef or SourceQueue directly. However, I'd prefer if this choice was up to the SocketAdapter class. Of course, this applies to the Sink as well.
It feels like this is a rather common case when working with streams and sockets. If it is not possible to create a "TestFlow" like I need it, I'm also happy with some advice on how to improve my design and make it better testable.
Update: I browsed through the documentation and found SourceRef and SinkRef. It looks like these could solve my problem but I'm not sure yet. Is it reasonable to use them in my case or are there any drawbacks, e.g. different behaviour in the test compared to production where there are no such refs?
Indirect Answer
The nature of your question suggests a design flaw which you are bumping into at testing time. The answer below does not address the issue in your question, but it demonstrates how to avoid the situation altogether.
Don't Mix Business Logic with Akka Code
Presumably you need to test your Flow because you have mixed a substantial amount of logic into the materialization. Lets assume you are using raw sockets for your IO. Your question suggests that your flow looks like:
val socketFlow : Flow[String, String, _] = {
val socket = new Socket(...)
//business logic for IO
}
You need a complicated test framework for your Flow because your Flow itself is also complicated.
Instead, you should separate out the logic into an independent function that has no akka dependencies:
type MessageProcessor = MessageType => ResponseType
object BusinessLogic {
val createMessageProcessor : (Socket) => MessageProcessor = {
//business logic for IO
}
}
Now your flow can be very simple:
val socket : Socket = new Socket(...)
val socketFlow = Flow.map(BusinessLogic.createMessageProcessor(socket))
As a result: your unit testing can exclusively work with createMessageProcessor, there's no need to test akka Flow because it is a simple veneer around the complicated logic that is tested independently.
Don't Use Streams For Concurrency Around 1 Element
The other big problem with your design is that SocketAdapter is using a stream to process just 1 message at a time. This is incredibly wasteful and unnecessary (you're trying to kill a mosquito with a tank).
Given the separated business logic your adapter becomes much simpler and independent of akka:
class SocketAdapter(messageProcessor : MessageProcessor) {
def sendMessage(msg: MessageType): Future[ResponseType] = Future {
messageProcessor(msg)
}
}
Note how easy it is to use Future in some instances and Flow in other scenarios depending on the need. This comes from the fact that the business logic is independent of any concurrency framework.
This is what I came up with using SinkRef and SourceRef:
object TestFlow {
def withProbes[In, Out](implicit actorSystem: ActorSystem,
actorMaterializer: ActorMaterializer)
:(Flow[In, Out, _], TestSubscriber.Probe[In], TestPublisher.Probe[Out]) = {
val f = Flow.fromSinkAndSourceMat(TestSink.probe[In], TestSource.probe[Out])
(Keep.both)
val ((sinkRefFuture, (inProbe, outProbe)), sourceRefFuture) =
StreamRefs.sinkRef[In]()
.viaMat(f)(Keep.both)
.toMat(StreamRefs.sourceRef[Out]())(Keep.both)
.run()
val sinkRef = Await.result(sinkRefFuture, 3.seconds)
val sourceRef = Await.result(sourceRefFuture, 3.seconds)
(Flow.fromSinkAndSource(sinkRef, sourceRef), inProbe, outProbe)
}
}
This gives me a flow I can completely control with the two probes but I can pass it to a client that connects source and sink later, so it seems to solve my problem.
The resulting Flow should only be used once, so it differs from a regular Flow that is rather a flow blueprint and can be materialized several times. However, this restriction applies to the web socket flow I am mocking anyway, as described here.
The only issue I still have is that some warnings are logged when the ActorSystem terminates after the test. This seems to be due to the indirection introduced by the SinkRef and SourceRef.
Update: I found a better solution without SinkRef and SourceRef by using mapMaterializedValue():
def withProbesFuture[In, Out](implicit actorSystem: ActorSystem,
ec: ExecutionContext)
: (Flow[In, Out, _],
Future[(TestSubscriber.Probe[In], TestPublisher.Probe[Out])]) = {
val (sinkPromise, sourcePromise) =
(Promise[TestSubscriber.Probe[In]], Promise[TestPublisher.Probe[Out]])
val flow =
Flow
.fromSinkAndSourceMat(TestSink.probe[In], TestSource.probe[Out])(Keep.both)
.mapMaterializedValue { case (inProbe, outProbe) =>
sinkPromise.success(inProbe)
sourcePromise.success(outProbe)
()
}
val probeTupleFuture = sinkPromise.future
.flatMap(sink => sourcePromise.future.map(source => (sink, source)))
(flow, probeTupleFuture)
}
When the class under test materializes the flow, the Future is completed and I receive the test probes.

Akka Http - Host Level Client Side API Source.queue pattern

We started to implement the Source.queue[HttpRequest] pattern mentioned in the docs: http://doc.akka.io/docs/akka-http/current/scala/http/client-side/host-level.html#examples
This is the (reduced) example from the documentation
val poolClientFlow = Http()
.cachedHostConnectionPool[Promise[HttpResponse]]("akka.io")
val queue =
Source.queue[(HttpRequest, Promise[HttpResponse])](
QueueSize, OverflowStrategy.dropNew
)
.via(poolClientFlow)
.toMat(Sink.foreach({
case ((Success(resp), p)) => p.success(resp)
case ((Failure(e), p)) => p.failure(e)
}))(Keep.left)
.run()
def queueRequest(request: HttpRequest): Future[HttpResponse] = {
val responsePromise = Promise[HttpResponse]()
queue.offer(request -> responsePromise).flatMap {
case QueueOfferResult.Enqueued => responsePromise.future
case QueueOfferResult.Dropped => Future.failed(new RuntimeException("Queue overflowed. Try again later."))
case QueueOfferResult.Failure(ex) => Future.failed(ex)
case QueueOfferResult.QueueClosed => Future.failed(new RuntimeException("Queue was closed (pool shut down) while running the request. Try again later."))
}
}
val responseFuture: Future[HttpResponse] = queueRequest(HttpRequest(uri = "/"))
The docs state that using Source.single(request) is an anti-pattern and should be avoid. However it doesn't clarify why and what implications come by using Source.queue.
At this place we previously showed an example that used the Source.single(request).via(pool).runWith(Sink.head).
In fact, this is an anti-pattern that doesn’t perform well. Please either supply requests using a queue or in a streamed fashion as shown below.
Advantages of Source.queue
The flow is only materialized once ( probably a performance gain? ). However if I understood the akka-http implementation correctly, a new flow is materialized for each connection, so this doesn't seem to be that much of a problem
Explicit backpressure handling with OverflowStrategy and matching over the QueueOfferResult
Issues with Source.queue
These are the questions that came up, when we started implementing this pattern in our application.
Source.queue is not thread-safe
The queue implementation is not thread safe. When we use the queue in different routes / actors we have this scenario that:
A enqueued request can override the latest enqueued request, thus leading to an unresolved Future.
UPDATE
This issue as been addressed in akka/akka/issues/23081. The queue is in fact thread safe.
Filtering?
What happens when request are being filtered? E.g. when someone changes the implementation
Source.queue[(HttpRequest, Promise[HttpResponse])](
QueueSize, OverflowStrategy.dropNew)
.via(poolClientFlow)
// only successful responses
.filter(_._1.isSuccess)
// failed won't arrive here
.to(Sink.foreach({
case ((Success(resp), p)) => p.success(resp)
case ((Failure(e), p)) => p.failure(e)
}))
Will the Future not resolve? With a single request flow this is straightforward:
Source.single(request).via(poolClientFlow).runWith(Sink.headOption)
QueueSize vs max-open-request?
The difference between the QueueSize and max-open-requests is not clear. In the end, both are buffers. Our implementation ended up using QueueSize == max-open-requests
What's the downside for Source.single()?
Until now I have found two reasons for using Source.queue over Source.single
Performance - materializing the flow only once. However according to this answer it shouldn't be an issue
Explicitly configuring backpressure and handle failure cases. In my opinion the ConnectionPool has a sufficient handling for too much load. One can map over the resulting future and handle the exceptions.
thanks in advance,
Muki
I'll answer each of your questions directly and then give a general indirect answer to the overall problem.
probably a performance gain?
You are correct that there is a Flow materialized for each IncomingConnection but there is still a performance gain to be had if a Connection has multiple requests coming from it.
What happens when request are being filtered?
In general streams do not have a 1:1 mapping between Source elements and Sink Elements. There can be 1:0, as in your example, or there can be 1:many if a single request somehow spawned multiple responses.
QueueSize vs max-open-request?
This ratio would depend on the speed with which elements are being offered to the queue and the speed with which http requests are being processed into responses. There is no pre-defined ideal solution.
GENERAL REDESIGN
In most cases a Source.queue is used because some upstream function is creating input elements dynamically and then offering them to the queue, e.g.
val queue = ??? //as in the example in your question
queue.offer(httpRequest1)
queue.offer(httpRequest2)
queue.offer(httpRequest3)
This is poor design because whatever entity or function that is being used to create each input element could itself be part of the stream Source, e.g.
val allRequests = Iterable(httpRequest1, httpRequest2, httpRequest3)
//no queue necessary
val allResponses : Future[Seq[HttpResponse]] =
Source(allRequests)
.via(poolClientFlow)
.to(Sink.seq[HttpResponse])
.run()
Now there is no need to worry about the queue, max queue size, etc. Everything is bundled into a nice compact stream.
Even if the source of requests is dynamic, you can still usually use a Source. Say we are getting the request paths from the console stdin, this can still be a complete stream:
import scala.io.{Source => ioSource}
val consoleLines : () => Iterator[String] =
() => ioSource.stdin.getLines()
Source
.fromIterator(consoleLines)
.map(consoleLine => HttpRequest(GET, uri = Uri(consoleLine)))
.via(poolClientFlow)
.to(Sink.foreach[HttpResponse](println))
.run()
Now, even if each line is typed into the console at random intervals the stream can still behave reactively without a Queue.
The only instance I've every seen a queue, or Source.ActorRef, as being absolutely necessary is when you have to create a callback function that gets passed into a third party API. This callback function will have to offer the incoming elements to the queue.

Akka-streams: execute action on flow start

Having a flow description in akka-streams
val flow: Flow[Input, Output, Unit] = ???
, how do I modify it to get a new flow description that performs a specified side-affect on start, i.e. when flow is materialized?
Starting materialization of a stream processing graph will set it in motion piece by piece, concurrently. The only way to perform an action that is guaranteed to happen before the first element is passed somewhere within that graph is to perform that action before materializing the graph. In this sense the answer by sschaef is slightly incorrect: using mapMaterializedValue runs the action pretty early, but not such that it is guaranteed to happend before the first element is processed.
If we are talking about a Flow here which only takes in inputs on one side and produces outputs on the other—i.e. it does not contain internal cycles or data sources—then one thing you can do to perform an action before the first element arrives is to attach a processing step to its input that does that:
def effectSource[T](block: => Unit) = Source.fromIterator(() => {block; Iterator.empty})
val newFlow = Flow[Input].prepend(effectSource(/* do stuff */)).via(flow)
Note
The above is using upcoming 2.0 syntax, in Akka Streams 1.0 it would be Source(() => { block; Iterator.empty }) and the prepend operation would need to be done using the FlowGraph DSL (the graph can be found here).
You said it by yourself, use the force of the materialization:
val newFlow = flow.mapMaterializedValue(_ ⇒ println("materialized"))