When not to use NotUsed in Akka Stream - scala

I have seen a lot of Akka stream examples that use NotUsed, but I have not been able to find one where something other than NotUsed was used. I tried creating a source that does not have NotUsed, but it does not seem to be passed to the next flow. I would appreciate if someone can explain when and how we want to have something other than NotUsed.

This part of Akka Streams documentation details the "materialized value" and how they compose (so that you can pass what you need to the next flow): https://doc.akka.io/docs/akka/current/stream/stream-composition.html#materialized-values
Also other parts of the documentation also use not-NotUsed values.
Like in the first snippet here
val source = Source(1 to 10)
val sink = Sink.fold[Int, Int](0)(_ + _)
// connect the Source to the Sink, obtaining a RunnableGraph
val runnable: RunnableGraph[Future[Int]] = source.toMat(sink)(Keep.right)
// materialize the flow and get the value of the sink
val sum: Future[Int] = runnable.run()

Related

When materialiser is actually used in Akka Streams Flows and when do we need to Keep values

I'm trying to learn Akka Streams and I'm stuck with this materialization here.
Every tutorial shows some basics source via to run examples where no real between Keep.left and Keep.right is explained. So I wrote this little piece of code, asked IntelliJ to add a type annotation to the values and started to dig the sources.
val single: Source[Int, NotUsed] = Source(Seq(1, 2, 3, 4, 5))
val flow: Flow[Int, Int, NotUsed] = Flow[Int].map(_ * 2)
val sink: Sink[Int, Future[Int]] = Sink.fold[Int, Int](0)(_ + _)
val run1: RunnableGraph[Future[Int]] =
single.viaMat(flow)(Keep.right).toMat(sink)(Keep.right)
val run2: RunnableGraph[NotUsed] =
single.viaMat(flow)(Keep.right).toMat(sink)(Keep.left)
val run3: RunnableGraph[(NotUsed, Future[Int])] =
single.viaMat(flow)(Keep.right).toMat(sink)(Keep.both)
val run4: RunnableGraph[NotUsed] =
single.viaMat(flow)(Keep.right).toMat(sink)(Keep.none)
So far I can understand that at the end of the execution we can need the value of the Sink that is of type Future[Int]. But I cannot think of any case when I gonna need to keep some of the values.
In the third example it is possible to acces both left and right values of the materialized output.
run3.run()._2 onComplete {
case Success(value) ⇒ println(value)
case Failure(exception) ⇒ println(exception.getMessage)
}
It actually works absolutely the same way if I change it to viaMat(flowMultiply)(Keep.left) or none or both.
But in what scenarios the materialized value could be used within the graph? Why would we need it if the value is flowing within anyway? Why do we need one of the values if we aren't gonna keep it?
Could you pelase provide an example where changing from left to right will not just break the compiler, but will actually bring a difference to the program logic?
For most streams, you only care about the value at the end of the stream. Accordingly, most of the Source and nearly all of the standard Flow operators have a materialized value of NotUsed, and the syntactic sugar .runWith boils down to .toMat(sink)(Keep.right).run.
Where one might care about the materialized value of a Source or Flow stage is when you want to be able to control a stage outside of the stream. An example of this is Source.actorRef, which allows you to send messages to an actor which get forwarded to the stream: you need the Source's materialized ActorRef in order to actually send a message to it. Likewise, you probably still want the materialized value of the Sink (whether to know that the stream processing happened (Future[Done]) or for an actual value at the end of the stream). In such a stream you'd probably have something like:
val stream: RunnableGraph[(ActorRef, Future[Done])] =
Source.actorRef(...)
.viaMat(calculateStuffFlow)(Keep.left) // propagates the ActorRef
.toMat(Sink.foreach { ... })(Keep.both)
val (sendToStream, done) = stream.run()
Another reasonably common use-case for this is in the Alpakka Kafka integration, where it's possible for the consumer to have a controller as a materialized value: this controller allows you to stop consuming from a topic and not unsubscribe until any pending offset commits have happened.

Detecting that back pressure is happening

My Akka HTTP application streams some data via server-sent events, and clients can request way more events than they can handle. The code looks something like this
complete {
source.filter(predicate.isMatch)
.buffer(1000, OverflowStrategy.dropTail)
.throttle(20, 1 second)
.map { evt => ServerSentEvent(evt) }
}
Is there a way to detect the fact that a stage backpressures and somehow notify the client preferably using the same sink (by emitting a different kind of output) or if not possible just make Akka framework call some sort of callback that will deal with the fact through a control side-channel?
So, I'm not sure I understand your use case. Are you asking about back pressure at .buffer or at .throttle? Another part of my confusion is that you are suggesting emitting a new "control" element in a situation where the stream is already back pressured. So your control element might not be received for some time. Also, if you emit a control element every single time you receive back pressure you will likely create a flood of control elements.
One way to build this (overly naive) solution would be to use conflate.
val simpleSink: Sink[String, Future[Done]] =
Sink.foreach(e => println(s"simple: $e"))
val cycleSource: Source[String, NotUsed] =
Source.cycle(() => List("1", "2", "3", "4").iterator).throttle(5, 1.second)
val conflateFlow: Flow[String, String, NotUsed] =
Flow[String].conflate((a, b) => {
"BACKPRESSURE CONTROL ELEMENT"
})
val backpressureFlow: Flow[String, String, NotUsed] =
Flow[String]
.buffer(10, OverflowStrategy.backpressure) throttle (2, 1.second)
val backpressureTest =
cycleSource.via(conflateFlow).via(backpressureFlow).to(simpleSink).run()
To turn this into a more usable example you could either:
Make some sort of call inside of .conflate (and then just drop one of the elements). Be careful not to do anything blocking though. Perhaps just send a message that could be de-duplicated elsewhere.
Write a custom graph stage. Doing something simple like this wouldn't be too difficult.
I think I'd have to understand more about the use case though. Take a look at all of the off the shelf backpressure aware operators and see if one of them helps.

Update concurrent map inside a stream map on flink

I have one stream that constantly streaming the latest values of some keys.
Stream A:DataStream[(String,Double)]
I have another stream that wants to get the latest value on each process call.
My approach was to introduce a concurrentHashMap which will be updated by stream A and read by the second stream.
val rates = new concurrentHasMap[String,Double].asScala
val streamA : DataStream[(String,Double)]= ???
streamA.map(keyWithValue => rates(keyWithValue._1)= keyWithValue._2) //rates never gets updated
rates("testKey")=2 //this works
val streamB: DataStream[String] = ???
streamB.map(str=> rates(str) // rates does not contain the values of the streamA at this point
//some other functionality
)
Is it possible to update a concurrent map from a stream? Any other solution on sharing data from a stream with another is also acceptable
The behaviour You are trying to use will not work in a distributed manner, basically if You will have parellelism > 1 it will not work. In Your code rates are actually updated, but in different instance of parallel operator.
Actually, what You would like to do in this case is use a BroadcastState which was designed to solve exactly the issue You are facing.
In Your specific usecase it would look like something like this:
val streamA : DataStream[(String,Double)]= ???
val streamABroadcasted = streamA.broadcast(<Your Map State Definition>)
val streamB: DataStream[String] = ???
streamB.connect(streamABroadcasted)
Then You could easily use BroadcastProcessFunction to implement Your logic. More on the Broadcast state pattern can be found here

Keep.both example in Akka Stream

I would like to understand the Keep.both in akka stream but I could not find an easy on the internet.
Could someone please provide a very simple example about Keep.right and Keep.both.
I tried:
implicit val system = ActorSystem("KafkaProducer")
implicit val materializer = ActorMaterializer()
val source = Source.single("Hello")_
val sink = Sink.fold[String, String]("")(_ + _)
val runnable: RunnableGraph[Future[String]] = source.toMat(sink)(Keep.left)
runnable.run()
I know, it is maybe not a good example and hopefully, someone provide a better example.
Simplest scenario, you need a stream to process a bunch of elements that you are going to (1) provide from outside the stream and you need to know when the (2) stream finish processing all elements.
For (1) you could use a Source.queue that is materialized into a queue and you can push elements to it via offer.
val source = Source.queue[String](100,OverflowStrategy.backpressure)
For (2) you could use a Sink.foreach that is materialized into a Future[Done] which will be completed with Success when reaching the normal end of the stream, or completed with Failure if there is a failure signaled in the stream.
val sink = Sink.foreach[String](println)
Then you need to connect source and sink and Keep.both materialized values.
val materializedValues: (SourceQueueWithComplete[String], Future[Done]) = source.toMat(sink)(Keep.both).run()

What is the best way to combine akka-http flow in a scala-stream flow

I have an use case where after n flows of Akka-stream, I have to take the result of one of them and made a request to a HTTP REST API.
The last akka-stream flow type, before the HTTP request is a string:
val stream1:Flow[T,String,NotUsed] = Flow[T].map(_.toString)
Now, HTTP request should be specified, I thought about something like:
val stream2: Flow[String,Future[HttpRespone],NotUsed] = Flow[String].map(param => Http.singleRequest(HttpRequest(uri=s"host.com/$param")))
and then combine it:
val stream3 = stream1 via stream2
Is it the best way to do it? Which ways you guys would actually recommend and why? A couple of best praxis examples in the scope of this use case would be great!
Thanks in advance :)
Your implementation would create a new connection to "host.com" for each new param. This is unnecessary and prevents akka from making certain optimizations. Under the hood akka actually keeps a connection pool around to reuse open connections but I think it is better to specify your intentions in the code and not rely on the underlying implementation.
You can make a single connection as described in the documentation:
val connectionFlow: Flow[HttpRequest, HttpResponse, _] =
Http().outgoingConnection("host.com")
To utilize this connection Flow you'll need to convert your String paths to HttpRequest objects:
import akka.http.scaladsl.model.Uri
import akka.http.scaladsl.model.Uri.Path
def pathToRequest(path : String) = HttpRequest(uri=Uri.Empty.withPath(Path(path)))
val reqFlow = Flow[String] map pathToRequest
And, finally, glue all the flows together:
val stream3 = stream1 via reqFlow via connectionFlow
This is the most common pattern for continuously querying the same server with different request objects.