When materialiser is actually used in Akka Streams Flows and when do we need to Keep values - scala

I'm trying to learn Akka Streams and I'm stuck with this materialization here.
Every tutorial shows some basics source via to run examples where no real between Keep.left and Keep.right is explained. So I wrote this little piece of code, asked IntelliJ to add a type annotation to the values and started to dig the sources.
val single: Source[Int, NotUsed] = Source(Seq(1, 2, 3, 4, 5))
val flow: Flow[Int, Int, NotUsed] = Flow[Int].map(_ * 2)
val sink: Sink[Int, Future[Int]] = Sink.fold[Int, Int](0)(_ + _)
val run1: RunnableGraph[Future[Int]] =
single.viaMat(flow)(Keep.right).toMat(sink)(Keep.right)
val run2: RunnableGraph[NotUsed] =
single.viaMat(flow)(Keep.right).toMat(sink)(Keep.left)
val run3: RunnableGraph[(NotUsed, Future[Int])] =
single.viaMat(flow)(Keep.right).toMat(sink)(Keep.both)
val run4: RunnableGraph[NotUsed] =
single.viaMat(flow)(Keep.right).toMat(sink)(Keep.none)
So far I can understand that at the end of the execution we can need the value of the Sink that is of type Future[Int]. But I cannot think of any case when I gonna need to keep some of the values.
In the third example it is possible to acces both left and right values of the materialized output.
run3.run()._2 onComplete {
case Success(value) ⇒ println(value)
case Failure(exception) ⇒ println(exception.getMessage)
}
It actually works absolutely the same way if I change it to viaMat(flowMultiply)(Keep.left) or none or both.
But in what scenarios the materialized value could be used within the graph? Why would we need it if the value is flowing within anyway? Why do we need one of the values if we aren't gonna keep it?
Could you pelase provide an example where changing from left to right will not just break the compiler, but will actually bring a difference to the program logic?

For most streams, you only care about the value at the end of the stream. Accordingly, most of the Source and nearly all of the standard Flow operators have a materialized value of NotUsed, and the syntactic sugar .runWith boils down to .toMat(sink)(Keep.right).run.
Where one might care about the materialized value of a Source or Flow stage is when you want to be able to control a stage outside of the stream. An example of this is Source.actorRef, which allows you to send messages to an actor which get forwarded to the stream: you need the Source's materialized ActorRef in order to actually send a message to it. Likewise, you probably still want the materialized value of the Sink (whether to know that the stream processing happened (Future[Done]) or for an actual value at the end of the stream). In such a stream you'd probably have something like:
val stream: RunnableGraph[(ActorRef, Future[Done])] =
Source.actorRef(...)
.viaMat(calculateStuffFlow)(Keep.left) // propagates the ActorRef
.toMat(Sink.foreach { ... })(Keep.both)
val (sendToStream, done) = stream.run()
Another reasonably common use-case for this is in the Alpakka Kafka integration, where it's possible for the consumer to have a controller as a materialized value: this controller allows you to stop consuming from a topic and not unsubscribe until any pending offset commits have happened.

Related

When not to use NotUsed in Akka Stream

I have seen a lot of Akka stream examples that use NotUsed, but I have not been able to find one where something other than NotUsed was used. I tried creating a source that does not have NotUsed, but it does not seem to be passed to the next flow. I would appreciate if someone can explain when and how we want to have something other than NotUsed.
This part of Akka Streams documentation details the "materialized value" and how they compose (so that you can pass what you need to the next flow): https://doc.akka.io/docs/akka/current/stream/stream-composition.html#materialized-values
Also other parts of the documentation also use not-NotUsed values.
Like in the first snippet here
val source = Source(1 to 10)
val sink = Sink.fold[Int, Int](0)(_ + _)
// connect the Source to the Sink, obtaining a RunnableGraph
val runnable: RunnableGraph[Future[Int]] = source.toMat(sink)(Keep.right)
// materialize the flow and get the value of the sink
val sum: Future[Int] = runnable.run()

Detecting that back pressure is happening

My Akka HTTP application streams some data via server-sent events, and clients can request way more events than they can handle. The code looks something like this
complete {
source.filter(predicate.isMatch)
.buffer(1000, OverflowStrategy.dropTail)
.throttle(20, 1 second)
.map { evt => ServerSentEvent(evt) }
}
Is there a way to detect the fact that a stage backpressures and somehow notify the client preferably using the same sink (by emitting a different kind of output) or if not possible just make Akka framework call some sort of callback that will deal with the fact through a control side-channel?
So, I'm not sure I understand your use case. Are you asking about back pressure at .buffer or at .throttle? Another part of my confusion is that you are suggesting emitting a new "control" element in a situation where the stream is already back pressured. So your control element might not be received for some time. Also, if you emit a control element every single time you receive back pressure you will likely create a flood of control elements.
One way to build this (overly naive) solution would be to use conflate.
val simpleSink: Sink[String, Future[Done]] =
Sink.foreach(e => println(s"simple: $e"))
val cycleSource: Source[String, NotUsed] =
Source.cycle(() => List("1", "2", "3", "4").iterator).throttle(5, 1.second)
val conflateFlow: Flow[String, String, NotUsed] =
Flow[String].conflate((a, b) => {
"BACKPRESSURE CONTROL ELEMENT"
})
val backpressureFlow: Flow[String, String, NotUsed] =
Flow[String]
.buffer(10, OverflowStrategy.backpressure) throttle (2, 1.second)
val backpressureTest =
cycleSource.via(conflateFlow).via(backpressureFlow).to(simpleSink).run()
To turn this into a more usable example you could either:
Make some sort of call inside of .conflate (and then just drop one of the elements). Be careful not to do anything blocking though. Perhaps just send a message that could be de-duplicated elsewhere.
Write a custom graph stage. Doing something simple like this wouldn't be too difficult.
I think I'd have to understand more about the use case though. Take a look at all of the off the shelf backpressure aware operators and see if one of them helps.

Update concurrent map inside a stream map on flink

I have one stream that constantly streaming the latest values of some keys.
Stream A:DataStream[(String,Double)]
I have another stream that wants to get the latest value on each process call.
My approach was to introduce a concurrentHashMap which will be updated by stream A and read by the second stream.
val rates = new concurrentHasMap[String,Double].asScala
val streamA : DataStream[(String,Double)]= ???
streamA.map(keyWithValue => rates(keyWithValue._1)= keyWithValue._2) //rates never gets updated
rates("testKey")=2 //this works
val streamB: DataStream[String] = ???
streamB.map(str=> rates(str) // rates does not contain the values of the streamA at this point
//some other functionality
)
Is it possible to update a concurrent map from a stream? Any other solution on sharing data from a stream with another is also acceptable
The behaviour You are trying to use will not work in a distributed manner, basically if You will have parellelism > 1 it will not work. In Your code rates are actually updated, but in different instance of parallel operator.
Actually, what You would like to do in this case is use a BroadcastState which was designed to solve exactly the issue You are facing.
In Your specific usecase it would look like something like this:
val streamA : DataStream[(String,Double)]= ???
val streamABroadcasted = streamA.broadcast(<Your Map State Definition>)
val streamB: DataStream[String] = ???
streamB.connect(streamABroadcasted)
Then You could easily use BroadcastProcessFunction to implement Your logic. More on the Broadcast state pattern can be found here

What does Keep in akka stream mean?

I am learning akka stream and encounter Keep.left and Keep.right in the code:
implicit val system = ActorSystem("KafkaProducer")
implicit val materializer = ActorMaterializer()
val source = Source(List("a", "b", "c"))
val sink = Sink.fold[String, String]("")(_ + _)
val runnable: RunnableGraph[Future[String]] = source.toMat(sink)(Keep.right)
val result: Future[String] = runnable.run()
What does here Keep.right mean?
Every stream processing stage can produce a materialized value which can be captured using viaMat or toMat (as opposed to via() or to(), respectively). In your code snippet, the using of source.toMat(sink) indicates that you're interested in capturing the materialized value of the source and sink and Keep.right keeps the right side (i.e. sink) of the materialized value. Keep.left would keep the materialized value on the left side (i.e. source), and Keep.both would allow you to keep both.
More details is available in relevant sections in the Akka Streams documentation.
Keep.left keeps only the left (first) of the input values. Keep.right only keeps the right (second) of two input values.
Because jvm does type erasure, Akka does not care about the type between Source Flow Sink when designing Stream, but Source Flow Sink is a typed class.
In order for the type to pass between the components, you need to attach a description value, which is the meaning of Keep.
By default, the message type in the final result is consistent with the leftmost Source , Keep.left(). You can also set it to something else like Keep.right().
Strictly speaking, Keep specifies the value type, not the value.
enter link description here
val r3: RunnableGraph[Future[Int]] = source.via(flow).toMat(sink)(Keep.right)
Change "Keep.right" to "Keep left":
val r3: RunnableGraph[Future[Int]] = source.via(flow).toMat(sink)(Keep.left)
Will get "type mismatch " error.

Akka-streams: execute action on flow start

Having a flow description in akka-streams
val flow: Flow[Input, Output, Unit] = ???
, how do I modify it to get a new flow description that performs a specified side-affect on start, i.e. when flow is materialized?
Starting materialization of a stream processing graph will set it in motion piece by piece, concurrently. The only way to perform an action that is guaranteed to happen before the first element is passed somewhere within that graph is to perform that action before materializing the graph. In this sense the answer by sschaef is slightly incorrect: using mapMaterializedValue runs the action pretty early, but not such that it is guaranteed to happend before the first element is processed.
If we are talking about a Flow here which only takes in inputs on one side and produces outputs on the other—i.e. it does not contain internal cycles or data sources—then one thing you can do to perform an action before the first element arrives is to attach a processing step to its input that does that:
def effectSource[T](block: => Unit) = Source.fromIterator(() => {block; Iterator.empty})
val newFlow = Flow[Input].prepend(effectSource(/* do stuff */)).via(flow)
Note
The above is using upcoming 2.0 syntax, in Akka Streams 1.0 it would be Source(() => { block; Iterator.empty }) and the prepend operation would need to be done using the FlowGraph DSL (the graph can be found here).
You said it by yourself, use the force of the materialization:
val newFlow = flow.mapMaterializedValue(_ ⇒ println("materialized"))