What does Keep in akka stream mean? - scala

I am learning akka stream and encounter Keep.left and Keep.right in the code:
implicit val system = ActorSystem("KafkaProducer")
implicit val materializer = ActorMaterializer()
val source = Source(List("a", "b", "c"))
val sink = Sink.fold[String, String]("")(_ + _)
val runnable: RunnableGraph[Future[String]] = source.toMat(sink)(Keep.right)
val result: Future[String] = runnable.run()
What does here Keep.right mean?

Every stream processing stage can produce a materialized value which can be captured using viaMat or toMat (as opposed to via() or to(), respectively). In your code snippet, the using of source.toMat(sink) indicates that you're interested in capturing the materialized value of the source and sink and Keep.right keeps the right side (i.e. sink) of the materialized value. Keep.left would keep the materialized value on the left side (i.e. source), and Keep.both would allow you to keep both.
More details is available in relevant sections in the Akka Streams documentation.

Keep.left keeps only the left (first) of the input values. Keep.right only keeps the right (second) of two input values.

Because jvm does type erasure, Akka does not care about the type between Source Flow Sink when designing Stream, but Source Flow Sink is a typed class.
In order for the type to pass between the components, you need to attach a description value, which is the meaning of Keep.
By default, the message type in the final result is consistent with the leftmost Source , Keep.left(). You can also set it to something else like Keep.right().
Strictly speaking, Keep specifies the value type, not the value.
enter link description here
val r3: RunnableGraph[Future[Int]] = source.via(flow).toMat(sink)(Keep.right)
Change "Keep.right" to "Keep left":
val r3: RunnableGraph[Future[Int]] = source.via(flow).toMat(sink)(Keep.left)
Will get "type mismatch " error.

Related

When not to use NotUsed in Akka Stream

I have seen a lot of Akka stream examples that use NotUsed, but I have not been able to find one where something other than NotUsed was used. I tried creating a source that does not have NotUsed, but it does not seem to be passed to the next flow. I would appreciate if someone can explain when and how we want to have something other than NotUsed.
This part of Akka Streams documentation details the "materialized value" and how they compose (so that you can pass what you need to the next flow): https://doc.akka.io/docs/akka/current/stream/stream-composition.html#materialized-values
Also other parts of the documentation also use not-NotUsed values.
Like in the first snippet here
val source = Source(1 to 10)
val sink = Sink.fold[Int, Int](0)(_ + _)
// connect the Source to the Sink, obtaining a RunnableGraph
val runnable: RunnableGraph[Future[Int]] = source.toMat(sink)(Keep.right)
// materialize the flow and get the value of the sink
val sum: Future[Int] = runnable.run()

When materialiser is actually used in Akka Streams Flows and when do we need to Keep values

I'm trying to learn Akka Streams and I'm stuck with this materialization here.
Every tutorial shows some basics source via to run examples where no real between Keep.left and Keep.right is explained. So I wrote this little piece of code, asked IntelliJ to add a type annotation to the values and started to dig the sources.
val single: Source[Int, NotUsed] = Source(Seq(1, 2, 3, 4, 5))
val flow: Flow[Int, Int, NotUsed] = Flow[Int].map(_ * 2)
val sink: Sink[Int, Future[Int]] = Sink.fold[Int, Int](0)(_ + _)
val run1: RunnableGraph[Future[Int]] =
single.viaMat(flow)(Keep.right).toMat(sink)(Keep.right)
val run2: RunnableGraph[NotUsed] =
single.viaMat(flow)(Keep.right).toMat(sink)(Keep.left)
val run3: RunnableGraph[(NotUsed, Future[Int])] =
single.viaMat(flow)(Keep.right).toMat(sink)(Keep.both)
val run4: RunnableGraph[NotUsed] =
single.viaMat(flow)(Keep.right).toMat(sink)(Keep.none)
So far I can understand that at the end of the execution we can need the value of the Sink that is of type Future[Int]. But I cannot think of any case when I gonna need to keep some of the values.
In the third example it is possible to acces both left and right values of the materialized output.
run3.run()._2 onComplete {
case Success(value) ⇒ println(value)
case Failure(exception) ⇒ println(exception.getMessage)
}
It actually works absolutely the same way if I change it to viaMat(flowMultiply)(Keep.left) or none or both.
But in what scenarios the materialized value could be used within the graph? Why would we need it if the value is flowing within anyway? Why do we need one of the values if we aren't gonna keep it?
Could you pelase provide an example where changing from left to right will not just break the compiler, but will actually bring a difference to the program logic?
For most streams, you only care about the value at the end of the stream. Accordingly, most of the Source and nearly all of the standard Flow operators have a materialized value of NotUsed, and the syntactic sugar .runWith boils down to .toMat(sink)(Keep.right).run.
Where one might care about the materialized value of a Source or Flow stage is when you want to be able to control a stage outside of the stream. An example of this is Source.actorRef, which allows you to send messages to an actor which get forwarded to the stream: you need the Source's materialized ActorRef in order to actually send a message to it. Likewise, you probably still want the materialized value of the Sink (whether to know that the stream processing happened (Future[Done]) or for an actual value at the end of the stream). In such a stream you'd probably have something like:
val stream: RunnableGraph[(ActorRef, Future[Done])] =
Source.actorRef(...)
.viaMat(calculateStuffFlow)(Keep.left) // propagates the ActorRef
.toMat(Sink.foreach { ... })(Keep.both)
val (sendToStream, done) = stream.run()
Another reasonably common use-case for this is in the Alpakka Kafka integration, where it's possible for the consumer to have a controller as a materialized value: this controller allows you to stop consuming from a topic and not unsubscribe until any pending offset commits have happened.

Update concurrent map inside a stream map on flink

I have one stream that constantly streaming the latest values of some keys.
Stream A:DataStream[(String,Double)]
I have another stream that wants to get the latest value on each process call.
My approach was to introduce a concurrentHashMap which will be updated by stream A and read by the second stream.
val rates = new concurrentHasMap[String,Double].asScala
val streamA : DataStream[(String,Double)]= ???
streamA.map(keyWithValue => rates(keyWithValue._1)= keyWithValue._2) //rates never gets updated
rates("testKey")=2 //this works
val streamB: DataStream[String] = ???
streamB.map(str=> rates(str) // rates does not contain the values of the streamA at this point
//some other functionality
)
Is it possible to update a concurrent map from a stream? Any other solution on sharing data from a stream with another is also acceptable
The behaviour You are trying to use will not work in a distributed manner, basically if You will have parellelism > 1 it will not work. In Your code rates are actually updated, but in different instance of parallel operator.
Actually, what You would like to do in this case is use a BroadcastState which was designed to solve exactly the issue You are facing.
In Your specific usecase it would look like something like this:
val streamA : DataStream[(String,Double)]= ???
val streamABroadcasted = streamA.broadcast(<Your Map State Definition>)
val streamB: DataStream[String] = ???
streamB.connect(streamABroadcasted)
Then You could easily use BroadcastProcessFunction to implement Your logic. More on the Broadcast state pattern can be found here

Keep.both example in Akka Stream

I would like to understand the Keep.both in akka stream but I could not find an easy on the internet.
Could someone please provide a very simple example about Keep.right and Keep.both.
I tried:
implicit val system = ActorSystem("KafkaProducer")
implicit val materializer = ActorMaterializer()
val source = Source.single("Hello")_
val sink = Sink.fold[String, String]("")(_ + _)
val runnable: RunnableGraph[Future[String]] = source.toMat(sink)(Keep.left)
runnable.run()
I know, it is maybe not a good example and hopefully, someone provide a better example.
Simplest scenario, you need a stream to process a bunch of elements that you are going to (1) provide from outside the stream and you need to know when the (2) stream finish processing all elements.
For (1) you could use a Source.queue that is materialized into a queue and you can push elements to it via offer.
val source = Source.queue[String](100,OverflowStrategy.backpressure)
For (2) you could use a Sink.foreach that is materialized into a Future[Done] which will be completed with Success when reaching the normal end of the stream, or completed with Failure if there is a failure signaled in the stream.
val sink = Sink.foreach[String](println)
Then you need to connect source and sink and Keep.both materialized values.
val materializedValues: (SourceQueueWithComplete[String], Future[Done]) = source.toMat(sink)(Keep.both).run()

Akka-streams - how to access the materialized value of the stream

I am learning to work with Akka streams, and really loving it, but the materialization part is still somewhat a mystery to me.
Quoting from http://doc.akka.io/docs/akka-stream-and-http-experimental/snapshot/scala/http/client-side/host-level.html#host-level-api
... trigger the immediate shutdown of a specific pool by calling
shutdown() on the HostConnectionPool instance that the pool client
flow materializes into
How do I get hold of the HostConnectionPool instance?
Even more importantly, I'd like to understand in general how to access the materialized value and perform some operation or retrieve information from it.
Thanks in advance for any documentation pointers or explanation.
This is accomplished with the Source.viaMat function. Extending the code example from the link provided in your question:
import akka.http.scaladsl.Http.HostConnectionPool
import akka.stream.scaladsl.Keep
import akka.stream.scaladsl.RunnableGraph
val poolClientFlow = Http().cachedHostConnectionPool[Int]("akka.io")
val graph: RunnableGraph[(HostConnectionPool, Future[(Try[HttpResponse], Int)])] =
Source.single(HttpRequest(uri = "/") -> 42)
.viaMat(poolClientFlow)(Keep.right)
.toMat(Sink.head)(Keep.both)
val (pool, fut) = graph.run()
pool.shutdown()
Since Source.single materializes into Unit the Keep.right says to keep the HostConnectionPool which the poolClientFlow materializes into. In the .toMat function the Keep.both says to keep both the left pool from the flow and the right Future from the Sink.