Keep.both example in Akka Stream - scala

I would like to understand the Keep.both in akka stream but I could not find an easy on the internet.
Could someone please provide a very simple example about Keep.right and Keep.both.
I tried:
implicit val system = ActorSystem("KafkaProducer")
implicit val materializer = ActorMaterializer()
val source = Source.single("Hello")_
val sink = Sink.fold[String, String]("")(_ + _)
val runnable: RunnableGraph[Future[String]] = source.toMat(sink)(Keep.left)
runnable.run()
I know, it is maybe not a good example and hopefully, someone provide a better example.

Simplest scenario, you need a stream to process a bunch of elements that you are going to (1) provide from outside the stream and you need to know when the (2) stream finish processing all elements.
For (1) you could use a Source.queue that is materialized into a queue and you can push elements to it via offer.
val source = Source.queue[String](100,OverflowStrategy.backpressure)
For (2) you could use a Sink.foreach that is materialized into a Future[Done] which will be completed with Success when reaching the normal end of the stream, or completed with Failure if there is a failure signaled in the stream.
val sink = Sink.foreach[String](println)
Then you need to connect source and sink and Keep.both materialized values.
val materializedValues: (SourceQueueWithComplete[String], Future[Done]) = source.toMat(sink)(Keep.both).run()

Related

When not to use NotUsed in Akka Stream

I have seen a lot of Akka stream examples that use NotUsed, but I have not been able to find one where something other than NotUsed was used. I tried creating a source that does not have NotUsed, but it does not seem to be passed to the next flow. I would appreciate if someone can explain when and how we want to have something other than NotUsed.
This part of Akka Streams documentation details the "materialized value" and how they compose (so that you can pass what you need to the next flow): https://doc.akka.io/docs/akka/current/stream/stream-composition.html#materialized-values
Also other parts of the documentation also use not-NotUsed values.
Like in the first snippet here
val source = Source(1 to 10)
val sink = Sink.fold[Int, Int](0)(_ + _)
// connect the Source to the Sink, obtaining a RunnableGraph
val runnable: RunnableGraph[Future[Int]] = source.toMat(sink)(Keep.right)
// materialize the flow and get the value of the sink
val sum: Future[Int] = runnable.run()

akka streaming file lines to actor router and writing with single actor. how to handle the backpressure

I want to stream a file from s3 to actor to be parsed and enriched and to write the output to other file.
The number of parserActors should be limited e.g
application.conf
akka{
actor{
deployment {
HereClient/router1 {
router = round-robin-pool
nr-of-instances = 28
}
}
}
}
code
val writerActor = actorSystem.actorOf(WriterActor.props())
val parser = actorSystem.actorOf(FromConfig.props(ParsingActor.props(writerActor)), "router1")
however the actor that is writing to a file should be limited to 1 (singleton)
I tried doing something like
val reader: ParquetReader[GenericRecord] = AvroParquetReader.builder[GenericRecord](file).withConf(conf).build()
val source: Source[GenericRecord, NotUsed] = AvroParquetSource(reader)
source.map (record => record ! parser)
but I am not sure that the backpressure is handled correctly. any advice ?
Indeed your solution is disregarding backpressure.
The correct way to have a stream interact with an actor while maintaining backpressure is to use the ask pattern support of akka-stream (reference).
From my understanding of your example you have 2 separate actor interaction points:
send records to the parsing actors (via a router)
send parsed records to the singleton write actor
What I would do is something similar to the following:
val writerActor = actorSystem.actorOf(WriterActor.props())
val parserActor = actorSystem.actorOf(FromConfig.props(ParsingActor.props(writerActor)), "router1")
val reader: ParquetReader[GenericRecord] = AvroParquetReader.builder[GenericRecord](file).withConf(conf).build()
val source: Source[GenericRecord, NotUsed] = AvroParquetSource(reader)
source.ask[ParsedRecord](28)(parserActor)
.ask[WriteAck](writerActor)
.runWith(Sink.ignore)
The idea is that you send all the GenericRecord elements to the parserActor which will reply with a ParsedRecord. Here as an example we specify a parallelism of 28 since that's the number of instances you have configured, however as long as you use a value higher than the actual number of actor instances no actor should suffer from work starvation.
Once the parseActor replies with the parsing result (here represented by the ParsedRecord) we apply the same pattern to interact with the singleton writer actor. Note that here we don't specify the parallelism as we have a single instance so it doesn't make sense the send more than 1 message at a time (in reality this happens anyway due to buffering at async boundaries, but this is just a built-in optimization). In this case we expect that the writer actor replies with a WriteAck to inform us that the writing has been successful and we can send the next element.
Using this method you are maintaining backpressure throughout your whole stream.
I think you should be using one of the "async" operations
Perhaps this other q/a gives you some insperation Processing an akka stream asynchronously and writing to a file sink

When materialiser is actually used in Akka Streams Flows and when do we need to Keep values

I'm trying to learn Akka Streams and I'm stuck with this materialization here.
Every tutorial shows some basics source via to run examples where no real between Keep.left and Keep.right is explained. So I wrote this little piece of code, asked IntelliJ to add a type annotation to the values and started to dig the sources.
val single: Source[Int, NotUsed] = Source(Seq(1, 2, 3, 4, 5))
val flow: Flow[Int, Int, NotUsed] = Flow[Int].map(_ * 2)
val sink: Sink[Int, Future[Int]] = Sink.fold[Int, Int](0)(_ + _)
val run1: RunnableGraph[Future[Int]] =
single.viaMat(flow)(Keep.right).toMat(sink)(Keep.right)
val run2: RunnableGraph[NotUsed] =
single.viaMat(flow)(Keep.right).toMat(sink)(Keep.left)
val run3: RunnableGraph[(NotUsed, Future[Int])] =
single.viaMat(flow)(Keep.right).toMat(sink)(Keep.both)
val run4: RunnableGraph[NotUsed] =
single.viaMat(flow)(Keep.right).toMat(sink)(Keep.none)
So far I can understand that at the end of the execution we can need the value of the Sink that is of type Future[Int]. But I cannot think of any case when I gonna need to keep some of the values.
In the third example it is possible to acces both left and right values of the materialized output.
run3.run()._2 onComplete {
case Success(value) ⇒ println(value)
case Failure(exception) ⇒ println(exception.getMessage)
}
It actually works absolutely the same way if I change it to viaMat(flowMultiply)(Keep.left) or none or both.
But in what scenarios the materialized value could be used within the graph? Why would we need it if the value is flowing within anyway? Why do we need one of the values if we aren't gonna keep it?
Could you pelase provide an example where changing from left to right will not just break the compiler, but will actually bring a difference to the program logic?
For most streams, you only care about the value at the end of the stream. Accordingly, most of the Source and nearly all of the standard Flow operators have a materialized value of NotUsed, and the syntactic sugar .runWith boils down to .toMat(sink)(Keep.right).run.
Where one might care about the materialized value of a Source or Flow stage is when you want to be able to control a stage outside of the stream. An example of this is Source.actorRef, which allows you to send messages to an actor which get forwarded to the stream: you need the Source's materialized ActorRef in order to actually send a message to it. Likewise, you probably still want the materialized value of the Sink (whether to know that the stream processing happened (Future[Done]) or for an actual value at the end of the stream). In such a stream you'd probably have something like:
val stream: RunnableGraph[(ActorRef, Future[Done])] =
Source.actorRef(...)
.viaMat(calculateStuffFlow)(Keep.left) // propagates the ActorRef
.toMat(Sink.foreach { ... })(Keep.both)
val (sendToStream, done) = stream.run()
Another reasonably common use-case for this is in the Alpakka Kafka integration, where it's possible for the consumer to have a controller as a materialized value: this controller allows you to stop consuming from a topic and not unsubscribe until any pending offset commits have happened.

Update concurrent map inside a stream map on flink

I have one stream that constantly streaming the latest values of some keys.
Stream A:DataStream[(String,Double)]
I have another stream that wants to get the latest value on each process call.
My approach was to introduce a concurrentHashMap which will be updated by stream A and read by the second stream.
val rates = new concurrentHasMap[String,Double].asScala
val streamA : DataStream[(String,Double)]= ???
streamA.map(keyWithValue => rates(keyWithValue._1)= keyWithValue._2) //rates never gets updated
rates("testKey")=2 //this works
val streamB: DataStream[String] = ???
streamB.map(str=> rates(str) // rates does not contain the values of the streamA at this point
//some other functionality
)
Is it possible to update a concurrent map from a stream? Any other solution on sharing data from a stream with another is also acceptable
The behaviour You are trying to use will not work in a distributed manner, basically if You will have parellelism > 1 it will not work. In Your code rates are actually updated, but in different instance of parallel operator.
Actually, what You would like to do in this case is use a BroadcastState which was designed to solve exactly the issue You are facing.
In Your specific usecase it would look like something like this:
val streamA : DataStream[(String,Double)]= ???
val streamABroadcasted = streamA.broadcast(<Your Map State Definition>)
val streamB: DataStream[String] = ???
streamB.connect(streamABroadcasted)
Then You could easily use BroadcastProcessFunction to implement Your logic. More on the Broadcast state pattern can be found here

Akka-streams - how to access the materialized value of the stream

I am learning to work with Akka streams, and really loving it, but the materialization part is still somewhat a mystery to me.
Quoting from http://doc.akka.io/docs/akka-stream-and-http-experimental/snapshot/scala/http/client-side/host-level.html#host-level-api
... trigger the immediate shutdown of a specific pool by calling
shutdown() on the HostConnectionPool instance that the pool client
flow materializes into
How do I get hold of the HostConnectionPool instance?
Even more importantly, I'd like to understand in general how to access the materialized value and perform some operation or retrieve information from it.
Thanks in advance for any documentation pointers or explanation.
This is accomplished with the Source.viaMat function. Extending the code example from the link provided in your question:
import akka.http.scaladsl.Http.HostConnectionPool
import akka.stream.scaladsl.Keep
import akka.stream.scaladsl.RunnableGraph
val poolClientFlow = Http().cachedHostConnectionPool[Int]("akka.io")
val graph: RunnableGraph[(HostConnectionPool, Future[(Try[HttpResponse], Int)])] =
Source.single(HttpRequest(uri = "/") -> 42)
.viaMat(poolClientFlow)(Keep.right)
.toMat(Sink.head)(Keep.both)
val (pool, fut) = graph.run()
pool.shutdown()
Since Source.single materializes into Unit the Keep.right says to keep the HostConnectionPool which the poolClientFlow materializes into. In the .toMat function the Keep.both says to keep both the left pool from the flow and the right Future from the Sink.