Akka Stream return object from Sink - scala

I've got a SourceQueue. When I offer an element to this I want it to pass through the Stream and when it reaches the Sink have the output returned to the code that offered this element (similar as Sink.head returns an element to the RunnableGraph.run() call).
How do I achieve this? A simple example of my problem would be:
val source = Source.queue[String](100, OverflowStrategy.fail)
val flow = Flow[String].map(element => s"Modified $element")
val sink = Sink.ReturnTheStringSomehow
val graph = source.via(flow).to(sink).run()
val x = graph.offer("foo")
println(x) // Output should be "Modified foo"
val y = graph.offer("bar")
println(y) // Output should be "Modified bar"
val z = graph.offer("baz")
println(z) // Output should be "Modified baz"
Edit: For the example I have given in this question Vladimir Matveev provided the best answer. However, it should be noted that this solution only works if the elements are going into the sink in the same order they were offered to the source. If this cannot be guaranteed the order of the elements in the sink may differ and the outcome might be different from what is expected.

I believe it is simpler to use the already existing primitive for pulling values from a stream, called Sink.queue. Here is an example:
val source = Source.queue[String](128, OverflowStrategy.fail)
val flow = Flow[String].map(element => s"Modified $element")
val sink = Sink.queue[String]().withAttributes(Attributes.inputBuffer(1, 1))
val (sourceQueue, sinkQueue) = source.via(flow).toMat(sink)(Keep.both).run()
def getNext: String = Await.result(sinkQueue.pull(), 1.second).get
sourceQueue.offer("foo")
println(getNext)
sourceQueue.offer("bar")
println(getNext)
sourceQueue.offer("baz")
println(getNext)
It does exactly what you want.
Note that setting the inputBuffer attribute for the queue sink may or may not be important for your use case - if you don't set it, the buffer will be zero-sized and the data won't flow through the stream until you invoke the pull() method on the sink.
sinkQueue.pull() yields a Future[Option[T]], which will be completed successfully with Some if the sink receives an element or with a failure if the stream fails. If the stream completes normally, it will be completed with None. In this particular example I'm ignoring this by using Option.get but you would probably want to add custom logic to handle this case.

Well, you know what offer() method returns if you take a look at its definition :) What you can do is to create Source.queue[(Promise[String], String)], create helper function that pushes pair to stream via offer, make sure offer doesn't fail because queue might be full, then complete promise inside your stream and use future of the promise to catch completion event in external code.
I do that to throttle rate to external API used from multiple places of my project.
Here is how it looked in my project before Typesafe added Hub sources to akka
import scala.concurrent.Promise
import scala.concurrent.Future
import java.util.concurrent.ConcurrentLinkedDeque
import akka.stream.scaladsl.{Keep, Sink, Source}
import akka.stream.{OverflowStrategy, QueueOfferResult}
import scala.util.Success
private val queue = Source.queue[(Promise[String], String)](100, OverflowStrategy.backpressure)
.toMat(Sink.foreach({ case (p, param) =>
p.complete(Success(param.reverse))
}))(Keep.left)
.run
private val futureDeque = new ConcurrentLinkedDeque[Future[String]]()
private def sendQueuedRequest(request: String): Future[String] = {
val p = Promise[String]
val offerFuture = queue.offer(p -> request)
def addToQueue(future: Future[String]): Future[String] = {
futureDeque.addLast(future)
future.onComplete(_ => futureDeque.remove(future))
future
}
offerFuture.flatMap {
case QueueOfferResult.Enqueued =>
addToQueue(p.future)
}.recoverWith {
case ex =>
val first = futureDeque.pollFirst()
if (first != null)
addToQueue(first.flatMap(_ => sendQueuedRequest(request)))
else
sendQueuedRequest(request)
}
}
I realize that blocking synchronized queue may be bottleneck and may grow indefinitely but because API calls in my project are made only from other akka streams which are backpressured I never have more than dozen items in futureDeque. Your situation may differ.
If you create MergeHub.source[(Promise[String], String)]() instead you'll get reusable sink. Thus every time you need to process item you'll create complete graph and run it. In that case you won't need hacky java container to queue requests.

Related

How to read only Successful values from a Seq of Futures

I am learning akka/scala and am trying to read only those Futures that succeeded from a Seq[Future[Int]] but cant get anything to work.
I simulated an array of 10 Future[Int] some of which fail depending on the value FailThreshold takes (all fail for 10 and none fail for 0).
I then try to read them into an ArrayBuffer (could not find a way to return immutable structure with the values).
Also, there isn't a filter on Success/Failure so had to run an onComplete on each future and update buffer as a side-effect.
Even when the FailThreshold=0 and the Seq has all Future set to Success, the array buffer is sometimes empty and different runs return array of different sizes.
I tried a few other suggestions from the web like using Future.sequence on the list but this throws exception if any of future variables fail.
import akka.actor._
import akka.pattern.ask
import scala.concurrent.{Await, Future, Promise}
import scala.concurrent.duration._
import scala.util.{Timeout, Failure, Success}
import concurrent.ExecutionContext.Implicits.global
case object AskNameMessage
implicit val timeout = Timeout(5, SECONDS)
val FailThreshold = 0
class HeyActor(num: Int) extends Actor {
def receive = {
case AskNameMessage => if (num<FailThreshold) {Thread.sleep(1000);sender ! num} else sender ! num
}
}
class FLPActor extends Actor {
def receive = {
case t: IndexedSeq[Future[Int]] => {
println(t)
val b = scala.collection.mutable.ArrayBuffer.empty[Int]
t.foldLeft( b ){ case (bf,ft) =>
ft.onComplete { case Success(v) => bf += ft.value.get.get }
bf
}
println(b)
}
}
}
val system = ActorSystem("AskTest")
val flm = (0 to 10).map( (n) => system.actorOf(Props(new HeyActor(n)), name="futureListMake"+(n)) )
val flp = system.actorOf(Props(new FLPActor), name="futureListProcessor")
// val delay = akka.pattern.after(500 millis, using=system.scheduler)(Future.failed( throw new IllegalArgumentException("DONE!") ))
val delay = akka.pattern.after(500 millis, using=system.scheduler)(Future.successful(0))
val seqOfFtrs = (0 to 10).map( (n) => Future.firstCompletedOf( Seq(delay, flm(n) ? AskNameMessage) ).mapTo[Int] )
flp ! seqOfFtrs
The receive in FLPActor mostly gets
Vector(Future(Success(0)), Future(Success(1)), Future(Success(2)), Future(Success(3)), Future(Success(4)), Future(Success(5)), Future(Success(6)), Future(Success(7)), Future(Success(8)), Future(Success(9)), Future(Success(10)))
but the array buffer b has varying number of values and empty at times.
Can someone please point me to gaps here,
why would the array buffer have varying sizes even when all Future have resolved to Success,
what is the correct pattern to use when we want to ask different actors with TimeOut and use only those asks that have successfully returned for further processing.
Instead of directly sending the IndexedSeq[Future[Int]], you should transform to Future[IndexedSeq[Int]] and then pipe it to the next actor. You don't send the Futures directly to an actor. You have to pipe it.
HeyActor can stay unchanged.
After
val seqOfFtrs = (0 to 10).map( (n) => Future.firstCompletedOf( Seq(delay, flm(n) ? AskNameMessage) ).mapTo[Int] )
do a recover, and use Future.sequence to turn it into one Future:
val oneFut = Future.sequence(seqOfFtrs.map(f=>f.map(Some(_)).recover{ case (ex: Throwable) => None})).map(_.flatten)
If you don't understand the business with Some, None, and flatten, then make sure you understand the Option type. One way to remove values from a sequence is to map values in the sequence to Option (either Some or None) and then to flatten the sequence. The None values are removed and the Some values are unwrapped.
After you have transformed your data into a single Future, pipe it over to FLPActor:
oneFut pipeTo flp
FLPActor should be rewritten with the following receive function:
def receive = {
case printme: IndexedSeq[Int] => println(printme)
}
In Akka, modifying some state in the main thread of your actor from a Future or the onComplete of a Future is a big no-no. In the worst case, it results in race conditions. Remember that each Future runs on its own thread, so running a Future inside an actor means you have concurrent work being done in different threads. Having the Future directly modify some state in your actor while the actor is also processing some state is a recipe for disaster. In Akka, you process all changes to state directly in the primary thread of execution of the main actor. If you have some work done in a Future and need to access that work from the main thread of an actor, you pipe it to that actor. The pipeTo pattern is functional, correct, and safe for accessing the finished computation of a Future.
To answer your question about why FLPActor is not printing out the IndexedSeq correctly: you are printing out the ArrayBuffer before your Futures have been completed. onComplete isn't the right idiom to use in this case, and you should avoid it in general as it isn't good functional style.
Don't forget the import akka.pattern.pipe for the pipeTo syntax.

Facing Issue in using Scala + Slick + MySQL+ Akka + Stream

Problem Statement : We are adding all incoming request parameters of user for particular module in MySQL DB table as a row (this is a huge data). Now, we want to design a process which will read each record from this table and will get more information about that request of user by calling third party APIs and after that it will put this returned meta information in another table.
Current Attempts:
I am using Scala + Slick to do this. As the data to read is huge, I want to read this table one row at a time and process it. I tried using slick + akka streams, however I am getting 'java.util.concurrent.RejectedExecutionException'
Following is the rough logic that I have tried,
implicit val system = ActorSystem("Example")
import system.dispatcher
implicit val materializer = ActorMaterializer()
val future = db.stream(SomeQuery.result)
Source.fromPublisher(future).map(row => {
id = dataEnrichmentAPI.process(row)
}).runForeach(id => println("Processed row : "+ id))
dataEnrichmentAPI.process : This function makes a third party REST call and also does some DB query to get required data. This DB query is done using 'db.run' method and it also waits until it finishes (Using Await)
e.g.,
def process(row: RequestRecord): Int = {
// SomeQuery2 = Check if data is already there in DB
val retId: Seq[Int] = Await.result(db.run(SomeQuery2.result), Duration.Inf)
if(retId.isEmpty){
val metaData = RestCall()
// SomeQuery3 = Store this metaData in DB
Await.result(db.run(SomeQuery3.result), Duration.Inf)
return metaData.id;
}else{
// SomeQuery4 = Get meta data id
return Await.result(db.run(SomeQuery4.result), Duration.Inf)
}
}
I am getting this exception where I am using blocking call to DB. I don't think if I can get rid of it as return value is required for later flow to continue.
Does 'blocking call' is a reason behind this Exception ?
What is the best practice to solve this kind of problem ?
Thanks.
I don't know if this is your problem (too few details) but you should never block.
Speaking of best practices, us async stages instead.
This is more or less what your code would look like without using Await.result:
def process(row: RequestRecord): Future[Int] = {
db.run(SomeQuery2.result) flatMap {
case retId if retId.isEmpty =>
// what is this? is it a sync call? if it's a rest call it should return a future
val metaData = RestCall()
db.run(SomeQuery3.result).map(_ => metaData.id)
case _ => db.run(SomeQuery4.result)
}
}
Source.fromPublisher(db.stream(SomeQuery.result))
// choose your own parallelism
.mapAsync(2)(dataEnrichmentAPI.process)
.runForeach(id => println("Processed row : "+ id))
This way you will be handling backpressure and parallelism explicitly and idiomatically.
Try to never call Await.result in production code and only compose futures using map, flatMap and for comprehensions

How to use an Akka Streams SourceQueue with PlayFramework

I would like to use a SourceQueue to push elements dynamically into an Akka Stream source.
Play controller needs a Source to be able to stream a result using the chuncked method.
As Play uses its own Akka Stream Sink under the hood, I can't materialize the source queue myself using a Sink because the source would be consumed before it's used by the chunked method (except if I use the following hack).
I'm able to make it work if I pre-materialize the source queue using a reactive-streams publisher, but it's a kind of 'dirty hack' :
def sourceQueueAction = Action{
val (queue, pub) = Source.queue[String](10, OverflowStrategy.fail).toMat(Sink.asPublisher(false))(Keep.both).run()
//stupid example to push elements dynamically
val tick = Source.tick(0 second, 1 second, "tick")
tick.runForeach(t => queue.offer(t))
Ok.chunked(Source.fromPublisher(pub))
}
Is there a simpler way to use an Akka Streams SourceQueue with PlayFramework?
Thanks
The solution is to use mapMaterializedValue on the source to get a future of its queue materialization :
def sourceQueueAction = Action {
val (queueSource, futureQueue) = peekMatValue(Source.queue[String](10, OverflowStrategy.fail))
futureQueue.map { queue =>
Source.tick(0.second, 1.second, "tick")
.runForeach (t => queue.offer(t))
}
Ok.chunked(queueSource)
}
//T is the source type, here String
//M is the materialization type, here a SourceQueue[String]
def peekMatValue[T, M](src: Source[T, M]): (Source[T, M], Future[M]) = {
val p = Promise[M]
val s = src.mapMaterializedValue { m =>
p.trySuccess(m)
m
}
(s, p.future)
}
Would like to share an insight I got today, though it may not be appropriate to your case with Play.
Instead of thinking of a Source to trigger, one can often turn the problem upside down and provide a Sink to the function that does the sourcing.
In such a case, the Sink would be the "recipe" (non-materialized) stage and we can now use Source.queue and materialize it right away. Got queue. Got the flow that it runs.

How to create a Source that can receive elements later via a method call?

I would like to create a Source and later push elements on it, like in:
val src = ... // create the Source here
// and then, do something like this
pushElement(x1, src)
pushElement(x2, src)
What is the recommended way to do this?
Thanks!
There are three ways this can be achieved:
1. Post Materialization with SourceQueue
You can use Source.queue that materializes the Flow into a SourceQueue:
case class Weather(zipCode : String, temperature : Double, raining : Boolean)
val bufferSize = 100
//if the buffer fills up then this strategy drops the oldest elements
//upon the arrival of a new element.
val overflowStrategy = akka.stream.OverflowStrategy.dropHead
val queue = Source.queue(bufferSize, overflowStrategy)
.filter(!_.raining)
.to(Sink foreach println)
.run() // in order to "keep" the queue Materialized value instead of the Sink's
queue offer Weather("02139", 32.0, true)
2. Post Materialization with Actor
There is a similar question and answer here, the gist being that you materialize the stream as an ActorRef and send messages to that ref:
val ref = Source.actorRef[Weather](Int.MaxValue, fail)
.filter(!_.raining)
.to(Sink foreach println )
.run() // in order to "keep" the ref Materialized value instead of the Sink's
ref ! Weather("02139", 32.0, true)
3. Pre Materialization with Actor
Similarly, you could explicitly create an Actor that contains a message buffer, use that Actor to create a Source, and then send that Actor messages as described in the answer here:
object WeatherForwarder {
def props : Props = Props[WeatherForwarder]
}
//see provided link for example definition
class WeatherForwarder extends Actor {...}
val actorRef = actorSystem actorOf WeatherForwarder.props
//note the stream has not been instatiated yet
actorRef ! Weather("02139", 32.0, true)
//stream already has 1 Weather value to process which is sitting in the
//ActorRef's internal buffer
val stream = Source(ActorPublisher[Weather](actorRef)).runWith{...}
Since Akka 2.5 Source has a preMaterialize method.
According to the documentation, this looks like the indicated way to do what you ask:
There are situations in which you require a Source materialized value before the Source gets hooked up to the rest of the graph. This is particularly useful in the case of “materialized value powered” Sources, like Source.queue, Source.actorRef or Source.maybe.
Below an example on how this would be with a SourceQueue. Elements are pushed to the queue before and after materialization, as well as from within the Flow:
import akka.actor.ActorSystem
import akka.stream.scaladsl._
import akka.stream.{ActorMaterializer, OverflowStrategy}
implicit val system = ActorSystem("QuickStart")
implicit val materializer = ActorMaterializer()
val sourceDecl = Source.queue[String](bufferSize = 2, OverflowStrategy.backpressure)
val (sourceMat, source) = sourceDecl.preMaterialize()
// Adding element before actual materialization
sourceMat.offer("pre materialization element")
val flow = Flow[String].map { e =>
if(!e.contains("new")) {
// Adding elements from within the flow
sourceMat.offer("new element generated inside the flow")
}
s"Processing $e"
}
// Actually materializing with `run`
source.via(flow).to(Sink.foreach(println)).run()
// Adding element after materialization
sourceMat.offer("post materialization element")
Output:
Processing pre materialization element
Processing post materialization element
Processing new element generated inside the flow
Processing new element generated inside the flow
After playing around and looking for a good solution to this I came across this solution which is clean, simple, and works both pre and post materialization.
https://stackoverflow.com/a/32553913/6791842
val (ref: ActorRef, publisher: Publisher[Int]) =
Source.actorRef[Int](bufferSize = 1000, OverflowStrategy.fail)
.toMat(Sink.asPublisher(true))(Keep.both).run()
ref ! 1 //before
val source = Source.fromPublisher(publisher)
ref ! 2 //before
Thread.sleep(1000)
ref ! 3 //before
source.runForeach(println)
ref ! 4 //after
Thread.sleep(1000)
ref ! 5 //after
Output:
1
2
3
4
5

Simple Scala actor question

I'm sure this is a very simple question, but embarrassed to say I can't get my head around it:
I have a list of values in Scala.
I would like to use use actors to make some (external) calls with each value, in parallel.
I would like to wait until all values have been processed, and then proceed.
There's no shared values being modified.
Could anyone advise?
Thanks
There's an actor-using class in Scala that's made precisely for this kind of problem: Futures. This problem would be solved like this:
// This assigns futures that will execute in parallel
// In the example, the computation is performed by the "process" function
val tasks = list map (value => scala.actors.Futures.future { process(value) })
// The result of a future may be extracted with the apply() method, which
// will block if the result is not ready.
// Since we do want to block until all results are ready, we can call apply()
// directly instead of using a method such as Futures.awaitAll()
val results = tasks map (future => future.apply())
There you go. Just that.
Create workers and ask them for futures using !!; then read off the results (which will be calculated and come in in parallel as they're ready; you can then use them). For example:
object Example {
import scala.actors._
class Worker extends Actor {
def act() { Actor.loop { react {
case s: String => reply(s.length)
case _ => exit()
}}}
}
def main(args: Array[String]) {
val arguments = args.toList
val workers = arguments.map(_ => (new Worker).start)
val futures = for ((w,a) <- workers zip arguments) yield w !! a
val results = futures.map(f => f() match {
case i: Int => i
case _ => throw new Exception("Whoops--didn't expect to get that!")
})
println(results)
workers.foreach(_ ! None)
}
}
This does a very inexpensive computation--calculating the length of a string--but you can put something expensive there to make sure it really does happen in parallel (the last thing that case of the act block should be to reply with the answer). Note that we also include a case for the worker to shut itself down, and when we're all done, we tell the workers to shut down. (In this case, any non-string shuts down the worker.)
And we can try this out to make sure it works:
scala> Example.main(Array("This","is","a","test"))
List(4, 2, 1, 4)