akka dataflow and side effects - scala

I'm working with akka dataflow and I'd like to know if there is a way to cause a particular block of code to wait for the completion of a future, without explicitly using the value of that future.
The actual use case is that I have a file and I want the file to be deleted when a particular future completes, but not before. Here is a rough example. First imagine I have this service:
trait ASync {
def pull: Future[File]
def process(input : File): Future[File]
def push(input : File): Future[URI]
}
And I have a workflow I want to run in a non-blocking way:
val uriFuture = flow {
val pulledFile = async.pull(uri)
val processedile = async.process(pulledFile())
val storedUri = async.push(processedFile())
// I'd like the following line executed only after storedUri is completed,
// not as soon as pulled file is ready.
pulledFile().delete()
storedUri()
}

You could try something like this:
val uriFuture = flow {
val pulledFile = async.pull(uri)
val processedile = async.process(pulledFile())
val storedUri = for(uri <- async.push(processedFile())) yield {
pulledFile().delete()
uri
}
storedUri()
}
In this example, pulledFile.delete will only be called if the Future from push succeeds. If it fails, delete will not be called. The result of the storedUri future will still be the result of the call to push.
Or another way would be:
val uriFuture = flow {
val pulledFile = async.pull(uri)
val processedile = async.process(pulledFile())
val storedUri = async.push(processedFile()) andThen{
case whatever => pulledFile().delete()
}
storedUri()
}
The difference here is that delete will be called regardless of if push succeeds or fails. The result of storedUri still will be the result of the call to push.

You can use callbacks for non-blocking workflow:
future onSuccess {
case _ => file.delete() //Deal with cases obviously...
}
Source: http://doc.akka.io/docs/akka/snapshot/scala/futures.html
Alternatively, you can block with Await.result:
val result = Await.result(future, timeout.duration).asInstanceOf[String]
The latter is generally used when you NEED to block - eg in test cases - while non blocking is more performant as you don't park a thread to spin up another thread only to resume the other thread again - that's slower than an asynchronous activity because of the resource management overhead.
The typesafe staff are calling it "Reactive". That's a little bit of a buzzword. I would laugh if you used it in the workplace.

Related

For Comprehension of Futures - Kicking off new thread inside comprehension and disregarding result

I'm trying to use a for comprehension to both run some futures in order and merged results, but also kick off a separate thread after one of those futures completes and not care about the result (basically used to fire some logging info)
I've played around a bit with this with some thread sleeps and it looks like whatever i'm throwing inside the for block will end up blocking the thread.
private def testFunction(): EitherT[Future, Error, Response] =
for {
firstRes <- EitherT(client.getFirst())
secondRes <- EitherT(client.getSecond())
// Future i want to run on a separate async thread outside the comprehension
_ = runSomeLogging(secondRes)
thirdRes <- EitherT(client.getThird(secondRes.value))
} yield thirdRes
def runSomeLogging(): Future[Either[Error, Response]] =
Thread.sleep(10000)
Future.successful(Right(Response("123")))
}
So this above code will wait the 10 seconds before returning the thirdRes result. My hope was to kick off the runSomeLogging function on a separate thread after the secondRes runs. I thought the usage of = rather than <- would cause that, however it doesn't.
The way I am able to get this to work is below. Basically I run my second future outside of the comprehension and use .onComplete on the previous future to only run my logging if certain conditions were meant from the above comprehension. I only want to run this logging function if the secondRes function is successful in my example here.
private def runSomeLogging(response: SecondRes) =
Thread.sleep(10000)
response.value.onComplete {
case Success(either) =>
either.fold(
_ => { },
response => { logThing() }
)
case _ =>
}
private def testFunction(): EitherT[Future, Error, Response] =
val res = for {
firstRes <- EitherT(client.getFirst())
secondRes <- EitherT(client.getSecond())
thirdRes <- EitherT(client.getThird(secondRes.value))
} yield thirdRes
runSomeLogging(res)
return res
This second example works fine and does what I want, it doesn't block the for comprehension for 10 seconds from returning. However, because there are dependencies of this running for certain pieces of the comprehension, but not all of them, I was hoping there was a way to kick off the job from within the for comprehension itself but let it run on its own thread and not block the comprehension from completing.
You need a function that starts the Future but doesn't return it, so the for-comprehension can move on (since Future's map/flatMap functions won't continue to the next step until the current Future resolves). To accomplish a "start and forget", you need to use a function that returns immediately, or a Future that resolves immediately.
// this function will return immediately
def runSomeLogging(res: SomeResult): Unit = {
// since startLoggingFuture uses Future.apply, calling it will start the Future,
// but we ignore the actual Future by returning Unit instead
startLoggingFuture(res)
}
// this function returns a future that takes 10 seconds to resolve
private def startLoggingFuture(res: SomeResult): Future[Unit] = Future {
// note: please don't actually do Thread.sleep in your Future's thread pool
Thread.sleep(10000)
logger.info(s"Got result $res")
}
Then you could put e.g.
_ = runSomeLogging(res)
or
_ <- Future { runSomeLogging(res) }
in your for-comprehension.
Note, Cats-Effect and Monix have a nice abstraction for "start but ignore result", with io.start.void and task.startAndForget respectively. If you were using IO or Task instead of Future, you could use .start.void or .startAndForget on the logging task.

Await for a Sequence of Futures with timeout without failing on TimeoutException

I have a sequence of scala Futures of same type.
I want, after some limited time, to get a result for the entire sequence while some futures may have succeeded, some may have failed and some haven't completed yet, the non completed futures should be considered failed.
I don't want to use Await each future sequentially.
I did look at this question: Scala waiting for sequence of futures
and try to use the solution from there, namely:
private def lift[T](futures: Seq[Future[T]])(implicit ex: ExecutionContext) =
futures.map(_.map { Success(_) }.recover { case t => Failure(t) })
def waitAll[T](futures: Seq[Future[T]])(implicit ex: ExecutionContext) =
Future.sequence(lift(futures))
futures: Seq[Future[MyObject]] = ...
val segments = Await.result(waitAll(futures), waitTimeoutMillis millis)
but I'm still getting a TimeoutException, I guess because some of the futures haven't completed yet.
and that answer also states,
Now Future.sequence(lifted) will be completed when every future is completed, and will represent successes and failures using Try.
But I want my Future to be completed after the timeout has passed, not when every future in the sequence has completed. What else can I do?
If I used raw Future (rather than some IO monad which has this functionality build-in, or without some Akka utils for exactly that) I would hack together utility like:
// make each separate future timeout
object FutureTimeout {
// separate EC for waiting
private val timeoutEC: ExecutorContext = ...
private def timeout[T](delay: Long): Future[T] = Future {
blocking {
Thread.sleep(delay)
}
throw new Exception("Timeout")
}(timeoutEC)
def apply[T](fut: Future[T], delat: Long)(
implicit ec: ExecutionContext
): Future[T] = Future.firstCompletedOf(Seq(
fut,
timeout(delay)
))
}
and then
Future.sequence(
futures
.map(FutureTimeout(_, delay))
.map(Success(_))
.recover { case e => Failure(e) }
)
Since each future would terminate at most after delay we would be able to collect them into one result right after that.
You have to remember though that no matter how would you trigger a timeout you would have no guarantee that the timeouted Future stops executing. It could run on and on on some thread somewhere, it's just that you wouldn't wait for the result. firstCompletedOf just makes this race more explicit.
Some other utilities (like e.g. Cats Effect IO) allow you to cancel computations (which is used in e.g. races like this one) but you still have to remember that JVM cannot arbitrarily "kill" a running thread, so that cancellation would happen after one stage of computation is completed and before the next one is started (so e.g. between .maps or .flatMaps).
If you aren't afraid of adding external deps there are other (and more reliable, as Thread.sleep is just a temporary ugly hack) ways of timing out a Future, like Akka utils. See also other questions like this.
Here is solution using monix
import monix.eval.Task
import monix.execution.Scheduler
val timeoutScheduler = Scheduler.singleThread("timeout") //it's safe to use single thread here because timeout tasks are very fast
def sequenceDiscardTimeouts[T](tasks: Task[T]*): Task[Seq[T]] = {
Task
.parSequence(
tasks
.map(t =>
t.map(Success.apply) // Map to success so we can collect the value
.timeout(500.millis)
.executeOn(timeoutScheduler) //This is needed to run timesouts in dedicated scheduler that won't be blocked by "blocking"/io work if you have any
.onErrorRecoverWith { ex =>
println("timed-out")
Task.pure(Failure(ex)) //It's assumed that any error is a timeout. It's possible to "catch" just timeout exception here
}
)
)
.map { res =>
res.collect { case Success(r) => r }
}
}
Testing code
implicit val mainScheduler = Scheduler.fixedPool(name = "main", poolSize = 10)
def slowTask(msg: String) = {
Task.sleep(Random.nextLong(1000).millis) //Sleep here to emulate a slow task
.map { _ =>
msg
}
}
val app = sequenceDiscardTimeouts(
slowTask("1"),
slowTask("2"),
slowTask("3"),
slowTask("4"),
slowTask("5"),
slowTask("6")
)
val started: Long = System.currentTimeMillis()
app.runSyncUnsafe().foreach(println)
println(s"Done in ${System.currentTimeMillis() - started} millis")
This will print an output different for each run but it should look like following
timed-out
timed-out
timed-out
3
4
5
Done in 564 millis
Please note the usage of two separate schedulers. This is to ensure that timeouts will fire even if the main scheduler is busy with business logic. You can test it by reducing poolSize for main scheduler.

Akka Stream return object from Sink

I've got a SourceQueue. When I offer an element to this I want it to pass through the Stream and when it reaches the Sink have the output returned to the code that offered this element (similar as Sink.head returns an element to the RunnableGraph.run() call).
How do I achieve this? A simple example of my problem would be:
val source = Source.queue[String](100, OverflowStrategy.fail)
val flow = Flow[String].map(element => s"Modified $element")
val sink = Sink.ReturnTheStringSomehow
val graph = source.via(flow).to(sink).run()
val x = graph.offer("foo")
println(x) // Output should be "Modified foo"
val y = graph.offer("bar")
println(y) // Output should be "Modified bar"
val z = graph.offer("baz")
println(z) // Output should be "Modified baz"
Edit: For the example I have given in this question Vladimir Matveev provided the best answer. However, it should be noted that this solution only works if the elements are going into the sink in the same order they were offered to the source. If this cannot be guaranteed the order of the elements in the sink may differ and the outcome might be different from what is expected.
I believe it is simpler to use the already existing primitive for pulling values from a stream, called Sink.queue. Here is an example:
val source = Source.queue[String](128, OverflowStrategy.fail)
val flow = Flow[String].map(element => s"Modified $element")
val sink = Sink.queue[String]().withAttributes(Attributes.inputBuffer(1, 1))
val (sourceQueue, sinkQueue) = source.via(flow).toMat(sink)(Keep.both).run()
def getNext: String = Await.result(sinkQueue.pull(), 1.second).get
sourceQueue.offer("foo")
println(getNext)
sourceQueue.offer("bar")
println(getNext)
sourceQueue.offer("baz")
println(getNext)
It does exactly what you want.
Note that setting the inputBuffer attribute for the queue sink may or may not be important for your use case - if you don't set it, the buffer will be zero-sized and the data won't flow through the stream until you invoke the pull() method on the sink.
sinkQueue.pull() yields a Future[Option[T]], which will be completed successfully with Some if the sink receives an element or with a failure if the stream fails. If the stream completes normally, it will be completed with None. In this particular example I'm ignoring this by using Option.get but you would probably want to add custom logic to handle this case.
Well, you know what offer() method returns if you take a look at its definition :) What you can do is to create Source.queue[(Promise[String], String)], create helper function that pushes pair to stream via offer, make sure offer doesn't fail because queue might be full, then complete promise inside your stream and use future of the promise to catch completion event in external code.
I do that to throttle rate to external API used from multiple places of my project.
Here is how it looked in my project before Typesafe added Hub sources to akka
import scala.concurrent.Promise
import scala.concurrent.Future
import java.util.concurrent.ConcurrentLinkedDeque
import akka.stream.scaladsl.{Keep, Sink, Source}
import akka.stream.{OverflowStrategy, QueueOfferResult}
import scala.util.Success
private val queue = Source.queue[(Promise[String], String)](100, OverflowStrategy.backpressure)
.toMat(Sink.foreach({ case (p, param) =>
p.complete(Success(param.reverse))
}))(Keep.left)
.run
private val futureDeque = new ConcurrentLinkedDeque[Future[String]]()
private def sendQueuedRequest(request: String): Future[String] = {
val p = Promise[String]
val offerFuture = queue.offer(p -> request)
def addToQueue(future: Future[String]): Future[String] = {
futureDeque.addLast(future)
future.onComplete(_ => futureDeque.remove(future))
future
}
offerFuture.flatMap {
case QueueOfferResult.Enqueued =>
addToQueue(p.future)
}.recoverWith {
case ex =>
val first = futureDeque.pollFirst()
if (first != null)
addToQueue(first.flatMap(_ => sendQueuedRequest(request)))
else
sendQueuedRequest(request)
}
}
I realize that blocking synchronized queue may be bottleneck and may grow indefinitely but because API calls in my project are made only from other akka streams which are backpressured I never have more than dozen items in futureDeque. Your situation may differ.
If you create MergeHub.source[(Promise[String], String)]() instead you'll get reusable sink. Thus every time you need to process item you'll create complete graph and run it. In that case you won't need hacky java container to queue requests.

How can I use and return Source queue to caller without materializing it?

I'm trying to use new Akka streams and wonder how I can use and return Source queue to caller without materializing it in my code ?
Imagine we have library that makes number of async calls and returns results via Source. Function looks like this
def findArticlesByTitle(text: String): Source[String, SourceQueue[String]] = {
val source = Source.queue[String](100, backpressure)
source.mapMaterializedValue { case queue =>
val url = s"http://.....&term=$text"
httpclient.get(url).map(httpResponseToSprayJson[SearchResponse]).map { v =>
v.idlist.foreach { id =>
queue.offer(id)
}
queue.complete()
}
}
source
}
and caller might use it like this
// There is implicit ActorMaterializer somewhere
val stream = plugin.findArticlesByTitle(title)
val results = stream.runFold(List[String]())((result, article) => article :: result)
When I run this code within mapMaterializedValue is never executed.
I can't understand why I don't have access to instance of SourceQueue if it should be up to caller to decide how to materialize the source.
How should I implement this ?
In your code example you're returning source instead of the return value of source.mapMaterializedValue (the method call doesn't mutate the Source object).

React for futures

I am trying to use a divide-and-conquer (aka fork/join) approach for a number crunching problem. Here is the code:
import scala.actors.Futures.future
private def compute( input: Input ):Result = {
if( pairs.size < SIZE_LIMIT ) {
computeSequential()
} else {
val (input1,input2) = input.split
val f1 = future( compute(input1) )
val f2 = future( compute(input2) )
val result1 = f1()
val result2 = f2()
merge(result1,result2)
}
}
It runs (with a nice speed-up) but the the future apply method seems to block a thread and the thread pool increases tremendously. And when too many threads are created, the computations is stucked.
Is there a kind of react method for futures which releases the thread ? Or any other way to achieve that behavior ?
EDIT: I am using scala 2.8.0.final
Don't claim (apply) your Futures, since this forces them to block and wait for an answer; as you've seen this can lead to deadlocks. Instead, use them monadically to tell them what to do when they complete. Instead of:
val result1 = f1()
val result2 = f2()
merge(result1,result2)
Try this:
for {
result1 <- f1
result2 <- f2
} yield merge(result1, result2)
The result of this will be a Responder[Result] (essentially a Future[Result]) containing the merged results; you can do something effectful with this final value using respond() or foreach(), or you can map() or flatMap() it to another Responder[T]. No blocking necessary, just keep scheduling computations for the future!
Edit 1:
Ok, the signature of the compute function is going to have to change to Responder[Result] now, so how does that affect the recursive calls? Let's try this:
private def compute( input: Input ):Responder[Result] = {
if( pairs.size < SIZE_LIMIT ) {
future(computeSequential())
} else {
val (input1,input2) = input.split
for {
result1 <- compute(input1)
result2 <- compute(input2)
} yield merge(result1, result2)
}
}
Now you no longer need to wrap the calls to compute with future(...) because they're already returning Responder (a superclass of Future).
Edit 2:
One upshot of using this continuation-passing style is that your top-level code--whatever calls compute originally--doesn't block at all any more. If it's being called from main(), and that's all the program does, this will be a problem, because now it will just spawn a bunch of futures and then immediately shut down, having finished everything it was told to do. What you need to do is block on all these futures, but only once, at the top level, and only on the results of all the computations, not any intermediate ones.
Unfortunately, this Responder thing that's being returned by compute() no longer has a blocking apply() method like the Future did. I'm not sure why flatMapping Futures produces a generic Responder instead of a Future; this seems like an API mistake. But in any case, you should be able to make your own:
def claim[A](r:Responder[A]):A = {
import java.util.concurrent.ArrayBlockingQueue
import scala.actors.Actor.actor
val q = new ArrayBlockingQueue[A](1)
// uses of 'respond' need to be wrapped in an actor or future block
actor { r.respond(a => q.put(a)) }
return q.take
}
So now you can create a blocking call to compute in your main method like so:
val finalResult = claim(compute(input))