Forcing an AsyncStream[A] into a Seq[A] in scala

Forcing an AsyncStream[A] into a Seq[A] in scala - scala

I am specifically using twitter's AsyncStream and I need to take the results of concurrent processing, and make it into a Seq, but the only way I've managed to get working is horrendous. This feels like it should be a one liner, but all my attempts with Await and force have either hung or not processed the work.
Here's what I have working - what's a more idiomatic way of doing this?
def processWork(work: AsyncStream[Work]): Seq[Result] = {
// TODO: make less stupid
val resultStream = work.flatMap { processWork }
var results : Seq[Result] = Nil
resultStream.foreach {
result => {
results = results :+ result
}
}
results
}

Like #MattFowler pointed out - you can force the stream and wait until it's complete using:
Await.result(resultStream.toSeq, 1.second)
toSeq will start realizing the stream and return a Future[Seq[A]] that completes once all the element are resolved, as documentated here.
You can then block until the future completes by using Await.result.
Make sure your stream is finite! Calling Await.result(resultStream.toSeq, 1.second) would hang forever.

Related

How to handle asynchronous API response in scala

I have an API that I need to query in scala. API returns a code that would be equal to 1 when results are ready.
I thought about an until loop to handle as the following:
var code= -1
while(code!=1){
var response = parse(Http(URL).asString.body)
code = response.get("code").get.asInstanceOf[BigInt].toInt
}
println(response)
But this code returns:
error: not found: value response
So I thought about doing the following:
var code = -1
var res = null.asInstanceOf[Map[String, Any]]
while(code!=1){
var response = parse(Http(URL).asString.body)
code = response.get("code").get.asInstanceOf[BigInt].toInt
res = response
}
println(res)
And it works. But I would like to know if this is really the best scala-friendly way to do so ?
How can I properly use a variable that outside of an until loop ?

When you say API, do you mean you use a http api and you're using a http library in scala, or do you mean there's some class/api written up in scala? If you have to keep checking then you have to keep checking I suppose.
If you're using a Scala framework like Akka or Play, they'd have solutions to asyncrhonously poll or schedule jobs in the background as a part of their solutions which you can read about.
If you're writing a Scala script, then from a design perspective I would either run the script every 1 minute and instead of having the while loop I'd just quit until code = 1. Otherwise I'd essentially do what you've done.
Another library that could help a scala script might be fs2 or ZIO which can allow you to setup tasks that periodically poll.
This appears to be a very open question about designing apps which do polling. A specific answer is hard to give.

You can just use simple recursion:
def runUntil[A](block: => A)(cond: A => Boolean): A = {
#annotation.tailrec
def loop(current: A): A =
if (cond(current)) current
else loop(current = block)
loop(current = block)
}
// Which can be used like:
val response = runUntil {
parse(Http(URL).asString.body)
} { res =>
res.getCode == 1
}
println(response)
An, if your real code uses some kind of effect type like IO or Future
// Actually cats already provides this, is called iterateUntil
def runUntil[A](program: IO[A])(cond: A => Boolean): IO[A] = {
def loop: IO[A] =
program.flatMap { a =>
if (cond(a)) IO.pure(a)
else loop
}
loop
}
// Used like:
val request = IO {
parse(Http(URL).asString.body)
}
val response = runUntil(request) { res =>
res.getCode == 1
}
response.flatMap(IO.println)
Note, for Future you would need to use () => Future instead to be able to re-execute the operation.
You can see the code running here.

Mapping Iterator into Iterator[Future] is not behaving as expecting

Using Scala 2.13.0:
implicit val ec = ExecutionContext.global
val arr = (0 until 20).toIterator
.map { x =>
Thread.sleep(500);
println(x);
x
}
val fss = arr.map { slowX =>
Future { blocking { slowX } }
}
Await.result(Future.sequence(fss), Inf)
problem
arr is an iterator where each item needs 500ms processing time. We map the iterator with Future { blocking { ... }} with the purpose of making the processing parallel (using the global execution context). Finally we run Future.sequence
to consume the iterator.
Given the definition of Future.apply[T](body: =>T) and blocking[T](body: =>T), body is passed lazily, which means that body will be processed in the Future. If we inject that in the definition of Iterator.map, we get def next() = Future{blocking(self.next())}, so each item of the iterator should be processed in the Future.
But when I try this example however, I can see that the iterator is consumed sequentially, which is not what is expected!
Is that a Scala bug?? Or am I missing something?

No it's not a bug, because:
val arr = (0 until 20).toIterator
// this map invokes first and executed sequentially, because it executes in same thread.
.map { x =>
Thread.sleep(500);
println(x);
x
}
// This go sequentially because upstream map executed sequentially in same thread.
// So, "Future { blocking { slowX } }" can be replaced with "Future.successfull(slowX)"
// because no computation executed
val fss = arr.map { slowX =>
Future { blocking { slowX } }
}
If you want perform completely asynchronously, you can do something like:
def heavyCalculation(x: Int) = {
Thread.sleep(500);
println(x);
x
}
val result = Future.traverse((0 until 20).toList) { x =>
Future(blocking(heavyCalculation(x)))
}
Await.result(result, 1 minute)
Working Scatie example: https://scastie.scala-lang.org/3v06NpypRHKYkqBgzaeVXg

First, this is not a proper benchmark, you actually haven't show formal proof that this is sequential and not parallel (although is "obvious" from the source code that it isn't).
Second, and Iterator of Futures is probably a bad idea; at this point, it may make sense to look into a streaming solution like Akka-Streams, fs2, Monix or ZIO.
Third, what is even the point of having a bunch of blocking futures? you aren't actually winning too much.
Fourth, the problem is that the second map is not passing the block of the first map, just the result. So, you actually did the sleep before creating the Future.
Fifth, you probably want to do this instead.
val result = Future.traverse(data) { elem =>
Future {
blocking {
// Process elem here.
}
}
}
Await.result(result, Inf)

The other answers were pointing in the right direction, but the formal answer is the following: the signature of Iterator.map(f: A => B) tells us that A that A is computed before f is applied to it (because it is not => A). Therefore, next() is computed in the main thread.

How to consume a list of futures as soon as one completes in Scala

I have a bunch of Futures that can complete at extremely different times, something like this:
val results = task.getAssignedFiles map {
file: File => Future[Result] {
<heavy computation>
}
}
Now I want to iterate over results as soon as one of the input futures completes, basically poll all the futures in results until one completes, do some processing, and continue until all of them are done. Something like:
while (!results.allCompleted) {
one = results.firstCompletedFuture
process(one)
}

You're looking for firstCompletedOf.

There's no need to "loop" over the list of Futures to take action on them when they complete. The approach is to chain
the needed sequence of computations using combinators and wait until they all terminate.
In terms of the code above that could be written as:
val results:List[Future[Result] = task.getAssignedFiles map {
file: File => Future[Result] {
<heavy computation>
}
}
val processedResults:List[Future[ProcessedResult]] = results.map(result => process(result))
val finalResults:Future[List[ProcessedResult]] = Future.sequence(processedResults)
Note that all futures are running from the moment they are defined. sequence will finalize at the moment the last of them is done.

scala's for yield comprehension used with Future. How to wait until future has returned?

I have a function which provides a Context:
def buildContext(s:String)(request:RequestHeader):Future[Granite.Context] = {
.... // returns a Future[Granite.Context]
}
I then have another function which uses a Context to return an Option[Library.Document]:
def getDocument(tag: String):Option[Library.Document] = {
val fakeRequest = play.api.test.FakeRequest().withHeaders(CONTENT_TYPE -> "application/json")
val context = buildContext(tag)(fakeRequest)
val maybeDoc = context.getDocument //getDocument is defined on Granite.Context to return an Option[Library.Document]
}
How would this code take into account if the Future has returned or not? I have seen for/yield used to wait for the return but I always assumed that a for/yield just flatmaps things together and has nothing really to do with waiting for Futures to return. I'm kinda stuck here and don't really no the correct question to ask!

The other two answers are misleading. A for yield in Scala is a compiler primitive that gets transformed into map or flatMap chains. Do not use Await if you can avoid it, it's not a simple issue.
You are introducing blocking behaviour and you have yet to realise the systemic damage you are doing when blocking.
When it comes to Future, map and flatMap do different things:
map
is executed when the future completes. It's an asynchronous way to do a type safe mapping.
val f: Future[A] = someFutureProducer
def convertAToB(a: A): B = {..}
f map { a => convertAToB(a) }
flatMap
is what you use to chain things:
someFuture flatMap {
_ => {
someOtherFuture
}
}
The equivalent of the above is:
for {
result1 <- someFuture
result2 <- someOtherFuture
} yield result2
In Play you would use Async to handle the above:
Async {
someFuture.map(i => Ok("Got result: " + i))
}
Update
I misunderstood your usage of Play. Still, it doesn't change anything. You can still make your logic asynchronous.
someFuture onComplete {
case Success(result) => // doSomething
case Failure(err) => // log the error etc
}
The main difference when thinking asynchronously is that you always have to map and flatMap and do everything else inside Futures to get things done. The performance gain is massive.
The bigger your app, the bigger the gain.

When using a for-comprehension on a Future, you're not waiting for it to finish, you're just saying: when it is finished, use it like this, and For-comprehension returns another Future in this case.
If you want to wait for a future to finish, you should use the Await as follows:
val resultContext = Await.result(context , timeout.duration)
Then run the getDocument method on it as such:
val maybeDoc = resultContext.getDocument
EDIT
The usual way to work with Futures however is to wait until the last moment before you Await. As pointed out by another answer here, Play Framework does the same thing by allowing you to return Future[Result]. So, a good way to do things would be to only use for-comprehensions and make your methods return Futures, etc, until the last moment when you want to finally return your result.

You can use scala.concurrent.Await for that:
import scala.concurrent.duration._
import scala.concurrent.Await
def getDocument(tag: String):Option[Library.Document] = {
val fakeRequest = play.api.test.FakeRequest().withHeaders(CONTENT_TYPE -> "application/json")
val context = Await.result(buildContext(tag)(fakeRequest), 42.seconds)
val maybeDoc = context.getDocument
}
But Await will block thread while future is not completed, so would be better either make buildContext a synchronous operation returning Granite.Context, or make getDocument async too, returning Future[Option[Library.Document]].

Once you are in a future you must stay in the future or you must wait until the future arrives.
Waiting is usually a bad idea because it blocks your execution, so you should work in the future.
Basically you should change your getDocument method to return a Future to something like getDocument(tag: String):Future[Option[Library.Document]]
Then using map ro flatMap, you chain your future calls:
return buildContext(tag)(fakeRequest).map(_.getDocument)
If buildContext fails, map will wrap the Failure
Then call
getDocument("blah").onComplete {
case Success(optionalDoc) => ...
case Failure(e) =>...
}

akka dataflow and side effects

I'm working with akka dataflow and I'd like to know if there is a way to cause a particular block of code to wait for the completion of a future, without explicitly using the value of that future.
The actual use case is that I have a file and I want the file to be deleted when a particular future completes, but not before. Here is a rough example. First imagine I have this service:
trait ASync {
def pull: Future[File]
def process(input : File): Future[File]
def push(input : File): Future[URI]
}
And I have a workflow I want to run in a non-blocking way:
val uriFuture = flow {
val pulledFile = async.pull(uri)
val processedile = async.process(pulledFile())
val storedUri = async.push(processedFile())
// I'd like the following line executed only after storedUri is completed,
// not as soon as pulled file is ready.
pulledFile().delete()
storedUri()
}

You could try something like this:
val uriFuture = flow {
val pulledFile = async.pull(uri)
val processedile = async.process(pulledFile())
val storedUri = for(uri <- async.push(processedFile())) yield {
pulledFile().delete()
uri
}
storedUri()
}
In this example, pulledFile.delete will only be called if the Future from push succeeds. If it fails, delete will not be called. The result of the storedUri future will still be the result of the call to push.
Or another way would be:
val uriFuture = flow {
val pulledFile = async.pull(uri)
val processedile = async.process(pulledFile())
val storedUri = async.push(processedFile()) andThen{
case whatever => pulledFile().delete()
}
storedUri()
}
The difference here is that delete will be called regardless of if push succeeds or fails. The result of storedUri still will be the result of the call to push.

You can use callbacks for non-blocking workflow:
future onSuccess {
case _ => file.delete() //Deal with cases obviously...
}
Source: http://doc.akka.io/docs/akka/snapshot/scala/futures.html
Alternatively, you can block with Await.result:
val result = Await.result(future, timeout.duration).asInstanceOf[String]
The latter is generally used when you NEED to block - eg in test cases - while non blocking is more performant as you don't park a thread to spin up another thread only to resume the other thread again - that's slower than an asynchronous activity because of the resource management overhead.
The typesafe staff are calling it "Reactive". That's a little bit of a buzzword. I would laugh if you used it in the workplace.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Forcing an AsyncStream[A] into a Seq[A] in scala - scala

Related

How to handle asynchronous API response in scala

Mapping Iterator into Iterator[Future] is not behaving as expecting

How to consume a list of futures as soon as one completes in Scala

scala's for yield comprehension used with Future. How to wait until future has returned?

akka dataflow and side effects

Categories

Resources