What are some best practices to mix async libraries with sync code in scala - scala

I'm working on a scala code where a 3rd party library returns a Future[Boolean] object while I need to consume this future object in my scala code which is fully written in a synchronous manner.
Currently, I'm doing Await.result on 3rd party lib operation to ensure it returns just boolean. Is there a better way to handle this, my scala code needs a boolean value for further operation?

As Luis noted in the comments, in general there's no alternative to Awaiting on the Future.
That said, you may have some choice about where to Await.
For instance, if you have code like
val result = Await.result(someFuture, Duration.Inf)
f(result)
It may be more useful to run f in Future land with
Await.result(someFuture.map(f), Duration.Inf)
If f happens to block, then it may be worth either wrapping f in blocking or explicitly using an ExecutionContext which will handle a lot of its threads being blocked (e.g. one that can have more threads than cores) for the map.
In general, you'll want to move Awaits to the outermost edge of your code as you can, even shifting edges if you can.

Related

How to convert `fs2.Stream[IO, T]` to `Iterator[T]` in Scala

Need to fill in the methods next and hasNext and preserve laziness
new Iterator[T] {
val stream: fs2.Stream[IO, T] = ...
def next(): T = ???
def hasNext(): Boolean = ???
}
But cannot figure out how an earth to do this from a fs2.Stream? All the methods on a Stream (or on the "compiled" thing) are fairly useless.
If this is simply impossible to do in a reasonable amount of code, then that itself is a satisfactory answer and we will just rip out fs2.Stream from the codebase - just want to check first!
fs2.Stream, while similar in concept to Iterator, cannot be converted to one while preserving laziness. I'll try to elaborate on why...
Both represent a pull-based series of items, but the way in which they represent that series and implement the laziness differs too much.
As you already know, Iterator represents its pull in terms of the next() and hasNext methods, both of which are synchronous and blocking. To consume the iterator and return a value, you can directly call those methods e.g. in a loop, or use one of its many convenience methods.
fs2.Stream supports two capabilities that make it incompatible with that interface:
cats.effect.Resource can be included in the construction of a Stream. For example, you could construct a fs2.Stream[IO, Byte] representing the contents of a file. When consuming that stream, even if you abort early or do some strange flatMap, the underlying Resource is honored and your file handle is guaranteed to be closed. If you were trying to do the same thing with iterator, the "abort early" case would pose problems, forcing you to do something like Iterator[Byte] with Closeable and the caller would have to make sure to .close() it, or some other pattern.
Evaluation of "effects". In this context, effects are types like IO or Future, where the process of obtaining the value may perform some possibly-asynchronous action, and may perform side-effects. Asynchrony poses a problem when trying to force the process into a synchronous interface, since it forces you to block your current thread to wait for the asynchronous answer, which can cause deadlocks if you aren't careful. Libraries like cats-effect strongly discourage you from calling methods like unsafeRunSync.
fs2.Stream does allow for some special cases that prevent the inclusion of Resource and Effects, via its Pure type alias which you can use in place of IO. That gets you access to Stream.PureOps, but that only gets you methods that consume the whole stream by building a collection; the laziness you want to preserve would be lost.
Side note: you can convert an Iterator to a Stream.
The only way to "convert" a Stream to an Iterator is to consume it to some collection type via e.g. .compile.toList, which would get you an IO[List[T]], then .map(_.iterator) that to get an IO[Iterator[T]]. But ultimately that doesn't fit what you're asking for since it forces you to consume the stream to a buffer, breaking laziness.
#Dima mentioned the "XY Problem", which was poorly-received since they didn't really elaborate (initially) on the incompatibility, but they're right. It would be helpful to know why you're trying to make a Stream-to-Iterator conversion, in case there's some other approach that would serve your overall goal instead.

What is Future and Promise in Scala? [duplicate]

I am trying to get my head around Scala's promise and future constructs.
I've been reading Futures and Promises in Scala Documentation and am a bit confused as I've got a feeling that the concepts of promises and futures are mixed up.
In my understanding a promise is a container that we could populate
value in a later point. And future is some sort of an asynchronous
operation that would complete in a different execution path.
In Scala we can obtain a result using the attached callbacks to future.
Where I'm lost is how promise has a future?
I have read about these concepts in Clojure too, assuming that promise and future have some generic common concept, but it seems like I was wrong.
A promise p completes the future returned by p.future. This future is
specific to the promise p. Depending on the implementation, it may be
the case that p.future eq p.
val p = promise[T]
val f = p.future
You can think of futures and promises as two different sides of a pipe.
On the promise side, data is pushed in, and on the future side, data can be pulled out.
And future is some sort of an asynchronous operation that would complete in a different execution path.
Actually, a future is a placeholder object for a value that may be become available at some point in time, asynchronously. It is not the asynchronous computation itself.
The fact that there is a future constructor called future that returns such a placeholder object and spawns an asynchronous computation that completes this placeholder object does not mean that the asynchronous computation is called a future. There are also other future constructors/factory methods.
But the point I do not get is how promise has a future?
To divide promises and futures into 2 separate interfaces was a design decision. You could have these two under the same interface Future, but that would then allow clients of futures to complete them instead of the intended completer of the future. This would cause unexpected errors, as there could be any number of contending completers.
E.g. for the asynchronous computation spawned by the future construct, it would no longer be clear whether it has to complete the promise, or if the client will do it.
Futures and promises are intended to constrain the flow of data in the program.
The idea is to have a future client that subscribes to the data to act on it once the data arrives.
The role of the promise client is to provide that data.
Mixing these two roles can lead to programs that are harder to understand or reason about.
You might also ask why the Promise trait does not extend Future. This is another design decision to discourage programmers from blindly passing Promises to clients where they should upcast the Promise to Future (this upcast is prone to be left out, whereas having to explicitly call future on the promise ensures you call it every time). In other words, by returning a promise you are giving the right to complete it to somebody else, and by returning the future you are giving the right to subscribe to it.
EDIT:
If you would like to learn more about futures, Chapter 4 in the Learning Concurrent Programming in Scala book describes them in detail. Disclaimer: I'm the author of the book.
The difference between the two is that futures are usually centered around the computation while promises are centered around data.
It seems your understanding matches this, but let me explain what I mean:
In both scala and clojure futures are (unless returned by some other function/method) created with some computation:
// scala
future { do_something() }
;; clojure
(future (do-something))
In both cases the "return-value" of the future can only be read (without blocking) only after the computation has terminated. When this is the case is typically outside the control of the programmer, as the computation gets executed in some thread (pool) in the background.
In contrast in both cases promises are an initially empty container, which can later be filled (exactly once):
// scala
val p = promise[Int]
...
p success 10 // or failure Exception()
;; clojure
(def p (promise))
(deliver p 10)
Once this is the case it can be read.
Reading the futures and promises is done through deref in clojure (and realized? can be used to check if deref will block). In scala reading is done through the methods provided by the Future trait. In order to read the result of a promise we thus have to obtain an object implementing Future, this is done by p.future. Now if the trait Future is implemented by a Promise, then p.future can return this and the two are equal. This is purely a implementation choice and does not change the concepts. So you were not wrong!
In any case Futures are mostly dealt with using callbacks.
At this point it might be worthwhile to reconsider the initial characterization of the two concepts:
Futures represent a computation that will produce a result at some point. Let's look at one possible implementation: We run the code in some thread(pool) and once its done, we arrange use the return value to fulfill a promise. So reading the result of the future is reading a promise; This is clojure's way of thinking (not necessarily of implementation).
On the other hand a promise represents a value that will be filled at some point. When it gets filled this means that some computation produced a result. So in a way this is like a future completing, so we should consume the value in the same way, using callbacks; This is scala's way of thinking.
Note that under the hood Future is implemented in terms of Promise and this Promise is completed with the body you passed to your Future:
def apply[T](body: =>T): Future[T] = impl.Future(body) //here I have omitted the implicit ExecutorContext
impl.Future is an implementation of Future trait:
def apply[T](body: =>T)(implicit executor: ExecutionContext): scala.concurrent.Future[T] =
{
val runnable = new PromiseCompletingRunnable(body)
executor.prepare.execute(runnable)
runnable.promise.future
}
Where PromiseCompletingRunnable looks like this:
class PromiseCompletingRunnable[T](body: => T) extends Runnable {
val promise = new Promise.DefaultPromise[T]()
override def run() = {
promise complete {
try Success(body) catch { case NonFatal(e) => Failure(e) }
}
} }
So you see even though they are seperate concepts that you can make use of independently in reality you can't get Future without using Promise.

A little help on understanding Scalaz Future and Task

I'm trying to understand the idea and purpose behind scalaz concurrent package, primarily Future and Task classes, but when using them in some application, it's now far from simple sequential analog, whereas scala.concurrent.Future, works more then better. Can any one share with his experience on writing concurrent/asynchronous application with scalaz, basically how to use it's async method correctly? As i understand from the sources async doesn't use a separate thread like the call to standard future, or fork/apply methods from scalaz works, so why it is called async then? Does it mean that in order to get real concurrency with scalaz i always have to call fork(now(...)) or apply?
I'm not a scalaz expert, but I'll try to help you a little bit. Let me try answer your questions one by one:
1) Can any one share with his experience on writing concurrent/asynchronous application with scalaz, basically how to use it's async method correctly?
Let's first take a look at async signature:
def async[A](listen: (A => Unit) => Unit): Future[A]
This could be a bit cryptic at first, so as always it's good idea to look at tests to understands possible use cases. In https://github.com/scalaz/scalaz/blob/scalaz-seven/tests/src/test/scala/scalaz/concurrent/FutureTest.scala
you can find the following code:
"when constructed from Future.async" ! prop{(n: Int) =>
def callback(call: Int => Unit): Unit = call(n)
Future.async(callback).run must_==
}
As we know from signature Future.async just construct new Future using function of signature (A => Unit) => Unit. What this really means is that Future.async takes as parameter function which for given callback makes all required computations and pass the result to that callback.
What is important to note it that Future.async does not run any computations on itself, it only prepare structure to run them later.
2) As i understand from the sources async doesn't use a separate thread like the call to standard future, or fork/apply methods from scalaz works, so why it is called async then?
You are correct. Only fork and apply seems to be running anything using threads, which is easy to notice looking at the signatures which contains implicit pool: ExecutorService. I cannot speak for the authors here, but I guess async is related to the callback. It means that rather than blocking on Future to get it result at the end you will use asynchronous callback.
3) Does it mean that in order to get real concurrency with scalaz i always have to call fork(now(...)) or apply?
From what I can say, yes. Just notice that when you are creating Future using syntax Future(x) you are using apply method here, so this is kind of default behavior (which is fine).
If you want to better understand design of Scalaz Futures I can recommend you reading "Functional Programming in Scala". I believe this book is written by main Scalaz contributors and chapter 7 discusses designing API for purely functional parallelism library. It's not exactly the same as Scalaz Future, but you can see many similarities.
You can also read wonderful Timothy Perrett blog post about Scalaz Task and Future which covers many not so obvious details.
async is used to adapt an async, callback-based API as a Future. It's called async because it's expected that it will be used with something that runs asynchronously, perhaps calling the callback from another thread somewhere further down the line. This is "real" concurrency, provided the API you're calling really uses it asynchronously (e.g. I use Future.async with the async parts of the AWS SDK like AmazonSimpleDBAsyncClient).
If you want "real" concurrency from the scalaz Task API directly you need to use things like fork or gatherUnordered, as many of the APIs default towards being safe/deterministic and restartable, with concurrency only when explicitly requested.
When composing Tasks with map and flatMap you can get a performance win by not using fork, see:
http://blog.higher-order.com/blog/2015/06/18/easy-performance-wins-with-scalaz/

Clarification needed about futures and promises in Scala

I am trying to get my head around Scala's promise and future constructs.
I've been reading Futures and Promises in Scala Documentation and am a bit confused as I've got a feeling that the concepts of promises and futures are mixed up.
In my understanding a promise is a container that we could populate
value in a later point. And future is some sort of an asynchronous
operation that would complete in a different execution path.
In Scala we can obtain a result using the attached callbacks to future.
Where I'm lost is how promise has a future?
I have read about these concepts in Clojure too, assuming that promise and future have some generic common concept, but it seems like I was wrong.
A promise p completes the future returned by p.future. This future is
specific to the promise p. Depending on the implementation, it may be
the case that p.future eq p.
val p = promise[T]
val f = p.future
You can think of futures and promises as two different sides of a pipe.
On the promise side, data is pushed in, and on the future side, data can be pulled out.
And future is some sort of an asynchronous operation that would complete in a different execution path.
Actually, a future is a placeholder object for a value that may be become available at some point in time, asynchronously. It is not the asynchronous computation itself.
The fact that there is a future constructor called future that returns such a placeholder object and spawns an asynchronous computation that completes this placeholder object does not mean that the asynchronous computation is called a future. There are also other future constructors/factory methods.
But the point I do not get is how promise has a future?
To divide promises and futures into 2 separate interfaces was a design decision. You could have these two under the same interface Future, but that would then allow clients of futures to complete them instead of the intended completer of the future. This would cause unexpected errors, as there could be any number of contending completers.
E.g. for the asynchronous computation spawned by the future construct, it would no longer be clear whether it has to complete the promise, or if the client will do it.
Futures and promises are intended to constrain the flow of data in the program.
The idea is to have a future client that subscribes to the data to act on it once the data arrives.
The role of the promise client is to provide that data.
Mixing these two roles can lead to programs that are harder to understand or reason about.
You might also ask why the Promise trait does not extend Future. This is another design decision to discourage programmers from blindly passing Promises to clients where they should upcast the Promise to Future (this upcast is prone to be left out, whereas having to explicitly call future on the promise ensures you call it every time). In other words, by returning a promise you are giving the right to complete it to somebody else, and by returning the future you are giving the right to subscribe to it.
EDIT:
If you would like to learn more about futures, Chapter 4 in the Learning Concurrent Programming in Scala book describes them in detail. Disclaimer: I'm the author of the book.
The difference between the two is that futures are usually centered around the computation while promises are centered around data.
It seems your understanding matches this, but let me explain what I mean:
In both scala and clojure futures are (unless returned by some other function/method) created with some computation:
// scala
future { do_something() }
;; clojure
(future (do-something))
In both cases the "return-value" of the future can only be read (without blocking) only after the computation has terminated. When this is the case is typically outside the control of the programmer, as the computation gets executed in some thread (pool) in the background.
In contrast in both cases promises are an initially empty container, which can later be filled (exactly once):
// scala
val p = promise[Int]
...
p success 10 // or failure Exception()
;; clojure
(def p (promise))
(deliver p 10)
Once this is the case it can be read.
Reading the futures and promises is done through deref in clojure (and realized? can be used to check if deref will block). In scala reading is done through the methods provided by the Future trait. In order to read the result of a promise we thus have to obtain an object implementing Future, this is done by p.future. Now if the trait Future is implemented by a Promise, then p.future can return this and the two are equal. This is purely a implementation choice and does not change the concepts. So you were not wrong!
In any case Futures are mostly dealt with using callbacks.
At this point it might be worthwhile to reconsider the initial characterization of the two concepts:
Futures represent a computation that will produce a result at some point. Let's look at one possible implementation: We run the code in some thread(pool) and once its done, we arrange use the return value to fulfill a promise. So reading the result of the future is reading a promise; This is clojure's way of thinking (not necessarily of implementation).
On the other hand a promise represents a value that will be filled at some point. When it gets filled this means that some computation produced a result. So in a way this is like a future completing, so we should consume the value in the same way, using callbacks; This is scala's way of thinking.
Note that under the hood Future is implemented in terms of Promise and this Promise is completed with the body you passed to your Future:
def apply[T](body: =>T): Future[T] = impl.Future(body) //here I have omitted the implicit ExecutorContext
impl.Future is an implementation of Future trait:
def apply[T](body: =>T)(implicit executor: ExecutionContext): scala.concurrent.Future[T] =
{
val runnable = new PromiseCompletingRunnable(body)
executor.prepare.execute(runnable)
runnable.promise.future
}
Where PromiseCompletingRunnable looks like this:
class PromiseCompletingRunnable[T](body: => T) extends Runnable {
val promise = new Promise.DefaultPromise[T]()
override def run() = {
promise complete {
try Success(body) catch { case NonFatal(e) => Failure(e) }
}
} }
So you see even though they are seperate concepts that you can make use of independently in reality you can't get Future without using Promise.

Traversing lists and streams with a function returning a future

Introduction
Scala's Future (new in 2.10 and now 2.9.3) is an applicative functor, which means that if we have a traversable type F, we can take an F[A] and a function A => Future[B] and turn them into a Future[F[B]].
This operation is available in the standard library as Future.traverse. Scalaz 7 also provides a more general traverse that we can use here if we import the applicative functor instance for Future from the scalaz-contrib library.
These two traverse methods behave differently in the case of streams. The standard library traversal consumes the stream before returning, while Scalaz's returns the future immediately:
import scala.concurrent._
import ExecutionContext.Implicits.global
// Hangs.
val standardRes = Future.traverse(Stream.from(1))(future(_))
// Returns immediately.
val scalazRes = Stream.from(1).traverse(future(_))
There's also another difference, as Leif Warner observes here. The standard library's traverse starts all of the asynchronous operations immediately, while Scalaz's starts the first, waits for it to complete, starts the second, waits for it, and so on.
Different behavior for streams
It's pretty easy to show this second difference by writing a function that will sleep for a few seconds for the first value in the stream:
def howLong(i: Int) = if (i == 1) 10000 else 0
import scalaz._, Scalaz._
import scalaz.contrib.std._
def toFuture(i: Int)(implicit ec: ExecutionContext) = future {
printf("Starting %d!\n", i)
Thread.sleep(howLong(i))
printf("Done %d!\n", i)
i
}
Now Future.traverse(Stream(1, 2))(toFuture) will print the following:
Starting 1!
Starting 2!
Done 2!
Done 1!
And the Scalaz version (Stream(1, 2).traverse(toFuture)):
Starting 1!
Done 1!
Starting 2!
Done 2!
Which probably isn't what we want here.
And for lists?
Strangely enough the two traversals behave the same in this respect on lists—Scalaz's doesn't wait for one future to complete before starting the next.
Another future
Scalaz also includes its own concurrent package with its own implementation of futures. We can use the same kind of setup as above:
import scalaz.concurrent.{ Future => FutureZ, _ }
def toFutureZ(i: Int) = FutureZ {
printf("Starting %d!\n", i)
Thread.sleep(howLong(i))
printf("Done %d!\n", i)
i
}
And then we get the behavior of Scalaz on streams for lists as well as streams:
Starting 1!
Done 1!
Starting 2!
Done 2!
Perhaps less surprisingly, traversing an infinite stream still returns immediately.
Question
At this point we really need a table to summarize, but a list will have to do:
Streams with standard library traversal: consume before returning; don't wait for each future.
Streams with Scalaz traversal: return immediately; do wait for each future to complete.
Scalaz futures with streams: return immediately; do wait for each future to complete.
And:
Lists with standard library traversal: don't wait.
Lists with Scalaz traversal: don't wait.
Scalaz futures with lists: do wait for each future to complete.
Does this make any sense? Is there a "correct" behavior for this operation on lists and streams? Is there some reason that the "most asynchronous" behavior—i.e., don't consume the collection before returning, and don't wait for each future to complete before moving on to the next—isn't represented here?
I cannot answer it all, but i try on some parts:
Is there some reason that the "most asynchronous" behavior—i.e., don't
consume the collection before returning, and don't wait for each
future to complete before moving on to the next—isn't represented
here?
If you have dependent calculations and a limited number of threads, you can experience deadlocks. For example you have two futures depending on a third one (all three in the list of futures) and only two threads, you can experience a situation where the first two futures block all two threads and the third one never gets executed. (Of course, if your pool size is one, i.e. zou execute one calculation after the other, you can get similar situations)
To solve this, you need one thread per future, without any limitation. This works for small lists of futures, but not for big one. So if you run all in parallel, you will get a situation where small examples will run in all cases and bigger one will deadlock. (Example: Developer tests run fine, production deadlocks).
Is there a "correct" behavior for this operation on lists and streams?
I think it is impossible with futures. If you know something more of the dependencies, or when you know for sure that the calculations will not block, a more concurrent solution might be possible. But executing lists of futures looks for me "broken by design". Best solution seems one, that will already fail for small examples for deadlocks (i.e. execute one Future after the other).
Scalaz futures with lists: do wait for each future to complete.
I think scalaz uses for comprehensions internally for traversal. With for comprehensions, it is not guaranteed that the calculations are independent. So I guess that Scalaz is doing the right thing here with for comprehensions: Doing one calculation after the other. In the case of futures, this will always work, given you have unlimited threads in you operating system.
So in other words: You see just an artifact of how for comprehensions (must) work.
I hope this makes some sense.
If I understand the question correctly, I think it really comes down to the semantics of streams vs lists.
Traversing a list does what we'd expect from the docs:
Transforms a TraversableOnce[A] into a Future[TraversableOnce[B]] using the provided function A => Future[B]. This is useful for performing a parallel map. For example, to apply a function to all items of a list in parallel:
With streams, it's up to the developer to decide how they want it to work because it depends on more knowledge of the stream than the compiler has (streams can be infinite, but the type system doesn't know about it). if my stream is reading lines from a file, I want to consume it first, since chaining futures line by line wouldn't actually parallelize things. in this case, I would want the parallel approach.
On the other hand, if my stream is an infinite list generating sequential integers and hunting for the first prime greater than some large number, it would be impossible to consume the stream first in one sweep (the chained Future approach would be required, and we'd probably want to run over batches from the stream).
Rather than trying to figure out a canonical way to handle this, I wonder if there are missing types that would help make the different cases more explicit.