Scala - differents between eventually timeout and Thread.sleep() - scala

I have some async (ZIO) code, which I need to test. If I create a testing part using Thread.sleep() it works fine and I always get response:
for {
saved <- database.save(smth)
result <- eventually {
Thread.sleep(20000)
database.search(...)
}
} yield result
But if I made same logic using timeout and interval from eventually then it never works correctly ( I got timeouts):
for {
saved <- database.save(smth)
result <- eventually(timeout(Span(20, Seconds)), interval(Span(20, Seconds))) {
database.search(...)
}
} yield result
I do not understand why timeout and interval works different then Thread.sleep. It should be doing exactly same thing. Can someone explain it to me and tell how I should change this code to do not need to use Thread.sleep()?

Assuming database.search(...) returns ZIO[] object.
eventually{database.search(...)} most probably succeeds immediately after the first try.
It successfully created a task to query the database.
Then database is queried without any retry logic.
Regarding how to make it work:
val search: ZIO[Any, Throwable, String] = ???
val retried: ZIO[Any with Clock, Throwable, Option[String]] = search.retry(Schedule.spaced(Duration.fromMillis(1000))).timeout(Duration.fromMillis(20000))
Something like that should work. But I believe that more elegant solutions exist.

The other answer from #simpadjo addresses the "what" quite succinctly. I'll add some additional context as to why you might see this behavior.
for {
saved <- database.save(smth)
result <- eventually {
Thread.sleep(20000)
database.search(...)
}
} yield result
There are three different technologies being mixed here which is causing some confusion.
First is ZIO which is an asynchronous programming library that uses it's own custom runtime and execution model to perform tasks. The second is eventually which comes from ScalaTest and is useful for checking asynchronous computations by effectively polling the state of a value. And thirdly, there is Thread.sleep which is a Java api that literally suspends the current thread and prevents task progression until the timer expires.
eventually uses a simple retry mechanism that differs based on whether you are using a normal value or a Future from the scala standard library. Basically it runs the code in the block and if it throws then it sleeps the current thread and then retries it based on some interval configuration, eventually timing out. Notably in this case the behavior is entirely synchronous, meaning that as long as the value in the {} doesn't throw an exception it won't keep retrying.
Thread.sleep is a heavy weight operation and in this case it is effectively blocking the function being passed to eventually from progressing for 20 seconds. Meaning that by the time the database.search is called the operation has likely completed.
The second variant is different, it executes the code in the eventually block immediately, if it throws an exception then it will attempt it again based on the interval/timeout logic that your provide. In this scenario the save may not have completed (or propagated if it is eventually consistent). Because you are returning a ZIO which is designed not to throw, and eventually doesn't understand ZIO it will simply return the search attempt with no retry logic.
The accepted answer:
val retried: ZIO[Any with Clock, Throwable, Option[String]] = search.retry(Schedule.spaced(Duration.fromMillis(1000))).timeout(Duration.fromMillis(20000))
works because the retry and timeout are using the built-in ZIO operators which do understand how to actually retry and timeout a ZIO. Meaning that if search fails the retry will handle it until it succeeds.

Related

Cats-effect and asynchronous IO specifics

For few days I have been wrapping my head around cats-effect and IO. And I feel I have some misconceptions about this effect or simply I missed its point.
First of all - if IO can replace Scala's Future, how can we create an async IO task? Using IO.shift? Using IO.async? Is IO.delay sync or async? Can we make a generic async task with code like this Async[F].delay(...)? Or async happens when we call IO with unsafeToAsync or unsafeToFuture?
What's the point of Async and Concurrent in cats-effect? Why they are separated?
Is IO a green thread? If yes, why is there a Fiber object in cats-effect? As I understand the Fiber is the green thread, but docs claim we can think of IOs as green threads.
I would appreciate some clarifing on any of this as I have failed comprehending cats-effect docs on those and internet was not that helpfull...
if IO can replace Scala's Future, how can we create an async IO task
First, we need to clarify what is meant as an async task. Usually async means "does not block the OS thread", but since you're mentioning Future, it's a bit blurry. Say, if I wrote:
Future { (1 to 1000000).foreach(println) }
it would not be async, as it's a blocking loop and blocking output, but it would potentially execute on a different OS thread, as managed by an implicit ExecutionContext. The equivalent cats-effect code would be:
for {
_ <- IO.shift
_ <- IO.delay { (1 to 1000000).foreach(println) }
} yield ()
(it's not the shorter version)
So,
IO.shift is used to maybe change thread / thread pool. Future does it on every operation, but it's not free performance-wise.
IO.delay { ... } (a.k.a. IO { ... }) does NOT make anything async and does NOT switch threads. It's used to create simple IO values from synchronous side-effecting APIs
Now, let's get back to true async. The thing to understand here is this:
Every async computation can be represented as a function taking callback.
Whether you're using API that returns Future or Java's CompletableFuture, or something like NIO CompletionHandler, it all can be converted to callbacks. This is what IO.async is for: you can convert any function taking callback to an IO. And in case like:
for {
_ <- IO.async { ... }
_ <- IO(println("Done"))
} yield ()
Done will be only printed when (and if) the computation in ... calls back. You can think of it as blocking the green thread, but not OS thread.
So,
IO.async is for converting any already asynchronous computation to IO.
IO.delay is for converting any completely synchronous computation to IO.
The code with truly asynchronous computations behaves like it's blocking a green thread.
The closest analogy when working with Futures is creating a scala.concurrent.Promise and returning p.future.
Or async happens when we call IO with unsafeToAsync or unsafeToFuture?
Sort of. With IO, nothing happens unless you call one of these (or use IOApp). But IO does not guarantee that you would execute on a different OS thread or even asynchronously unless you asked for this explicitly with IO.shift or IO.async.
You can guarantee thread switching any time with e.g. (IO.shift *> myIO).unsafeRunAsyncAndForget(). This is possible exactly because myIO would not be executed until asked for it, whether you have it as val myIO or def myIO.
You cannot magically transform blocking operations into non-blocking, however. That's not possible neither with Future nor with IO.
What's the point of Async and Concurrent in cats-effect? Why they are separated?
Async and Concurrent (and Sync) are type classes. They are designed so that programmers can avoid being locked to cats.effect.IO and can give you API that supports whatever you choose instead, such as monix Task or Scalaz 8 ZIO, or even monad transformer type such as OptionT[Task, *something*]. Libraries like fs2, monix and http4s make use of them to give you more choice of what to use them with.
Concurrent adds extra things on top of Async, most important of them being .cancelable and .start. These do not have a direct analogy with Future, since that does not support cancellation at all.
.cancelable is a version of .async that allows you to also specify some logic to cancel the operation you're wrapping. A common example is network requests - if you're not interested in results anymore, you can just abort them without waiting for server response and don't waste any sockets or processing time on reading the response. You might never use it directly, but it has it's place.
But what good are cancelable operations if you can't cancel them? Key observation here is that you cannot cancel an operation from within itself. Somebody else has to make that decision, and that would happen concurrently with the operation itself (which is where the type class gets its name). That's where .start comes in. In short,
.start is an explicit fork of a green thread.
Doing someIO.start is akin to doing val t = new Thread(someRunnable); t.start(), except it's green now. And Fiber is essentially a stripped down version of Thread API: you can do .join, which is like Thread#join(), but it does not block OS thread; and .cancel, which is safe version of .interrupt().
Note that there are other ways to fork green threads. For example, doing parallel operations:
val ids: List[Int] = List.range(1, 1000)
def processId(id: Int): IO[Unit] = ???
val processAll: IO[Unit] = ids.parTraverse_(processId)
will fork processing all IDs to green threads and then join them all. Or using .race:
val fetchFromS3: IO[String] = ???
val fetchFromOtherNode: IO[String] = ???
val fetchWhateverIsFaster = IO.race(fetchFromS3, fetchFromOtherNode).map(_.merge)
will execute fetches in parallel, give you first result completed and automatically cancel the fetch that is slower. So, doing .start and using Fiber is not the only way to fork more green threads, just the most explicit one. And that answers:
Is IO a green thread? If yes, why is there a Fiber object in cats-effect? As I understand the Fiber is the green thread, but docs claim we can think of IOs as green threads.
IO is like a green thread, meaning you can have lots of them running in parallel without overhead of OS threads, and the code in for-comprehension behaves as if it was blocking for the result to be computed.
Fiber is a tool for controlling green threads explicitly forked (waiting for completion or cancelling).

What can cause Akka's Scheduler to execute scheduled tasks before the scheduled time?

I'm experiencing a strange behaviour when using Akka's scheduler. My code looks roughly like this:
val s = ActorSystem("scheduler")
import scala.concurrent.ExecutionContext.Implicits.global
def doSomething(): Future[Unit] = {
val now = new GregorianCalendar(TimeZone.getTimeZone("UTC"))
println(s"${now.get(Calendar.MINUTE)}:${now.get(Calendar.SECOND)}:${now.get(Calendar.MILLISECOND)}" )
// Do many things that include an http request using "dispatch" and manipulation of the response and saving it in a file.
}
val futures: Seq[Future[Unit]] = for (i <- 1 to 500) yield {
println(s"$i : ${i*600}")
// AlphaVantage recommends 100 API calls per minute
akka.pattern.after(i * 600 milliseconds, s.scheduler) { doSomething() }
}
Future.sequence(futures).onComplete(_ => s.terminate())
When I execute my code, doSomething is initially called repeatedly with 600 milliseconds between successive calls, as expected. However, after a while, all remaining scheduled calls are suddenly executed simultaneously.
I suspect that something inside my doSomething might be interfering with the scheduling, but I don't know what. My doSomething just does an http request using dispatch and manipulates the result, and does not interact directly with akka or the scheduler in any way. So, my question is:
What can cause the Scheduler's schedule to fail and suddenly trigger the immediate execution of all remaining scheduled tasks?
(PS: I tried to simplify my doSomething to post a minimal non-working example here, but my simplifications resulted in working examples.)
Ok. I figured it out. As soon as one of the futures fail, the line
Future.sequence(futures).onComplete(_ => s.terminate())
will terminate the actor system, and all remaining scheduled tasks will be called.

Do all futures need to be waited on to guarantee their execution?

We have a Scala Play webapp which does a number of database operations as part of a HTTP request, each of which is a Future. Usually we bubble up the Futures to an async controller action and let Play handle waiting for them.
But I've also noticed in a number of places we don't bubble up the Future or even wait for it to complete. I think this is bad because it means the HTTP request wont fail if the future fails, but does it actually even guarantee the future will be executed at all, since nothing is going to wait on the result of it? Will Play drop un-awaited futures after the HTTP request has been served, or leave them running in the background?
TL;DR
Play will not kill your Futures after sending the HTTP response.
Errors will not be reported if any of your Futures fail.
Long version
Your futures will not be killed when the HTTP response has been sent. You can try that out for yourself like this:
def futuresTest = Action.async { request =>
println(s"Entered futuresTest at ${LocalDateTime.now}")
val ignoredFuture = Future{
var i = 0
while (i < 10) {
Thread.sleep(1000)
println(LocalDateTime.now)
i += 1
}
}
println(s"Leaving futuresTest at ${LocalDateTime.now}")
Future.successful(Ok)
}
However you are right that the request will not fail if any of the futures fail. If this is a problem then you can compose the futures using a for comprehension or flatMaps. Here's an example of what you can do (I'm assuming that your Futures only perform side efects (Future[Unit])
To let your futures execute in paralell
val dbFut1 = dbCall1(...)
val dbFut2 = dbCall2(...)
val wsFut1 = wsCall1(...)
val fut = for(
_ <- dbFut1;
_ <- dbFut2;
_ <- wsFut1
) yield ()
fut.map(_ => Ok)
To have them execute in sequence
val fut = for(
_ <- dbCall1(...);
_ <- dbCall2(...);
_ <- wsCall2(...)
) yield ()
fut.map(_ => Ok)
does it actually even guarantee the future will be executed at all,
since nothing is going to wait on the result of it? Will Play drop
un-awaited futures after the HTTP request has been served, or leave
them running in the background?
This question actually runs much deeper than Play. You're generally asking "If I don't synchronously wait on a future, how can I guarantee it will actually complete without being GCed?". To answer that, we need to understand how the GC actually views threads. From the GC point of view, a thread is what we call a "root". Such a root is the starting point for the heap to traverse it's objects and see which ones are eligible for collection. Among roots are also static fields, for example, which are known to live throughout the life time of the application.
So, when you view it like that, and think of what a Future actually does, which is queue a function that runs on a dedicated thread from the pool of threads available via the underlying ExecutorService (which we refer to as ExecutionContext in Scala), you see that even though you're not waiting on the completion, the JVM runtime does guarantee that your Future will run to completion. As for the Future object wrapping the function, it holds a reference to that unfinished function body so the Future itself isn't collected.
When you think about it from that point of view, it's totally logical, since execution of a Future happens asynchronously, and we usually continue processing it in an asynchronous manner using continuations such as map, flatMap, onComplete, etc.

How to compose two parallel Tasks to cancel one task if another one fails?

I would like to implement my asynchronous processing with
scalaz.concurrent.Task. I need a function (Task[A], Task[B]) => Task[(A, B)] to return a new task that works as follows:
run Task[A] and Task[B] in parallel and wait for the results;
if one of the tasks fails then cancel the second one and wait until it terminates;
return the results of both tasks.
How would you implement such a function ?
As I mention above, if you don't care about actually stopping the non-failed computation, you can use Nondeterminism. For example:
import scalaz._, scalaz.Scalaz._, scalaz.concurrent._
def pairFailSlow[A, B](a: Task[A], b: Task[B]): Task[(A, B)] = a.tuple(b)
def pairFailFast[A, B](a: Task[A], b: Task[B]): Task[(A, B)] =
Nondeterminism[Task].both(a, b)
val divByZero: Task[Int] = Task(1 / 0)
val waitALongTime: Task[String] = Task {
Thread.sleep(10000)
println("foo")
"foo"
}
And then:
pairFailSlow(divByZero, waitALongTime).run // fails immediately
pairFailSlow(waitALongTime, divByZero).run // hangs while sleeping
pairFailFast(divByZero, waitALongTime).run // fails immediately
pairFailFast(waitALongTime, divByZero).run // fails immediately
In every case except the first the side effect in waitALongTime will happen. If you wanted to attempt to stop that computation, you'd need to use something like Task's runAsyncInterruptibly.
There is a weird conception among java developers that you should not cancel parallel tasks. They comminate Thread.stop() and mark it deprecated. Without Thread.stop() you could not really cancel future. All you could do is to send some signal to future, or modify some shared variable and make code inside future to check it periodically. So, all libraries that provides futures could suggest the only way to cancel future: do it cooperatively.
I'm facing the same problem now and is in the middle of writing my own library for futures that could be cancelled. There are some difficulties but they may be solved. You just could not call Thread.stop() in any arbitrary position. The thread may perform updating shared variables. Lock would be recalled normally, but update may be stopped half-way, e.g. updating only half of double value and so on. So I'm introducing some lock. If the thread is in guarded state, then it should be now killed by Thread.stop() but with sending specific message. The guarded state is considered always very fast to be waited for. All other time, in the middle of computation, thread may be safely stopped and replaced with new one.
So, the answer is that: you should not desire to cancel futures, otherwise you are heretic and no one in java community would lend you a willing hand. You should define your own executional context that could kill threads and you should write your own futures library to run upon this context

Playframework non-blocking Action

Came across a problem I did not find an answer yet.
Running on playframework 2 with Scala.
Was required to write an Action method that performs multiple Future calls.
My question:
1) Is the attached code non-blocking and hence looking the way it should be ?
2) Is there a guarantee that both DAO results are caught at any given time ?
def index = Action.async {
val t2:Future[Tuple2[List[PlayerCol],List[CreatureCol]]] = for {
p <- PlayerDAO.findAll()
c <- CreatureDAO.findAlive()
}yield(p,c)
t2.map(t => Ok(views.html.index(t._1, t._2)))
}
Thanks for your feedback.
Is the attached code non-blocking and hence looking the way it should be ?
That depends on a few things. First, I'm going to assume that PlayerDAO.findAll() and CreatureDAO.findAlive() return Future[List[PlayerCol]] and Future[List[CreatureCol]] respectively. What matters most is what these functions are actually calling themselves. Are they making JDBC calls, or using an asynchronous DB driver?
If the answer is JDBC (or some other synchronous db driver), then you're still blocking, and there's no way to make it fully "non-blocking". Why? Because JDBC calls block their current thread, and wrapping them in a Future won't fix that. In this situation, the most you can do is have them block a different ExecutionContext than the one Play is using to handle requests. This is generally a good idea, because if you have several db requests running concurrently, they can block Play's internal thread pool used for handling HTTP requests, and suddenly your server will have to wait to handle other requests (even if they don't require database calls).
For more on different ExecutionContexts see the thread pools documentation and this answer.
If you're answer is an asynchronous database driver like reactive mongo (there's also scalike-jdbc, and maybe some others), then you're in good shape, and I probably made you read a little more than you had to. In that scenario your index controller function would be fully non-blocking.
Is there a guarantee that both DAO results are caught at any given time ?
I'm not quite sure what you mean by this. In your current code, you're actually making these calls in sequence. CreatureDAO.findAlive() isn't executed until PlayerDAO.findAll() has returned. Since they are not dependent on each other, it seems like this isn't intentional. To make them run in parallel, you should instantiate the Futures before mapping them in a for-comprehension:
def index = Action.async {
val players: Future[List[PlayerCol]] = PlayerDAO.findAll()
val creatures: Future[List[CreatureCol]] = CreatureDAO.findAlive()
val t2: Future[(List[PlayerCol], List[CreatureCol])] = for {
p <- players
c <- creatures
} yield (p, c)
t2.map(t => Ok(views.html.index(t._1, t._2)))
}
The only thing you can guarantee about having both results being completed is that yield isn't executed until the Futures have completed (or never, if they failed), and likewise the body of t2.map(...) isn't executed until t2 has been completed.
Further reading:
Are there any benefits in using non-async actions in Play Framework 2.2?
Understanding the Difference Between Non-Blocking Web Service Calls vs Non-Blocking JDBC