I am combining ZIO retry and repeat in a long polling procedure:
logic.repeat(repeatSchedule).retry(retrySchedule)
where logic is a ZIO that can fail.
Since retrySchedule can an exponential backoff it can grow indefinitely upon errors, however, I would like to reset it to its initial value upon the success of logic (which will be repeated infinitely)
I'm following the section about Schedule composition in ZIO Scheduling doc but I miss a "recursive" combination, in which something like the following is possible:
Schedule.exponential(baseDelay)
.whileOutput(_ < UpdaterManagerSettings.maxDelay)
.andThen([SOMETHING TO POINT RECURSIVELY AT THE SAME])
I think something like this may work for you:
val logic = ZIO(???)
// First, create an effect that uses exponential backoff to retry logic.
// This effect will complete as soon as logic succeeds,
// or maximum number of retries is exceeded.
val retrySchedule = Schedule.exponential(baseDelay) && Schedule.recurs(maxRetries)
val retriedLogic = logic.retry(retrySchedule)
// Then, repeat the whole retriedLogic infinitely.
val repeatSchedule = Schedule.spaced(1.second)
val wholeProcedure = retriedLogic.repeat(repeatSchedule)
As long as logic fails, it's retried with exponential backoff. But as soon as it succeeds, it's repeated in fixed intervals. If it fails again, exponential backoff starts again from baseDelay.
See a running example here: https://scastie.scala-lang.org/yZpBO34NRgK6BgYYIzk5Iw
Related
Let us use Scala.
I'm trying to find the best possible way to do an opportunistic, partial, and asynchronous pre-computation of some of the elements of an iterator that is otherwise processed synchronously.
The below image illustrates the problem.
There is a lead thread (blue) that takes an iterator and a state. The state contains mutable data that must be protected from concurrent access. Moreover, the state must be updated while the iterator is processed from the beginning, sequentially, and in order because the elements of the iterator depend on previous elements. Moreover, the nature of the dependency is not known in advance.
Processing some elements may lead to substantial overhead (2 orders of magnitude) compared to others, meaning that some elements are 1ms to compute and some elements are 300ms to compute. It would lead to significant improvements in terms of running time if I could pre-process the next k elements speculatively. A speculative pre-processing on asynchronous threads is possible (while the blue thread is synchronously processing), but the pre-processed data must be validated by the blue thread, whether the result of pre-computation is valid at that time. Usually (90% of the time), it should be valid. Thus, launching separate asynchronous threads to pre-process the remaining portion of the iterator speculatively would spear many 300s of milliseconds in running time.
I have studied comparisons of asynchronous and functional libraries of Scala to understand better which model of computation, or in other words, which description of computation (which library) could be a better fit to this processing problem. I was thinking about communication patterns and came about with the following ideas:
AKKA
Use an AKKA actor Blue for the blue thread that takes the iterator, and for each step, it sends a Step message to itself. On a Step message, before it starts the processing of the next ith element, it sends a PleasePreprocess(i+k) message with the i+kth element to one of the k pre-processor actors in place. The Blue would Step to i+1 only and only if PreprocessingKindlyDone(i+1) is received.
AKKA Streams
AFAIK AKKA streams also support the previous two-way backpressure mechanism, therefore, it could be a good candidate to implement what actors do without actually using actors.
Scala Futures
While the blue thread processes elements ˙processElement(e)˙ in iterator.map(processElement(_)), then it would also spawn Futures for preprocessing. However, maintaining these pre-processing Futures and awaiting their states would require a semi-blocking implementation in pure Scala as I see, so I would not go with this direction to the best of my current knowledge.
Use Monix
I have some knowledge of Monix but could not wrap my head around how this problem could be elegantly solved with Monix. I'm not seeing how the blue thread could wait for the result of i+1 and then continue. For this, I was thinking of using something like a sliding window with foldLeft(blueThreadAsZero){ (blue, preProc1, preProc2, notYetPreProc) => ... }, but could not find a similar construction.
Possibly, there could be libraries I did not mention that could better express computational patterns for this.
I hope I have described my problem adequately. Thank you for the hints/ideas or code snippets!
You need blocking anyhow, if your blue thread happens to go faster than the yellow ones. I don't think you need any fancy libraries for this, "vanilla scala" should do (like it actually does in most cases). Something like this, perhaps ...
def doit[T,R](it: Iterator[T], yellow: T => R, blue: R => R): Future[Seq[R]] = it
.map { elem => Future(yellow(elem)) }
.foldLeft(Future.successful(List.empty[R])) { (last, next) =>
last.flatMap { acc => next.map(blue).map(_ :: acc) }
}.map(_.reverse)
I didn't test or compile this, so it could need some tweaks, but conceptually, this should work: pass through the iterator and start preprocessing right away, then fold to tuck the "validation" on each completing preprocess sequentially.
I would split the processing into two steps, the pre-processing that could be run in parallel and the dependent one which has to be serial.
Then, you can just create a stream of data from the iterator do a parallel map applying the preprocess step and finish with a fold
Personally I would use fs2, but the same approach can be expressed with any streaming solution like AkkaStreams, Monix Observables or ZIO ZStreams
import fs2.Stream
import cats.effect.IO
val finalState =
Stream
.fromIterator[IO](iterator = ???, chunkSize = ???)
.parEvalMap(elem => IO(preProcess(elem))
.compile
.fold(initialState) {
case (acc, elem) =>
computeNewState(acc, elem)
}
PS: Remember to benchmark to make sure parallelism is actually speeding things up; it may not be worth the hassle.
I have some async (ZIO) code, which I need to test. If I create a testing part using Thread.sleep() it works fine and I always get response:
for {
saved <- database.save(smth)
result <- eventually {
Thread.sleep(20000)
database.search(...)
}
} yield result
But if I made same logic using timeout and interval from eventually then it never works correctly ( I got timeouts):
for {
saved <- database.save(smth)
result <- eventually(timeout(Span(20, Seconds)), interval(Span(20, Seconds))) {
database.search(...)
}
} yield result
I do not understand why timeout and interval works different then Thread.sleep. It should be doing exactly same thing. Can someone explain it to me and tell how I should change this code to do not need to use Thread.sleep()?
Assuming database.search(...) returns ZIO[] object.
eventually{database.search(...)} most probably succeeds immediately after the first try.
It successfully created a task to query the database.
Then database is queried without any retry logic.
Regarding how to make it work:
val search: ZIO[Any, Throwable, String] = ???
val retried: ZIO[Any with Clock, Throwable, Option[String]] = search.retry(Schedule.spaced(Duration.fromMillis(1000))).timeout(Duration.fromMillis(20000))
Something like that should work. But I believe that more elegant solutions exist.
The other answer from #simpadjo addresses the "what" quite succinctly. I'll add some additional context as to why you might see this behavior.
for {
saved <- database.save(smth)
result <- eventually {
Thread.sleep(20000)
database.search(...)
}
} yield result
There are three different technologies being mixed here which is causing some confusion.
First is ZIO which is an asynchronous programming library that uses it's own custom runtime and execution model to perform tasks. The second is eventually which comes from ScalaTest and is useful for checking asynchronous computations by effectively polling the state of a value. And thirdly, there is Thread.sleep which is a Java api that literally suspends the current thread and prevents task progression until the timer expires.
eventually uses a simple retry mechanism that differs based on whether you are using a normal value or a Future from the scala standard library. Basically it runs the code in the block and if it throws then it sleeps the current thread and then retries it based on some interval configuration, eventually timing out. Notably in this case the behavior is entirely synchronous, meaning that as long as the value in the {} doesn't throw an exception it won't keep retrying.
Thread.sleep is a heavy weight operation and in this case it is effectively blocking the function being passed to eventually from progressing for 20 seconds. Meaning that by the time the database.search is called the operation has likely completed.
The second variant is different, it executes the code in the eventually block immediately, if it throws an exception then it will attempt it again based on the interval/timeout logic that your provide. In this scenario the save may not have completed (or propagated if it is eventually consistent). Because you are returning a ZIO which is designed not to throw, and eventually doesn't understand ZIO it will simply return the search attempt with no retry logic.
The accepted answer:
val retried: ZIO[Any with Clock, Throwable, Option[String]] = search.retry(Schedule.spaced(Duration.fromMillis(1000))).timeout(Duration.fromMillis(20000))
works because the retry and timeout are using the built-in ZIO operators which do understand how to actually retry and timeout a ZIO. Meaning that if search fails the retry will handle it until it succeeds.
Is there any way to interrupt a parallel collection computation in Scala?
Example:
val r = new Runnable {
override def run(): Unit = {
(1 to 3).par.foreach { _ => Thread.sleep(5000000) }
}
}
val t = new Thread(r)
t.start()
Thread.sleep(300) // let them spin up
t.interrupt()
I'd expect t.interrupt to interrupt all threads spawned by par, but this is not happening, it keeps spinning inside ForkJoinTask.externalAwaitDone. Looks like that method clears the interrupted status and keeps waiting for the spawned threads to finish.
This is Scala 2.12
The thread that you t.start() is responsible just for starting parallel computations and to wait and gather the result.
It is not connected to threads that compute operations. Usually, it runs on default forkJoinPool that independent from the thread that submits computation tasks.
If you want to interrupt the computation, you can use custom execution back-end (like manually created forkJoinPool or a threadPool), and then shut it down. You can read about that here.
Or you can provide a callback from the computation.
But all those approaches are not so good for such a case.
If you producing a production solution or your case is complex and critical for the app, you probably should use something that has cancellation by design. Like Monix.Task or CancellableFuture.
Or at least use Future and cancel it with workarounds.
I'm experiencing a strange behaviour when using Akka's scheduler. My code looks roughly like this:
val s = ActorSystem("scheduler")
import scala.concurrent.ExecutionContext.Implicits.global
def doSomething(): Future[Unit] = {
val now = new GregorianCalendar(TimeZone.getTimeZone("UTC"))
println(s"${now.get(Calendar.MINUTE)}:${now.get(Calendar.SECOND)}:${now.get(Calendar.MILLISECOND)}" )
// Do many things that include an http request using "dispatch" and manipulation of the response and saving it in a file.
}
val futures: Seq[Future[Unit]] = for (i <- 1 to 500) yield {
println(s"$i : ${i*600}")
// AlphaVantage recommends 100 API calls per minute
akka.pattern.after(i * 600 milliseconds, s.scheduler) { doSomething() }
}
Future.sequence(futures).onComplete(_ => s.terminate())
When I execute my code, doSomething is initially called repeatedly with 600 milliseconds between successive calls, as expected. However, after a while, all remaining scheduled calls are suddenly executed simultaneously.
I suspect that something inside my doSomething might be interfering with the scheduling, but I don't know what. My doSomething just does an http request using dispatch and manipulates the result, and does not interact directly with akka or the scheduler in any way. So, my question is:
What can cause the Scheduler's schedule to fail and suddenly trigger the immediate execution of all remaining scheduled tasks?
(PS: I tried to simplify my doSomething to post a minimal non-working example here, but my simplifications resulted in working examples.)
Ok. I figured it out. As soon as one of the futures fail, the line
Future.sequence(futures).onComplete(_ => s.terminate())
will terminate the actor system, and all remaining scheduled tasks will be called.
I would like to implement my asynchronous processing with
scalaz.concurrent.Task. I need a function (Task[A], Task[B]) => Task[(A, B)] to return a new task that works as follows:
run Task[A] and Task[B] in parallel and wait for the results;
if one of the tasks fails then cancel the second one and wait until it terminates;
return the results of both tasks.
How would you implement such a function ?
As I mention above, if you don't care about actually stopping the non-failed computation, you can use Nondeterminism. For example:
import scalaz._, scalaz.Scalaz._, scalaz.concurrent._
def pairFailSlow[A, B](a: Task[A], b: Task[B]): Task[(A, B)] = a.tuple(b)
def pairFailFast[A, B](a: Task[A], b: Task[B]): Task[(A, B)] =
Nondeterminism[Task].both(a, b)
val divByZero: Task[Int] = Task(1 / 0)
val waitALongTime: Task[String] = Task {
Thread.sleep(10000)
println("foo")
"foo"
}
And then:
pairFailSlow(divByZero, waitALongTime).run // fails immediately
pairFailSlow(waitALongTime, divByZero).run // hangs while sleeping
pairFailFast(divByZero, waitALongTime).run // fails immediately
pairFailFast(waitALongTime, divByZero).run // fails immediately
In every case except the first the side effect in waitALongTime will happen. If you wanted to attempt to stop that computation, you'd need to use something like Task's runAsyncInterruptibly.
There is a weird conception among java developers that you should not cancel parallel tasks. They comminate Thread.stop() and mark it deprecated. Without Thread.stop() you could not really cancel future. All you could do is to send some signal to future, or modify some shared variable and make code inside future to check it periodically. So, all libraries that provides futures could suggest the only way to cancel future: do it cooperatively.
I'm facing the same problem now and is in the middle of writing my own library for futures that could be cancelled. There are some difficulties but they may be solved. You just could not call Thread.stop() in any arbitrary position. The thread may perform updating shared variables. Lock would be recalled normally, but update may be stopped half-way, e.g. updating only half of double value and so on. So I'm introducing some lock. If the thread is in guarded state, then it should be now killed by Thread.stop() but with sending specific message. The guarded state is considered always very fast to be waited for. All other time, in the middle of computation, thread may be safely stopped and replaced with new one.
So, the answer is that: you should not desire to cancel futures, otherwise you are heretic and no one in java community would lend you a willing hand. You should define your own executional context that could kill threads and you should write your own futures library to run upon this context