I read the tokio documentation and I wonder what is the best approach for encapsulating costly synchronous I/O in a future.
With the reactor framework, we get the advantage of a green threading model: a few OS threads handle a lot of concurrent tasks through an executor.
The future model of tokio is demand driven, which means the future itself will poll its internal state to provide informations about its completion; allowing backpressure and cancellation capabilities. As far as I understand, the polling phase of the future must be non-blocking to work well.
The I/O I want to encapsulate can be seen as a long atomic and costly operation. Ideally, an independent task would perform the I/O and the associated future would poll the I/O thread for the completion status.
The two only options I see are:
Include the blocking I/O in the poll function of the future.
spawn an OS thread to perform the I/O and use the future mechanism to poll its state, as shown in the documentation
As I understand it, neither solution is optimal and don't get the full advantage of the green-threading model (first is not advised in documentation and second don't pass through the executor provided by reactor framework). Is there another solution?
Ideally, an independent task would perform the I/O and the associated future would poll the I/O thread for the completion status.
Yes, this is the recommended approach for asynchronous execution. Note that this is not restricted to I/O, but is valid for any long-running synchronous task!
Futures crate
The ThreadPool type was created for this1.
In this case, you spawn work to run in the pool. The pool itself performs the work to check to see if the work is completed yet and returns a type that fulfills the Future trait.
use futures::{
executor::{self, ThreadPool},
future,
task::{SpawnError, SpawnExt},
}; // 0.3.1, features = ["thread-pool"]
use std::{thread, time::Duration};
async fn delay_for(pool: &ThreadPool, seconds: u64) -> Result<u64, SpawnError> {
pool.spawn_with_handle(async {
thread::sleep(Duration::from_secs(3));
3
})?
.await;
Ok(seconds)
}
fn main() -> Result<(), SpawnError> {
let pool = ThreadPool::new().expect("Unable to create threadpool");
let a = delay_for(&pool, 3);
let b = delay_for(&pool, 1);
let c = executor::block_on(async {
let (a, b) = future::join(a, b).await;
Ok(a? + b?)
});
println!("{}", c?);
Ok(())
}
You can see that the total time is only 3 seconds:
% time ./target/debug/example
4
real 3.010
user 0.002
sys 0.003
1 — There's some discussion that the current implementation may not be the best for blocking operations, but it suffices for now.
Tokio
Here, we use task::spawn_blocking
use futures::future; // 0.3.15
use std::{thread, time::Duration};
use tokio::task; // 1.7.1, features = ["full"]
async fn delay_for(seconds: u64) -> Result<u64, task::JoinError> {
task::spawn_blocking(move || {
thread::sleep(Duration::from_secs(seconds));
seconds
})
.await?;
Ok(seconds)
}
#[tokio::main]
async fn main() -> Result<(), task::JoinError> {
let a = delay_for(3);
let b = delay_for(1);
let (a, b) = future::join(a, b).await;
let c = a? + b?;
println!("{}", c);
Ok(())
}
See also CPU-bound tasks and blocking code in the Tokio documentation.
Additional points
Note that this is not an efficient way of sleeping, it's just a placeholder for some blocking operation. If you actually need to sleep, use something like futures-timer or tokio::time::sleep. See Why does Future::select choose the future with a longer sleep period first? for more details
neither solution is optimal and don't get the full advantage of the green-threading model
That's correct - because you don't have something that is asynchronous! You are trying to combine two different methodologies and there has to be an ugly bit somewhere to translate between them.
second don't pass through the executor provided by reactor framework
I'm not sure what you mean here. There's an executor implicitly created by block_on or tokio::main. The thread pool has some internal logic that checks to see if a thread is done, but that should only be triggered when the user's executor polls it.
Related
Is there any way to interrupt a parallel collection computation in Scala?
Example:
val r = new Runnable {
override def run(): Unit = {
(1 to 3).par.foreach { _ => Thread.sleep(5000000) }
}
}
val t = new Thread(r)
t.start()
Thread.sleep(300) // let them spin up
t.interrupt()
I'd expect t.interrupt to interrupt all threads spawned by par, but this is not happening, it keeps spinning inside ForkJoinTask.externalAwaitDone. Looks like that method clears the interrupted status and keeps waiting for the spawned threads to finish.
This is Scala 2.12
The thread that you t.start() is responsible just for starting parallel computations and to wait and gather the result.
It is not connected to threads that compute operations. Usually, it runs on default forkJoinPool that independent from the thread that submits computation tasks.
If you want to interrupt the computation, you can use custom execution back-end (like manually created forkJoinPool or a threadPool), and then shut it down. You can read about that here.
Or you can provide a callback from the computation.
But all those approaches are not so good for such a case.
If you producing a production solution or your case is complex and critical for the app, you probably should use something that has cancellation by design. Like Monix.Task or CancellableFuture.
Or at least use Future and cancel it with workarounds.
For few days I have been wrapping my head around cats-effect and IO. And I feel I have some misconceptions about this effect or simply I missed its point.
First of all - if IO can replace Scala's Future, how can we create an async IO task? Using IO.shift? Using IO.async? Is IO.delay sync or async? Can we make a generic async task with code like this Async[F].delay(...)? Or async happens when we call IO with unsafeToAsync or unsafeToFuture?
What's the point of Async and Concurrent in cats-effect? Why they are separated?
Is IO a green thread? If yes, why is there a Fiber object in cats-effect? As I understand the Fiber is the green thread, but docs claim we can think of IOs as green threads.
I would appreciate some clarifing on any of this as I have failed comprehending cats-effect docs on those and internet was not that helpfull...
if IO can replace Scala's Future, how can we create an async IO task
First, we need to clarify what is meant as an async task. Usually async means "does not block the OS thread", but since you're mentioning Future, it's a bit blurry. Say, if I wrote:
Future { (1 to 1000000).foreach(println) }
it would not be async, as it's a blocking loop and blocking output, but it would potentially execute on a different OS thread, as managed by an implicit ExecutionContext. The equivalent cats-effect code would be:
for {
_ <- IO.shift
_ <- IO.delay { (1 to 1000000).foreach(println) }
} yield ()
(it's not the shorter version)
So,
IO.shift is used to maybe change thread / thread pool. Future does it on every operation, but it's not free performance-wise.
IO.delay { ... } (a.k.a. IO { ... }) does NOT make anything async and does NOT switch threads. It's used to create simple IO values from synchronous side-effecting APIs
Now, let's get back to true async. The thing to understand here is this:
Every async computation can be represented as a function taking callback.
Whether you're using API that returns Future or Java's CompletableFuture, or something like NIO CompletionHandler, it all can be converted to callbacks. This is what IO.async is for: you can convert any function taking callback to an IO. And in case like:
for {
_ <- IO.async { ... }
_ <- IO(println("Done"))
} yield ()
Done will be only printed when (and if) the computation in ... calls back. You can think of it as blocking the green thread, but not OS thread.
So,
IO.async is for converting any already asynchronous computation to IO.
IO.delay is for converting any completely synchronous computation to IO.
The code with truly asynchronous computations behaves like it's blocking a green thread.
The closest analogy when working with Futures is creating a scala.concurrent.Promise and returning p.future.
Or async happens when we call IO with unsafeToAsync or unsafeToFuture?
Sort of. With IO, nothing happens unless you call one of these (or use IOApp). But IO does not guarantee that you would execute on a different OS thread or even asynchronously unless you asked for this explicitly with IO.shift or IO.async.
You can guarantee thread switching any time with e.g. (IO.shift *> myIO).unsafeRunAsyncAndForget(). This is possible exactly because myIO would not be executed until asked for it, whether you have it as val myIO or def myIO.
You cannot magically transform blocking operations into non-blocking, however. That's not possible neither with Future nor with IO.
What's the point of Async and Concurrent in cats-effect? Why they are separated?
Async and Concurrent (and Sync) are type classes. They are designed so that programmers can avoid being locked to cats.effect.IO and can give you API that supports whatever you choose instead, such as monix Task or Scalaz 8 ZIO, or even monad transformer type such as OptionT[Task, *something*]. Libraries like fs2, monix and http4s make use of them to give you more choice of what to use them with.
Concurrent adds extra things on top of Async, most important of them being .cancelable and .start. These do not have a direct analogy with Future, since that does not support cancellation at all.
.cancelable is a version of .async that allows you to also specify some logic to cancel the operation you're wrapping. A common example is network requests - if you're not interested in results anymore, you can just abort them without waiting for server response and don't waste any sockets or processing time on reading the response. You might never use it directly, but it has it's place.
But what good are cancelable operations if you can't cancel them? Key observation here is that you cannot cancel an operation from within itself. Somebody else has to make that decision, and that would happen concurrently with the operation itself (which is where the type class gets its name). That's where .start comes in. In short,
.start is an explicit fork of a green thread.
Doing someIO.start is akin to doing val t = new Thread(someRunnable); t.start(), except it's green now. And Fiber is essentially a stripped down version of Thread API: you can do .join, which is like Thread#join(), but it does not block OS thread; and .cancel, which is safe version of .interrupt().
Note that there are other ways to fork green threads. For example, doing parallel operations:
val ids: List[Int] = List.range(1, 1000)
def processId(id: Int): IO[Unit] = ???
val processAll: IO[Unit] = ids.parTraverse_(processId)
will fork processing all IDs to green threads and then join them all. Or using .race:
val fetchFromS3: IO[String] = ???
val fetchFromOtherNode: IO[String] = ???
val fetchWhateverIsFaster = IO.race(fetchFromS3, fetchFromOtherNode).map(_.merge)
will execute fetches in parallel, give you first result completed and automatically cancel the fetch that is slower. So, doing .start and using Fiber is not the only way to fork more green threads, just the most explicit one. And that answers:
Is IO a green thread? If yes, why is there a Fiber object in cats-effect? As I understand the Fiber is the green thread, but docs claim we can think of IOs as green threads.
IO is like a green thread, meaning you can have lots of them running in parallel without overhead of OS threads, and the code in for-comprehension behaves as if it was blocking for the result to be computed.
Fiber is a tool for controlling green threads explicitly forked (waiting for completion or cancelling).
I'm trying to understand Future::select: in this example, the future with a longer time delay is returned first.
When I read this article with its example, I get cognitive dissonance. The author writes:
The select function runs two (or more in case of select_all) futures and returns the first one coming to completion. This is useful for implementing timeouts.
It seems I don't understand the sense of select.
extern crate futures; // v0.1 (old)
extern crate tokio_core;
use std::thread;
use std::time::Duration;
use futures::{Async, Future};
use tokio_core::reactor::Core;
struct Timeout {
time: u32,
}
impl Timeout {
fn new(period: u32) -> Timeout {
Timeout { time: period }
}
}
impl Future for Timeout {
type Item = u32;
type Error = String;
fn poll(&mut self) -> Result<Async<u32>, Self::Error> {
thread::sleep(Duration::from_secs(self.time as u64));
println!("Timeout is done with time {}.", self.time);
Ok(Async::Ready(self.time))
}
}
fn main() {
let mut reactor = Core::new().unwrap();
let time_out1 = Timeout::new(5);
let time_out2 = Timeout::new(1);
let task = time_out1.select(time_out2);
let mut reactor = Core::new().unwrap();
reactor.run(task);
}
I need to process the early future with the smaller time delay, and then work with the future with a longer delay. How can I do it?
TL;DR: use tokio::time
If there's one thing to take away from this: never perform blocking or long-running operations inside of asynchronous operations.
If you want a timeout, use something from tokio::time, such as delay_for or timeout:
use futures::future::{self, Either}; // 0.3.1
use std::time::Duration;
use tokio::time; // 0.2.9
#[tokio::main]
async fn main() {
let time_out1 = time::delay_for(Duration::from_secs(5));
let time_out2 = time::delay_for(Duration::from_secs(1));
match future::select(time_out1, time_out2).await {
Either::Left(_) => println!("Timer 1 finished"),
Either::Right(_) => println!("Timer 2 finished"),
}
}
What's the problem?
To understand why you get the behavior you do, you have to understand the implementation of futures at a high level.
When you call run, there's a loop that calls poll on the passed-in future. It loops until the future returns success or failure, otherwise the future isn't done yet.
Your implementation of poll "locks up" this loop for 5 seconds because nothing can break the call to sleep. By the time the sleep is done, the future is ready, thus that future is selected.
The implementation of an async timeout conceptually works by checking the clock every time it's polled, saying if enough time has passed or not.
The big difference is that when a future returns that it's not ready, another future can be checked. This is what select does!
A dramatic re-enactment:
sleep-based timer
core: Hey select, are you ready to go?
select: Hey future1, are you ready to go?
future1: Hold on a seconnnnnnnn [... 5 seconds pass ...] nnnnd. Yes!
simplistic async-based timer
core: Hey select, are you ready to go?
select: Hey future1, are you ready to go?
future1: Checks watch No.
select: Hey future2, are you ready to go?
future2: Checks watch No.
core: Hey select, are you ready to go?
[... polling continues ...]
[... 1 second passes ...]
core: Hey select, are you ready to go?
select: Hey future1, are you ready to go?
future1: Checks watch No.
select: Hey future2, are you ready to go?
future2: Checks watch Yes!
This simple implementation polls the futures over and over until they are all complete. This is not the most efficient, and not what most executors do.
See How do I execute an async/await function without using any external dependencies? for an implementation of this kind of executor.
smart async-based timer
core: Hey select, are you ready to go?
select: Hey future1, are you ready to go?
future1: Checks watch No, but I'll call you when something changes.
select: Hey future2, are you ready to go?
future2: Checks watch No, but I'll call you when something changes.
[... core stops polling ...]
[... 1 second passes ...]
future2: Hey core, something changed.
core: Hey select, are you ready to go?
select: Hey future1, are you ready to go?
future1: Checks watch No.
select: Hey future2, are you ready to go?
future2: Checks watch Yes!
This more efficient implementation hands a waker to each future when it is polled. When a future is not ready, it saves that waker for later. When something changes, the waker notifies the core of the executor that now would be a good time to re-check the futures. This allows the executor to not perform what is effectively a busy-wait.
The generic solution
When you have have an operation that is blocking or long-running, then the appropriate thing to do is to move that work out of the async loop. See What is the best approach to encapsulate blocking I/O in future-rs? for details and examples.
It should be simple, but I have no idea how to do it. I want to run ScalaZ Task in the current thread. I was surprised task.run doesn't run on the current thread, as it is synchronous.
Is it possible to run it on the current thread, and how to do it?
There were some updates and deprecations since http://timperrett.com/2014/07/20/scalaz-task-the-missing-documentation/.
Right now the recommended way of calling task synchronously is:
task.unsafePerformSync // returns result or throws exception
task.unsafePerformSyncAttempt // returns -\/(error) or \/-(result)
Keep in mind, though, that it is not exactly done in the caller's thread - the execution is perfomed in a thread pool defined for a task, but the caller's thread blocks until the execution is finished. There is no way of making the task run exactly in the same thread.
In general, if Task.async is used - there is no way to make composite Task always stay in the same thread as cb (callback) can be called from any place (any thread), so that in a chain like:
Task
.delay("aaa")
.map(_ + "bbb")
.flatMap(x => Task.async(cb => completeCallBackSomewhereElse(cb, x)))
.map(_ + "ccc")
.unsafePerformSync
_ + "bbb" is gonna be executed in a caller's thread
_ + "ccc" is gonna be executed in Somewhereelse's thread as scalaz have no control over it.
Basically, this allows a Task to be a powerful instrument for asynchronous operations, so it might not even know about underlying thread pools or even implement behavior without pure threads and wait/notify.
However, there are special cases where it might work as caller-runs:
1) No Strategy/Task.async related stuff:
Task.delay("aaa").map(_ + "bbb").unsafePerformSync
unsafePerformSync uses CountDownLatch to await for result of runAsync, so if there is no async/non-deterministic operations on the way - runAsync will use caller's thread:
/**
* Run this `Future`, passing the result to the given callback once available.
* Any pure, non-asynchronous computation at the head of this `Future` will
* be forced in the calling thread. At the first `Async` encountered, control
* switches to whatever thread backs the `Async` and this function returns.
*/
def runAsync(cb: A => Unit): Unit =
listen(a => Trampoline.done(cb(a)))
2) You have control over execution strategies. So this simple Java trick will help. Besides, it's already implemented in scalaz and called Strategy.sequential
P.S.
1) If you simply want to start a computation as soon as possible use task.now/Task.unsafeStart.
2) If you want something less heavily related on asynchronous stuff but still lazy and stack-safe, you might take a look here (it's for Cats library) http://eed3si9n.com/herding-cats/Eval.html
3) If you just need to encapsulate side-effects - take a look at scalaz.effect
Came across a problem I did not find an answer yet.
Running on playframework 2 with Scala.
Was required to write an Action method that performs multiple Future calls.
My question:
1) Is the attached code non-blocking and hence looking the way it should be ?
2) Is there a guarantee that both DAO results are caught at any given time ?
def index = Action.async {
val t2:Future[Tuple2[List[PlayerCol],List[CreatureCol]]] = for {
p <- PlayerDAO.findAll()
c <- CreatureDAO.findAlive()
}yield(p,c)
t2.map(t => Ok(views.html.index(t._1, t._2)))
}
Thanks for your feedback.
Is the attached code non-blocking and hence looking the way it should be ?
That depends on a few things. First, I'm going to assume that PlayerDAO.findAll() and CreatureDAO.findAlive() return Future[List[PlayerCol]] and Future[List[CreatureCol]] respectively. What matters most is what these functions are actually calling themselves. Are they making JDBC calls, or using an asynchronous DB driver?
If the answer is JDBC (or some other synchronous db driver), then you're still blocking, and there's no way to make it fully "non-blocking". Why? Because JDBC calls block their current thread, and wrapping them in a Future won't fix that. In this situation, the most you can do is have them block a different ExecutionContext than the one Play is using to handle requests. This is generally a good idea, because if you have several db requests running concurrently, they can block Play's internal thread pool used for handling HTTP requests, and suddenly your server will have to wait to handle other requests (even if they don't require database calls).
For more on different ExecutionContexts see the thread pools documentation and this answer.
If you're answer is an asynchronous database driver like reactive mongo (there's also scalike-jdbc, and maybe some others), then you're in good shape, and I probably made you read a little more than you had to. In that scenario your index controller function would be fully non-blocking.
Is there a guarantee that both DAO results are caught at any given time ?
I'm not quite sure what you mean by this. In your current code, you're actually making these calls in sequence. CreatureDAO.findAlive() isn't executed until PlayerDAO.findAll() has returned. Since they are not dependent on each other, it seems like this isn't intentional. To make them run in parallel, you should instantiate the Futures before mapping them in a for-comprehension:
def index = Action.async {
val players: Future[List[PlayerCol]] = PlayerDAO.findAll()
val creatures: Future[List[CreatureCol]] = CreatureDAO.findAlive()
val t2: Future[(List[PlayerCol], List[CreatureCol])] = for {
p <- players
c <- creatures
} yield (p, c)
t2.map(t => Ok(views.html.index(t._1, t._2)))
}
The only thing you can guarantee about having both results being completed is that yield isn't executed until the Futures have completed (or never, if they failed), and likewise the body of t2.map(...) isn't executed until t2 has been completed.
Further reading:
Are there any benefits in using non-async actions in Play Framework 2.2?
Understanding the Difference Between Non-Blocking Web Service Calls vs Non-Blocking JDBC