PureScript - Replace `launchAff_` with `launchAff` - purescript

Consider the following working code example, that uses launchAff_ to copy a JSON file:
module Main where
import Prelude
import Effect (Effect)
import Effect.Aff (Aff, launchAff_)
import Node.Encoding (Encoding(..))
import Node.FS.Aff (readTextFile, writeTextFile)
import Node.Path (FilePath)
duplicateCustomerData :: FilePath -> FilePath -> Aff Unit
duplicateCustomerData filePath1 filePath2 = do
customer_data <- readTextFile UTF8 filePath1
writeTextFile UTF8 filePath2 customer_data
main :: Effect Unit
main = launchAff_ do
duplicateCustomerData "database/customer-1.json" "database/customer-2.json"
However, if the main function is changed to use launchAff (without the underscore) like so
main :: Effect Unit
main = launchAff do
duplicateCustomerData "database/customer-1.json" "database/customer-2.json"
The following error is thrown:
Could not match type
Fiber Unit
with type
Unit
Thus,
What is a fiber unit?
What is the use case for this fiber unit? Is it meant to be discarded? What does it represent?
Why are there two launchAff functions?

Fiber represents a running async computation.
The difference between Fiber and Aff is the "running" part. Aff is not an async computation that is running, but rather a way to start an async computation. It's not started until you bind it in a larger computation. And conversely, you can start the same Aff multiple times.
Fiber, on the other hand, is an async computation that has already been started, it's already in progress. You can kill it via killFiber or you can wait for it to complete via joinFiber, and some other things. See the docs.
And now that we know that (1) Aff is a way to start an async computation, and (2) Fiber is an already-running async computation, it should become obvious that a Fiber would be the result of starting an Aff. Which is exactly what you observe: launchAff takes an Aff and returns a Fiber.
But sometimes (in my own experience - most times) we don't actually need the resulting Fiber. There is nothing we're going to do with it. For those cases, there is a convenience shortcut - launchAff_. It does the same thing as launchAff, but throws away the resulting Fiber. See it for yourself in the source code.
Adding an underscore at the end to mean "returns unit" is a common pattern. For example, there are for and for_, there are traverse and traverse_, there are runAff and runAff_.

Related

Regarding Scala Futures being eager

I have been trying to understand why Scala Futures are regarded as eager and violate referential transparency. I think I understand this part reasonably. However, I have trouble understanding what this means:
(A => Unit) => Unit
With respect to a Future.
I am not sure if this is the right forum, but ELI5 answers appreciated
The reason why Future is regarded as eager (and as such violates referential transparency) is because it evaluates as soon as the value is defined. Below is the ELI5 and non-ELI5 explanation for this.
As for (A => Unit) => Unit, it's a signature for the callback-driven asynchronous computation. In a synchronous computation, you evaluate the Future[A] to A, even if it means sitting in place and waiting a long time for the evaluation to finish. But with asynchronous computation, you don't sit and wait; instead, you pass a function of type A => Unit, and you immediately get the Unit back. Later, when the computation has finished in the background and value A has been produced, function A => Unit will be applied to it. So basically you tell the Future "once you obtain A, here's what I want you to do with it", and it responds "OK, will do, here's a Unit for you, leave now and do other stuff".
TBH I wouldn't overthink this signature too much because that's not what your mental model of working with Future should be. Instead, just become familiar with the notion of mapping and flatMapping. When you have a value wrapped in a Future, you shouldn't try to get that value out of the Future context because that would be a blocking synchronous operation. But what you can do is map over it and say "alright Future, I don't need this value A right now, I just want to describe a function A => B to you which turns it to another value B, and you make sure to apply it to once you have the original A". And if B is wrapped in a yet another Future, meaning your function is not A => B but A => Future[B], instead of mapping you should use flatMap. This is how you chain asynchronous operations. Imagine a database query which as a parameter needs something returned in the previous query.
And that's it. Somewhere at the end of the world, e.g. when you're done processing an http request and are ready to send some response payload over the wire, you will finally unwrap that future in a synchronous way (you can't send a payload if you don't know what to put in it).
Now, about referential transparency in Future:
ELI5:
Imagine you have two daughters, Anna and Betty. You tell them that their task will be to count to 20 out loud. You also tell them that Betty should start only after Anna is done. Whole process is hence expected to take about 40 seconds.
But if they evaluate their task eagerly (like Future does), as soon as you explain the task to them, they will each start counting right away. Whole process will hence last about 20 seconds.
In the context of programming, referential transparency says that you should always be able to replace (pseudocode):
// imagine >> as a pipe operator which starts the next function
// only after previous one has terminated
count(20) >> count(20)
with
anna = count(20)
betty = count(20)
anna >> betty
but that's not true in this situation because of eager evaluation (the girls start counting as soon as their task is explained to them, so in the second case the program will last only 20 seconds regardless of the pipe).
non-ELI5:
Let's prepare an execution context for Future and a function that will be evaluated. It simply sleeps for two seconds before printing "hi".
import scala.concurrent.ExecutionContext.Implicits.global
def f = {
Thread.sleep(2000)
println("hi")
}
Let's now write a for comprehension which will create two Futures one after another:
val done = for {
f1 <- Future(f)
f2 <- Future(f)
} yield (f1, f2)
import scala.concurrent.duration._
Await.result(done, 5000 millis)
As expected, after two seconds we'll get the first "hi" (from f1), and after additional two seconds we'll get the second "hi" (from f2).
Now let's do a small modification; we will first define two Future values, and then we'll use those in the for comprehension:
val future1 = Future(f)
val future2 = Future(f)
val done = for {
f1 <- future1
f2 <- future2
} yield (f1, f2)
import scala.concurrent.duration._
Await.result(done, 5000 millis)
What happens this time is that after approximately two seconds you get two simultaneous "hi" printouts. This is because both future1 and future2 started getting evaluated as soon as they were defined. By the time they got chained in the for comprehension, they were already running alongside each other on the given execution context.
This is why referential transparency is broken; normally you should be able to replace:
doStuff(foo)
with
val f = foo
doStuff(f)
without having any consequence on the behaviour of the program, but in the case of Future, as you can see above, that's not the case.

Cats-effect and asynchronous IO specifics

For few days I have been wrapping my head around cats-effect and IO. And I feel I have some misconceptions about this effect or simply I missed its point.
First of all - if IO can replace Scala's Future, how can we create an async IO task? Using IO.shift? Using IO.async? Is IO.delay sync or async? Can we make a generic async task with code like this Async[F].delay(...)? Or async happens when we call IO with unsafeToAsync or unsafeToFuture?
What's the point of Async and Concurrent in cats-effect? Why they are separated?
Is IO a green thread? If yes, why is there a Fiber object in cats-effect? As I understand the Fiber is the green thread, but docs claim we can think of IOs as green threads.
I would appreciate some clarifing on any of this as I have failed comprehending cats-effect docs on those and internet was not that helpfull...
if IO can replace Scala's Future, how can we create an async IO task
First, we need to clarify what is meant as an async task. Usually async means "does not block the OS thread", but since you're mentioning Future, it's a bit blurry. Say, if I wrote:
Future { (1 to 1000000).foreach(println) }
it would not be async, as it's a blocking loop and blocking output, but it would potentially execute on a different OS thread, as managed by an implicit ExecutionContext. The equivalent cats-effect code would be:
for {
_ <- IO.shift
_ <- IO.delay { (1 to 1000000).foreach(println) }
} yield ()
(it's not the shorter version)
So,
IO.shift is used to maybe change thread / thread pool. Future does it on every operation, but it's not free performance-wise.
IO.delay { ... } (a.k.a. IO { ... }) does NOT make anything async and does NOT switch threads. It's used to create simple IO values from synchronous side-effecting APIs
Now, let's get back to true async. The thing to understand here is this:
Every async computation can be represented as a function taking callback.
Whether you're using API that returns Future or Java's CompletableFuture, or something like NIO CompletionHandler, it all can be converted to callbacks. This is what IO.async is for: you can convert any function taking callback to an IO. And in case like:
for {
_ <- IO.async { ... }
_ <- IO(println("Done"))
} yield ()
Done will be only printed when (and if) the computation in ... calls back. You can think of it as blocking the green thread, but not OS thread.
So,
IO.async is for converting any already asynchronous computation to IO.
IO.delay is for converting any completely synchronous computation to IO.
The code with truly asynchronous computations behaves like it's blocking a green thread.
The closest analogy when working with Futures is creating a scala.concurrent.Promise and returning p.future.
Or async happens when we call IO with unsafeToAsync or unsafeToFuture?
Sort of. With IO, nothing happens unless you call one of these (or use IOApp). But IO does not guarantee that you would execute on a different OS thread or even asynchronously unless you asked for this explicitly with IO.shift or IO.async.
You can guarantee thread switching any time with e.g. (IO.shift *> myIO).unsafeRunAsyncAndForget(). This is possible exactly because myIO would not be executed until asked for it, whether you have it as val myIO or def myIO.
You cannot magically transform blocking operations into non-blocking, however. That's not possible neither with Future nor with IO.
What's the point of Async and Concurrent in cats-effect? Why they are separated?
Async and Concurrent (and Sync) are type classes. They are designed so that programmers can avoid being locked to cats.effect.IO and can give you API that supports whatever you choose instead, such as monix Task or Scalaz 8 ZIO, or even monad transformer type such as OptionT[Task, *something*]. Libraries like fs2, monix and http4s make use of them to give you more choice of what to use them with.
Concurrent adds extra things on top of Async, most important of them being .cancelable and .start. These do not have a direct analogy with Future, since that does not support cancellation at all.
.cancelable is a version of .async that allows you to also specify some logic to cancel the operation you're wrapping. A common example is network requests - if you're not interested in results anymore, you can just abort them without waiting for server response and don't waste any sockets or processing time on reading the response. You might never use it directly, but it has it's place.
But what good are cancelable operations if you can't cancel them? Key observation here is that you cannot cancel an operation from within itself. Somebody else has to make that decision, and that would happen concurrently with the operation itself (which is where the type class gets its name). That's where .start comes in. In short,
.start is an explicit fork of a green thread.
Doing someIO.start is akin to doing val t = new Thread(someRunnable); t.start(), except it's green now. And Fiber is essentially a stripped down version of Thread API: you can do .join, which is like Thread#join(), but it does not block OS thread; and .cancel, which is safe version of .interrupt().
Note that there are other ways to fork green threads. For example, doing parallel operations:
val ids: List[Int] = List.range(1, 1000)
def processId(id: Int): IO[Unit] = ???
val processAll: IO[Unit] = ids.parTraverse_(processId)
will fork processing all IDs to green threads and then join them all. Or using .race:
val fetchFromS3: IO[String] = ???
val fetchFromOtherNode: IO[String] = ???
val fetchWhateverIsFaster = IO.race(fetchFromS3, fetchFromOtherNode).map(_.merge)
will execute fetches in parallel, give you first result completed and automatically cancel the fetch that is slower. So, doing .start and using Fiber is not the only way to fork more green threads, just the most explicit one. And that answers:
Is IO a green thread? If yes, why is there a Fiber object in cats-effect? As I understand the Fiber is the green thread, but docs claim we can think of IOs as green threads.
IO is like a green thread, meaning you can have lots of them running in parallel without overhead of OS threads, and the code in for-comprehension behaves as if it was blocking for the result to be computed.
Fiber is a tool for controlling green threads explicitly forked (waiting for completion or cancelling).

Task#apply versus Task#delay

Given the following scala.concurrent.Task instance created via Task#delay:
val t =
Task.delay { println(Thread.currentThread); Thread.sleep(5000); 42 }
I wrote a method that will run t asynchronously.
def f = t.runAsync {
case \/-(x) => println(x)
case -\/(e) => println(e.getMessage)
}
Running it shows that f evaluates entirely, i.e. waits 5 seconds, and then evaluates again. In other words, the second f appears to wait until the first f had completed
scala> {f; f; }
Thread[run-main-0,5,run-main-group-0]
42
Thread[run-main-0,5,run-main-group-0]
42
Then, I re-wrote t using Task#apply:
val u =
Task { println(Thread.currentThread); Thread.sleep(5000); 42 }
Again, I defined a method that executes u with runAsync:
def g = u.runAsync {
case \/-(x) => println(x)
case -\/(e) => println(e.getMessage)
}
Finally, I ran two g's.
scala> {g; g}
Thread[pool-3-thread-2,5,run-main-group-0]
Thread[pool-3-thread-3,5,run-main-group-0]
scala> 42
42
However, in the above result, the g's, more or less, ran at the same time.
I had expected that {f; f; } would've run asynchronously, i.e. in the same way as g. But, it seems to me that calling f resulted in a block.
EDIT
Task's docs note on runAsync:
Any pure, non-asynchronous computation at the head of this Future will be forced in the calling thread.
Since t's body is non-asynchronous, I suppose that the above comment explains why it blocked, i.e. "forced in the calling thread."
When is the right time to use Task#delay versus Task#apply?
You can think of Task.delay as a fancy version of something like () => Try[A]. It suspends evaluation of the computation, but doesn't have anything to say about what thread that evaluation is eventually going to run on, etc. (which means it's just going to run on the current thread).
This is often exactly what you want. Consider a definition like this:
val currentTime: Task[Long] = Task.xxx(System.currentTimeMillis)
We can't use now because that would evaluate the time immediately (and only once, on definition). We could use apply, but forcing an asynchronous boundary for this computation is wasteful and unnecessary—we actually want it to run in the current thread, just not right now. This is exactly what delay provides.
In general when you're modeling your computations, if something is always going to be computationally expensive, you might want to consider Task.apply, which means the evaluation will always happen on a thread determined by the current implicit ExecutorService. This may make usage a little cleaner, at the expense of flexibility—you're baking something you know about the runtime characteristics of the evaluation of the computation into its definition.
The nice thing about using delay to define your asynchronous computations is that you can always force an asynchronous boundary by wrapping your Task with Task.fork, which gets you essentially the same thing you'd have if you'd defined the computation with Task.apply. It's not possible to go in the other direction—if you use Task.apply, the implicit strategy is going to determine where the computation is evaluated and that's all there is to it.

Mixing in asynchronous code within plain synchronous flow in Scala

What is in your view the best scala solution for the case that you have some plain chain of synchronous function calls, and you need to add an asynchronous action in the middle of it without blocking?
I think going for futures entails refactoring existing code to callback spaghetti, tethering futures all the way through the formerly synchronous chain of function calls, or polling on the promise every interval, not so optimal but maybe I am missing some trivial/suitable options here.
It may seem that refactoring for (akka) actors, entail a whole lot of boilerplate for such a simple feat of engineering.
How would you plug in an asynchronous function within an existing synchronous flow, without blocking, and without going into a lot of boilerplate?
Right now in my code I block with Await.result, which just means a thread is sleeping...
One simple dirty trick:
Let's say you have:
def f1Sync: T1
def f2Sync: T2
...
def fAsynchronous: Future[TAsync]
import scala.concurrent.{ Future, Promise }
object FutureHelper {
// A value class is much cheaper than an implicit conversion.
implicit class FutureConverter[T](val t: T) extends AnyVal {
def future: Future[T] = Promise.successful(t).future
}
}
Then you can for yield:
import FutureHelper._
def completeChain: Future[Whatever] = {
for {
r1 <- f1Sync.future
r2 <- f2Sync.future
.. all your syncs
rn <- fAsynchronous // this should also return a future
rnn <- f50Sync(rn).future// you can even pass the result of the async to the next function
} yield rn
}
There is minimal boilerplate of converting your sync calls to immediately resolved futures. Don't be tempted to do that with Future.apply[T](t) or simply Future(a) as that will put daemon threads onto the executor. You won't be able to convert without an implicit executor.
With promises you pay the price of 3, 4 promises which is negligible and you get what you want. for yield acts as a sequencer, it will wait for every result in turn, so you can even do something with your async result after it has been processed.
It will "wait" for every sync call to complete as well, which will happen immediately by design, except for the async call where normal async processing will be automatically employed.
You could also use the async/await library, with the caveat that you wind up with one big Future out of it that you still have to deal with:
http://docs.scala-lang.org/sips/pending/async.html
But, it results in code almost identical to the sychronous code; where you were previously blocking, you add an:
await { theAsyncFuture }
and then carry on with synchronous code.

How to wait for result of Scala Futures in for comprehension

I am having a issue with the below piece of code. I want 'combine' method to get triggered after all groundCoffee,heaterWater,frothedMilk method completes. They would be triggered concurrently.All the 4 methods grind,heatWater,frothMilk,brew are concurrently executed using a future.
def prepareCappuccino(): Future[Cappuccino] = {
val groundCoffee = grind("arabica beans")
val heatedWater = heatWater(Water(20))
val frothedMilk = frothMilk("milk")
for {
ground <- groundCoffee
water <- heatedWater
foam <- frothedMilk
espresso <- brew(ground, water)
} yield combine(espresso, foam)
}
When I execute the above method the output I am getting is below
start grinding...
heating the water now
milk frothing system engaged!
And the program exits after this. I got this example from a site while I was trying to learn futures. How can the program be made to wait so that combine method get triggered after all the futures return?
The solution already posted to Await for a future is a solution when you want to deliberately block execution on that thread. Two common reasons to do this are for testing, when you want to wait for the outcome before making an assertion, and when otherwise all threads would exit (as is the case when doing toy examples).
However in a proper long lived application Await is generally to be avoided.
Your question already contains one of the correct ways to do future composition - using a for comprehension. Bear in mind here, that for-comprehensions are converted to flatMaps, maps and withFilter operations, so any futures you invoke in the for-comprehension will only be created after the others complete, ie serially.
If you want a bunch of futures to operate in concurrently, then you would create them before entering the for-comprehension as you have done.
You can use the Await here:
val f = Future.sequence(futures.toList)
Await.ready(f, Duration.Inf)
I assume, you have all the futures packed in a list. The Await.ready makes all the waiting work.