Execute two fs2 tasks concurrently (non-determenistically) - scala

With Scalaz Task I make this with scalaz.Nondeterminism.both:
Nondeterminism[Task]
.both(
Task.now("Hello"),
Task.now("world")
)
or with Nondeterminism[Task].gatherUnordered().
How can I do the same thing with fs2 0.9.x version tasks?

I'm assuming you're on fs2 version 0.9.x.
To execute several Tasks in parallel, you can simply call Task.start.
Here's an example from the docs:
for {
f <- Task.start { expensiveTask1 }
// at this point, `expensive1` is evaluating in background
g <- Task.start { expensiveTask2 }
// now both `expensiveTask2` and `expensiveTask1` are running
result1 <- f
// we have forced `f`, so now only `expensiveTask2` may be running
result2 <- g
// we have forced `g`, so now nothing is running and we have both results
} yield (result1 + result2)
So in your case it would look like this:
for {
ta <- Task.start(Task.now("Hello"))
tb <- Task.start(Task.now("World"))
a <- ta
b <- tb
} yield (a, b)
Note that in the future it might be possible to do something like this with much less boilerplate. There's a PR in the works to add a Parallel type class, which would allow us to write something like this:
(taskA, taskB).parMapN((a, b) => ...)

Related

How to discard other Futures if the critical Future is finished in Scala?

Let's say I have three remote calls in order to construct my page. One of them (X) is critical for the page and the other two (A, B) just used to enhance the experience.
Because criticalFutureX is too important to be effected by futureA and futureB, so I want the overall latency of of all remote calls to be Not more than X.
That means, in case of criticalFutureX finishes, I want to discard futureA and futureB.
val criticalFutureX = ...
val futureA = ...
val futureB = ...
// the overall latency of this for-comprehension depends on the longest among X, A and B
for {
x <- criticalFutureX
a <- futureA
b <- futureB
} ...
In the above example, even though they are executed in parallel, the overall latency depends on the longest among X, A and B, which is not what I want.
Latencies:
X: |----------|
A: |---------------|
B: |---|
O: |---------------| (overall latency)
There is firstCompletedOf but it can not be used to explicit say "in case of completed of criticalFutureX".
Is there something like the following?
val criticalFutureX = ...
val futureA = ...
val futureB = ...
for {
x <- criticalFutureX
a <- futureA // discard when criticalFutureX finished
b <- futureB // discard when criticalFutureX finished
} ...
X: |----------|
A: |-----------... discarded
B: |---|
O: |----------| (overall latency)
You can achieve this with a promise
def completeOnMain[A, B](main: Future[A], secondary: Future[B]) = {
val promise = Promise[Option[B]]()
main.onComplete {
case Failure(_) =>
case Success(_) => promise.trySuccess(None)
}
secondary.onComplete {
case Failure(exception) => promise.tryFailure(exception)
case Success(value) => promise.trySuccess(Option(value))
}
promise.future
}
Some testing code
private def runFor(first: Int, second: Int) = {
def run(millis: Int) = Future {
Thread.sleep(millis);
millis
}
val start = System.currentTimeMillis()
val combined = for {
_ <- Future.unit
f1 = run(first)
f2 = completeOnMain(f1, run(second))
r1 <- f1
r2 <- f2
} yield (r1, r2)
val result = Await.result(combined, 10.seconds)
println(s"It took: ${System.currentTimeMillis() - start}: $result")
}
runFor(3000, 4000)
runFor(3000, 1000)
Produces
It took: 3131: (3000,None)
It took: 3001: (3000,Some(1000))
This kind of task is very hard to achieve efficiently, reliably and safely with Scala standard library Futures. There is no way to interrupt a Future that hasn't completed yet, meaning that even if you choose to ignore its result, it will still keep running and waste memory and CPU time. And even if there was a method to interrupt a running Future, there is no way to ensure that resources that were allocated (network connections, open files etc.) will be properly released.
I would like to point out that the implementation given by Ivan Stanislavciuc has a bug: if the main Future fails, then the promise will never be completed, which is unlikely to be what you want.
I would therefore strongly suggest looking into modern concurrent effect systems like ZIO or cats-effect. These are not only safer and faster, but also much easier. Here's an implementation with ZIO that doesn't have this bug:
import zio.{Exit, Task}
import Function.tupled
def completeOnMain[A, B](
main: Task[A], secondary: Task[B]): Task[(A, Exit[Throwable, B])] =
(main.forkManaged zip secondary.forkManaged).use {
tupled(_.join zip _.interrupt)
}
Exit is a type that describes how the secondary task ended, i. e. by successfully returning a B or because of an error (of type Throwable) or due to interruption.
Note that this function can be given a much more sophisticated signature that tells you a lot more about what's going on, but I wanted to keep it simple here.

How do I compose multiple monads? IO / Future and maybe with even State and Option

I'm trying to compose monads in Scala doing some requests to server.
here is the code snippet that I'm using. I try not to use flatmap as possible only using for comprehension as well. any ideas? I know using Monad Transformers, but I don't know how to compose multiple monads. can anyone help me out?
for {
session <- getSession(ticker) //IO[Future[Response]]
crumbF = session.flatMap(response => Future(parseCrumb(response.body)))
cookiesF = session.flatMap(response => Future(response.cookies))
crumb = Await.result(crumbF, 5 seconds) // Future[String]
cookies = Await.result(cookiesF, 5 seconds) //Future[Seq[Cookies]]
data <- getData(ticker, startDate, endDate, interval, crumb, cookies.head) // IO[Future[Response]]
stocksF = data.flatMap { response =>
import DefaultBodyReadables._
Future {
StockDf.mapDataToDf(response.body)
}
}
} yield stocksF
So a few things.
If you launch futures inside a for comprehension then they will run in sequence rather than in parallel - if this is your intention then fine. If not then instantiate them outside the for comprehension.
You cannot mix monadic contexts inside a for comprehension.
// Yes
for {
a <- Some(5)
b <- Some(10)
} yield 5 * 10
// No
for {
a <- Some(5)
b <- Future(10)
} yield 5 * 10

List to multiple anonymous/underscore parameters in for-comprehension

I'm kind of new to Scala/functional so I'm not yet able to use technical language.
I'm experiencing problems with a for-comprehension
val queries =
for {
_ <- createBanco
_ <- createBancoMedio
bankInsertions <- Update[Banco](insertStr).updateMany(NonEmptyList.fromList(createBankList(1, maxBanks)).get)
mediumInsertions <- Update[BancoMedio](mediumInsert).updateMany(NonEmptyList.fromList(mediumList).get)
bankCount <- BancoStatements.getCount().unique
bankGetIds <- BancoStatements.getIds(0, maxBanks).to[List]
bankSome <- BancoStatements.getSome(halfBanks).to[List]
} yield (bankCount, bankGetIds, bankSome)
//Execute database queries, saves them on tuple
val transactionResults : (Int, List[String], List[Banco]) =
queries.transact(h2Transactor).unsafeRunSync()
I'm trying to refactor the _ <- createBanco & _ <- createBancoMedio, which are both a ConnectionIO[Int] object.
Id like to convert those to a single List(createBanco, createBancoMedio) and then execute transact.
However, i'd be altering the return type of the for-comprehension by doing that. I'd like to know if there is any way on doing that without affecting the for output value
Basically, treat the list as if I was writing multiple anonymous parameters manually.
You can use .sequence to turn a List[G[A]] into a G[List[A]] if G has an Applicative instance, which ConnectionIO does:
val queries =
for {
_ <- List(createBanco, createBancoMedio).sequence
...
Just solved it, did another for comprehension for the List
val createList = for {
m <- createBancoMedio
b <- createBanco
} yield List(b, m)
val queries =
for {
_ <- createList ....
This way i had a ConnectionIO[List[Int]]

Concurrent for-comprehensions

According to this blog post there's a potential performance issue with for comprehensions. For example:
for {
a <- remoteCallA()
b <- remoteCallB()
} yield {
(a, b)
}
has remoteCallB blocked until remoteCallA is completed. The blog post suggests that we do this instead:
futureA <- remoteCallA()
futureB <- remoteCallB()
for {
a <- futureA
b <- futureB
} yield {
(a, b)
}
which will ensure that the two remote calls can start at the same time.
My question: is the above (and therefore the blog writer) correct?
I've not seen people using this pattern, which has got me wondering whether there are alternative patterns that are generally used instead.
With thanks
The for comprehension
for {
a <- remoteCallA()
b <- remoteCallB()
} yield {
(a, b)
}
Translates to:
remoteCallA().flatmap(a => remoteCallB().map(b => (a,b)))
So, yes, I believe the blogger is correct in that the calls will be sequential, not concurrent, to one another.
The common pattern to execute several futures simultaneously is to use zip or Future.traverse. Here are a few examples:
for {
(a, b) <- remoteCallA() zip remoteCallB()
} yield f(a, b)
This becomes a bit cumbersome when there are more than 2 futures:
for {
((a, b), c) <- remoteCall() zip remoteCallB() zip remoteCallC()
} yield (a, b, c)
In those cases you can use Future.sequence:
for {
Seq(a, b, c) <-
Future.sequence(Seq(remoteCallA(), remoteCallB(), remoteCallC()))
} yield (a, b, c)
or Future.traverse, in case you have a sequence of arguments, and want to apply to all of them the same function, which returns a Future.
But both approaches have an issue: if one of the Futures fails early, before the others finish, naturally you may want the resulting Future to fail immediately at that moment. But that's not what happens. The result Future is failed only after all the futures have completed. See this question for details: How to implement Future as Applicative in Scala?

How to use Applicative for concurrency?

This is a follow-up to my previous question. I copied the example below from Haxl
Suppose I am fetching data from a blog server to render a blog page, which contains the recent posts, popular posts and posts topics.
I have the following Data Fetching API:
val getRecent : Server => Seq[Post] = ...
val getPopular : Server => Seq[Post] = ...
val getTopics : Server => Seq[Topic] = ...
Now I need to compose them to implement a new function getPageData
val getPageData: Server => (Seq[Post], Seq[Post], Seq[Topic])
Haxl suggests using a new monad Fetch to make the API composable.
val getRecent : Fetch[Seq[Posts]] = ...
val getPopular : Fetch[Seq[Posts]] = ...
val getTopics : Fetch[Seq[Topic]] = ...
Now I can define my getPageData: Fetch[A] with monadic composition
val getPageData = for {
recent <- getRecent
popular <- getPopular
topics <- getTopics
} yield (recent, popular, topics)
but it does not run getRecent, getPopular, and getTopics concurrently.
Haxl suggests using applicative composition <*> to compose "concurrent" functions (i.e. the functions that can run concurrently). So my questions are:
How to implement getPageData assuming Fetch[A] is an Applicative ?
How to implement Fetch as an Applicative but not a Monad ?
How to implement getPageData assuming Fetch[A] is an Applicative ?
All we need to do is drop the monadic bind >>= in favour of the applicative <*>. So instead of
val getPageData = for {
recent <- getRecent
popular <- getPopular
topics <- getTopics
} yield (recent, popular, topics)
we would write something like (in Haskell syntax; sorry, I can't do Scala off the top of my head):
getPageData = makeTriple <$> getRecent <*> getPopular <*> getTopics
where
makeTriple x y z = (x, y, z)
But whether this has the desired effect is contingent upon the second question!
How to implement Fetch as an Applicative but not a Monad ?
The key distinction between monadic and applicative sequencing is that the monadic one can depend upon the value inside a monadic value, whereas the applicative <*> cannot. Notice how the monadic expression for getPageData above binds the names recent and popular before reaching getTopics. Those names could have been used to change the structure of the expression, for example by getting some other data source in case recent is empty. But with the applicative expression, the results of getRecent and getPopular are not factors in the structure of the expression itself. This property allows us to fire off each term in the applicative expression concurrently, because we know the structure of the expression statically.
So, using the observation above, and obviously the particular shape of the Fetch datatype, we can come up with a suitable definition for <*>. I think the following illustrates the general idea:
data Fetch a = Fetch { runFetch :: IO a }
fetchF <*> fetchX = Fetch $ do
-- Fire off both IOs concurrently.
resultF <- async $ runFetch fetchF
resultX <- async $ runFetch fetchX
-- Wait for both results to be ready.
f <- wait resultF
x <- wait resultX
return $ f x
For comparison, suppose we tried to do monadic bind with concurrent evaluation:
fetchF >>= fetchK = Fetch $ do
resultF <- async $ runFetch fetchF
-- Oh no, we need resultF in order to produce the next
-- Fetch value! We just have to wait...
f <- wait resultF
fetchX <- async $ runFetch (fetchK f)
x <- wait $ runFetch fetchX
return $ f x