Most idiomatic way to mix synchronous, asynchronous, and parallel computation in a scala for comprehension of futures - scala

Suppose I have 4 future computations to do. The first two can be done in parallel, but the third must be done after the first two (even though the values of the first two are not used in the third -- think of each computation as a command that performs some db operation). Finally, there is a 4th computation that must occur after all of the first 3. Additionally, there is a side effect that can be started after the first 3 complete (think of this as kicking off a periodic runnable). In code, this could look like the following:
for {
_ <- async1 // not done in parallel with async2 :( is there
_ <- async2 // any way of achieving this cleanly inside of for?
_ <- async3
_ = sideEffect // do I need "=" here??
_ <- async4
} yield ()
The comments show my doubts about the quality of the code:
What's the cleanest way to do two operations in parallel in a for comprehension?
Is there is a way to achieve this result without so many "_" characters (nor assigning a named reference, at least in the case of sideEffect)
what's the cleanest and most idiomatic way to do this?

You can use zip to combine two futures, including the result of zip itself. You'll end up with tuples holding tuples, but if you use infix notation for Tuple2 it is easy to take them apart. Below I define a synonym ~ for succinctness (this is what the parser combinator library does, except its ~ is a different class that behaves similiarly to Tuple2).
As an alternative for _ = for the side effect, you can either move it into the yield, or combine it with the following statement using braces and a semicolon. I would still consider _ = to be more idiomatic, at least so far as having a side effecting statement in the for is idiomatic at all.
val ~ = Tuple2
for {
a ~ b ~ c <- async1 zip
async2 zip
async3
d <- { sideEffect; async4 }
} yield (a, b, c, d)

for-comprehensions represent monadic operations, and monadic operations are sequenced. There's superclass of monad, applicative, where computations don't depend on the results of prior computations, thus may be run in parallel.
Scalaz has a |#| operator for combining applicatives, so you can use (future1 |#| future2)(proc(_, _)) to dispatch two futures in parallel and then run "proc" on the result of both of them, as opposed to sequential computation of for {a <- future1; b <- future2(a)} yield b (or just future1 flatMap future2).
There's already a method on stdlib Futures called .zip that combines Futures in parallel, and indeed the scalaz impl uses this: https://github.com/scalaz/scalaz/blob/scalaz-seven/core/src/main/scala/scalaz/std/Future.scala#L36
And .zip and for-comprehensions may be intermixed to have parallel and sequential parts, as appropriate.
So just using the stdlib syntax, your above example could be written as:
for {
_ <- async1 zip async2
_ <- async3
_ = sideEffect
_ <- async4
} yield ()
Alternatively, written w/out a for-comprehension:
async1 zip async2 flatMap (_=> async3) flatMap {_=> sideEffect; async4}

Just as an FYI, it's really simple to get two futures to run in parallel and still process them via a for-comprehension. The suggested solutions of using zip can certainly work, but I find that when I want to handle a couple of futures and do something when they are all done, and I have two or more that are independent of each other, I do something like this:
val f1 = async1
val f2 = async2
//First two futures now running in parallel
for {
r1 <- f1
r2 <- f2
_ <- async3
_ = sideEffect
_ <- async4
} yield {
...
}
Now the way the for comprehension is structured certainly waits on f1 before checking on the completion status of f2, but the logic behind these two futures is running at the same time. This is a little simpler then some of the suggestions but still might give you what you need.

Your code already looks structured minus computing futures in parallel.
Use helper functions, ideally writing a code generator to print out
helpers for all tuple cases
As far as I know, you need to name the result or assign it _
Example code
Example code with helpers.
import scala.concurrent.Future
import scala.concurrent.ExecutionContext.Implicits.global
object Example {
def run: Future[Unit] = {
for {
(a, b, c) <- par(
Future.successful(1),
Future.successful(2),
Future.successful(3)
)
constant = 100
(d, e) <- par(
Future.successful(a + 10),
Future.successful(b + c)
)
} yield {
println(constant)
println(d)
println(e)
}
}
def par[A,B](a: Future[A], b: Future[B]): Future[(A, B)] = {
for {
a <- a
b <- b
} yield (a, b)
}
def par[A,B,C](a: Future[A], b: Future[B], c: Future[C]): Future[(A, B, C)] = {
for {
a <- a
b <- b
c <- c
} yield (a, b, c)
}
}
Example.run
Edit:
generated code for 1 to 20 futures: https://gist.github.com/nanop/c448db7ac1dfd6545967#file-parhelpers-scala
parPrinter script: https://gist.github.com/nanop/c448db7ac1dfd6545967#file-parprinter-scala

Related

Scala filter by nested Option/Try monads

In Scala, I have an Array[Option[(String,String,Try[String])]] and would like to find all the Failure error codes.
If the inner monad is an Option[String] instead, I can access the Some(x) contents with a clean little for comprehension, like so:
for {
Some(row) <- row
(a,b,c) = row
x <- c
} yield x
But if the inner monad is a Failure, then I'm struggling to see how to pattern match it, since I can't put Failure(x) <- c in the for statement. This feels like a really simple thing I'm missing, but any guidance would be very valuable.
Many thanks!
EDIT - Mis-specified the array. It's actually an array of option-tuple3s, not just tuple3s.
Will a.map(_._3).filter(_.isFailure) do?
EDIT: after having seen the edit and your comment, I think you can also do
val tries = for {
x <- a
z <- x
} yield z._3
tries.filter(_.isFailure)
In order to combine different types of "monads" you will need what's called a monad transformer. To put it simply, Scala doesn't let you mixin different monad types within the same for comprehension - this makes sense since a for comprehension just syntactic sugar for combinations of map / flatMap / filter.
Assuming the first one is always an Option then you could transform the Try into an Option and get the desired result:
for {
Some((a, b, c)) <- row
x <- c.toOption
} yield x
If you don't really care about what's inside that Try that's fine, but if you do then be careful that you'll lose that information when doing Some(x). If the pattern match fails, then you will get a None.
I hope that helps you.
This returns an Array[Throwable].
for {
(_,_,Failure(e)) <- rows
} yield e
Or, perhaps, an un-sugared version.
rows.collect{case (_,_,Failure(e)) => e}

Difference between applicative and monadic computation in scala

Given this simple computation i can not clearly see the difference between using applicative style over monadic style. Are there some better examples out there ( in scala ) when to use the one over the other.
println( (3.some |#| none[Int] |#| 4.some )( (a:Int,b:Int,c:Int) => { a + b + c } ) ) // prints None
println( for(
a <- Some(3);
b <- none[Int];
c <- Some(4)
) yield( a + b + c ) ) // prints None
Both computations ending up in a None so the end result is the same. The only difference i can see ist that there is no temporaray access to those vars in the for comprehension when using the applicative syntax.
Furthermore having one None value stops the whole computation. I thought applicative means "not dependent on the result of the computation before"
The applicative builder syntax will evaluate each term and can not use the result of a prior computation. However, even if the first result is None, all the other expressions will still be evaluated.
Whereas, with the for comprehension, it will 'fail fast' (it will not evaluate any further expressions after a None, in your case), plus you can access the results of previous computations.
Don't think of these things as simply different styles, they are calling different functions with different behaviours: i.e. flatMap vs apply
Monads represent sequential computations where each next computation depends on previous ones (if previous computation is empty you can't proceed, so you "fail fast"), more generic example of monadic computation:
println( for(
a <- Some(1);
b <- Some(a);
c <- Some(a + b)
) yield( a + b + c ) ) //=> 4
Applicative is just fmap on steroids where not only an argument, but a mapping function itself can be empty. In your case it can be rewritten as:
4.some <*>
{ none[Int] <*>
{ 3.some <*>
{ (_: Int) + (_: Int) + (_: Int) }.curried.some } }
On some step your function becomes Option[Int => Int] = None, but it doesn't stop from applying it to 4.some, only the result is None as expected. You still need to know the value of 4.some.

Transforming/repacking the results of a Slick query

I have what I hope is a simple question about Slick. Apologies if this is well documented - I may have overlooked something in my searching.
I have an aggregate query built as follows:
def doQuery(/* ... */) = for {
a <- Query(TableA)
b <- a.relationship.where // ...
c <- b.relationship.where // ...
} yield (a, b, c)
This returns me a Query[(A, B, C)].
I also have a case class:
case class Aggregate(a: A, b: B, c: C)
I'd like to transform my query to a Query[Aggregate] so my fellow developers can call .list() or .firstOption() and get a List or Option as appropriate.
I naturally went for the .map() method on Query, but it has an implicit Shape argument that I'm not sure how to handle.
Is this straightforward in Slick? We're using v1.0.1 at the moment but upgrading to 2.0 is also a possibility.
Best regards,
Dave
After a lot of playing around, I have concluded that this is not possible in Slick 1.
In Slick 2 you can use the <> operator to transform a projection assembled in the yield portion of the for comprehension:
def doQuery(/* ... */) = for {
a <- Query(TableA)
b <- a.relationship.where // ...
c <- b.relationship.where // ...
} yield (a, b, c) <> (Aggregate.tupled, Aggregate.unapply)
This works as expected in conjunction with .list and .firstOption. I'm unsure what the consequences are of trying to use .insert, .update and .delete.
If you can modify doQuery, then you just want to do yield Aggregate(a, b, c) instead of yield (a, b, c).
Or, if you want to transform the result without modifying doQuery, then you can call .map { case (a, b, c) => Aggregate(a, b, c) } on the result of doQuery.

Scala's "for comprehension" with futures

I am reading through the Scala Cookbook (http://shop.oreilly.com/product/0636920026914.do)
There is an example related to Future use that involves for comprehension.
So far my understanding about for comprehension is when use with a collection it will produce another collection with the same type. For example, if each futureX is of type Future[Int], the following should also be of type Future[Int]:
for {
r1 <- future1
r2 <- future2
r3 <- future3
} yield (r1+r2+r3)
Could someone explain me what exactly happening when use <- in this code?
I know if it was a generator it will fetch each element by looping.
First about for comprehension. It was answered on SO many many times, that it's an abstraction over a couple of monadic operations: map, flatMap, withFilter. When you use <-, scalac desugars this lines into monadic flatMap:
r <- monad into monad.flatMap(r => ... )
it looks like an imperative computation (what a monad is all about), you bind a computation result to the r. And yield part is desugared into map call. Result type depends on the type of monad's.
Future trait has a flatMap and map functions, so we can use for comprehension with it. In your example can be desugared into the following code:
future1.flatMap(r1 => future2.flatMap(r2 => future3.map(r3 => r1 + r2 + r3) ) )
Parallelism aside
It goes without saying that if execution of future2 depends on r1 then you can't escape sequential execution, but if the future computations are independent, you have two choices. You can enforce sequential execution, or allow for parallel execution. You can't enforce the latter, as the execution context will handle this.
val res = for {
r1 <- computationReturningFuture1(...)
r2 <- computationReturningFuture2(...)
r3 <- computationReturningFuture3(...)
} yield (r1+r2+r3)
will always run sequentially. It can be easily explained by the desugaring, after which the subsequent computationReturningFutureX calls are only invoked inside of the flatMaps, i.e.
computationReturningFuture1(...).flatMap(r1 =>
computationReturningFuture2(...).flatMap(r2 =>
computationReturningFuture3(...).map(r3 => r1 + r2 + r3) ) )
However this is able to run in parallel and the for comprehension aggregates the results:
val future1 = computationReturningFuture1(...)
val future2 = computationReturningFuture2(...)
val future3 = computationReturningFuture3(...)
val res = for {
r1 <- future1
r2 <- future2
r3 <- future3
} yield (r1+r2+r3)
To elaborate those existing answers here a simple result to demonstrate how for comprehension works.
Its bit lengthy functions yet they worth taking look into it.
A function that give us a range of integers
scala> def createIntegers = Future{
println("INT "+ Thread.currentThread().getName+" Begin.")
val returnValue = List.range(1, 256)
println("INT "+ Thread.currentThread().getName+" End.")
returnValue
}
createIntegers: createIntegers: scala.concurrent.Future[List[Int]]
A function that give us a range of chars
scala> def createAsciiChars = Future{
println("CHAR "+ Thread.currentThread().getName+" Begin.")
val returnValue = new ListBuffer[Char]
for (i <- 1 to 256){
returnValue += i.toChar
}
println("CHAR "+ Thread.currentThread().getName+" End.")
returnValue
}
createAsciiChars: scala.concurrent.Future[scala.collection.mutable.ListBuffer[Char]]
Using these function calls within the for comprehension.
scala> val result = for{
i <- createIntegers
s <- createAsciiChars
} yield i.zip(s)
Await.result(result, Duration.Inf)
result: scala.concurrent.Future[List[(Int, Char)]] = Future(<not completed>)
For these below lines we can make out that all the function calls are synchronous i.e. createAsciiChars function call is not executed until createIntegers completes its execution.
scala> INT scala-execution-context-global-27 Begin.
INT scala-execution-context-global-27 End.
CHAR scala-execution-context-global-28 Begin.
CHAR scala-execution-context-global-28 End.
Making these function createAsciiChars, createIntegers calls outside the for comprehensions will be asynchronous execution.
It allows r1, r2, r3 to run in parallel, if possible. It may not be possible, depending things like how many threads are available to execute Future computations, but by using this syntax you are telling the compiler to run these computations in parallel if possible, then execute the yield() when all have completed.

Is there an Iteratee-like concept which pulls data from multiple sources?

It is possible to pull on demand from a number (say two for simplicity) of sources using streams (lazy lists). Iteratees can be used to process data coming from a single source.
Is there an Iteratee-like functional concept for processing multiple input sources? I could imagine an Iteratee whose state signals from which source does it want to pull.
To do this using pipes you nest the Pipe monad transformer within itself, once for each producer you wish to interact with. For example:
import Control.Monad
import Control.Monad.Trans
import Control.Pipe
producerA, producerB :: (Monad m) => Producer Int m ()
producerA = mapM_ yield [1,2,3]
producerB = mapM_ yield [4,5,6]
consumes2 :: (Show a, Show b) =>
Consumer a (Consumer b IO) r
consumes2 = forever $ do
a <- await -- await from outer producer
b <- lift await -- await from inner producer
lift $ lift $ print (a, b)
Just like a Haskell curried function of multiple variables, you partially apply it to each source using composition and runPipe:
consumes1 :: (Show b) => Consumer b IO ()
consumes1 = runPipe $ consumes2 <+< producerA
fullyApplied :: IO ()
fullyApplied = runPipe $ consumes1 <+< producerB
The above function outputs when run:
>>> fullyApplied
(1, 4)
(2, 5)
(3, 6)
This trick works for yielding or awaiting to any number of pipes upstream or downstream. It also works for proxies, the bidirectional analogs to pipes.
Edit: Note that this also works for any iteratee library, not just pipes. In fact, John Milikin and Oleg were the original advocates for this approach and I just stole the idea from them.
We're using Machines in Scala to pull in not just two, but an arbitrary amount of sources.
Two examples of binary joins are provided by the library itself, on the Tee module: mergeOuterJoin and hashJoin. Here is what the code for hashJoin looks like (it assumes both streams are sorted):
/**
* A natural hash join according to keys of type `K`.
*/
def hashJoin[A, B, K](f: A => K, g: B => K): Tee[A, B, (A, B)] = {
def build(m: Map[K, A]): Plan[T[A, B], Nothing, Map[K, A]] = (for {
a <- awaits(left[A])
mp <- build(m + (f(a) -> a))
} yield mp) orElse Return(m)
for {
m <- build(Map())
r <- (awaits(right[B]) flatMap (b => {
val k = g(b)
if (m contains k) emit(m(k) -> b) else Return(())
})) repeatedly
} yield r
}
This code builds up a Plan which is "compiled" to a Machine with the repeatedly method. The type being built here is Tee[A, B, (A, B)] which is a machine with two inputs. You request inputs on the left and right with awaits(left) and awaits(right), and you output with emit.
There is also a Haskell version of Machines.
Conduits (and, it can be built for Pipes, but that code hasn't been released yet) has a zip primitive that takes two upstreams and combines them as a stream of tuples.
Check out the pipes library, where vertical concatenation might do what you want. For example,
import Control.Pipe
import Control.Monad
import Control.Monad.State
import Data.Void
source0, source1 :: Producer Char IO ()
source0 = mapM_ yield "say"
source1 = mapM_ yield "what"
sink :: Show b => Consumer b IO ()
sink = forever $ await >>= \x -> lift $ print x
pipeline :: Pipe () Void IO ()
pipeline = sink <+< (source0 >> source1)
The sequencing operator (>>) vertically concatenates the sources, yielding the output (on a runPipe)
's'
'a'
'y'
'w'
'h'
'a'
't'