Is it possible to run multiple queries in parallel, using Doobie?
I have the following (pseudo)queries:
def prepareForQuery(input: String): ConnectionIO[Unit] = ???
val gettAllResults: ConnectionIO[List[(String, BigDecimal)]] = ???
def program(input : String) : ConnectionIO[List[(String, BigDecimal)]] = for{
_ <- prepareForQuery(input)
r <- gettAllResults
} yield r
What I tried is the following:
import doobie._
import doobie.implicits._
import cats.implicits._
val xa = Transactor.fromDataSource[IO](myDataSource)
val result = (program(i1),program(i2)).parMapN{case (a,b) => a ++ b}
val rs = result.transact(xa).unsafeRunSync
However, no NonEmptyParallel instance is found for ConnectionIO.
Error:(107, 54) could not find implicit value for parameter p:
cats.NonEmptyParallel[doobie.ConnectionIO,F] val result =
(program(i1),program(i2)).parMapN{case (a ,b) => a ++ b}
Am I missing something obvious or trying something that cannot be done?
Thanks
You cannot run queries in the ConnectionIO monad in parallel. But as soon as you turn them into your actual runtime monad (as long as it has a Parallel instance), you can.
For example, with the cats-effect IO runtime monad:
def prepareForQuery(input: String): ConnectionIO[Unit] = ???
val gettAllResults: ConnectionIO[List[(String, BigDecimal)]] = ???
def program(input : String) : ConnectionIO[List[(String, BigDecimal)]] = for{
_ <- prepareForQuery(input)
r <- gettAllResults
} yield r
Turn your ConnectionIO into an IO
val program1IO: IO[List[(String, BigDecimal)]]] = program(i1).transact(xa)
val program2IO: IO[List[(String, BigDecimal)]]] = program(i2).transact(xa)
You now have a monad which can do things in parallel.
val result: IO[List[(String, BigDecimal)]]] =
(program1IO, program2IO).parMapN{case (a,b) => a ++ b}
To understand why ConnectionIO doesn't allow you to do things in parallel, I'll just quote tpolecat:
You can't run ConnectionIO in parallel. It's a language describing the use of a connection which is a linear sequence of operations.
Using parMapN in IO, yes, you can run two things at the same time because they're running on different connections.
There is no parMapN with ConnectionIO because it does not (and cannot) have a Parallel instance.
Related
I'm trying to build an example concerning using the Stream.concurrently method in fs2. I'm developing the producer/consumer pattern, using a Queue as the shared state:
import cats.effect.std.{Queue, Random}
object Fs2Tutorial extends IOApp {
val random: IO[Random[IO]] = Random.scalaUtilRandom[IO]
val queue: IO[Queue[IO, Int]] = Queue.bounded[IO, Int](10)
val producer: IO[Nothing] = for {
r <- random
q <- queue
p <-
r.betweenInt(1, 11)
.flatMap(q.offer)
.flatTap(_ => IO.sleep(1.second))
.foreverM
} yield p
val consumer: IO[Nothing] = for {
q <- queue
c <- q.take.flatMap { n =>
IO.println(s"Consumed $n")
}.foreverM
} yield c
val concurrently: Stream[IO, Nothing] = Stream.eval(producer).concurrently(Stream.eval(consumer))
override def run(args: List[String]): IO[ExitCode] = {
concurrently.compile.drain.as(ExitCode.Success)
}
}
I expect the program to print some "Consumed n", for some n. However, the program prints nothing to the console.
What's wrong with the above code?
What's wrong with the above code?
You are not using the same Queue in the consumer and in the producer, rather each of them is creating its own new independent Queue (the same happens with Random BTW)
This is a common mistake made by newbies who don't grasp yet the main principles behind a data type like IO
When you do val queue: IO[Queue[IO, Int]] = Queue.bounded[IO, Int](10) you are saying that queue is a program that when evaluated will produce a value of type Queue[IO, Unit], that is the point of all this.
The program become a value, and as any value you can manipulate it in any ways to produce new values, for example using flatMap so when both consumer & producer crate a new program by flatMapping queue they both create new independent programs / values.
You can fix that code like this:
import cats.effect.{IO, IOApp}
import cats.effect.std.{Queue, Random}
import cats.syntax.all._
import fs2.Stream
import scala.concurrent.duration._
object Fs2Tutorial extends IOApp.Simple {
override final val run: IO[Unit] = {
val resources =
(
Random.scalaUtilRandom[IO],
Queue.bounded[IO, Int](10)
).tupled
val concurrently =
Stream.eval(resources).flatMap {
case (random, queue) =>
val producer =
Stream
.fixedDelay[IO](1.second)
.evalMap(_ => random.betweenInt(1, 11))
.evalMap(queue.offer)
val consumer =
Stream.fromQueueUnterminated(queue).evalMap(n => IO.println(s"Consumed $n"))
producer.concurrently(consumer)
}
concurrently.interruptAfter(10.seconds).compile.drain >> IO.println("Finished!")
}
}
(You can see it running here).
PS: I would recommend you to look into the "Programs as Values" Series from Fabio Labella: https://systemfw.org/archive.html
This question already has answers here:
How to configure a fine tuned thread pool for futures?
(4 answers)
Closed 3 years ago.
I am a newbie to Scala. I have a general query on future concepts of Scala.
Say I have a list of elements and foreach element present in the list i have to invoke a method which does some processing.
We can use future method and can do our processing in parallel but my question is how can we control that concurrent processing tasks running in parallel/background.
For example I should maintain the parallel running task limit as 10. So at Max my future should spawn processing for 10 elements in the list and wait for any of the spawned process to complete. Once free slots available it should spawn the process for remaining elements till it reach max.
I searched in Google but could not able to find it. In Unix same can be done by running process in background and manually check count using ps command. Since not aware of Scala much. Please help me in this.
Thanks in advance.
Let us create two thread pools of different sizes:
val fiveThreadsEc = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(5))
val tenThreadsEc = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(10))
We can control on which thread pool will future run by passing it as an argument to the future like so
Future(42)(tenThreadsEc)
This is equivalent to
Future.apply(body = 42)(executor = tenThreadsEc)
which corresponds to the signature of Future.apply
def apply[T](body: => T)(implicit executor: ExecutionContext): Future[T] =
Note how the executor parameter is declared as implicit. This means we could provide it implicitly like so
implicit val tenThreadsEc = ...
Future(42) // executor = tenThreadsEc argument passed in magically
Now, as per Luis' suggestion, consider simplified signature of Future.traverse
def traverse[A, B, M[X] <: IterableOnce[X]](in: M[A])(fn: A => Future[B])(implicit ..., executor: ExecutionContext): Future[M[B]]
Let us simplify it further by fixing M type constructor parameter to, say a M = List,
def traverse[A, B]
(in: List[A]) // list of things to process in parallel
(fn: A => Future[B]) // function to process an element asynchronously
(implicit executor: ExecutionContext) // thread pool to use for parallel processing
: Future[List[B]] // returned result is a future of list of things instead of list of future things
Let's pass in the arguments
val tenThreadsEc = ...
val myList: List[Int] = List(11, 42, -1)
def myFun(x: Int)(implicit executor: ExecutionContext): Future[Int] = Future(x + 1)(ec)
Future.traverse[Int, Int, List](
in = myList)(
fn = myFun(_)(executor = tenThreadsEc))(
executor = tenThreadsEc,
bf = implicitly // ignore this
)
Relying on implicit resolution and type inference, we have simply
implicit val tenThreadsEc = ...
Future.traverse(myList)(myFun)
Putting it all together, here is a working example
import java.util.concurrent.Executors
import scala.concurrent.{ExecutionContext, Future}
object FuturesExample extends App {
val fiveThreadsEc = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(5))
val tenThreadsEc = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(10))
val myList: List[Int] = List(11, 42, -1)
def myFun(x: Int)(implicit executor: ExecutionContext): Future[Int] = Future(x + 1)(executor)
Future(body = 42)(executor = fiveThreadsEc)
.andThen(v => println(v))(executor = fiveThreadsEc)
Future.traverse[Int, Int, List](
in = myList)(
fn = myFun(_)(executor = tenThreadsEc))(
executor = tenThreadsEc,
bf = implicitly
).andThen(v => println(v))(executor = tenThreadsEc)
// Using implicit execution context call-site simplifies to...
implicit val ec = tenThreadsEc
Future(42)
.andThen(v => println(v))
Future.traverse(myList)(myFun)
.andThen(v => println(v))
}
which outputs
Success(42)
Success(List(12, 43, 0))
Success(42)
Success(List(12, 43, 0))
Alternatively, Scala provides default execution context called
scala.concurrent.ExecutionContext.Implicits.global
and we can control its parallelism with system properties
scala.concurrent.context.minThreads
scala.concurrent.context.numThreads
scala.concurrent.context.maxThreads
scala.concurrent.context.maxExtraThreads
For example, create the following ConfiguringGlobalExecutorParallelism.scala
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.Future
object ConfiguringGlobalExecutorParallelism extends App {
println(scala.concurrent.ExecutionContext.Implicits.global.toString)
Future.traverse(List(11,42,-1))(x => Future(x + 1))
.andThen(v => println(v))
}
and run it with
scala -Dscala.concurrent.context.numThreads=10 -Dscala.concurrent.context.maxThreads=10 ConfiguringGlobalExecutorParallelism.scala
which should output
scala.concurrent.impl.ExecutionContextImpl$$anon$3#cb191ca[Running, parallelism = 10, size = 0, active = 0, running = 0, steals = 0, tasks = 0, submissions = 0]
Success(List(12, 43, 0))
Note how parallelism = 10.
Another option is to use parallel collections
libraryDependencies += "org.scala-lang.modules" %% "scala-parallel-collections" % "0.2.0"
and configure parallelism via tasksupport, for example
val myParVector: ParVector[Int] = ParVector(11, 42, -1)
myParVector.tasksupport = new ForkJoinTaskSupport(new ForkJoinPool(10))
myParVector.map(x => x + 1)
Note that parallel collections are a separate facility from Futures
parallel collection design in Scala has no notion of an
ExecutionContext, that is strictly a property of Future. The parallel
collection library has a notion of a TaskSupport which is responsible
for scheduling inside the parallel collection
so we could map over the collection simply with x => x + 1 instead of x => Future(x + 1), and there was no need to use Future.traverse, instead just a regular map was sufficient.
I need to compile function and then evaluate it with different parameters of type List[Map[String, AnyRef]].
I have the following code that does not compile with such the type but compiles with simple type like List[Int].
I found that there are just certain implementations of Liftable in scala.reflect.api.StandardLiftables.StandardLiftableInstances
import scala.reflect.runtime.universe
import scala.reflect.runtime.universe._
import scala.tools.reflect.ToolBox
val tb = universe.runtimeMirror(getClass.getClassLoader).mkToolBox()
val functionWrapper =
"""
object FunctionWrapper {
def makeBody(messages: List[Map[String, AnyRef]]) = Map.empty
}""".stripMargin
val functionSymbol =
tb.define(tb.parse(functionWrapper).asInstanceOf[tb.u.ImplDef])
val list: List[Map[String, AnyRef]] = List(Map("1" -> "2"))
tb.eval(q"$functionSymbol.function($list)")
Getting compilation error for this, how can I make it work?
Error:(22, 38) Can't unquote List[Map[String,AnyRef]], consider using
... or providing an implicit instance of
Liftable[List[Map[String,AnyRef]]]
tb.eval(q"$functionSymbol.function($list)")
^
The problem comes not from complicated type but from the attempt to use AnyRef. When you unquote some literal, it means you want the infrastructure to be able to create a valid syntax tree to create an object that would exactly match the object you pass. Unfortunately this is obviously not possible for all objects. For example, assume that you've passed a reference to Thread.currentThread() as a part of the Map. How it could possible work? Compiler is just not able to recreate such a complicated object (not to mention making it the current thread). So you have two obvious alternatives:
Make you argument also a Tree i.e. something like this
def testTree() = {
val tb = universe.runtimeMirror(getClass.getClassLoader).mkToolBox()
val functionWrapper =
"""
| object FunctionWrapper {
|
| def makeBody(messages: List[Map[String, AnyRef]]) = Map.empty
|
| }
""".stripMargin
val functionSymbol =
tb.define(tb.parse(functionWrapper).asInstanceOf[tb.u.ImplDef])
//val list: List[Map[String, AnyRef]] = List(Map("1" -> "2"))
val list = q"""List(Map("1" -> "2"))"""
val res = tb.eval(q"$functionSymbol.makeBody($list)")
println(s"testTree = $res")
}
The obvious drawback of this approach is that you loose type safety at compile time and might need to provide a lot of context for the tree to work
Another approach is to not try to pass anything containing AnyRef to the compiler-infrastructure. It means you create some function-like Wrapper:
package so {
trait Wrapper {
def call(args: List[Map[String, AnyRef]]): Map[String, AnyRef]
}
}
and then make your generated code return a Wrapper instead of directly executing the logic and call the Wrapper from the usual Scala code rather than inside compiled code. Something like this:
def testWrapper() = {
val tb = universe.runtimeMirror(getClass.getClassLoader).mkToolBox()
val functionWrapper =
"""
|object FunctionWrapper {
| import scala.collection._
| import so.Wrapper /* <- here probably different package :) */
|
| def createWrapper(): Wrapper = new Wrapper {
| override def call(args: List[Map[String, AnyRef]]): Map[String, AnyRef] = Map.empty
| }
|}
| """.stripMargin
val functionSymbol = tb.define(tb.parse(functionWrapper).asInstanceOf[tb.u.ImplDef])
val list: List[Map[String, AnyRef]] = List(Map("1" -> "2"))
val tree: tb.u.Tree = q"$functionSymbol.createWrapper()"
val wrapper = tb.eval(tree).asInstanceOf[Wrapper]
val res = wrapper.call(list)
println(s"testWrapper = $res")
}
P.S. I'm not sure what are you doing but beware of performance issues. Scala is a hard language to compile and thus it might easily take more time to compile your custom code than to run it. If performance becomes an issue you might need to use some other methods such as full-blown macro-code-generation or at least caching of the compiled code.
Apologies in advance for the basic question. I am starting to learn Scala with http4s and in a router handler, I am trying to enter an entry to MongoDB. As far as I can tell insertOne returns a Observable[Completed].
Any idea how I can wait for the observalbe to complete, before returning the response?
My code is:
class Routes {
val service: HttpService = HttpService {
case r # GET -> Root / "hello" => {
val mongoClient: MongoClient = MongoClient()
val database: MongoDatabase = mongoClient.getDatabase("scala")
val collection: MongoCollection[Document] = database.getCollection("tests")
val doc: Document = Document("_id" -> 0, "name" -> "MongoDB", "type" -> "database",
"count" -> 1, "info" -> Document("x" -> 203, "y" -> 102))
collection.insertOne(doc)
mongoClient.close()
Ok("Hello.")
}
}
}
class GomadApp(host: String, port: Int) {
private val pool = Executors.newCachedThreadPool()
println(s"Starting server on '$host:$port'")
val routes = new Routes().service
// Add some logging to the service
val service: HttpService = routes.local { req =>
val path = req.uri
val start = System.nanoTime()
val result = req
val time = ((System.nanoTime() - start) / 1000) / 1000.0
println(s"${req.remoteAddr.getOrElse("null")} -> ${req.method}: $path in $time ms")
result
}
// Construct the blaze pipeline.
def build(): ServerBuilder =
BlazeBuilder
.bindHttp(port, host)
.mountService(service)
.withServiceExecutor(pool)
}
object GomadApp extends ServerApp {
val ip = "127.0.0.1"
val port = envOrNone("HTTP_PORT") map (_.toInt) getOrElse (8787)
override def server(args: List[String]): Task[Server] =
new GomadApp(ip, port)
.build()
.start
}
I'd recommend https://github.com/haghard/mongo-query-streams - although you'll have to fork it and up the dependencies a bit, scalaz 7.1 and 7.2 aren't binary-compatible.
The less-streamy (and less referentially correct) way: https://github.com/Verizon/delorean
collection.insertOne(doc).toFuture().toTask.flatMap({res => Ok("Hello")})
The latter solution looks easier, but it has some hidden pitfalls. See https://www.reddit.com/r/scala/comments/3zofjl/why_is_future_totally_unusable/
This tweet made me wonder: https://twitter.com/timperrett/status/684584581048233984
Do you consider Futures "totally unusable" or is this just hyperbole? I've never had a major problem, but I'm willing to be enlightened. Doesn't the following code make Futures effectively "lazy"? def myFuture = Future { 42 }
And, finally, I've also heard rumblings that scalaz's Tasks have some failings as well, but I haven't found much on it. Anybody have more details?
Answer:
The fundamental problem is that constructing a Future with a side-effecting expression is itself a side-effect. You can only reason about Future for pure computations, which unfortunately is not how they are commonly used. Here is a demonstration of this operation breaking referential transparency:
import scala.concurrent.Future
import scala.concurrent.ExecutionContext.Implicits.global
import scala.util.Random
val f1 = {
val r = new Random(0L)
val x = Future(r.nextInt)
for {
a <- x
b <- x
} yield (a, b)
}
// Same as f1, but I inlined `x`
val f2 = {
val r = new Random(0L)
for {
a <- Future(r.nextInt)
b <- Future(r.nextInt)
} yield (a, b)
}
f1.onComplete(println) // Success((-1155484576,-1155484576))
f2.onComplete(println) // Success((-1155484576,-723955400)) <-- not the same
However this works fine with Task. Note that the interesting one is the non-inlined version, which manages to produce two distinct Int values. This is the important bit: Task has a constructor that captures side-effects as values, and Future does not.
import scalaz.concurrent.Task
val task1 = {
val r = new Random(0L)
val x = Task.delay(r.nextInt)
for {
a <- x
b <- x
} yield (a, b)
}
// Same as task1, but I inlined `x`
val task2 = {
val r = new Random(0L)
for {
a <- Task.delay(r.nextInt)
b <- Task.delay(r.nextInt)
} yield (a, b)
}
println(task1.run) // (-1155484576,-723955400)
println(task2.run) // (-1155484576,-723955400)
Most of the commonly-cited differences like "a Task doesn't run until you ask it to" and "you can compose the same Task over and over" trace back to this fundamental distinction.
So the reason it's "totally unusable" is that once you're used to programming with pure values and relying on equational reasoning to understand and manipulate programs it's hard to go back to side-effecty world where things are much harder to understand.
I need to do something really similar to this https://github.com/typesafehub/activator-akka-stream-scala/blob/master/src/main/scala/sample/stream/GroupLogFile.scala
my problem is that I have an unknown number of groups and if the number of parallelism of the mapAsync is less of the number of groups i got and error in the last sink
Tearing down
SynchronousFileSink(/Users/sam/dev/projects/akka-streams/target/log-ERROR.txt)
due to upstream error
(akka.stream.impl.StreamSubscriptionTimeoutSupport$$anon$2)
I tried to put a buffer in the middle as suggested in the pattern guide of akka streams http://doc.akka.io/docs/akka-stream-and-http-experimental/1.0/scala/stream-cookbook.html
groupBy {
case LoglevelPattern(level) => level
case other => "OTHER"
}.buffer(1000, OverflowStrategy.backpressure).
// write lines of each group to a separate file
mapAsync(parallelism = 2) {....
but with the same result
Expanding on jrudolph's comment which is completely correct...
You do not need a mapAsync in this instance. As a basic example, suppose you have a source of tuples
import akka.stream.scaladsl.{Source, Sink}
def data() = List(("foo", 1),
("foo", 2),
("bar", 1),
("foo", 3),
("bar", 2))
val originalSource = Source(data)
You can then perform a groupBy to create a Source of Sources
def getID(tuple : (String, Int)) = tuple._1
//a Source of (String, Source[(String, Int),_])
val groupedSource = originalSource groupBy getID
Each one of the grouped Sources can be processed in parallel with just a map, no need for anything fancy. Here is an example of each grouping being summed in an independent stream:
import akka.actor.ActorSystem
import akka.stream.ACtorMaterializer
implicit val actorSystem = ActorSystem()
implicit val mat = ActorMaterializer()
import actorSystem.dispatcher
def getValues(tuple : (String, Int)) = tuple._2
//does not have to be a def, we can re-use the same sink over-and-over
val sumSink = Sink.fold[Int,Int](0)(_ + _)
//a Source of (String, Future[Int])
val sumSource =
groupedSource map { case (id, src) =>
id -> {src map getValues runWith sumSink} //calculate sum in independent stream
}
Now all of the "foo" numbers are being summed in parallel with all of the "bar" numbers.
mapAsync is used when you have a encapsulated function that returns a Future[T] and you're trying to emit a T instead; which is not the case in you question. Further, mapAsync involves waiting for results which is not reactive...