Doobie and DB access composition within 1 transaction - scala

Doobie book says that it's a good practice to return ConnectionIO from your repository layer. It gives an ability to chain calls and perform them in one transaction.
Nice and clear.
Now let's imagine we are working on REST API service and our scenario is:
Find an object in database
Perform some async manipulation (using cats.effect.IO or monix.eval.Task) with this object.
Store the object in database.
And we want to perform all these steps inside 1 transaction. The problem is that without natural transformation which is given for us by transactor.trans() we are working inside 2 monads - Task and ConnectionIO. That's not possible.
The question is - how to mix doobie ConnectionIO with any effect monad in 1 composition such as we are working in 1 transaction and able to commit/rollback all DB mutations at the end of the world?
Thank you!
UPD:
small example
def getObject: ConnectionIO[Request] = ???
def saveObject(obj: Request): ConnectionIO[Request] = ???
def processObject(obj: Request): monix.eval.Task[Request] = ???
val transaction:??? = for {
obj <- getObject //ConnectionIO[Request]
processed <- processObject(obj) //monix.eval.Task[Request]
updated <- saveObject(processed) //ConnectionIO[Request]
} yield updated
UPD2: The correct answer provided by #oleg-pyzhcov is to lift your effect datatypes to ConnectionIO like this:
def getObject: ConnectionIO[Request] = ???
def saveObject(obj: Request): ConnectionIO[Request] = ???
def processObject(obj: Request): monix.eval.Task[Request] = ???
val transaction: ConnectionIO[Request] = for {
obj <- getObject //ConnectionIO[Request]
processed <- Async[ConnectionIO].liftIO(processObject(obj).toIO) //ConnectionIO[Request]
updated <- saveObject(processed) //ConnectionIO[Request]
} yield updated
val result: Task[Request] = transaction.transact(xa)

ConnectionIO in doobie has a cats.effect.Async instance, which, among other things, allows you do turn any cats.effect.IO into ConnectionIO by means of liftIO method:
import doobie.free.connection._
import cats.effect.{IO, Async}
val catsIO: IO[String] = ???
val cio: ConnectionIO[String] = Async[ConnectionIO].liftIO(catsIO)
For monix.eval.Task, your best bet is using Task#toIO and performing the same trick, but you'd need a monix Scheduler in scope.

Related

Scala Thread Pool - Invoking API's Concurrently

I have a use-case in databricks where an API call has to me made on a dataset of URL's. The dataset has around 100K records.
The max allowed concurrency is 3.
I did the implementation in Scala and ran in databricks notebook. Apart from the one element pending in queue, i feel some thing is missing here.
Is the Blocking Queue and Thread Pool the right way to tackle this problem.
In the code below I have modified and instead of reading from dataset I am sampling on a Seq.
Any help/thought will be much appreciated.
import java.time.LocalDateTime
import java.util.concurrent.{ArrayBlockingQueue,BlockingQueue}
import java.util.concurrent.Executors
import java.util.concurrent.TimeUnit;
var inpQueue:BlockingQueue[(Int, String)] = new ArrayBlockingQueue[(Int, String)](1)
val inpDS = Seq((1,"https://google.com/2X6barD"), (2,"https://google.com/3d9vCgW"), (3,"https://google.com/2M02Xz0"), (4,"https://google.com/2XOu2uL"), (5,"https://google.com/2AfBWF0"), (6,"https://google.com/36AEKsw"), (7,"https://google.com/3enBxz7"), (8,"https://google.com/36ABq0x"), (9,"https://google.com/2XBjmiF"), (10,"https://google.com/36Emlen"))
val pool = Executors.newFixedThreadPool(3)
var i = 0
inpDS.foreach{
ix => {
inpQueue.put(ix)
val t = new ConsumerAPIThread()
t.setName("MyThread-"+i+" ")
pool.execute(t)
}
i = i+1
}
println("Final Queue Size = " +inpQueue.size+"\n")
class ConsumerAPIThread() extends Thread
{
var name =""
override def run()
{
val urlDetail = inpQueue.take()
print(this.getName()+" "+ Thread.currentThread().getName() + " popped "+urlDetail+" Queue Size "+inpQueue.size+" \n")
triggerAPI((urlDetail._1, urlDetail._2))
}
def triggerAPI(params:(Int,String)){
try{
val result = scala.io.Source.fromURL(params._2)
println("" +result)
}catch{
case ex:Exception => {
println("Exception caught")
}
}
}
def ConsumerAPIThread(s:String)
{
name = s;
}
}
So, you have two requirements: the functional one is that you want to process asynchronously the items in a list, the non-functional one is that you want to not process more than three items at once.
Regarding the latter, the nice thing is that, as you already have shown in your question, Java natively exposes a nicely packaged Executor that runs task on a thread pool with a fixed size, elegantly allowing you to cap the concurrency level if you work with threads.
Moving to the functional requirement, Scala helps by having something that does precisely that as part of its standard API. In particular it uses scala.concurrent.Future, so in order to use it we'll have to reframe triggerAPI in terms of Future. The content of the function is not particularly relevant, so we'll mostly focus on its (revised) signature for now:
import scala.concurrent.Future
import scala.concurrent.ExecutionContext
def triggerAPI(params: (Int, String))(implicit ec: ExecutionContext): Future[Unit] =
Future {
// some code that takes some time to run...
}
Notice that now triggerAPI returns a Future. A Future can be thought as a read-handle to something that is going to be eventually computed. In particular, this is a Future[Unit], where Unit stands for "we don't particularly care about the output of this function, but mostly about its side effects".
Furthermore, notice that the method now takes an implicit parameter, namely an ExecutionContext. The ExecutionContext is used to provide Futures with some form of environment where the computation happens. Scala has an API to create an ExecutionContext from a java.util.concurrent.ExecutorService, so this will come in handy to run our computation on the fixed thread pool, running no more than three callbacks at any given time.
Before moving forward, if you have questions about Futures, ExecutionContexts and implicit parameters, the Scala documentation is your best source of knowledge (here are a couple of pointers: 1, 2).
Now that we have the new triggerAPI method, we can use Future.traverse (here is the documentation for Scala 2.12 -- the latest version at the time of writing is 2.13 but to the best of my knowledge Spark users are stuck on 2.12 for the time being).
The tl;dr of Future.traverse is that it takes some form of container and a function that takes the items in that container and returns a Future of something else. The function will be applied to each item in the container and the result will be a Future of the container of the results. In your case: the container is a List, the items are (Int, String) and the something else you return is a Unit.
This means that you can simply call it like this:
Future.traverse(inpDS)(triggerAPI)
And triggerAPI will be applied to each item in inpDS.
By making sure that the execution context backed by the thread pool is in the implicit scope when calling Future.traverse, the items will be processed with the desired thread pool.
The result of the call is Future[List[Unit]], which is not very interesting and can simply be discarded (as you are only interested in the side effects).
That was a lot of talk, if you want to play around with the code I described you can do so here on Scastie.
For reference, this is the whole implementation:
import java.util.concurrent.{ExecutorService, Executors}
import scala.concurrent.duration.DurationLong
import scala.concurrent.Future
import scala.concurrent.{ExecutionContext, ExecutionContextExecutorService}
import scala.util.control.NonFatal
import scala.util.{Failure, Success, Try}
val datasets = List(
(1, "https://google.com/2X6barD"),
(2, "https://google.com/3d9vCgW"),
(3, "https://google.com/2M02Xz0"),
(4, "https://google.com/2XOu2uL"),
(5, "https://google.com/2AfBWF0"),
(6, "https://google.com/36AEKsw"),
(7, "https://google.com/3enBxz7"),
(8, "https://google.com/36ABq0x"),
(9, "https://google.com/2XBjmiF")
)
val executor: ExecutorService = Executors.newFixedThreadPool(3)
implicit val executionContext: ExecutionContextExecutorService = ExecutionContext.fromExecutorService(executor)
def triggerAPI(params: (Int, String))(implicit ec: ExecutionContext): Future[Unit] =
Future {
val (index, _) = params
println(s"+ started processing $index")
val start = System.nanoTime() / 1000000
Iterator.from(0).map(_ + 1).drop(100000000).take(1).toList.head // a noticeably slow operation
val end = System.nanoTime() / 1000000
val duration = (end - start).millis
println(s"- finished processing $index after $duration")
}
Future.traverse(datasets)(triggerAPI).onComplete {
case result =>
println("* processing is over, shutting down the executor")
executionContext.shutdown()
}
You need to shutdown the Executor after your job done else It will be waiting.
Try add pool.shutdown() end of your program.

Doobie - lifting arbitrary effect into ConnectionIO

I'm trying to send an email in the same transaction as inserting user into a database with Doobie.
I know that I can lift IO into ConnectionIO by using Async[ConnectionIO].liftIO(catsIO) where catsIO: IO[String]
But in my code I don't operate on IO, I use F with constraints, for example F[_]: Async
So then I can replace F with my own monad for testing.
Is it possible to somehow lift an F[String] into ConnectionIO[String] without using IO type directly?
Here is an answer I found for IO type: Doobie and DB access composition within 1 transaction
Cats has something called FunctionK which is a natural transformation.
I did this:
At the top of the world, where everything is built, you will need this
val liftToConnIO: FunctionK[IO, ConnectionIO] = LiftIO.liftK[ConnectionIO]
In the class needing to transform from F[String] to G[String] (F will be IO, G will be ConnectionIO when you construct everything) you can pass liftToConnIO and use it to transform F[A]to G[A] where needed.
The class that doesn't wants to abstract over IO and ConnectionIO can be passed the FunctionK to do the lifting:
class Stuff[F[_], G[_]](emailer: Emailer[F], store: Store[G], liftToG: FunctionK[F, G]) {
def sendEmail: G[Unit] =
for {
_ <- doDatabaseThingsReturnStuffInG
_ <- liftToG(emailer.sendEmail)
_ <- doMoreDatabaseThingsReturnStuffInG
} yield ()
}
(You might need context bounds (Sync?) on F and G)
A variation on Channing's answer,
class Stuff[F[_] : Effect, G[_] : LiftIO](emailer: Emailer[F], store: Store[G]) {
def sendEmail: G[Unit] =
for {
_ <- doDatabaseThingsReturnStuffInG
_ <- emailer.sendEmail.toIO.to[G]
_ <- doMoreDatabaseThingsReturnStuffInG
} yield ()
}
Effect[F] supports taking an F[A] to an IO[A] via toIO, and LiftIO[G] supports taking an IO[A] to a G[A] via to[G].
Yes, you can easily instantiate your F[String] into ConnectionIO[String].
Given a function like:
def foo[F[_]: Async]: F[String] = ...
To instantiate in to ConnectionIO you can simply do this:
def fooCIO: ConnectionIO[String] = foo[ConnectionIO]
Since Doobie 1.x (with cats-effect 3), the LiftIO instance for ConnectionIO no longer exists (why)
Instead you want to use WeakAsync.liftK as mentioned in Doobie - lifting arbitrary effect into ConnectionIO CE3

Do own stuff in Slick transaction

I'm using Slick 3.1.1 and i would like to implement my own stuff on Slick transaction.
def findSomeProducts = table.getSomeProducts() //db operation
def productTableUpdate = doSomeStuff1() //db operation
def priceTableUpdate = doSomeStuff2() //db operation
def updateElasticCache = updateIndexOfProduct() //this is not a database operation
I have these sample functions. Firstly, I'm getting some products from db and after i'm updating the tables. At the end I need to run updateElasticCache method. If updateElasticCache method fails, I would like to rollback whole db procceses.
I can't use
(for { ... } yield ()).transactionally this code because it's not applicable for my cases. This 'transactionally' is waiting a db action. But I want to add another functionality which is not a db-process.
Is it possible? How can I achieve it?
DBIO.from
Yes !!! its possible to add non db logic in between db logic in slick using DBIO actions composition and DBIO.from
Note that "your own stuff" should return a future, future can be converted to DBIO and can be composed along with usual db actions.
DBIO.from can help you with this. Here is how it works. DBIO.from takes a future and converts it into a DBIOAction. now you can compose these actions with usual db actions to do non-db operations along with db operations in a transaction.
def updateElasticCache: Future[Unit] = Future(doSomething())
now lets say we have some db actions
def createUser(user: User): DBIO[Int] = ???
I want createUser to rollback if updating cache fails. so i do the following
val action = createUser.flatMap { _ => DBIO.from(updateElasticCache()) }.transactionally
db.run(action)
now if updateElasticCache fails whole tx will fail and everything will rollback to normal state.
Example
You can use for comprehension to make it look good
def updateStats: DBIO[Int] = ???
val rollbackActions =
(for {
cStatus <- createUser()
uStatus <- updateStats()
result <- DBIO.from(updateElasticCache())
} yield result).transactionally
db.run(rollbackActions)
everything roll backs if updateElasticCache future fails

Scala transactional block with for comprehension

Getting stuck with a DAO layer I've created; works fine for the single case, but when needing to persist several bean instances in a transactional block, I find that I have coded myself into a corner. Why? Check out the DAO create method below:
def create(e: Entity): Option[Int] =
db.handle withSession { implicit ss: Session=>
catching( mapper.insert(e) ) option match {
case Some(success) => Some(Query(sequenceID))
case None => None
}
}
Queries that occur within a session block are set to auto commit, so I can't wrap several persistence operations in a transactional block. For example, here's a simplified for comprehension that processes new member subscriptions
val result = for{
u <- user.dao.create(ubean)
m <- member.dao.create(mbean)
o <- order.dao.create(obean)
} yield (u,m,o)
result match {
case Some((a,b,c)) => // all good
case _ => // failed, need to rollback here
}
I could manually perform the queries, but that gets ugly fast
db.handle withSession { implicit ss: Session=>
ss.withTransaction {
val result = for{
u <- safe( UserMapper.insert(ubean) )
...
}
def safe(q: Query[_]) =
catching( q ) option match {
case Some(success) => Some(Query(sequenceID))
case None => None
}
}
}
because I then wind up duplicating error handling, have to supply database, session, etc. all over the application, instead of encapsulating in the DAO layer
Anyone have some sage advice here for how to workaround this problem? I really like the concision of the for comprehension, Scala rocks ;-), ideas appreciated!
OK, unless someone has a better idea, here's what I'm rolling with:
Since DAO to Entity is a 1-to-1 relationship, and ScalaQuery auto commits queries executed in a session block, performing multiple inserts on separate entities is not possible via my DAO implementation.
The workaround is to create a GenericDAO, one not tied to a particular entity, which provides transactional query functionality, with error handling extracted into parent trait
def createMember(...): Boolean = {
db.handle withSession { implicit ss: Session=>
ss.withTransaction {
val result = for{
u <- safeInsert( UserMapper.insert(ubean) )(ss)
...
}
...
At the controller layer the implementation becomes dao.createMember(...) which is quite nice, imo, transactional inserts using safe for comprehension, cool stuff.

Memory consumption of a parallel Scala Stream

I have written a Scala (2.9.1-1) application that needs to process several million rows from a database query. I am converting the ResultSet to a Stream using the technique shown in the answer to one of my previous questions:
class Record(...)
val resultSet = statement.executeQuery(...)
new Iterator[Record] {
def hasNext = resultSet.next()
def next = new Record(resultSet.getString(1), resultSet.getInt(2), ...)
}.toStream.foreach { record => ... }
and this has worked very well.
Since the body of the foreach closure is very CPU intensive, and as a testament to the practicality of functional programming, if I add a .par before the foreach, the closures get run in parallel with no other effort, except to make sure that the body of the closure is thread safe (it is written in a functional style with no mutable data except printing to a thread-safe log).
However, I am worried about memory consumption. Is the .par causing the entire result set to load in RAM, or does the parallel operation load only as many rows as it has active threads? I've allocated 4G to the JVM (64-bit with -Xmx4g) but in the future I will be running it on even more rows and worry that I'll eventually get an out-of-memory.
Is there a better pattern for doing this kind of parallel processing in a functional manner? I've been showing this application to my co-workers as an example of the value of functional programming and multi-core machines.
If you look at the scaladoc of Stream, you will notice that the definition class of par is the Parallelizable trait... and, if you look at the source code of this trait, you will notice that it takes each element from the original collection and put them into a combiner, thus, you will load each row into a ParSeq:
def par: ParRepr = {
val cb = parCombiner
for (x <- seq) cb += x
cb.result
}
/** The default `par` implementation uses the combiner provided by this method
* to create a new parallel collection.
*
* #return a combiner for the parallel collection of type `ParRepr`
*/
protected[this] def parCombiner: Combiner[A, ParRepr]
A possible solution is to explicitly parallelize your computation, thanks to actors for example. You can take a look at this example from the akka documentation for example, that might be helpful in your context.
The new akka stream library is the fix you're looking for:
import akka.actor.ActorSystem
import akka.stream.ActorMaterializer
import akka.stream.scaladsl.{Source, Sink}
def iterFromQuery() : Iterator[Record] = {
val resultSet = statement.executeQuery(...)
new Iterator[Record] {
def hasNext = resultSet.next()
def next = new Record(...)
}
}
def cpuIntensiveFunction(record : Record) = {
...
}
implicit val actorSystem = ActorSystem()
implicit val materializer = ActorMaterializer()
implicit val execContext = actorSystem.dispatcher
val poolSize = 10 //number of Records in memory at once
val stream =
Source(iterFromQuery).runWith(Sink.foreachParallel(poolSize)(cpuIntensiveFunction))
stream onComplete {_ => actorSystem.shutdown()}