Scala transactional block with for comprehension - scala

Getting stuck with a DAO layer I've created; works fine for the single case, but when needing to persist several bean instances in a transactional block, I find that I have coded myself into a corner. Why? Check out the DAO create method below:
def create(e: Entity): Option[Int] =
db.handle withSession { implicit ss: Session=>
catching( mapper.insert(e) ) option match {
case Some(success) => Some(Query(sequenceID))
case None => None
}
}
Queries that occur within a session block are set to auto commit, so I can't wrap several persistence operations in a transactional block. For example, here's a simplified for comprehension that processes new member subscriptions
val result = for{
u <- user.dao.create(ubean)
m <- member.dao.create(mbean)
o <- order.dao.create(obean)
} yield (u,m,o)
result match {
case Some((a,b,c)) => // all good
case _ => // failed, need to rollback here
}
I could manually perform the queries, but that gets ugly fast
db.handle withSession { implicit ss: Session=>
ss.withTransaction {
val result = for{
u <- safe( UserMapper.insert(ubean) )
...
}
def safe(q: Query[_]) =
catching( q ) option match {
case Some(success) => Some(Query(sequenceID))
case None => None
}
}
}
because I then wind up duplicating error handling, have to supply database, session, etc. all over the application, instead of encapsulating in the DAO layer
Anyone have some sage advice here for how to workaround this problem? I really like the concision of the for comprehension, Scala rocks ;-), ideas appreciated!

OK, unless someone has a better idea, here's what I'm rolling with:
Since DAO to Entity is a 1-to-1 relationship, and ScalaQuery auto commits queries executed in a session block, performing multiple inserts on separate entities is not possible via my DAO implementation.
The workaround is to create a GenericDAO, one not tied to a particular entity, which provides transactional query functionality, with error handling extracted into parent trait
def createMember(...): Boolean = {
db.handle withSession { implicit ss: Session=>
ss.withTransaction {
val result = for{
u <- safeInsert( UserMapper.insert(ubean) )(ss)
...
}
...
At the controller layer the implementation becomes dao.createMember(...) which is quite nice, imo, transactional inserts using safe for comprehension, cool stuff.

Related

Functional scala log accumulator

I'm working on a Scala project using cats library, mainly. In there, we have calls like
for {
_ <- initSomeServiceAndLog("something from a far away service")
_ <- initSomeOtherServiceAndLog("something from another far away service")
a <- b()
c <- d(a)
} yield c
Imagine that b also logs something or might throw a business error (I know, we avoid to throw in Scala, but it's not the case right now). I'm looking for a solution to accumulate logs and print them all in the end, in a single message.
For a happy path, I saw that Writer Monad from Cats might be an acceptable solution.
But what if b method throws? The requirements are to logs everything - all previous logs and the error message, in a single message, with some kind of unique trace ID.
Any thoughts? Thanks in advance
Implementing functional logging (in a way that preserves logs even if error happened) using monad transformers like Writer (WriterT) or State (StateT) is hard. However, if we don't be anal about FP approach we could do the following:
use some IO monad
with it create something like in-memory storage for logs
however implement in in a functional way
Personally I would pick either cats.effect.concurrent.Ref or monix.eval.TaskLocal.
Example using Ref (and Task):
type Log = Ref[Task, Chain[String]]
type FunctionalLogger = String => Task[Unit]
val createLog: Task[Log] = Ref.of[Task, Chain[String]](Chain.empty)
def createAppender(log: Log): FunctionalLogger =
entry => log.update(chain => chain.append(entry))
def outputLog(log: Log): Task[Chain[String]] = log.get
with helpers like that I could:
def doOperations(logger: FunctionalLogger) = for {
_ <- operation1(logger) // logging is a side effect managed by IO monad
_ <- operation2(logger) // so it is referentially transparent
} yield result
createLog.flatMap { log =>
doOperations(createAppender(log))
.recoverWith(...)
.flatMap { result =>
outputLog(log)
...
}
}
However, making sure that output is called is a bit of a pain so we could use some form of Bracket or Resource to handle it:
val loggerResource: Resource[Task, FunctionalLogger] = Resource.make {
createLog // acquiring resource - IO operation that accesses something
} { log =>
outputLog(log) // releasing resource - works like finally in try-catchso it should
.flatMap(... /* log entries or sth */) // be called no matter if error occured
}.map(createAppender)
loggerResource.use { logger =>
doSomething(logger)
}
If you don't like passing this appender around explicitly you could use Kleisli to inject it:
type WithLogger[A] = Kleisli[Task, FunctionalLogger, A]
// def operation1: WithLogger[A]
// def operation2: WithLogger[B]
def doSomething: WithLogger[C] = for {
a <- operation1
b <- operation2
} yield c
loggerResource.use { logger =>
doSomething(logger)
}
TaskLocal would be used in a very similar way.
At the end of the day you would end up with:
type that says that it is logging
mutability managed through IO, so referential transparency would not be lost
certainty that even if IO fails, log will be preserved and the results sent
I believe some purist would not like this solution, but it has all the benefits of FP, so I would personally use it.

Order of execution of Future - Making sequential inserts in a db non-blocking

A simple scenario here. I am using akka streams to read from kafka and write into an external source, in my case: cassandra.
Akka streams(reactive-kafka) library equips me with backpressure and other nifty things to make this possible.
kafka being a Source and Cassandra being a Sink, when I get bunch of events which are, for example be cassandra queries here through Kafka which are supposed to be executed sequentially (ex: it could be a INSERT, UPDATE and a DELETE and must be sequential).
I cannot use mayAsync and execute both the statement, Future is eager and there is a chance that DELETE or UPDATE might get executed first before INSERT.
I am forced to use Cassandra's execute as opposed to executeAsync which is non-blocking.
There is no way to make a complete async solution to this issue, but how ever is there a much elegant way to do this?
For ex: Make the Future lazy and sequential and offload it to a different execution context of sorts.
mapAsync gives a parallelism option as well.
Can Monix Task be of help here?
This a general design question and what are the approaches one can take.
UPDATE:
Flow[In].mapAsync(3)(input => {
input match {
case INSERT => //do insert - returns future
case UPDATE => //do update - returns future
case DELETE => //delete - returns future
}
The scenario is a little more complex. There could be thousands of insert, update and delete coming in order for specific key(s)(in kafka)
I would ideally want to execute the 3 futures of a single key in sequence. I believe Monix's Task can help?
If you process things with parallelism of 1, they will get executed in strict sequence, which will solve your problem.
But that's not interesting. If you want, you can run operations for different keys in parallel - if processing for different keys is independent, which, I assume from your description, is possible. To do this, you have to buffer the incoming values and then regroup it. Let's see some code:
import monix.reactive.Observable
import scala.concurrent.duration._
import monix.eval.Task
// Your domain logic - I'll use these stubs
trait Event
trait Acknowledgement // whatever your DB functions return, if you need it
def toKey(e: Event): String = ???
def processOne(event: Event): Task[Acknowledgement] = Task.deferFuture {
event match {
case _ => ??? // insert/update/delete
}
}
// Monix Task.traverse is strictly sequential, which is what you need
def processMany(evs: Seq[Event]): Task[Seq[Acknowledgement]] =
Task.traverse(evs)(processOne)
def processEventStreamInParallel(source: Observable[Event]): Observable[Acknowledgement] =
source
// Process a bunch of events, but don't wait too long for whole 100. Fine-tune for your data source
.bufferTimedAndCounted(2.seconds, 100)
.concatMap { batch =>
Observable
.fromIterable(batch.groupBy(toKey).values) // Standard collection methods FTW
.mapAsync(3)(processMany) // processing up to 3 different keys in parallel - tho 3 is not necessary, probably depends on your DB throughput
.flatMap(Observable.fromIterable) // flattening it back
}
The concatMap operator here will ensure that your chunks are processed sequentially as well. So even if one buffer has key1 -> insert, key1 -> update and the other has key1 -> delete, that causes no problems. In Monix, this is the same as flatMap, but in other Rx libraries flatMap might be an alias for mergeMap which has no ordering guarantee.
This can be done with Futures too, tho there's no standard "sequential traverse", so you have to roll your own, something like:
def processMany(evs: Seq[Event]): Future[Seq[Acknowledgement]] =
evs.foldLeft(Future.successful(Vector.empty[Acknowledgement])){ (acksF, ev) =>
for {
acks <- acksF
next <- processOne(ev)
} yield acks :+ next
}
You can use akka-streams subflows, to group by key, then merge substreams if you want to do something with what you get from your database operations:
def databaseOp(input: In): Future[Out] = input match {
case INSERT => ...
case UPDATE => ...
case DELETE => ...
}
val databaseFlow: Flow[In, Out, NotUsed] =
Flow[In].groupBy(Int.maxValues, _.key).mapAsync(1)(databaseOp).mergeSubstreams
Note that order from input source won't be kept in output as it is done in mapAsync, but all operations on the same key will still be in order.
You are looking for Future.flatMap:
def doSomething: Future[Unit]
def doSomethingElse: Future[Unit]
val result = doSomething.flatMap { _ => doSomethingElse }
This executes the first function, and then, when its Future is satisfied, starts the second one. The result is a new Future that completes when the result of the second execution is satisfied.
The result of the first future is passed into the function you give to .flatMap, so the second function can depend on the result of the first one. For example:
def getUserID: Future[Int]
def getUser(id: Int): Future[User]
val userName: Future[String] = getUserID.flatMap(getUser).map(_.name)
You can also write this as a for-comprehension:
for {
id <- getUserID
user <- getUser(id)
} yield user.name

Sharing database session between multiple methods in Slick 3

I have recently switched from Slick-2 to Slick-3. Everything is working very well with slick-3. However, I am having some issues when it comes to transaction.
I have seen different questions and sample code in which transactionally and withPinnedSession are used to handle the transaction. But my case is slightly different. Both transcationally and withPinnedSession can be applied on Query. But what I want to do is to pass the same session to another method which will do some operations and want to wrap multiple methods in same transaction.
I have the below slick-2 code, I am not sure how this can be implemented with Slick-3.
def insertWithTransaction(row: TTable#TableElementType)(implicit session: Session) = {
val entity = (query returning query.map(obj => obj) += row).asInstanceOf[TEntity]
// do some operations after insert
//eg: invoke another method for sending the notification
entity
}
override def insert(row: TTable#TableElementType) = {
db.withSession {
implicit session => {
insertWithTransaction(row)
}
}
}
Now, if someone is not interested in having transactions, they can just invoke the insert() method.
If we need to do some transaction, it can be done by using insertWithTransaction() in db.withTransaction block.
For eg :
db.withTransaction { implicit session =>
insertWithTransaction(row1)
insertWithTransaction(row2)
//check some condition, invoke session.rollback if something goes wrong
}
But with slick-3, the transactionally can be applied on query only.
That means, wherever we need to do some logic centrally after insertion, it is possible. Every developer needs to manually handle those scenarios explicitly, if they are using transactions. I believe this could potentially cause errors. I am trying to abstract the whole logic in insert operation so that the implementors need to worry only about the transaction success/failure
Is there any other way, in slick-3, in which I can pass the same session to multiple methods so that everything can be done in single db session.
You are missing something : .transactionally doesn't apply to a Query, but to a DBIOAction.
Then, a DBIOAction can be composed of multiple queries by using monadic composition.
Here is a exemple coming from the documentation :
val action = (for {
ns <- coffees.filter(_.name.startsWith("ESPRESSO")).map(_.name).result
_ <- DBIO.seq(ns.map(n => coffees.filter(_.name === n).delete): _*)
} yield ()).transactionally
action is composed of a select query and as many delete queries as rows returned by the first query. All that creates DBIOAction that be executed in a transaction.
Then, to run the action against the database, you have to call db.run, so, like this:
val f: Future[Unit] = db.run(action)
Now, to come back to your exemple, let's say you want to apply an update query after your insert, you can create an action this way
val action = (for {
entity <- (query returning query.map(obj => obj) += row)
_ <- query.map(_.foo).update(newFoo)
} yield entity).transactionally
Hope it helps.

Is there an easy way to get a Stream as output of a RowParser?

Given rowParser of type RowParser[Photo], this is how you would parse a list of rows coming from a table photo, according to the code samples I have seen so far:
def getPhotos(album: Album): List[Photo] = DB.withConnection { implicit c =>
SQL("select * from photo where album = {album}").on(
'album -> album.id
).as(rowParser *)
}
Where the * operator creates a parser of type ResultSetParser[List[Photo]]. Now, I was wondering if it was equally possible to get a parser that yields a Stream (thinking that being more lazy is always better), but I only came up with this:
def getPhotos(album: Album): Stream[Photo] = DB.withConnection { implicit c =>
SQL("select * from photo where album = {album}").on(
'album -> album.id
)() collect (rowParser(_) match { case Success(photo) => photo })
}
It works, but it seems overly complicated. I could of course just call toStream on the List I get from the first function, but my goal was to only apply rowParser on rows that are actually read. Is there an easier way to achieve this?
EDIT: I know that limit should be used in the query, if the number of rows of interest is known beforehand. I am also aware that, in many cases, you are going to use the whole result anyway, so being lazy will not improve performance. But there might be a case where you save a few cycles, e.g. if for some reason, you have search criteria that you cannot or do not want to express in SQL. So I thought it was odd that, given the fact that anorm provides a way to obtain a Stream of SqlRow, I didn't find a straightforward way to apply a RowParser on that.
I ended up creating my own stream method which corresponds to the list method:
def stream[A](p: RowParser[A]) = new ResultSetParser[Stream[A]] {
def apply(rows: SqlParser.ResultSet): SqlResult[Stream[A]] = rows.headOption.map(p(_)) match {
case None => Success(Stream.empty[A])
case Some(Success(a)) => {
val s: Stream[A] = a #:: rows.tail.flatMap(r => p(r) match {
case Success(r) => Some(r)
case _ => None
})
Success(s)
}
case Some(Error(msg)) => Error(msg)
}
}
Note that the Play SqlResult can only be either Success/Error while each row can also be Success/Error. I handle this for the first row only, assuming the rest will be the same. This may or may not work for you.
You're better off making smaller (paged) queries using limit and offset.
Anorm would need some modification if you're going to keep your (large) result around in memory and stream it from there. Then the other concern would be the new memory requirements for your JVM. And how would you deal with caching on the service level? See, previously you could easily cache something like photos?page=1&size=10, but now you just have photos, and the caching technology would have no idea what to do with the stream.
Even worse, and possibly on a JDBC-level, wrapping Stream around limited and offset-ed execute statements and just making multiple calls to the database behind the scenes, but this sounds like it would need a fair bit of work to port the Stream code that Scala generates to Java land (to work with Groovy, jRuby, etc), then get it on the approved for the JDBC 5 or 6 roadmap. This idea will probably be shunned as being too complicated, which it is.
You could wrap Stream around your entire DAO (where the limit and offset trickery would happen), but this almost sounds like more trouble than it's worth :-)
I ran into a similar situation but ran into a Call Stack Overflow exception when the built-in anorm function to convert to Streams attempted to parse the result set.
In order to get around this I elected to abandon the anorm ResultSetParser paradigm, and fall back to the java.sql.ResultSet object.
I wanted to use anorm's internal classes for the parsing result set rows, but, ever since version 2.4, they have made all of the pertinent classes and methods private to their package, and have deprecated several other methods that would have been more straight-forward to use.
I used a combination of Promises and Futures to work around the ManagedResource that anorm now returns. I avoided all deprecated functions.
import anorm._
import java.sql.ResultSet
import scala.concurrent._
def SqlStream[T](sql:SqlQuery)(parse:ResultSet => T)(implicit ec:ExecutionContext):Future[Stream[T]] = {
val conn = db.getConnection()
val mr = sql.preparedStatement(conn, false)
val p = Promise[Unit]()
val p2 = Promise[ResultSet]()
Future {
mr.map({ stmt =>
p2.success(stmt.executeQuery)
Await.ready(p.future, duration.Duration.Inf)
}).acquireAndGet(identity).andThen { case _ => conn.close() }
}
def _stream(rs:ResultSet):Stream[T] = {
if (rs.next()) parse(rs) #:: _stream(rs)
else {
p.success(())
Stream.empty
}
}
p2.future.map { rs =>
rs.beforeFirst()
_stream(rs)
}
}
A rather trivial usage of this function would be something like this:
def getText(implicit ec:ExecutionContext):Future[Stream[String]] = {
SqlStream(SQL("select FIELD from TABLE")) { rs => rs.getString("FIELD") }
}
There are, of course, drawbacks to this approach, however, this got around my problem and did not require inclusion of any other libraries.

Is there a FIFO stream in Scala?

I'm looking for a FIFO stream in Scala, i.e., something that provides the functionality of
immutable.Stream (a stream that can be finite and memorizes the elements that have already been read)
mutable.Queue (which allows for added elements to the FIFO)
The stream should be closable and should block access to the next element until the element has been added or the stream has been closed.
Actually I'm a bit surprised that the collection library does not (seem to) include such a data structure, since it is IMO a quite classical one.
My questions:
1) Did I overlook something? Is there already a class providing this functionality?
2) OK, if it's not included in the collection library then it might by just a trivial combination of existing collection classes. However, I tried to find this trivial code but my implementation looks still quite complex for such a simple problem. Is there a simpler solution for such a FifoStream?
class FifoStream[T] extends Closeable {
val queue = new Queue[Option[T]]
lazy val stream = nextStreamElem
private def nextStreamElem: Stream[T] = next() match {
case Some(elem) => Stream.cons(elem, nextStreamElem)
case None => Stream.empty
}
/** Returns next element in the queue (may wait for it to be inserted). */
private def next() = {
queue.synchronized {
if (queue.isEmpty) queue.wait()
queue.dequeue()
}
}
/** Adds new elements to this stream. */
def enqueue(elems: T*) {
queue.synchronized {
queue.enqueue(elems.map{Some(_)}: _*)
queue.notify()
}
}
/** Closes this stream. */
def close() {
queue.synchronized {
queue.enqueue(None)
queue.notify()
}
}
}
Paradigmatic's solution (sightly modified)
Thanks for your suggestions. I slightly modified paradigmatic's solution so that toStream returns an immutable stream (allows for repeatable reads) so that it fits my needs. Just for completeness, here is the code:
import collection.JavaConversions._
import java.util.concurrent.{LinkedBlockingQueue, BlockingQueue}
class FIFOStream[A]( private val queue: BlockingQueue[Option[A]] = new LinkedBlockingQueue[Option[A]]() ) {
lazy val toStream: Stream[A] = queue2stream
private def queue2stream: Stream[A] = queue take match {
case Some(a) => Stream cons ( a, queue2stream )
case None => Stream empty
}
def close() = queue add None
def enqueue( as: A* ) = queue addAll as.map( Some(_) )
}
In Scala, streams are "functional iterators". People expect them to be pure (no side effects) and immutable. In you case, everytime you iterate on the stream you modify the queue (so it's no pure). This can create a lot of misunderstandings, because iterating twice the same stream, will have two different results.
That being said, you should rather use Java BlockingQueues, rather than rolling your own implementation. They are considered well implemented in term of safety and performances. Here is the cleanest code I can think of (using your approach):
import java.util.concurrent.BlockingQueue
import scala.collection.JavaConversions._
class FIFOStream[A]( private val queue: BlockingQueue[Option[A]] ) {
def toStream: Stream[A] = queue take match {
case Some(a) => Stream cons ( a, toStream )
case None => Stream empty
}
def close() = queue add None
def enqueue( as: A* ) = queue addAll as.map( Some(_) )
}
object FIFOStream {
def apply[A]() = new LinkedBlockingQueue
}
I'm assuming you're looking for something like java.util.concurrent.BlockingQueue?
Akka has a BoundedBlockingQueue implementation of this interface. There are of course the implementations available in java.util.concurrent.
You might also consider using Akka's actors for whatever it is you are doing. Use Actors to be notified or pushed a new event or message instead of pulling.
1) It seems you're looking for a dataflow stream seen in languages like Oz, which supports the producer-consumer pattern. Such a collection is not available in the collections API, but you could always create one yourself.
2) The data flow stream relies on the concept of single-assignment variables (such that they don't have to be initialized upon declaration point and reading them prior to initialization causes blocking):
val x: Int
startThread {
println(x)
}
println("The other thread waits for the x to be assigned")
x = 1
It would be straightforward to implement such a stream if single-assignment (or dataflow) variables were supported in the language (see the link). Since they are not a part of Scala, you have to use the wait-synchronized-notify pattern just like you did.
Concurrent queues from Java can be used to achieve that as well, as the other user suggested.