Avoiding race condition in postgres updates using slick - postgresql

case class Item(id: String, count: Int).
class ItemRepo(db: Database) {
val query = TableQuery[ItemTable]
def updateAmount(id: String, incCount :Int) = {
val currentRow = db.run(query.filter(_.id === id).result).head
val updatedRow = Item(currentRow.id, currentRow.count + incCount)
db.run((query returning query).insertOrUpdate(updatedRow))
}
The code above has a race condition - if two threads run this in parallel they might both read the same count, and only the last updating thread will increment their incCount.
How can i avoid this case? I tried using .forUpdate in the line that does query.filter but it doesn't block the other thread. am i missing something?

There are a few tricks you can use to improve this situation.
First, you're sending two independent queries to the database (two db.run calls). You can improve that by composing them into a single action and send that to the database. For example:
// Danger: I've not tried to compile this. Please excuse typos.
val lookupAction = query.filter(_.id === id).result
val updateAction = lookupAction.flatMap { matchingRows =>
val newItem = matchingRows.headOption match {
case Some(Item(_, count)) => Item(id, count + incCount)
case None => Item(id, 1) // or whatever your default is
}
(query returning query).insertOrUpdate(newItem)
}
// and you db.run(updateAction.transactionally)
That will get you some way, depending on the transaction guarantees for your database. I mention it because combining actions in Slick is an important concept. With that, your forUpdate (that Laurenz Albe noted) may behave as expected.
However, you may prefer to send an update to the database. You'd need to do this using Slick's Plain SQL feature:
val action = sqlu"UPDATE items SET count = count + $incCount WHERE id = $id"
// And then you db.run(action)
...and allow you database to handle the concurrency (subject to database isolation levels).
If you really want to do this all client-side, in Scala code on the JVM, there are concurrency concepts such as locks, actors, and refs. There's nothing in Slick itself to do the JVM locking for you.

You should use SELECT ... FOR UPDATE when you fetch the data from the database, so that you have an exclusive lock on the row that prevents other sessions from updating the data until your transaction is done.
In Slick you can do that with the forUpdate construct available since version 3.2.0.

Related

Order of execution of Future - Making sequential inserts in a db non-blocking

A simple scenario here. I am using akka streams to read from kafka and write into an external source, in my case: cassandra.
Akka streams(reactive-kafka) library equips me with backpressure and other nifty things to make this possible.
kafka being a Source and Cassandra being a Sink, when I get bunch of events which are, for example be cassandra queries here through Kafka which are supposed to be executed sequentially (ex: it could be a INSERT, UPDATE and a DELETE and must be sequential).
I cannot use mayAsync and execute both the statement, Future is eager and there is a chance that DELETE or UPDATE might get executed first before INSERT.
I am forced to use Cassandra's execute as opposed to executeAsync which is non-blocking.
There is no way to make a complete async solution to this issue, but how ever is there a much elegant way to do this?
For ex: Make the Future lazy and sequential and offload it to a different execution context of sorts.
mapAsync gives a parallelism option as well.
Can Monix Task be of help here?
This a general design question and what are the approaches one can take.
UPDATE:
Flow[In].mapAsync(3)(input => {
input match {
case INSERT => //do insert - returns future
case UPDATE => //do update - returns future
case DELETE => //delete - returns future
}
The scenario is a little more complex. There could be thousands of insert, update and delete coming in order for specific key(s)(in kafka)
I would ideally want to execute the 3 futures of a single key in sequence. I believe Monix's Task can help?
If you process things with parallelism of 1, they will get executed in strict sequence, which will solve your problem.
But that's not interesting. If you want, you can run operations for different keys in parallel - if processing for different keys is independent, which, I assume from your description, is possible. To do this, you have to buffer the incoming values and then regroup it. Let's see some code:
import monix.reactive.Observable
import scala.concurrent.duration._
import monix.eval.Task
// Your domain logic - I'll use these stubs
trait Event
trait Acknowledgement // whatever your DB functions return, if you need it
def toKey(e: Event): String = ???
def processOne(event: Event): Task[Acknowledgement] = Task.deferFuture {
event match {
case _ => ??? // insert/update/delete
}
}
// Monix Task.traverse is strictly sequential, which is what you need
def processMany(evs: Seq[Event]): Task[Seq[Acknowledgement]] =
Task.traverse(evs)(processOne)
def processEventStreamInParallel(source: Observable[Event]): Observable[Acknowledgement] =
source
// Process a bunch of events, but don't wait too long for whole 100. Fine-tune for your data source
.bufferTimedAndCounted(2.seconds, 100)
.concatMap { batch =>
Observable
.fromIterable(batch.groupBy(toKey).values) // Standard collection methods FTW
.mapAsync(3)(processMany) // processing up to 3 different keys in parallel - tho 3 is not necessary, probably depends on your DB throughput
.flatMap(Observable.fromIterable) // flattening it back
}
The concatMap operator here will ensure that your chunks are processed sequentially as well. So even if one buffer has key1 -> insert, key1 -> update and the other has key1 -> delete, that causes no problems. In Monix, this is the same as flatMap, but in other Rx libraries flatMap might be an alias for mergeMap which has no ordering guarantee.
This can be done with Futures too, tho there's no standard "sequential traverse", so you have to roll your own, something like:
def processMany(evs: Seq[Event]): Future[Seq[Acknowledgement]] =
evs.foldLeft(Future.successful(Vector.empty[Acknowledgement])){ (acksF, ev) =>
for {
acks <- acksF
next <- processOne(ev)
} yield acks :+ next
}
You can use akka-streams subflows, to group by key, then merge substreams if you want to do something with what you get from your database operations:
def databaseOp(input: In): Future[Out] = input match {
case INSERT => ...
case UPDATE => ...
case DELETE => ...
}
val databaseFlow: Flow[In, Out, NotUsed] =
Flow[In].groupBy(Int.maxValues, _.key).mapAsync(1)(databaseOp).mergeSubstreams
Note that order from input source won't be kept in output as it is done in mapAsync, but all operations on the same key will still be in order.
You are looking for Future.flatMap:
def doSomething: Future[Unit]
def doSomethingElse: Future[Unit]
val result = doSomething.flatMap { _ => doSomethingElse }
This executes the first function, and then, when its Future is satisfied, starts the second one. The result is a new Future that completes when the result of the second execution is satisfied.
The result of the first future is passed into the function you give to .flatMap, so the second function can depend on the result of the first one. For example:
def getUserID: Future[Int]
def getUser(id: Int): Future[User]
val userName: Future[String] = getUserID.flatMap(getUser).map(_.name)
You can also write this as a for-comprehension:
for {
id <- getUserID
user <- getUser(id)
} yield user.name

What are good models to introduce concurrency for database inserts using actor model

More details:
I am new to Scala and Akka.
I am trying to build a concurrent system that does this essentially-
Read a CSV file
Parse it into groups
And then load into table.
The file cannot be split into smaller files and hence I am going with a normal standard serialized read. I pass the info to a Masterwriter(an actor). I dynamically create n number of actors called writers and pass them chunks of this info. Each writer is now actually responsible for reading the data, categorize them and then insert into appropriate table.
My doubt is that when two writers are writing concurrently onto the table, will it lead to a race condition. Also, how else could this problem be modeled in a better way to increase speed. Any help in any direction would be really useful. Thanks
Modelling the Data Access
I have found that the biggest key to designing this sort of task is to abstract away the database. You should treat any database updates as simple function that returns success or failure:
type UpdateResult = Boolean
val UpdateSuccess : UpdateResult = true
val UpdateFailure : UpdateResult = false
type Data = ???
type Updater = (Data) => UpdateResult
This allows you to write an Updater that goes to an actual db or an test updater that always returns success:
val statement : Statement = ???
val dbUpdater : Updater = (data) => {
statement.executeQuery(s"INSERT INTO ... ${data.toString}")
}
val testUpdater : Updater = _ => UpdateSuccess
Akka Stream Implementation
For this particular use case I recommend akka streams instead of raw Actors. A solution using the stream paradigm can be found here.
Akka Actor
An Actor solution is also possible:
val UpdateActor(updater : Updater) extends Actor {
override def receive = {
case data : Data => sender ! updater(data)
}
}
The problem with Actors is that you'll have to write an Actor to read the file, other Actors to group the rows, and finally use the UpdateActor to send data to the db. You'll also have to wire all of those Actors together...

Sharing database session between multiple methods in Slick 3

I have recently switched from Slick-2 to Slick-3. Everything is working very well with slick-3. However, I am having some issues when it comes to transaction.
I have seen different questions and sample code in which transactionally and withPinnedSession are used to handle the transaction. But my case is slightly different. Both transcationally and withPinnedSession can be applied on Query. But what I want to do is to pass the same session to another method which will do some operations and want to wrap multiple methods in same transaction.
I have the below slick-2 code, I am not sure how this can be implemented with Slick-3.
def insertWithTransaction(row: TTable#TableElementType)(implicit session: Session) = {
val entity = (query returning query.map(obj => obj) += row).asInstanceOf[TEntity]
// do some operations after insert
//eg: invoke another method for sending the notification
entity
}
override def insert(row: TTable#TableElementType) = {
db.withSession {
implicit session => {
insertWithTransaction(row)
}
}
}
Now, if someone is not interested in having transactions, they can just invoke the insert() method.
If we need to do some transaction, it can be done by using insertWithTransaction() in db.withTransaction block.
For eg :
db.withTransaction { implicit session =>
insertWithTransaction(row1)
insertWithTransaction(row2)
//check some condition, invoke session.rollback if something goes wrong
}
But with slick-3, the transactionally can be applied on query only.
That means, wherever we need to do some logic centrally after insertion, it is possible. Every developer needs to manually handle those scenarios explicitly, if they are using transactions. I believe this could potentially cause errors. I am trying to abstract the whole logic in insert operation so that the implementors need to worry only about the transaction success/failure
Is there any other way, in slick-3, in which I can pass the same session to multiple methods so that everything can be done in single db session.
You are missing something : .transactionally doesn't apply to a Query, but to a DBIOAction.
Then, a DBIOAction can be composed of multiple queries by using monadic composition.
Here is a exemple coming from the documentation :
val action = (for {
ns <- coffees.filter(_.name.startsWith("ESPRESSO")).map(_.name).result
_ <- DBIO.seq(ns.map(n => coffees.filter(_.name === n).delete): _*)
} yield ()).transactionally
action is composed of a select query and as many delete queries as rows returned by the first query. All that creates DBIOAction that be executed in a transaction.
Then, to run the action against the database, you have to call db.run, so, like this:
val f: Future[Unit] = db.run(action)
Now, to come back to your exemple, let's say you want to apply an update query after your insert, you can create an action this way
val action = (for {
entity <- (query returning query.map(obj => obj) += row)
_ <- query.map(_.foo).update(newFoo)
} yield entity).transactionally
Hope it helps.

Is there an easy way to get a Stream as output of a RowParser?

Given rowParser of type RowParser[Photo], this is how you would parse a list of rows coming from a table photo, according to the code samples I have seen so far:
def getPhotos(album: Album): List[Photo] = DB.withConnection { implicit c =>
SQL("select * from photo where album = {album}").on(
'album -> album.id
).as(rowParser *)
}
Where the * operator creates a parser of type ResultSetParser[List[Photo]]. Now, I was wondering if it was equally possible to get a parser that yields a Stream (thinking that being more lazy is always better), but I only came up with this:
def getPhotos(album: Album): Stream[Photo] = DB.withConnection { implicit c =>
SQL("select * from photo where album = {album}").on(
'album -> album.id
)() collect (rowParser(_) match { case Success(photo) => photo })
}
It works, but it seems overly complicated. I could of course just call toStream on the List I get from the first function, but my goal was to only apply rowParser on rows that are actually read. Is there an easier way to achieve this?
EDIT: I know that limit should be used in the query, if the number of rows of interest is known beforehand. I am also aware that, in many cases, you are going to use the whole result anyway, so being lazy will not improve performance. But there might be a case where you save a few cycles, e.g. if for some reason, you have search criteria that you cannot or do not want to express in SQL. So I thought it was odd that, given the fact that anorm provides a way to obtain a Stream of SqlRow, I didn't find a straightforward way to apply a RowParser on that.
I ended up creating my own stream method which corresponds to the list method:
def stream[A](p: RowParser[A]) = new ResultSetParser[Stream[A]] {
def apply(rows: SqlParser.ResultSet): SqlResult[Stream[A]] = rows.headOption.map(p(_)) match {
case None => Success(Stream.empty[A])
case Some(Success(a)) => {
val s: Stream[A] = a #:: rows.tail.flatMap(r => p(r) match {
case Success(r) => Some(r)
case _ => None
})
Success(s)
}
case Some(Error(msg)) => Error(msg)
}
}
Note that the Play SqlResult can only be either Success/Error while each row can also be Success/Error. I handle this for the first row only, assuming the rest will be the same. This may or may not work for you.
You're better off making smaller (paged) queries using limit and offset.
Anorm would need some modification if you're going to keep your (large) result around in memory and stream it from there. Then the other concern would be the new memory requirements for your JVM. And how would you deal with caching on the service level? See, previously you could easily cache something like photos?page=1&size=10, but now you just have photos, and the caching technology would have no idea what to do with the stream.
Even worse, and possibly on a JDBC-level, wrapping Stream around limited and offset-ed execute statements and just making multiple calls to the database behind the scenes, but this sounds like it would need a fair bit of work to port the Stream code that Scala generates to Java land (to work with Groovy, jRuby, etc), then get it on the approved for the JDBC 5 or 6 roadmap. This idea will probably be shunned as being too complicated, which it is.
You could wrap Stream around your entire DAO (where the limit and offset trickery would happen), but this almost sounds like more trouble than it's worth :-)
I ran into a similar situation but ran into a Call Stack Overflow exception when the built-in anorm function to convert to Streams attempted to parse the result set.
In order to get around this I elected to abandon the anorm ResultSetParser paradigm, and fall back to the java.sql.ResultSet object.
I wanted to use anorm's internal classes for the parsing result set rows, but, ever since version 2.4, they have made all of the pertinent classes and methods private to their package, and have deprecated several other methods that would have been more straight-forward to use.
I used a combination of Promises and Futures to work around the ManagedResource that anorm now returns. I avoided all deprecated functions.
import anorm._
import java.sql.ResultSet
import scala.concurrent._
def SqlStream[T](sql:SqlQuery)(parse:ResultSet => T)(implicit ec:ExecutionContext):Future[Stream[T]] = {
val conn = db.getConnection()
val mr = sql.preparedStatement(conn, false)
val p = Promise[Unit]()
val p2 = Promise[ResultSet]()
Future {
mr.map({ stmt =>
p2.success(stmt.executeQuery)
Await.ready(p.future, duration.Duration.Inf)
}).acquireAndGet(identity).andThen { case _ => conn.close() }
}
def _stream(rs:ResultSet):Stream[T] = {
if (rs.next()) parse(rs) #:: _stream(rs)
else {
p.success(())
Stream.empty
}
}
p2.future.map { rs =>
rs.beforeFirst()
_stream(rs)
}
}
A rather trivial usage of this function would be something like this:
def getText(implicit ec:ExecutionContext):Future[Stream[String]] = {
SqlStream(SQL("select FIELD from TABLE")) { rs => rs.getString("FIELD") }
}
There are, of course, drawbacks to this approach, however, this got around my problem and did not require inclusion of any other libraries.

Squeryl session management with 'using'

I'm learning Squeryl and trying to understand the 'using' syntax but can't find documentation on it.
In the following example two databases are created, A contains the word Hello, and B contains Goodbye. The intention is to query the contents of A, then append the word World and write the result to B.
Expected console output is Inserted Message(2,HelloWorld)
object Test {
def main(args: Array[String]) {
Class.forName("org.h2.Driver");
import Library._
val sessionA = Session.create(DriverManager.getConnection(
"jdbc:h2:file:data/dbA","sa","password"),new H2Adapter)
val sessionB = Session.create(DriverManager.getConnection(
"jdbc:h2:file:data/dbB","sa","password"),new H2Adapter)
using(sessionA){
drop; create
myTable.insert(Message(0,"Hello"))
}
using(sessionB){
drop; create
myTable.insert(Message(0,"Goodbye"))
}
using(sessionA){
val results = from(myTable)(s => select(s))//.toList
using(sessionB){
results.foreach(m => {
val newMsg = m.copy(msg = (m.msg+"World"))
myTable.insert(newMsg)
println("Inserted "+newMsg)
})
}
}
}
case class Message(val id: Long, val msg: String) extends KeyedEntity[Long]
object Library extends Schema { val myTable = table[Message] }
}
As it stands, the code prints Inserted Message(2,GoodbyeWorld), unless the toList is added on the end of the val results line.
Is there some way to bind the results query to use sessionA even when evaluated inside the using(sessionB)? This seems preferable to using toList to force the query to evaluate and store the contents in memory.
Update
Thanks to Dave Whittaker's answer, the following snippet fixes it without resorting to 'toList' and corrects my understanding of both 'using' and the running of queries.
val results = from(myTable)(s => select(s))
using(sessionA){
results.foreach(m => {
val newMsg = m.copy(msg = (m.msg+"World"))
using(sessionB){myTable.insert(newMsg)}
println("Inserted "+newMsg)
})
}
First off, I apologize for the lack of documentation. The using() construct is a new feature that is only available in SNAPSHOT builds. I actually talked to Max about some of the documentation issues for early adopters yesterday and we are working to fix them.
There isn't a way that I can think of to bind a specific Session to a Query. Looking at your example, it looks like an easy work around would be to invert your transactions. When you create a query, Squeryl doesn't actually access the DB, it just creates an AST representing the SQL to be performed, so you don't need to issue your using(sessionA) at that point. Then, when you are ready to iterate over the results you can wrap the query invocation in a using(sessionA) nested within your using(sessionB). Does that make sense?