InsertAll only new records with Slick - postgresql

[PSQLException: ERROR: duplicate key value violates unique constraint
"dictionary_word_idx" Detail: Key (word)=(odirane) already exists.]
I have unique index preventing any duplications. I wonder how to InsertAll an Array with thousands elements but only the new ones? I'm using Slick 1.0.1 and Postgresql 9.1
Edit:
I'm trying the following:
def run = {
val source = scala.io.Source.fromFile("/home/user/dev/txt/test1.txt")
val lines = source.mkString
source.close()
val words = lines.split("[^\\p{Ll}]").distinct
database withTransaction {
val q = for {
w <- words.toList
row <- Dictionary if row.word != w
} yield w
Dictionary.autoInc.insertAll(q: _*)
}
words.length
}
but t dosent compile:
polymorphic expression cannot be instantiated to expected type;
[error] found : [G, T]scala.slick.lifted.Query[G,T]
[error] required: scala.collection.GenTraversableOnce[?] [error]
row <- Dictionary if row.word != w
Edit 2:
case class Word(id: Option[Long], word:String)
object Dictionary extends Table[Word]("dictionary") {
def id = column[Long]("id", O.PrimaryKey, O.AutoInc)
def word = column[String]("word")
def * = id.? ~ word <> (Word, Word.unapply _)
def dictionary_word_idx = index("dictionary_word_idx", word, unique = true)
def autoInc = word returning id
}

Another alternative is to write raw SQL. Postgres doesn't have a default way to on duplicate ignore, but you can emulate it in a few different ways, shown here https://dba.stackexchange.com/questions/30499/optimal-way-to-ignore-duplicate-inserts
Combine that with http://slick.typesafe.com/doc/1.0.0-RC2/sql.html
Edit:
Here's an example
def insert(c: String) =
(Q.u + """INSERT INTO dictionary
(word)
SELECT""" +? c +
"""WHERE
NOT EXISTS (
SELECT word FROM dictionary WHERE word = """ +? c + ")"
).execute
val words = lines.split("[^\\p{Ll}]")
words.foreach(insert)
Is that what you mean by "at once"? I think that's going to be the most performant way of doing this without being crazy.
If it's too slow for you, there's another suggestion of creating a temporary table without the unique constraint, copy your current table into the temp table, insert the new words into the temp table, and then select distinct out of that table. That's shown here: https://stackoverflow.com/a/4070385/375874
But I think that's WAY overkill. Unless you have some crazy requirements or something.

Conceptually:
def insertAll[T](items: Seq[T]): Seq[Either[(T, Exception), (T, Int)]] = items.map { i =>
try {
// Perform an insert supposing returns and int representing the PK on the table
val pk = …
Right(i, pk)
} catch {
case e: Exception => Left(i, e)
}
}
You perform each insert operation and then, based on the result, you return a Left or Right object that keep tracks of the end result and give you a detailed context to interpret the operation.
EDIT
Let's suppose that your DAO object looks like:
object Dictionary extends Table[Word]("dictionary") {
// ...
}
where Word is your object model and moreover you have provided the nuts and bolts (as I can deduce from your pasted code) it should be (where words is a Seq[Word]):
words.map { w =>
try {
Right(w, Dictionary.autoInc.insert(w))
} catch {
case e: Exception => Left(w, e)
}
}
What you get is a sequence of Either that encapsulates the outcome for further processing.
Considerations
The solution provided by me attempts optimistically to perform the operation against the DB without requiring to pre-filter the list based on the state of the DB.
In general pre-filtering is problematic in an heavily multiuser application provided you can't assume that nobody added a word in your pre-filtered list after you've performed the filter.
State more simply: uniqueness constraint is a robust feature provided by DBMS which is better to exploit than to reinvent.
The solution you edited above is a no-solution because you still need to face possibly PK violation exception.

Related

Passing result of one DBIO into another

I'm new to Slick and I am trying to rewrite the following two queries to work in one transaction. My goal is to
1. check if elements exists
2. return existing element or create it handling autoincrement from MySQL
The two functions are:
def createEmail(email: String): DBIO[Email] = {
// We create a projection of just the email column, since we're not inserting a value for the id column
(emails.map(p => p.email)
returning emails.map(_.id)
into ((email, id) => Email(id, email))
) += email
}
def findEmail(email: String): DBIO[Option[Email]] =
emails.filter(_.email === email).result.headOption
How can I safely chain them, ie. to run first check for existence, return if object already exists and if it does not exist then create it and return the new element in one transaction?
You could use a for comprehension:
def findOrCreate(email: String) = {
(for {
found <- findEmail(email)
em <- found match {
case Some(e) => DBIO.successful(e)
case None => createEmail(email)
}
} yield em).transactionally
}
val result = db.run(findOrCreate("batman#gotham.gov"))
// Future[Email]
With a little help of cats library:
def findOrCreate(email: String): DBIO[Email] = {
OptionT(findEmail(email)).getOrElseF(createEmail(email)).transactionally
}

Scala's Slick with multiple PK insertOrUpdate() throws exception ERROR: syntax error at end of input

I am using Scala' Slick and PostgreSQL.
And I am working well with tables with single PK.
Now I need to use a table with multiple PKs:
case class Report(f1: DateTime,
f2: String,
f3: Double)
class Reports(tag: Tag) extends Table[Report](tag, "Reports") {
def f1 = column[DateTime]("f1")
def f2 = column[String]("f2")
def f3 = column[Double]("f3")
def * = (f1, f2, f3) <> (Report.tupled, Report.unapply)
def pk = primaryKey("pk_report", (f1, f2))
}
val reports = TableQuery[Reports]
when I have empty table and use reports.insert(report) it works well.
But when I use reports.insertOrUpdate(report) I receive and exception:
Exception in thread "main" org.postgresql.util.PSQLException: ERROR: syntax error at end of input
Position: 76
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2102)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1835)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:500)
at ....
What am I doing wrong? How to fix it?
Thanks in advance.
PS. I tried workaround - tried to implement "if exist update else insert" logic by:
val len = reports.withFilter(_.f1 === report.f1).withFilter(_.f2 === report.f2).length.run.toInt
if(len == 1) {
println("Update: " + report)
reports.update(report)
} else {
println("Insert: " + report)
reports.insert(report)
}
But I still get exception on update:
Exception in thread "main" org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "pk_report"
Detail: Key ("f1", f2)=(2014-01-31 04:00:00, addon_io.aha.connect) already exists.
Concerning your initial question, insertOrUpdate on a table with compound keys is broken in Slick (at least with PGSql), so the error is not on your side. See bug report e.g., here: https://github.com/slick/slick/issues/966
So you have to design a workaround, however the "upsert" operation is very prone to race conditions, and is very hard to design properly as PostgreSQL does not provide native feature to perform this. See e.g., http://www.depesz.com/2012/06/10/why-is-upsert-so-complicated/
Anyway, another way to perform the operation which is a bit less prone to race condition is to first update (which will do not do anything if the row does not exist), and then perform a "insert select" query, which will only insert if the row does not exist. This is the wày Slick will perform the insertOrUpdate operation on PostgreSQL with single PK. However, "insert select" cannot be done using Slick directly, you will have to fallback to direct SQL.
Second part where you have
val len = reports.withFilter(_.f1 === report.f1).withFilter(_.f2 === report.f2).length.run.toInt
if(len == 1) {
println("Update: " + report)
reports.update(report)
} else {
println("Insert: " + report)
reports.insert(report)
}
change reports.update(report) with
reports.filter(_.id === report.id).update(report)
actually you can just make one filter call (replacing your first withFilter )
I've successfully applied the technique described here
so my upsert method looks like this:
def upsert(model: String, module: String, timestamp: Long) = {
// see this article http://www.the-art-of-web.com/sql/upsert/
val insert = s"INSERT INTO $ModulesAffectedTableName (model, affected_module, timestamp) SELECT '$model','$module','$timestamp'"
val upsert = s"UPDATE $ModulesAffectedTableName SET timestamp=$timestamp WHERE model='$model' AND affected_module='$module'"
val finalStmnt = s"WITH upsert AS ($upsert RETURNING *) $insert WHERE NOT EXISTS (SELECT * FROM upsert)"
conn.run(sqlu"#$finalStmnt")
}
Hopefully this issue will be fixed in 3.2.0
Currently, I work around this issue by creating a dummy table for table creation:
class ReportsDummy(tag: Tag) extends Table[Report](tag, "Reports") {
def f1 = column[DateTime]("f1")
def f2 = column[String]("f2")
def f3 = column[Double]("f3")
def * = (f1, f2, f3) <> (Report.tupled, Report.unapply)
def pk = primaryKey("pk_report", (f1, f2))
}
and a "real" table for upsert
class Reports(tag: Tag) extends Table[Report](tag, "Reports") {
def f1 = column[DateTime]("f1", O.PrimaryKey)
def f2 = column[String]("f2", O.PrimaryKey) //two primary keys here, which would throw errors on table creation. Hence a dummy one for the task
def f3 = column[Double]("f3")
def * = (f1, f2, f3) <> (Report.tupled, Report.unapply)
}

Access database column names from a Table?

Let's say I have a table:
object Suppliers extends Table[(Int, String, String, String)]("SUPPLIERS") {
def id = column[Int]("SUP_ID", O.PrimaryKey)
def name = column[String]("SUP_NAME")
def state = column[String]("STATE")
def zip = column[String]("ZIP")
def * = id ~ name ~ state ~ zip
}
Table's database name
The table's database name can be accessed by going: Suppliers.tableName
This is supported by the Scaladoc on AbstractTable.
For example, the above table's database name is "SUPPLIERS".
Columns' database names
Looking through AbstractTable, getLinearizedNodes and indexes looked promising. No column names in their string representations though.
I assume that * means "all the columns I'm usually interested in." * is a MappedProjection, which has this signature:
final case class MappedProjection[T, P <: Product](
child: Node,
f: (P) ⇒ T,
g: (T) ⇒ Option[P])(proj: Projection[P])
extends ColumnBase[T] with UnaryNode with Product with Serializable
*.getLinearizedNodes contains a huge sequence of numbers, and I realized that at this point I'm just doing a brute force inspection of everything in the API for possibly finding the column names in the String.
Has anybody also encountered this problem before, or could anybody give me a better understanding of how MappedProjection works?
It requires you to rely on Slick internals, which may change between versions, but it is possible. Here is how it works for Slick 1.0.1: You have to go via the FieldSymbol. Then you can extract the information you want like how columnInfo(driver: JdbcDriver, column: FieldSymbol): ColumnInfo does it.
To get a FieldSymbol from a Column you can use fieldSym(node: Node): Option[FieldSymbol] and fieldSym(column: Column[_]): FieldSymbol.
To get the (qualified) column names you can simply do the following:
Suppliers.id.toString
Suppliers.name.toString
Suppliers.state.toString
Suppliers.zip.toString
It's not explicitly stated anywhere that the toString will yield the column name, so your question is a valid one.
Now, if you want to programmatically get all the column names, then that's a bit harder. You could try using reflection to get all the methods that return a Column[_] and call toString on them, but it wouldn't be elegant. Or you could hack a bit and get a select * SQL statement from a query like this:
val selectStatement = DB withSession {
Query(Suppliers).selectStatement
}
And then parse our the column names.
This is the best I could do. If someone knows a better way then please share - I'm interested too ;)
Code is based on Lightbend Activator "slick-http-app".
slick version: 3.1.1
Added this method to the BaseDal:
def getColumns(): mutable.Map[String, Type] = {
val columns = mutable.Map.empty[String, Type]
def selectType(t: Any): Option[Any] = t match {
case t: TableExpansion => Some(t.columns)
case t: Select => Some(t.field)
case _ => None
}
def selectArray(t:Any): Option[ConstArray[Node]] = t match {
case t: TypeMapping => Some(t.child.children)
case _ => None
}
def selectFieldSymbol(t:Any): Option[FieldSymbol] = t match {
case t: FieldSymbol => Some(t)
case _ => None
}
val t = selectType(tableQ.toNode)
val c = selectArray(t.get)
for (se <- c.get) {
val col = selectType(se)
val fs = selectFieldSymbol(col.get)
columns += (fs.get.name -> fs.get.tpe)
}
columns
}
this method gets the column names (real names in DB) + types form the TableQ
used imports are:
import slick.ast._
import slick.util.ConstArray

Unique constraint for repeated values

I am trying to define a form in play! 2.0.4 with the following properties and constraints:
The form handles repeated values (lets conveniently assume that these values are of type number). So this will get us to something like this:
"numbers" -> list(number)
Each number must be unique, i.e. it must be unique with regard to all the other numbers submitted and it must be unique to the numbers that are already existent in the database (this can be checked via some function check(num: Int): Boolean).
The form error should be specific to the number, that is not unique. I don't want a general form error saying "There is duplicate number".
What would be the best way to go?
The trick here is to define a custom Constraint sort of like this example. The custom Constraint can then be used on the Mapping[T] to verify a field in the form with the verifying method.
The custom Constraint contains the logic to return a ValidationResult that is either Valid or Invalid. An error message can be passed to an Invalid result which is where you can specify what is duplicated or exists in the database.
See Play for Scala for a section on custom validation.
- Create the Constraint
//Make this lazy to prevent java.lang.ExceptionInInitializerError at runtime.
lazy val uniqueNumbersConstraint = Constraint[String](Some("Unique numbers constraint"), "")(checkNumbers)
//"Business Logic".
//Important part here is that the function returns a ValidationResult and complies with the signature for Constraint. i.e. f: (T) => ValidationResult
//Return Valid if n in numbers is not in database and there are no duplicates.
//Otherwise return Invalid and an error message showing what numbers are in the database or duplicated.
def checkNumbers(numbers: String):ValidationResult = {
val splitNums = numbers.split(" ").toList.map(_.toInt)
val dbnums = splitNums.partition(database.contains(_))
if(dbnums._1.isEmpty && uniquesAndDuplicates(splitNums)._2.isEmpty){
Valid
}else{
val duplicates = uniquesAndDuplicates(dbnums._2)._2
val error = "Database contains: " + dbnums._1 + ", duplicated values: " + duplicates
Invalid(error)
}
}
- Validate Form Using Custom Constraint
val helloForm = Form(
tuple(
"numbers" -> nonEmptyText.verifying(uniqueNumbersConstraint)
))
- Utilities
//Return unique values on left side and duplicate values on right side
def uniquesAndDuplicates(numbers: List[Int]):Tuple2[List[Int], List[Int]] = {
numbers.partition(i => numbers.indexOf (i) == numbers.lastIndexOf(i))
}
def checkNum(num: Int) = {
database.contains(num)
}
val database = List(5,6,7)
- etc
Note I defined numbers as a String in the form. When I defined it as list(number) it kept evaluating to List(). I think that is a binding issue. It's a fairly simple change to use List(1,2,3) instead of "1 2 3" if using list(number) works.
- Samples
How about something like:
def validateUnique(input: List[Int]): ValidationResult = {
// Assuming check return true if the input num doesn't exist yet
def check(num: Int): Boolean = num % 2 == 0
val unique = input.toSet
val dbDuplicates = unique.filterNot(check)
val formDuplicates = input.diff(unique.toSeq)
val duplicates = (dbDuplicates ++ formDuplicates).toList
duplicates match {
case List() => Valid
case _ => Invalid("Duplicates: " + duplicates.mkString(", "))
}
}
val uniqueConstraint = Constraint[List[Int]](validateUnique(_))
And then you can just use the new constraint with:
mapping(
...,
"ints" -> list(number).verifying(uniqueConstraint)
...

Filling a Scala immutable Map from a database table

I have a SQL database table with the following structure:
create table category_value (
category varchar(25),
property varchar(25)
);
I want to read this into a Scala Map[String, Set[String]] where each entry in the map is a set of all of the property values that are in the same category.
I would like to do it in a "functional" style with no mutable data (other than the database result set).
Following on the Clojure loop construct, here is what I have come up with:
def fillMap(statement: java.sql.Statement): Map[String, Set[String]] = {
val resultSet = statement.executeQuery("select category, property from category_value")
#tailrec
def loop(m: Map[String, Set[String]]): Map[String, Set[String]] = {
if (resultSet.next) {
val category = resultSet.getString("category")
val property = resultSet.getString("property")
loop(m + (category -> m.getOrElse(category, Set.empty)))
} else m
}
loop(Map.empty)
}
Is there a better way to do this, without using mutable data structures?
If you like, you could try something around
def fillMap(statement: java.sql.Statement): Map[String, Set[String]] = {
val resultSet = statement.executeQuery("select category, property from category_value")
Iterator.continually((resultSet, resultSet.next)).takeWhile(_._2).map(_._1).map{ res =>
val category = res.getString("category")
val property = res.getString("property")
(category, property)
}.toIterable.groupBy(_._1).mapValues(_.map(_._2).toSet)
}
Untested, because I don’t have a proper sql.Statement. And the groupBy part might need some more love to look nice.
Edit: Added the requested changes.
There are two parts to this problem.
Getting the data out of the database and into a list of rows.
I would use a Spring SimpleJdbcOperations for the database access, so that things at least appear functional, even though the ResultSet is being changed behind the scenes.
First, some a simple conversion to let us use a closure to map each row:
implicit def rowMapper[T<:AnyRef](func: (ResultSet)=>T) =
new ParameterizedRowMapper[T]{
override def mapRow(rs:ResultSet, row:Int):T = func(rs)
}
Then let's define a data structure to store the results. (You could use a tuple, but defining my own case class has advantage of being just a little bit clearer regarding the names of things.)
case class CategoryValue(category:String, property:String)
Now select from the database
val db:SimpleJdbcOperations = //get this somehow
val resultList:java.util.List[CategoryValue] =
db.query("select category, property from category_value",
{ rs:ResultSet => CategoryValue(rs.getString(1),rs.getString(2)) } )
Converting the data from a list of rows into the format that you actually want
import scala.collection.JavaConversions._
val result:Map[String,Set[String]] =
resultList.groupBy(_.category).mapValues(_.map(_.property).toSet)
(You can omit the type annotations. I've included them to make it clear what's going on.)
Builders are built for this purpose. Get one via the desired collection type companion, e.g. HashMap.newBuilder[String, Set[String]].
This solution is basically the same as my other solution, but it doesn't use Spring, and the logic for converting a ResultSet to some sort of list is simpler than Debilski's solution.
def streamFromResultSet[T](rs:ResultSet)(func: ResultSet => T):Stream[T] = {
if (rs.next())
func(rs) #:: streamFromResultSet(rs)(func)
else
rs.close()
Stream.empty
}
def fillMap(statement:java.sql.Statement):Map[String,Set[String]] = {
case class CategoryValue(category:String, property:String)
val resultSet = statement.executeQuery("""
select category, property from category_value
""")
val queryResult = streamFromResultSet(resultSet){rs =>
CategoryValue(rs.getString(1),rs.getString(2))
}
queryResult.groupBy(_.category).mapValues(_.map(_.property).toSet)
}
There is only one approach I can think of that does not include either mutable state or extensive copying*. It is actually a very basic technique I learnt in my first term studying CS. Here goes, abstracting from the database stuff:
def empty[K,V](k : K) : Option[V] = None
def add[K,V](m : K => Option[V])(k : K, v : V) : K => Option[V] = q => {
if ( k == q ) {
Some(v)
}
else {
m(q)
}
}
def build[K,V](input : TraversableOnce[(K,V)]) : K => Option[V] = {
input.foldLeft(empty[K,V]_)((m,i) => add(m)(i._1, i._2))
}
Usage example:
val map = build(List(("a",1),("b",2)))
println("a " + map("a"))
println("b " + map("b"))
println("c " + map("c"))
> a Some(1)
> b Some(2)
> c None
Of course, the resulting function does not have type Map (nor any of its benefits) and has linear lookup costs. I guess you could implement something in a similar way that mimicks simple search trees.
(*) I am talking concepts here. In reality, things like value sharing might enable e.g. mutable list constructions without memory overhead.