Slick: how to implement find by example i.e. findByExample generically? - scala

I'm exploring the different possibilities on how to implement a generic DAO using the latest Slick 3.1.1 to boost productivity and yes there is need for it because basing the service layer of my Play Web application on TableQuery alone leads to a lot of boilerplate code. One of the methods I'd like to feature in my generic DAO implementation is the findByExample, possible in JPA with the help of the Criteria API. In my case, I'm using the Slick Code Generator to generate the model classes from a sql script.
I need the following to be able to dynamically access the attribute names, taken from Scala. Get field names list from case class:
import scala.reflect.runtime.universe._
def classAccessors[T: TypeTag]: List[MethodSymbol] = typeOf[T].members.collect {
case m: MethodSymbol if m.isCaseAccessor => m
}.toList
A draft implementation for findByExample would be:
def findByExample[T, R](example: R) : Future[Seq[R]] = {
var qt = TableQuery[T].result
val accessors = classAccessors[R]
(0 until example.productArity).map { i =>
example.productElement(i) match {
case None => // ignore
case 0 => // ignore
// ... some more default values => // ignore
// handle a populated case
case Some(x) => {
val columnName = accessors(i)
qt = qt.filter(_.columnByName(columnName) == x)
}
}
}
qt.result
}
But this doesn't work because I need better Scala Kungfu. T is the entity table type and R is the row type that is generated as a case class and therefore a valid Scala Product type.
The first problem in that code is that would be too inefficient because instead of doing e.g.
qt.filter(_.firstName === "Juan" && _.streetName === "Rosedale Ave." && _.streetNumber === 5)
is doing:
// find all
var qt = TableQuery[T].result
// then filter by each column at the time
qt = qt.filter(_.firstName === "Juan")
qt = qt.filter(_.streetName === "Rosedale Ave.")
qt = qt.filter(_.streetNumber === 5)
Second I can't see how to dynamically access the column name in the filter method i.e.
qt.filter(_.firstName == "Juan")
I need to instead have
qt.filter(_.columnByName("firstName") == "Juan")
but apparently there is no such possibility while using the filter function?

Probably the best ways to implement filters and sorting by dynamically provided column names would be either plain SQL or extending the code generator to generate extension methods, something like this:
implicit class DynamicPersonQueries[C[_]](q: Query[PersonTable, PersonRow, C]){
def dynamicFilter( column: String, value: String ) = column {
case "firstName" => q.filter(_.firstName === value)
case "streetNumber" => q.filter(_.streetNumber === value.toInt)
...
}
}
You might have to fiddle with the types a bit to get it to compile (and ideally update this post afterwards :)).
You can then filter by all the provided values like this:
val examples: Map[String, String] = ...
val t = TableQuery[PersonTable]
val query = examples.foldLeft(t){case (t,(column, value)) => t.dynamicFilter(column, value)
query.result
Extending the code generator is explained here: http://slick.lightbend.com/doc/3.1.1/code-generation.html#customization

After further researching found the following blog post Repository Pattern / Generic DAO Implementation.
There they declare and implement a generic filter method that works for any Model Entity type and therefore it is in my view a valid functional replacement to the more JPA findByExample.
i.e.
T <: Table[E] with IdentifyableTable[PK]
E <: Entity[PK]
PK: BaseColumnType
def filter[C <: Rep[_]](expr: T => C)(implicit wt: CanBeQueryCondition[C]) : Query[T, E, Seq] = tableQuery.filter(expr)

Related

Scala: decompose the filter parameter on a Spark DataSet?

I have the following code and I need to type x._1. and x._2. a lot of times.
case class T (Field1: String, Field2: Int, ....)
val j: DataSet[(T, T)] = ...
j.filter(x => x._1.Field1 == x._2.Field1
&& x._1.Field2 == x._2.Field2
&& ....)
Is it a way to decompose x to (l, r) so the expression can be a little bit shorter?
The following doesn't work on Spark's DataSet. why? How can Spark's DataSet not support Scala's language construct?
filter{ case (l,r) => ...
In F#, you can write something like
j.filter((l, r) -> ....)
even
j.filtere(({Field1 = l1; Field2 = l2; ....}, {Field1 = r1; Field2 = r2; ....}) -> ....)
The trick is to use the fact that PartialFunction[A,B] is a subclass of Function1[A,B], so, you can use partial function syntax everywhere, a Function1 is expected (filter, map, flatMap etc.):
j.filter {
case (l,r) if (l.Field1 == lr.Field1 && l.Field2 == r.Field2 => true
case _ => false
}
UPDATE
As mentioned in the comments, unfortunately this does not work with spark's Dataset. This seems to be due to the fact, that filter is overloaded in Dataset, and that throws the typer off (method overloads are generally discouraged in scala and don't work very well with its other features).
One work around for this, is to define a method with a different name, that you can tack on Dataset with an implicit conversion, and then use that method instead of filter:
object PimpedDataset {
implicit class It[T](val ds: Dataset[T]) extends AnyVal {
def filtered(f: T => Boolean) = ds.filter(f)
}
}
...
import PimpedDataset._
j.filtered {
case (l,r) if (l.Field1 == r.Field1 && l.Field2 == r.Field2 => true
case _ => false
}
This will compile ...
Spark's Dataset class has multiple overloaded filter(...) methods, and the compiler isn't able to infer which one to use. You can explicitly specify the function type, but it's a bit ugly.
j.filter({
case (l, r) => true
}: ((Field1, Field2)) => Boolean)
That syntax (without explicitly specifying the type) is still available for RDDs. Unfortunately, in the interest of supporting Python/R/Etc, the Spark developers decided to forsake users preferring to write idiomatic Scala. :(

Why does Scala type inference fail in one case but not the other?

Background: I'm using net.liftweb.record with MongoDB to access a database. At some point, I was in need of drawing a table of a collection of documents from the database (and render them as an ASCII table). I ran into very obscure type inference issues which are very easy to solve but nevertheless made me want to understand why they were happening.
Reproduction: For simplicity, I've reduced the code to (what I think is) an absolute minimum, so that it only depends on net.liftweb.record and none of the Mongo specific types. I've kept the real-life body of the function under question to make the example more realistic.
makeTable takes some apples, and some functions that map apples to columns. Columns can either be mapped to a real field on the apples, or a dynamically computed value (with a name). To be able to mix the two (real fields and dynamic values) in a single Seq, I defined a structural type Col.
To see how the code (below) behaves, try the following variants of the cols parameter to makeTable:
// OK:
cols = Seq(_.isDone)
cols = Seq(job => dynCol1)
cols = Seq(job => dynCol1, job => dynCol2)
// ERROR: found: Seq[Job => Object], required: Seq[Job => Test.Col]
cols = Seq(_.isDone, job => dynCol1)
cols = Seq(_.isDone, job => dynCol2)
cols = Seq(_.isDone, job => dynCol1, job => dynCol2)
...so whenever _.isDone (i.e. the column that maps to a physical field) is mixed with any other "flavor" of column, the error occurs (CASE 1). Alone it behaves well; other flavors of column also behave well when alone or mixed (CASE 2).
Intuitive workaround: marking cols as Seq[Job => Col] ALWAYS fixes the error.
Counter-intuitive workaround: explicitly marking any of the return values of the functions in the Seq as Col, or any of the functions as Job => Col, solves the issue.
The code:
import net.liftweb.record.{ Record, MetaRecord }
import net.liftweb.record.field.IntField
import scala.language.reflectiveCalls
class Job extends Record[Job] {
def meta = Job
object isDone extends IntField(this)
}
object Job extends Job with MetaRecord[Job]
object Test extends App {
type Col = { def name: String; def get: Any }
def makeTable[T](xs: Seq[T])(cols: Seq[T => Col]) = {
assert(xs.size >= 1)
val rows = xs map { x => cols { map { _(x).get } }
val header = cols map { _(xs.head).name }
(header +: rows)
}
val dynCol1 = new { def name = "dyncol1"; def get = "dyn1" }
val dynCol2 = new { def name = "dyncol2"; def get = "dyn2" }
val jobs = Seq(Job.createRecord, Job.createRecord)
makeTable(jobs)(Seq(
_.isDone,
job => dynCol1,
job => dynCol2
))
}
P.S. I'm not adding a lift or lift-record tag because I think this is not related to Lift and is simply a Scala question triggered by what happens to be a Lift-specific situation. Feel free to correct me if I'm wrong.

Access database column names from a Table?

Let's say I have a table:
object Suppliers extends Table[(Int, String, String, String)]("SUPPLIERS") {
def id = column[Int]("SUP_ID", O.PrimaryKey)
def name = column[String]("SUP_NAME")
def state = column[String]("STATE")
def zip = column[String]("ZIP")
def * = id ~ name ~ state ~ zip
}
Table's database name
The table's database name can be accessed by going: Suppliers.tableName
This is supported by the Scaladoc on AbstractTable.
For example, the above table's database name is "SUPPLIERS".
Columns' database names
Looking through AbstractTable, getLinearizedNodes and indexes looked promising. No column names in their string representations though.
I assume that * means "all the columns I'm usually interested in." * is a MappedProjection, which has this signature:
final case class MappedProjection[T, P <: Product](
child: Node,
f: (P) ⇒ T,
g: (T) ⇒ Option[P])(proj: Projection[P])
extends ColumnBase[T] with UnaryNode with Product with Serializable
*.getLinearizedNodes contains a huge sequence of numbers, and I realized that at this point I'm just doing a brute force inspection of everything in the API for possibly finding the column names in the String.
Has anybody also encountered this problem before, or could anybody give me a better understanding of how MappedProjection works?
It requires you to rely on Slick internals, which may change between versions, but it is possible. Here is how it works for Slick 1.0.1: You have to go via the FieldSymbol. Then you can extract the information you want like how columnInfo(driver: JdbcDriver, column: FieldSymbol): ColumnInfo does it.
To get a FieldSymbol from a Column you can use fieldSym(node: Node): Option[FieldSymbol] and fieldSym(column: Column[_]): FieldSymbol.
To get the (qualified) column names you can simply do the following:
Suppliers.id.toString
Suppliers.name.toString
Suppliers.state.toString
Suppliers.zip.toString
It's not explicitly stated anywhere that the toString will yield the column name, so your question is a valid one.
Now, if you want to programmatically get all the column names, then that's a bit harder. You could try using reflection to get all the methods that return a Column[_] and call toString on them, but it wouldn't be elegant. Or you could hack a bit and get a select * SQL statement from a query like this:
val selectStatement = DB withSession {
Query(Suppliers).selectStatement
}
And then parse our the column names.
This is the best I could do. If someone knows a better way then please share - I'm interested too ;)
Code is based on Lightbend Activator "slick-http-app".
slick version: 3.1.1
Added this method to the BaseDal:
def getColumns(): mutable.Map[String, Type] = {
val columns = mutable.Map.empty[String, Type]
def selectType(t: Any): Option[Any] = t match {
case t: TableExpansion => Some(t.columns)
case t: Select => Some(t.field)
case _ => None
}
def selectArray(t:Any): Option[ConstArray[Node]] = t match {
case t: TypeMapping => Some(t.child.children)
case _ => None
}
def selectFieldSymbol(t:Any): Option[FieldSymbol] = t match {
case t: FieldSymbol => Some(t)
case _ => None
}
val t = selectType(tableQ.toNode)
val c = selectArray(t.get)
for (se <- c.get) {
val col = selectType(se)
val fs = selectFieldSymbol(col.get)
columns += (fs.get.name -> fs.get.tpe)
}
columns
}
this method gets the column names (real names in DB) + types form the TableQ
used imports are:
import slick.ast._
import slick.util.ConstArray

How to write nested queries in select clause

I'm trying to produce this SQL with SLICK 1.0.0:
select
cat.categoryId,
cat.title,
(
select
count(product.productId)
from
products product
right join products_categories productCategory on productCategory.productId = product.productId
right join categories c on c.categoryId = productCategory.categoryId
where
c.leftValue >= cat.leftValue and
c.rightValue <= cat.rightValue
) as productCount
from
categories cat
where
cat.parentCategoryId = 2;
My most successful attempt is (I dropped the "joins" part, so it's more readable):
def subQuery(c: CategoriesTable.type) = (for {
p <- ProductsTable
} yield(p.id.count))
for {
c <- CategoriesTable
if (c.parentId === 2)
} yield(c.id, c.title, (subQuery(c).asColumn))
which produces the SQL lacking parenthesis in subquery:
select
x2.categoryId,
x2.title,
select count(x3.productId) from products x3
from
categories x2
where x2.parentCategoryId = 2
which is obviously invalid SQL
Any thoughts how to have SLICK put these parenthesis in the right place? Or maybe there is a different way to achieve this?
I never used Slick or ScalaQuery so it was quite an adventure to find out how to achieve this. Slick is very extensible, but the documentation on extending is a bit tricky. It might already exist, but this is what I came up with. If I have done something incorrect, please correct me.
First we need to create a custom driver. I extended the H2Driver to be able to test easily.
trait CustomDriver extends H2Driver {
// make sure we create our query builder
override def createQueryBuilder(input: QueryBuilderInput): QueryBuilder =
new QueryBuilder(input)
// extend the H2 query builder
class QueryBuilder(input: QueryBuilderInput) extends super.QueryBuilder(input) {
// we override the expr method in order to support the 'As' function
override def expr(n: Node, skipParens: Boolean = false) = n match {
// if we match our function we simply build the appropriate query
case CustomDriver.As(column, LiteralNode(name: String)) =>
b"("
super.expr(column, skipParens)
b") as ${name}"
// we don't know how to handle this, so let super hanle it
case _ => super.expr(n, skipParens)
}
}
}
object CustomDriver extends CustomDriver {
// simply define 'As' as a function symbol
val As = new FunctionSymbol("As")
// we override SimpleSql to add an extra implicit
trait SimpleQL extends super.SimpleQL {
// This is the part that makes it easy to use on queries. It's an enrichment class.
implicit class RichQuery[T: TypeMapper](q: Query[Column[T], T]) {
// here we redirect our as call to the As method we defined in our custom driver
def as(name: String) =
CustomDriver.As.column[T](Node(q.unpackable.value), name)
}
}
// we need to override simple to use our version
override val simple: SimpleQL = new SimpleQL {}
}
In order to use it we need to import specific things:
import CustomDriver.simple._
import Database.threadLocalSession
Then, to use it you can do the following (I used the tables from the official Slick documentation in my example).
// first create a function to create a count query
def countCoffees(supID: Column[Int]) =
for {
c <- Coffees
if (c.supID === supID)
} yield (c.length)
// create the query to combine name and count
val coffeesPerSupplier =
for {
s <- Suppliers
} yield (s.name, countCoffees(s.id) as "test")
// print out the name and count
coffeesPerSupplier foreach { case (name, count) =>
println(s"$name has $count type(s) of coffee")
}
The result is this:
Acme, Inc. has 2 type(s) of coffee
Superior Coffee has 2 type(s) of coffee
The High Ground has 1 type(s) of coffee

Filling a Scala immutable Map from a database table

I have a SQL database table with the following structure:
create table category_value (
category varchar(25),
property varchar(25)
);
I want to read this into a Scala Map[String, Set[String]] where each entry in the map is a set of all of the property values that are in the same category.
I would like to do it in a "functional" style with no mutable data (other than the database result set).
Following on the Clojure loop construct, here is what I have come up with:
def fillMap(statement: java.sql.Statement): Map[String, Set[String]] = {
val resultSet = statement.executeQuery("select category, property from category_value")
#tailrec
def loop(m: Map[String, Set[String]]): Map[String, Set[String]] = {
if (resultSet.next) {
val category = resultSet.getString("category")
val property = resultSet.getString("property")
loop(m + (category -> m.getOrElse(category, Set.empty)))
} else m
}
loop(Map.empty)
}
Is there a better way to do this, without using mutable data structures?
If you like, you could try something around
def fillMap(statement: java.sql.Statement): Map[String, Set[String]] = {
val resultSet = statement.executeQuery("select category, property from category_value")
Iterator.continually((resultSet, resultSet.next)).takeWhile(_._2).map(_._1).map{ res =>
val category = res.getString("category")
val property = res.getString("property")
(category, property)
}.toIterable.groupBy(_._1).mapValues(_.map(_._2).toSet)
}
Untested, because I don’t have a proper sql.Statement. And the groupBy part might need some more love to look nice.
Edit: Added the requested changes.
There are two parts to this problem.
Getting the data out of the database and into a list of rows.
I would use a Spring SimpleJdbcOperations for the database access, so that things at least appear functional, even though the ResultSet is being changed behind the scenes.
First, some a simple conversion to let us use a closure to map each row:
implicit def rowMapper[T<:AnyRef](func: (ResultSet)=>T) =
new ParameterizedRowMapper[T]{
override def mapRow(rs:ResultSet, row:Int):T = func(rs)
}
Then let's define a data structure to store the results. (You could use a tuple, but defining my own case class has advantage of being just a little bit clearer regarding the names of things.)
case class CategoryValue(category:String, property:String)
Now select from the database
val db:SimpleJdbcOperations = //get this somehow
val resultList:java.util.List[CategoryValue] =
db.query("select category, property from category_value",
{ rs:ResultSet => CategoryValue(rs.getString(1),rs.getString(2)) } )
Converting the data from a list of rows into the format that you actually want
import scala.collection.JavaConversions._
val result:Map[String,Set[String]] =
resultList.groupBy(_.category).mapValues(_.map(_.property).toSet)
(You can omit the type annotations. I've included them to make it clear what's going on.)
Builders are built for this purpose. Get one via the desired collection type companion, e.g. HashMap.newBuilder[String, Set[String]].
This solution is basically the same as my other solution, but it doesn't use Spring, and the logic for converting a ResultSet to some sort of list is simpler than Debilski's solution.
def streamFromResultSet[T](rs:ResultSet)(func: ResultSet => T):Stream[T] = {
if (rs.next())
func(rs) #:: streamFromResultSet(rs)(func)
else
rs.close()
Stream.empty
}
def fillMap(statement:java.sql.Statement):Map[String,Set[String]] = {
case class CategoryValue(category:String, property:String)
val resultSet = statement.executeQuery("""
select category, property from category_value
""")
val queryResult = streamFromResultSet(resultSet){rs =>
CategoryValue(rs.getString(1),rs.getString(2))
}
queryResult.groupBy(_.category).mapValues(_.map(_.property).toSet)
}
There is only one approach I can think of that does not include either mutable state or extensive copying*. It is actually a very basic technique I learnt in my first term studying CS. Here goes, abstracting from the database stuff:
def empty[K,V](k : K) : Option[V] = None
def add[K,V](m : K => Option[V])(k : K, v : V) : K => Option[V] = q => {
if ( k == q ) {
Some(v)
}
else {
m(q)
}
}
def build[K,V](input : TraversableOnce[(K,V)]) : K => Option[V] = {
input.foldLeft(empty[K,V]_)((m,i) => add(m)(i._1, i._2))
}
Usage example:
val map = build(List(("a",1),("b",2)))
println("a " + map("a"))
println("b " + map("b"))
println("c " + map("c"))
> a Some(1)
> b Some(2)
> c None
Of course, the resulting function does not have type Map (nor any of its benefits) and has linear lookup costs. I guess you could implement something in a similar way that mimicks simple search trees.
(*) I am talking concepts here. In reality, things like value sharing might enable e.g. mutable list constructions without memory overhead.