scala: map-like structure that doesn't require casting when fetching a value? - scala

I'm writing a data structure that converts the results of a database query. The raw structure is a java ResultSet and it would be converted to a map or class which permits accessing different fields on that data structure by either a named method call or passing a string into apply(). Clearly different values may have different types. In order to reduce burden on the clients of this data structure, my preference is that one not need to cast the values of the data structure but the value fetched still has the correct type.
For example, suppose I'm doing a query that fetches two column values, one an Int, the other a String. The result then names of the columns are "a" and "b" respectively. Some ideal syntax might be the following:
val javaResultSet = dbQuery("select a, b from table limit 1")
// with ResultSet, particular values can be accessed like this:
val a = javaResultSet.getInt("a")
val b = javaResultSet.getString("b")
// but this syntax is undesirable.
// since I want to convert this to a single data structure,
// the preferred syntax might look something like this:
val newStructure = toDataStructure[Int, String](javaResultSet)("a", "b")
// that is, I'm willing to state the types during the instantiation
// of such a data structure.
// then,
val a: Int = newStructure("a") // OR
val a: Int = newStructure.a
// in both cases, "val a" does not require asInstanceOf[Int].
I've been trying to determine what sort of data structure might allow this and I could not figure out a way around the casting.
The other requirement is obviously that I would like to define a single data structure used for all db queries. I realize I could easily define a case class or similar per call and that solves the typing issue, but such a solution does not scale well when many db queries are being written. I suspect some people are going to propose using some sort of ORM, but let us assume for my case that it is preferred to maintain the query in the form of a string.
Anyone have any suggestions? Thanks!

To do this without casting, one needs more information about the query and one needs that information at compiole time.
I suspect some people are going to propose using some sort of ORM, but let us assume for my case that it is preferred to maintain the query in the form of a string.
Your suspicion is right and you will not get around this. If current ORMs or DSLs like squeryl don't suit your fancy, you can create your own one. But I doubt you will be able to use query strings.

The basic problem is that you don't know how many columns there will be in any given query, and so you don't know how many type parameters the data structure should have and it's not possible to abstract over the number of type parameters.
There is however, a data structure that exists in different variants for different numbers of type parameters: the tuple. (E.g. Tuple2, Tuple3 etc.) You could define parameterized mapping functions for different numbers of parameters that returns tuples like this:
def toDataStructure2[T1, T2](rs: ResultSet)(c1: String, c2: String) =
(rs.getObject(c1).asInstanceOf[T1],
rs.getObject(c2).asInstanceOf[T2])
def toDataStructure3[T1, T2, T3](rs: ResultSet)(c1: String, c2: String, c3: String) =
(rs.getObject(c1).asInstanceOf[T1],
rs.getObject(c2).asInstanceOf[T2],
rs.getObject(c3).asInstanceOf[T3])
You would have to define these for as many columns you expect to have in your tables (max 22).
This depends of course on that using getObject and casting it to a given type is safe.
In your example you could use the resulting tuple as follows:
val (a, b) = toDataStructure2[Int, String](javaResultSet)("a", "b")

if you decide to go the route of heterogeneous collections, there are some very interesting posts on heterogeneous typed lists:
one for instance is
http://jnordenberg.blogspot.com/2008/08/hlist-in-scala.html
http://jnordenberg.blogspot.com/2008/09/hlist-in-scala-revisited-or-scala.html
with an implementation at
http://www.assembla.com/wiki/show/metascala
a second great series of posts starts with
http://apocalisp.wordpress.com/2010/07/06/type-level-programming-in-scala-part-6a-heterogeneous-list%C2%A0basics/
the series continues with parts "b,c,d" linked from part a
finally, there is a talk by Daniel Spiewak which touches on HOMaps
http://vimeo.com/13518456
so all this to say that perhaps you can build you solution from these ideas. sorry that i don't have a specific example, but i admit i haven't tried these out yet myself!

Joschua Bloch has introduced a heterogeneous collection, which can be written in Java. I once adopted it a little. It now works as a value register. It is basically a wrapper around two maps. Here is the code and this is how you can use it. But this is just FYI, since you are interested in a Scala solution.
In Scala I would start by playing with Tuples. Tuples are kinda heterogeneous collections. The results can be, but not have to be accessed through fields like _1, _2, _3 and so on. But you don't want that, you want names. This is how you can assign names to those:
scala> val tuple = (1, "word")
tuple: ([Int], [String]) = (1, word)
scala> val (a, b) = tuple
a: Int = 1
b: String = word
So as mentioned before I would try to build a ResultSetWrapper around tuples.

If you want "extract the column value by name" on a plain bean instance, you can probably:
use reflects and CASTs, which you(and me) don't like.
use a ResultSetToJavaBeanMapper provided by most ORM libraries, which is a little heavy and coupled.
write a scala compiler plugin, which is too complex to control.
so, I guess a lightweight ORM with following features may satisfy you:
support raw SQL
support a lightweight,declarative and adaptive ResultSetToJavaBeanMapper
nothing else.
I made an experimental project on that idea, but note it's still an ORM, and I just think it may be useful to you, or can bring you some hint.
Usage:
declare the model:
//declare DB schema
trait UserDef extends TableDef {
var name = property[String]("name", title = Some("姓名"))
var age1 = property[Int]("age", primary = true)
}
//declare model, and it mixes in properties as {var name = ""}
#BeanInfo class User extends Model with UserDef
//declare a object.
//it mixes in properties as {var name = Property[String]("name") }
//and, object User is a Mapper[User], thus, it can translate ResultSet to a User instance.
object `package`{
#BeanInfo implicit object User extends Table[User]("users") with UserDef
}
then call raw sql, the implicit Mapper[User] works for you:
val users = SQL("select name, age from users").all[User]
users.foreach{user => println(user.name)}
or even build a type safe query:
val users = User.q.where(User.age > 20).where(User.name like "%liu%").all[User]
for more, see unit test:
https://github.com/liusong1111/soupy-orm/blob/master/src/test/scala/mapper/SoupyMapperSpec.scala
project home:
https://github.com/liusong1111/soupy-orm
It uses "abstract Type" and "implicit" heavily to make the magic happen, and you can check source code of TableDef, Table, Model for detail.

Several million years ago I wrote an example showing how to use Scala's type system to push and pull values from a ResultSet. Check it out; it matches up with what you want to do fairly closely.
implicit val conn = connect("jdbc:h2:f2", "sa", "");
implicit val s: Statement = conn << setup;
val insertPerson = conn prepareStatement "insert into person(type, name) values(?, ?)";
for (val name <- names)
insertPerson<<rnd.nextInt(10)<<name<<!;
for (val person <- query("select * from person", rs => Person(rs,rs,rs)))
println(person.toXML);
for (val person <- "select * from person" <<! (rs => Person(rs,rs,rs)))
println(person.toXML);
Primitives types are used to guide the Scala compiler into selecting the right functions on the ResultSet.

Related

Grouping by generic parameter in Slick

I am trying to implement generic grouping using Slick 3.2.3. By generic grouping I mean grouping the same query by different parameters or sets thereof.
Supposing I have a table:
class MyTable(tag: Tag) extends Table[MyEntry](tag, "my_table") {
def text1 = column[String]("text1")
def text2 = column[Option[String]]("text2")
def list = column[List[String]]("list") // I am using postgres+slick_pg
...
}
Then I have a complex query with several joins and I would like to be able to group it by text1, (text1, text2), list etc. One way to do it would be to define a generic function which performs grouping using extractor parameter:
private def getData[T](extractor: MyTable => T) = {
// supposing MyTable comes second in the list
// of joined tables in my complex query
val groupedQuery = myComplexQuery.groupedBy(x => extractor(x._2))
...
// here goes aggregation functions, mapping etc.
}
where one of extractor implementations may be defined as
val extractor: MyTable => (Rep[String], Rep[Option[String]]) = me => me.text1 -> me.text2
However, since extractor is generic, groupBy cannot find matching Shape for T type, and it means that I will have to provide it as well. My question is how exactly to define such Shapes? Documentation for slick.lifted package lacks examples, and it is not exactly obvious what generic types K, T, G and P mean in Query#groupBy definition (or FlatShapeLevel for that matter). I would appreciate if somebody provided examples of such extractor functions at least for a primitive type (String) and a tuple2 (say, (String, Option[String])). Or perhaps there is a better way to achieve the same result which I have overlooked? Thanks.

Passing parameters to scala slick query [duplicate]

There is a similar question here but it doesn't actually answer the question.
Is it possible to use IN clause in plain sql Slick?
Note that this is actually part of a larger and more complex query, so I do need to use plain sql instead of slick's lifted embedding. Something like the following will be good:
val ids = List(2,4,9)
sql"SELECT * FROM coffee WHERE id IN ($ids)"
The sql prefix unlocks a StringContext where you can set SQL parameters. There is no SQL parameter for a list, so you can easily end up opening yourself up to SQL injection here if you're not careful. There are some good (and some dangerous) suggestions about dealing with this problem with SQLServer on this question. You have a few options:
Your best bet is probably to use the #$ operator together with mkString to interpolate dynamic SQL:
val sql = sql"""SELECT * FROM coffee WHERE id IN (#${ids.mkString(",")})"""
This doesn't properly use parameters and therefore might be open to SQL-injection and other problems.
Another option is to use regular string interpolation and mkString to build the statement:
val query = s"""SELECT * FROM coffee WHERE id IN (${ids.mkString(",")})"""
StaticQuery.queryNA[Coffee](query)
This is essentially the same approach as using #$, but might be more flexible in the general case.
If SQL-injection vulnerability is a major concern (e.g. if the elements of ids are user provided), you can build a query with a parameter for each element of ids. Then you'll need to provide a custom SetParameter instance so that slick can turn the List into parameters:
implicit val setStringListParameter = new SetParameter[List[String]]{
def apply(v1: List[String], v2: PositionedParameters): Unit = {
v1.foreach(v2.setString)
}
}
val idsInClause = List.fill(ids.length)("?").mkString("(", ",", ")")
val query = s"""SELECT * FROM coffee WHERE id IN ($idsInClause)"""
Q.query[List[String], String](query).apply(ids).list(s)
Since your ids are Ints, this is probably less of a concern, but if you prefer this method, you would just need to change the setStringListParameter to use Int instead of String:
val ids = List(610113193610210035L, 220702198208189710L)
implicit object SetListLong extends SetParameter[List[Long]] {
def apply(vList: List[Long], pp: PositionedParameters) {
vList.foreach(pp.setLong)
}
}
val select = sql"""
select idnum from idnum_0
where idnum in ($ids#${",?" * (ids.size - 1)})
""".as[Long]
#Ben Reich is right.
this is another sample code, test on slick 3.1.0.
($ids#${",?" * (ids.size - 1)})
Although this is not universal answer and may not be what the author wanted, I still want to point this out to whoever views this question.
Some DB backends support array types, and there are extensions to Slick that allow setting these array types in the interpolations.
For example, Postgres has the syntax where column = any(array), and with slick-pg you can use this syntax like so:
def query(ids: Seq[Long]) = db.run(sql"select * from table where ids = any($ids)".as[Long])
This brings a much cleaner syntax, which is friendlier to the statement compiler cache and also safe from SQL injections and overall danger of creating a malformed SQL with the #$var interpolation syntax.
I have written a small extension to Slick that addresses exactly this problem: https://github.com/rtkaczyk/inslick
For the given example the solution would be:
import accode.inslick.syntax._
val ids = List(2,4,9)
sqli"SELECT * FROM coffee WHERE id IN *$ids"
Additionally InSlick works with iterables of tuples or case classes. It's available for all Slick 3.x versions and Scala versions 2.11 - 2.13. We've been using it in production for several months at the company I work for.
The interpolation is safe from SQL injection. It utilises a macro which rewrites the query in a way similar to trydofor's answer
Ran into essentially this same issue in Slick 3.3.3 when trying to use a Seq[Long] in an IN query for MySQL. Kept getting a compilation error from Slick of:
could not find implicit value for parameter e: slick.jdbc.SetParameter[Seq[Long]]
The original question would have been getting something like:
could not find implicit value for parameter e: slick.jdbc.SetParameter[List[Int]]
Slick 3.3.X+ can handle binding the parameters for the IN query, as long as we provide the implicit definition of how Slick should do so for the types we're using. This means adding the implicit val definition somewhere at the class level. So, like:
class MyClass {
// THIS IS THIS KEY LINE TO ENABLE SLICK TO BIND THE PARAMS
implicit val setListInt = SetParameter[List[Int]]((inputList, params) => inputList.foreach(params.setInt))
def queryByHardcodedIds() = {
val ids: List[Int] = List(2,4,9)
sql"SELECT * FROM coffee WHERE id IN ($ids)" // SLICK CAN AUTO-HANDLE BINDING NOW
}
}
Similar for the case of Seq[Long] & others. Just make sure your types/binding aligns to what you need Slick to handle:
implicit val setSeqLong = SetParameter[Seq[Long]]((inputList, params) => inputList.foreach(params.setLong))
// ^^Note the `SetParameter[Seq[Long]]` & `.setLong` for type alignment

How to pass data from closure without repeating yourself

I'm using Play 2 with Anorm to manage database access. A common pattern I find myself doing is this:
val (futureChecklists, jobsLookup) =
DB.withConnection { implicit connection =>
val futureChecklists = futureChecklistRepository.getAllHavingActiveTemplateAndNonNullNextRunDate()
val jobsLookup = futureChecklistJobRepository.getAllHavingActiveTemplateAndNonNullNextRunDate()
.groupBy(_.futureChecklist.id)
.withDefaultValue(List.empty)
(futureChecklists, jobsLookup)
}
Which seems kinda weird, because I have to repeat myself. It also gets a bit unruly if I have several variables I'll need in the outer scope, but I don't want to keep the connection open.
Is there an easy way to pass this information back without having to resort to using vars?
What I would like is something like:
val futureChecklists
val jobsLookup
DB.withConnection { implicit connection =>
futureChecklists = futureChecklistRepository.getAllHavingActiveTemplateAndNonNullNextRunDate()
jobsLookup = futureChecklistJobRepository.getAllHavingActiveTemplateAndNonNullNextRunDate()
.groupBy(_.futureChecklist.id)
.withDefaultValue(List.empty)
}
That way I don't have the same tuple at the beginning and end.
I am afraid there is no easy way not to duplicate the tuple declaration, but var is definitely not the way to go around it.
You're mentioning that it becomes weird and difficult with multiple variables at time which returned as a tuple. This indeed can become really tricky and error prone, especially then you end up having large N-tuples with the same parameter types. In that scenario I would consider having a dedicated contained i.e. a case class where you can reference variables by name and not by position in the tuple. The side benefit is that you can assign the whole container to a variable and reference it in the natural way.
Last but not least you don't mention much about your particular use case, but maybe it is worth considering having the 2 queries results obtained in the separate withConnection block. If you are using any collection pooling mechanism, then there is hardly any benefit having it in the same with connection block and with the separate blocks you might even get a flexibility to pararelize the DB queries using separate connections.
There are three ways that i came up with:
Return tuple immediately
val (users, posts) =
DB.withConnection { connection => (
connection.getUsers,
connection.getPosts
)}
I think this is OK for simple code and small numbers of vals. For more complex code and more vals this can be error prone. Someone can accidentally change order of elements in tuple on just one side of assignment, and assign data to wrong vals (which will be reported by compiler only if it also cause type mismatch).
Use anonymous class
val dbResult =
DB.withConnection { connection =>
new {
val users = connection.getUsers
val posts = connection.getPosts
}
}
If you like to have users and posts variables instead of dbResult.users and dbResult.posts you can:
import dbResult._
This solution is a little exotic, but it works just fine and is quite clean.
Use case class
First define case class for your return value:
case class DBResult(users: List[User], posts: List[Post])
and then use it:
val DBResult(users: List[User], posts: List[Post]) =
DB.withConnection { connection =>
DBResult(
users = connection.getUsers,
posts = connection.getPosts
)
}
This is best if you intend to reuse this case class multiple times.

In Scala what is the easiest way to parse json and map to objects?

I'm looking for a super simple way to take a big JSON fragment, that is a long list with a bunch of big objects in it, and parse it, then pick out the same few values from each object and then map into a case class.
I have tried pretty hard to get lift-json (2.5) working for me, but I'm having trouble cleanly dealing with checking if a key is present, and if so, then map the whole object, but if not, then skip it.
I absolutely do not understand this syntax for Lift-JSON one bit:
case class Car(make: String, model: String)
...
val parsed = parse(jsonFragment)
val JArray(cars) = parsed / "cars"
val carList = new MutableList[Car]
for (car <- cars) {
val JString(model) = car / "model"
val JString(make) = car / "make"
// i want to check if they both exist here, and if so
// then add to carList
carList += car
}
What on earth is that construct that makes it look like a case class is being created left of the assignment operator? I'm talking about the "JString" part.
Also how is it supposed to cope with the situation where a key is missing?
Can someone please explain to me what the right way to do this is?
And if I have nested values I'm looking for, I just want to skip the whole object and go on to try to map the next one.
Is there something more straightforward for this than Lift-JSON?
Would using extractOpt help?
I have looked at this a lot:
https://github.com/lift/framework/tree/master/core/json
and it's still not particularly clear to me.
Help is very much appreciated!!!!!
Since you are only looking to extract certain fields, you are on the right track. This modified version of your for-comprehension will loop through your car structure, extract the make and model and only yield your case class if both items exist:
for{
car <- cars
model <- (car \ "model").extractOpt[String]
make <- (car \ "make").extractOpt[String]
} yield Car(make, model)
You would add additional required fields the same way. If you want to also utilize optional parameters, let's say color - then you can call that in your yield section and the for comprehension won't unbox them:
for{
car <- cars
model <- (car \ "model").extractOpt[String]
make <- (car \ "make").extractOpt[String]
} yield Car(make, model, (car \ "color").extractOpt[String])
In both cases you will get back a List of Car case classes.
The weird looking assignment is pattern-matching used on val declaration.
When you see
val JArray(cars) = parsed / "cars"
it extracts from the parsed json the subtree of "cars" objects and matches the resulting value with the extractor pattern JArrays(cars).
That is to say that the value is expected to be in the form of a constructor JArrays(something) and the something is bound to the cars variable name.
It works pretty much the same as you're probably familiar with case classes, like Options, e.g.
//define a value with a class that can pattern match
val option = Some(1)
//do the matching on val assignment
val Some(number) = option
//use the extracted binding as a variable
println(number)
The following assignments are exactly the same stuff
//pattern match on a JSon String whose inner value is assigned to "model"
val JString(model) = car / "model"
//pattern match on a JSon String whose inner value is assigned to "make"
val JString(make) = car / "make"
References
The JSON types (e.g. JValue, JString, JDouble) are defined as aliases within the net.liftweb.json object here.
The aliases in turn point to corresponding inner case classes within the net.liftweb.json.JsonAST object, found here
The case classes have an unapply method for free, which lets you do the pattern-matching as explained in the above answer.
I think this should work for you:
case class UserInfo(
name: String,
firstName: Option[String],
lastName: Option[String],
smiles: Boolean
)
val jValue: JValue
val extractedUserInfoClass: Option[UserInfo] = jValue.extractOpt[UserInfo]
val jsonArray: JArray
val listOfUserInfos: List[Option[UserInfo]] = jsonArray.arr.map(_.extractOpt[UserInfo])
I expect jValue to have smiles and name -- otherwise extracting will fail.
I don't expect jValue to necessarily have firstName and lastName -- so I write Option[T] in the case class.

What are good examples of: "operation of a program should map input values to output values rather than change data in place"

I came across this sentence in Scala in explaining its functional behavior.
operation of a program should map input of values to output values rather than change data in place
Could somebody explain it with a good example?
Edit: Please explain or give example for the above sentence in its context, please do not make it complicate to get more confusion
The most obvious pattern that this is referring to is the difference between how you would write code which uses collections in Java when compared with Scala. If you were writing scala but in the idiom of Java, then you would be working with collections by mutating data in place. The idiomatic scala code to do the same would favour the mapping of input values to output values.
Let's have a look at a few things you might want to do to a collection:
Filtering
In Java, if I have a List<Trade> and I am only interested in those trades executed with Deutsche Bank, I might do something like:
for (Iterator<Trade> it = trades.iterator(); it.hasNext();) {
Trade t = it.next();
if (t.getCounterparty() != DEUTSCHE_BANK) it.remove(); // MUTATION
}
Following this loop, my trades collection only contains the relevant trades. But, I have achieved this using mutation - a careless programmer could easily have missed that trades was an input parameter, an instance variable, or is used elsewhere in the method. As such, it is quite possible their code is now broken. Furthermore, such code is extremely brittle for refactoring for this same reason; a programmer wishing to refactor a piece of code must be very careful to not let mutated collections escape the scope in which they are intended to be used and, vice-versa, that they don't accidentally use an un-mutated collection where they should have used a mutated one.
Compare with Scala:
val db = trades filter (_.counterparty == DeutscheBank) //MAPPING INPUT TO OUTPUT
This creates a new collection! It doesn't affect anyone who is looking at trades and is inherently safer.
Mapping
Suppose I have a List<Trade> and I want to get a Set<Stock> for the unique stocks which I have been trading. Again, the idiom in Java is to create a collection and mutate it.
Set<Stock> stocks = new HashSet<Stock>();
for (Trade t : trades) stocks.add(t.getStock()); //MUTATION
Using scala the correct thing to do is to map the input collection and then convert to a set:
val stocks = (trades map (_.stock)).toSet //MAPPING INPUT TO OUTPUT
Or, if we are concerned about performance:
(trades.view map (_.stock)).toSet
(trades.iterator map (_.stock)).toSet
What are the advantages here? Well:
My code can never observe a partially-constructed result
The application of a function A => B to a Coll[A] to get a Coll[B] is clearer.
Accumulating
Again, in Java the idiom has to be mutation. Suppose we are trying to sum the decimal quantities of the trades we have done:
BigDecimal sum = BigDecimal.ZERO
for (Trade t : trades) {
sum.add(t.getQuantity()); //MUTATION
}
Again, we must be very careful not to accidentally observe a partially-constructed result! In scala, we can do this in a single expression:
val sum = (0 /: trades)(_ + _.quantity) //MAPPING INTO TO OUTPUT
Or the various other forms:
(trades.foldLeft(0)(_ + _.quantity)
(trades.iterator map (_.quantity)).sum
(trades.view map (_.quantity)).sum
Oh, by the way, there is a bug in the Java implementation! Did you spot it?
I'd say it's the difference between:
var counter = 0
def updateCounter(toAdd: Int): Unit = {
counter += toAdd
}
updateCounter(8)
println(counter)
and:
val originalValue = 0
def addToValue(value: Int, toAdd: Int): Int = value + toAdd
val firstNewResult = addToValue(originalValue, 8)
println(firstNewResult)
This is a gross over simplification but fuller examples are things like using a foldLeft to build up a result rather than doing the hard work yourself: foldLeft example
What it means is that if you write pure functions like this you always get the same output from the same input, and there are no side effects, which makes it easier to reason about your programs and ensure that they are correct.
so for example the function:
def times2(x:Int) = x*2
is pure, while
def add5ToList(xs: MutableList[Int]) {
xs += 5
}
is impure because it edits data in place as a side effect. This is a problem because that same list could be in use elsewhere in the the program and now we can't guarantee the behaviour because it has changed.
A pure version would use immutable lists and return a new list
def add5ToList(xs: List[Int]) = {
5::xs
}
There are plenty examples with collections, which are easy to come by but might give the wrong impression. This concept works at all levels of the language (it doesn't at the VM level, however). One example is the case classes. Consider these two alternatives:
// Java-style
class Person(initialName: String, initialAge: Int) {
def this(initialName: String) = this(initialName, 0)
private var name = initialName
private var age = initialAge
def getName = name
def getAge = age
def setName(newName: String) { name = newName }
def setAge(newAge: Int) { age = newAge }
}
val employee = new Person("John")
employee.setAge(40) // we changed the object
// Scala-style
case class Person(name: String, age: Int) {
def this(name: String) = this(name, 0)
}
val employee = new Person("John")
val employeeWithAge = employee.copy(age = 40) // employee still exists!
This concept is applied on the construction of the immutable collection themselves: a List never changes. Instead, new List objects are created when necessary. Use of persistent data structures reduce the copying that would happen on a mutable data structure.