Grouping by generic parameter in Slick - scala

I am trying to implement generic grouping using Slick 3.2.3. By generic grouping I mean grouping the same query by different parameters or sets thereof.
Supposing I have a table:
class MyTable(tag: Tag) extends Table[MyEntry](tag, "my_table") {
def text1 = column[String]("text1")
def text2 = column[Option[String]]("text2")
def list = column[List[String]]("list") // I am using postgres+slick_pg
...
}
Then I have a complex query with several joins and I would like to be able to group it by text1, (text1, text2), list etc. One way to do it would be to define a generic function which performs grouping using extractor parameter:
private def getData[T](extractor: MyTable => T) = {
// supposing MyTable comes second in the list
// of joined tables in my complex query
val groupedQuery = myComplexQuery.groupedBy(x => extractor(x._2))
...
// here goes aggregation functions, mapping etc.
}
where one of extractor implementations may be defined as
val extractor: MyTable => (Rep[String], Rep[Option[String]]) = me => me.text1 -> me.text2
However, since extractor is generic, groupBy cannot find matching Shape for T type, and it means that I will have to provide it as well. My question is how exactly to define such Shapes? Documentation for slick.lifted package lacks examples, and it is not exactly obvious what generic types K, T, G and P mean in Query#groupBy definition (or FlatShapeLevel for that matter). I would appreciate if somebody provided examples of such extractor functions at least for a primitive type (String) and a tuple2 (say, (String, Option[String])). Or perhaps there is a better way to achieve the same result which I have overlooked? Thanks.

Related

Trying to keep types unwrapped when using refined

I am trying to use refined to create smart constructors based on primitives and avoid wrapping since same types might be used in large collections. Am I doing this right? Seems to work but a bit boilerplaty
type ONE_Pred = = MatchesRegex[W....
type ONE = String ## ONE_Pred
type TWO_Pred = OneOf[...
type TWO = String ## TWO_PRED
and then
case class C(one:ONE, two:TWO)
object C {
def apply(one:String, two:String):Either[String, C] =
(
refineT[ONE_Pred](one),
refineT[TWO_Pred](two)
).mapN(C.apply)
}
Refined has a mechanism for creating a companion-like objects that have some utilites already defined:
type ONE = String ## MatchesRegex["\\d+"]
object ONE extends RefinedTypeOps[ONE, String]
Note that:
You don't need a predicate type to be mentioned separately
It works for both shapeless tags and refined own newtypes. The flavor you get is based on the structure of type ONE.
You get:
ONE("literal") as alternative to refineMT/refineMV
ONE.from(string) as alternative to refineT/refineV
ONE.unapply so you can do string match { case ONE(taggedValue) => ... }
ONE.unsafeFrom for when your only option is to throw an exception.
With these "companions", it's possible to write much simpler code with no need to mention any predicate types:
object C {
def apply(one: String, two: String): Either[String, C] =
(ONE.from(one), TWO.from(two)).mapN(C.apply)
}
(example in scastie, using 2.13 with native literal types)

Dataset.groupByKey + untyped aggregation functions

Suppose I have types like these:
case class SomeType(id: String, x: Int, y: Int, payload: String)
case class Key(x: Int, y: Int)
Then suppose I did groupByKey on a Dataset[SomeType] like this:
val input: Dataset[SomeType] = ...
val grouped: KeyValueGroupedDataset[Key, SomeType] =
input.groupByKey(s => Key(s.x, s.y))
Then suppose I have a function which determines which field I want to use in an aggregation:
val chooseDistinguisher: SomeType => String = _.id
And now I would like to run an aggregation function over the grouped dataset, for example, functions.countDistinct, using the field obtained by the function:
grouped.agg(
countDistinct(<something which depends on chooseDistinguisher>).as[Long]
)
The problem is, I cannot create a UDF from chooseDistinguisher, because countDistinct accepts a Column, and to turn a UDF into a Column you need to specify the input column names, which I cannot do - I do not know which name to use for the "values" of a KeyValueGroupedDataset.
I think it should be possible, because KeyValueGroupedDataset itself does something similar:
def count(): Dataset[(K, Long)] = agg(functions.count("*").as(ExpressionEncoder[Long]()))
However, this method cheats a bit because it uses "*" as the column name, but I need to specify a particular column (i.e. the column of the "value" in a key-value grouped dataset). Also, when you use typed functions from the typed object, you also do not need to specify the column name, and it works somehow.
So, is it possible to do this, and if it is, how to do it?
As I know it's not possible with agg transformation, which expects TypedColumn type which is constructed based on Column type using as method, so you need to start from not type-safe expression. If somebody knows solution I would be interested to see it...
If you need to use type-safe aggregation you can use one of below approaches:
mapGroups - where you can implement Scala function responsible for aggregating Iterator
implement your custom Aggregator as suggested above
First approach needs less code, so below I'm showing quick example:
def countDistinct[T](values: Iterator[T])(chooseDistinguisher: T => String): Long =
values.map(chooseDistinguisher).toSeq.distinct.size
ds
.groupByKey(s => Key(s.x, s.y))
.mapGroups((k,vs) => (k, countDistinct(vs)(_.name)))
In my opinion Spark Dataset type-safe API is still much less mature than not type safe DataFrame API. Some time ago I was thinking that it could be good idea to implement simple to use type-safe aggregation API for Spark Dataset.
Currently, this use case is better handled with DataFrame, which you can later convert back into a Dataset[A].
// Code assumes SQLContext implicits are present
import org.apache.spark.sql.{functions => f}
val colName = "id"
ds.toDF
.withColumn("key", f.concat('x, f.lit(":"), 'y))
.groupBy('key)
.agg(countDistinct(f.col(colName)).as("cntd"))

Internal DSL in Scala tuple to a class conversion

I'm trying to build an internal DSL in Scala. I have the following types:
case class A(name:String)
case class Group(list:A*) // it can also be list:List[A]
Creating a group of A's using the normal syntax is as follows:
val group1 = Group(A("a1"), A("a2"), ...)
which is quite ugly. I would like to present a group as (A("a1"), A("a2"), ...) and possibly later ("a1", "a2", ...), if possible.
I could not figure it out myself how to convert (A("a1"), A("a2"), ...) to Group(A("a1"), A("a2"), ...). It would be nice if we can convert (A("a1"), A("a2"), ...) to an instance of class Group. (I don't care if I can't specify unlimited number of A's inside. Maximum 8 A's will suffice)
So my question is: Is there a way to convert a tuple to a specific instance of a class? If not, how would you solve this problem?
First of all, solving something for tuples of any arity in Scala is nasty because the tuple classes are distinct, so you'd have to have 8 or 22 or whatever arity you want to support conversions.
But anyways, tuples are for heterogeneously typed things, whereas you here require a collection of a common type. So while tuple syntax may look nice, I would recommend not to try to use it for this case in your DSL. Stick to collections or just alias your Group type, even at the price of an additional character, e.g.
object A {
implicit def fromString(name: String) = A(name)
}
case class A(name: String)
case class Group(elem: A*)
val G = Group
G("a1", "a2")
If you really want to support tuples, the following will do:
object Group {
implicit def fromTuple2[A1 <% A, A2 <% A](t: (A1, A2)) = Group(t._1, t._2)
}
case class Group(elem: A*)
("a1", "a2"): Group

Call distinct on an immutable list with a custom equality relation in Scala

I'd like to extract the distinct elements from a Scala list, but I don't want to use the natural equality relation. How can I specify it?
Do I have to rewrite the function or is there any way (maybe using some implicit definition that I am missing) to invoke the distinct method with a custom equality relation?
distinct does not expect an ordering algorithm - it uses the equals-method (source).
One way to achieve what you want is to create your own ordering and pass it to a SortedSet, which expects an Ordering:
implicit val ord = new Ordering[Int] {
def compare(i: Int, j: Int) = /* your implementation here */
}
val sortedList = collection.immutable.SortedSet(list: _*)/*(ord)*/.toList

scala: map-like structure that doesn't require casting when fetching a value?

I'm writing a data structure that converts the results of a database query. The raw structure is a java ResultSet and it would be converted to a map or class which permits accessing different fields on that data structure by either a named method call or passing a string into apply(). Clearly different values may have different types. In order to reduce burden on the clients of this data structure, my preference is that one not need to cast the values of the data structure but the value fetched still has the correct type.
For example, suppose I'm doing a query that fetches two column values, one an Int, the other a String. The result then names of the columns are "a" and "b" respectively. Some ideal syntax might be the following:
val javaResultSet = dbQuery("select a, b from table limit 1")
// with ResultSet, particular values can be accessed like this:
val a = javaResultSet.getInt("a")
val b = javaResultSet.getString("b")
// but this syntax is undesirable.
// since I want to convert this to a single data structure,
// the preferred syntax might look something like this:
val newStructure = toDataStructure[Int, String](javaResultSet)("a", "b")
// that is, I'm willing to state the types during the instantiation
// of such a data structure.
// then,
val a: Int = newStructure("a") // OR
val a: Int = newStructure.a
// in both cases, "val a" does not require asInstanceOf[Int].
I've been trying to determine what sort of data structure might allow this and I could not figure out a way around the casting.
The other requirement is obviously that I would like to define a single data structure used for all db queries. I realize I could easily define a case class or similar per call and that solves the typing issue, but such a solution does not scale well when many db queries are being written. I suspect some people are going to propose using some sort of ORM, but let us assume for my case that it is preferred to maintain the query in the form of a string.
Anyone have any suggestions? Thanks!
To do this without casting, one needs more information about the query and one needs that information at compiole time.
I suspect some people are going to propose using some sort of ORM, but let us assume for my case that it is preferred to maintain the query in the form of a string.
Your suspicion is right and you will not get around this. If current ORMs or DSLs like squeryl don't suit your fancy, you can create your own one. But I doubt you will be able to use query strings.
The basic problem is that you don't know how many columns there will be in any given query, and so you don't know how many type parameters the data structure should have and it's not possible to abstract over the number of type parameters.
There is however, a data structure that exists in different variants for different numbers of type parameters: the tuple. (E.g. Tuple2, Tuple3 etc.) You could define parameterized mapping functions for different numbers of parameters that returns tuples like this:
def toDataStructure2[T1, T2](rs: ResultSet)(c1: String, c2: String) =
(rs.getObject(c1).asInstanceOf[T1],
rs.getObject(c2).asInstanceOf[T2])
def toDataStructure3[T1, T2, T3](rs: ResultSet)(c1: String, c2: String, c3: String) =
(rs.getObject(c1).asInstanceOf[T1],
rs.getObject(c2).asInstanceOf[T2],
rs.getObject(c3).asInstanceOf[T3])
You would have to define these for as many columns you expect to have in your tables (max 22).
This depends of course on that using getObject and casting it to a given type is safe.
In your example you could use the resulting tuple as follows:
val (a, b) = toDataStructure2[Int, String](javaResultSet)("a", "b")
if you decide to go the route of heterogeneous collections, there are some very interesting posts on heterogeneous typed lists:
one for instance is
http://jnordenberg.blogspot.com/2008/08/hlist-in-scala.html
http://jnordenberg.blogspot.com/2008/09/hlist-in-scala-revisited-or-scala.html
with an implementation at
http://www.assembla.com/wiki/show/metascala
a second great series of posts starts with
http://apocalisp.wordpress.com/2010/07/06/type-level-programming-in-scala-part-6a-heterogeneous-list%C2%A0basics/
the series continues with parts "b,c,d" linked from part a
finally, there is a talk by Daniel Spiewak which touches on HOMaps
http://vimeo.com/13518456
so all this to say that perhaps you can build you solution from these ideas. sorry that i don't have a specific example, but i admit i haven't tried these out yet myself!
Joschua Bloch has introduced a heterogeneous collection, which can be written in Java. I once adopted it a little. It now works as a value register. It is basically a wrapper around two maps. Here is the code and this is how you can use it. But this is just FYI, since you are interested in a Scala solution.
In Scala I would start by playing with Tuples. Tuples are kinda heterogeneous collections. The results can be, but not have to be accessed through fields like _1, _2, _3 and so on. But you don't want that, you want names. This is how you can assign names to those:
scala> val tuple = (1, "word")
tuple: ([Int], [String]) = (1, word)
scala> val (a, b) = tuple
a: Int = 1
b: String = word
So as mentioned before I would try to build a ResultSetWrapper around tuples.
If you want "extract the column value by name" on a plain bean instance, you can probably:
use reflects and CASTs, which you(and me) don't like.
use a ResultSetToJavaBeanMapper provided by most ORM libraries, which is a little heavy and coupled.
write a scala compiler plugin, which is too complex to control.
so, I guess a lightweight ORM with following features may satisfy you:
support raw SQL
support a lightweight,declarative and adaptive ResultSetToJavaBeanMapper
nothing else.
I made an experimental project on that idea, but note it's still an ORM, and I just think it may be useful to you, or can bring you some hint.
Usage:
declare the model:
//declare DB schema
trait UserDef extends TableDef {
var name = property[String]("name", title = Some("姓名"))
var age1 = property[Int]("age", primary = true)
}
//declare model, and it mixes in properties as {var name = ""}
#BeanInfo class User extends Model with UserDef
//declare a object.
//it mixes in properties as {var name = Property[String]("name") }
//and, object User is a Mapper[User], thus, it can translate ResultSet to a User instance.
object `package`{
#BeanInfo implicit object User extends Table[User]("users") with UserDef
}
then call raw sql, the implicit Mapper[User] works for you:
val users = SQL("select name, age from users").all[User]
users.foreach{user => println(user.name)}
or even build a type safe query:
val users = User.q.where(User.age > 20).where(User.name like "%liu%").all[User]
for more, see unit test:
https://github.com/liusong1111/soupy-orm/blob/master/src/test/scala/mapper/SoupyMapperSpec.scala
project home:
https://github.com/liusong1111/soupy-orm
It uses "abstract Type" and "implicit" heavily to make the magic happen, and you can check source code of TableDef, Table, Model for detail.
Several million years ago I wrote an example showing how to use Scala's type system to push and pull values from a ResultSet. Check it out; it matches up with what you want to do fairly closely.
implicit val conn = connect("jdbc:h2:f2", "sa", "");
implicit val s: Statement = conn << setup;
val insertPerson = conn prepareStatement "insert into person(type, name) values(?, ?)";
for (val name <- names)
insertPerson<<rnd.nextInt(10)<<name<<!;
for (val person <- query("select * from person", rs => Person(rs,rs,rs)))
println(person.toXML);
for (val person <- "select * from person" <<! (rs => Person(rs,rs,rs)))
println(person.toXML);
Primitives types are used to guide the Scala compiler into selecting the right functions on the ResultSet.