Grouping in Slick - scala

I am trying to get a group by result in from the slick table .
Sql : Select * from Jobs GROUP BY category ;
Class
case class JobEntity(id:Option[Long],category:String,properties:String)
My slick function
def getJobsByCategory() :(String,Future[Seq[JobsEntity]]) =
db.run(jobs.groupBy(_.category).map{ case(category,res) =>
(category,res)}.result)
Error:
No matching Shape found.
[ERROR] Slick does not know how to map the given types.
[ERROR] Possible causes: T in Table[T] does not match your * projection,
[ERROR] you use an unsupported type in a Query (e.g. scala List),
[ERROR] or you forgot to import a driver api into scope.
[ERROR] Required level: slick.lifted.FlatShapeLevel
[ERROR] Source type: (slick.lifted.Rep[String], slick.sql.FixedSqlStreamingAction[Seq[org.exadatum.xstream.persistence.models.SparkConfigEntity],org.exadatum.xstream.persistence.models.SparkConfigEntity,slick.dbio.Effect.Read])
[ERROR] Unpacked type: T
[ERROR] Packed type: G
[ERROR]
There is probably some issue with the return type but am not sure what
as the IDE generates error as
Expression of type Future[Seq[Nothing]] doesn't conform to expected type (String,Future[Seq[JobsEntity]])

Sql : Select * from Jobs GROUP BY category ;
This query would only work (even in SQL) if your table only consists of the category field.
With a group by statement every field in the select statement which is not in the group by statement (in your case everything (*) aside from category), needs to be aggregated in some way, since standard SQL only supports flat result tables.
Same stands for Slick. In the map call following the groupBy call, you'll have to define aggregation functions for everything aside your category. Otherwise Slick does not know how to map the result (as stated in the exception)
case class JobEntity(id:Option[Long],category:String,properties:String)
db.run(jobs.groupBy(_.category).map{ case(category,res) =>(category,res)}.result)
Does not work as it is.
Something like:
db.run(jobs.groupBy(_.category).map{ case(category,res) =>(category,res.map(_.1).sum)}.result)
Would work, since it results in a flat shape: Your category and the sum of the IDs with that category. I know this does not make sense for you as is, but hopefully illustrates the problem.
If really just want to group your jobs by the category, I would do it in Scala after fetching them from the database:
val groupedJobs: Future[Seq[String, JobEntity]] = db.run(jobs).map {
case jobs => jobs.groupBy(_.category)
}
If you tell me what you want to achieve exactly, I can propose another solution for you.

Related

Slick generic query with distinctOn

I would want to make a generic slick query using distinctOn on table to count distinct elements in the column.
def countDistinct(table: TableQuery[_], column: Rep[_]): DBIO[Int] =
table.distinctOn(_ => column).length.result
This code above doesn't compile because:
No matching Shape found.
[error] Slick does not know how to map the given types.
[error] Possible causes: T in Table[T] does not match your * projection,
[error] you use an unsupported type in a Query (e.g. scala List),
[error] or you forgot to import a driver api into scope.
[error] Required level: slick.lifted.FlatShapeLevel
[error] Source type: slick.lifted.Rep[_]
[error] Unpacked type: T
[error] Packed type: Any
[error] table.distinctOn(_ => column).length.result
FlatShapeLevel instead of Rep[_] also doesn't work. I'm using slick 3.
distinctOn doesn't work properly in Slick due to incorrect projection to a particular field/column. The bug was raised 5 years ago, surprisingly still hasn't been resolved.

Cast a DataFrame Entry to a Case Class with Any-type Member [duplicate]

I recently moved from Spark 1.6 to Spark 2.X and I would like to move - where possible - from Dataframes to Datasets, as well. I tried a code like this
case class MyClass(a : Any, ...)
val df = ...
df.map(x => MyClass(x.get(0), ...))
As you can see MyClass has a field of type Any, as I do not know at compile time the type of the field I retrieve with x.get(0). It may be a long, string, int, etc.
However, when I try to execute code similar to what you see above, I get an exception:
java.lang.ClassNotFoundException: scala.Any
With some debugging, I realized that the exception is raised, not because my data is of type Any, but because MyClass has a type Any. So how can I use Datasets then?
Unless you're interested in limited and ugly workarounds like Encoders.kryo:
import org.apache.spark.sql.Encoders
case class FooBar(foo: Int, bar: Any)
spark.createDataset(
sc.parallelize(Seq(FooBar(1, "a")))
)(Encoders.kryo[FooBar])
or
spark.createDataset(
sc.parallelize(Seq(FooBar(1, "a"))).map(x => (x.foo, x.bar))
)(Encoders.tuple(Encoders.scalaInt, Encoders.kryo[Any]))
you don't. All fields / columns in a Dataset have to be of known, homogeneous type for which there is an implicit Encoder in the scope. There is simply no place for Any there.
UDT API provides a bit more flexibility and allows for a limited polymorphism but it is private, not fully compatible with Dataset API and comes with significant performance and storage penalty.
If for a given execution all values of the same type you can of course create specialized classes and make a decision which one to use at run time.

What's the meaning of "$" in Dataset's operators (like select or filter)?

I am a bit confused about using $ to reference columns in DataFrame operators like select or filter.
The following statements work:
df.select("app", "renders").show
df.select($"app", $"renders").show
But, only the first statement in the following works:
df.filter("renders = 265").show // <-- this works
df.filter($"renders" = 265).show // <-- this does not work (!) Why?!
However, this again works:
df.filter($"renders" > 265).show
Basically, what is this $ in DataFrame's operators and when/how should I use it?
Implicits are a major feature of the Scala language that take a lot of different forms--like implicit classes as we will see shortly. They have different purposes, and they all come with varying levels of debate regarding how useful or dangerous they are. Ultimately though, implicits generally come down to simply having the compiler convert one class to another when you bring them into scope.
Why does this matter? Because in Spark there is an implicitclass called StringToColumn that endows a StringContext with additional functionality. As you can see, StringToColumn adds the $ method to the Scala class StringContext. This method produces a ColumnName, which extends Column.
The end result of all this is that the $ method allows you to treat the name of a column, represented as a String, as if it were the Column itself. Implicits, when used wisely, can produce convenient conversions like this to make development easier.
So let's use this to understand what you found:
df.select("app","renders").show -- succeeds because select takes multiple Strings
df.select($"app",$"renders").show -- succeeds because select takes multiple Columnss that result after the implicit conversions are applied
df.filter("renders = 265").show -- succeeds because Spark supports SQL-like filters
df.filter($"renders" = 265).show -- fails because $"renders" is of type Column after implicit conversion, and Columns use the custom === operator for equality (unlike the case in SQL).
df.filter($"renders" > 265).show -- succeeds because you're using a Column after implicit conversion and > is a function on Column.
$ is a way to convert a string to the column with that name.
Both options of select work originally because select can receive either a column or a string.
When you do the filter $"renders" = 265 is an attempt at assigning a number to the column. > on the other hand is a comparison method. You should be using === instead of =.

How to define a union type that works at runtime?

Following on form this excellent set of answers on how to define union types in Scala. I've been using the Miles Sabin definition of Union types, but one questions remains.
How do you work with these if the type isn't know until Runtime? For example:
trait inv[-A] {}
type Or[A,B] = {
type check[X] = (inv[A] with inv[B]) <:< inv[X]
}
case class Foo[A : (Int Or String)#check](a: A)
Foo(1) // Foo[Int] = Foo(1)
Foo("hi") // Foo[String] = Foo(hi)
Foo(2.0) // Error!
This example works since the parameter A is know at compile time, and calling Foo(1) is really calling Foo[Int](1). However, what do you do if parameter A isn't known until runtime? Maybe you're paring a file that contains the data for Foo's, in which case the type parameter of Foo isn't know until you read the data. There's no easy way to set parameter A in this case.
The best solutions I've been able to come up with are:
Pattern Match on the data you've read and then create different Foo's based that type. In my case this isn't feasible because my case-class actually contains dozens of union types, so there'd be hundreds of combinations of types to pattern match.
Cast the type you've just read to be (String or Int), so you have a single type to pass around, that passes the Type Class constraint when you create Foo with it. Then return Foo[_] instead. This puts the onus back on the Foo user to work out the type of each field (since they'll appear to be type Any), but at least it defers having to know the type until the field is actually used, in which case a pattern match seems more tractable.
The second solution looks like this:
def parseLine: Any // Parses data point, but can be either a String or
// Int, so returns Any.
def mkFoo: Foo[_] = {
val a = parseLine.asInstanceOf[Int with String]
Foo(a) // Passes type constraint now
}
In practice I've ended up using the second solution, but I'm wondering if there's something better I can do?
Another way to state the problem is: What does it mean to return a Union Type? Functions can only return a single type, and the trickery we use with Miles Sabin union types is only useful for the types you pass in, not for the types you return.
PS. For context, why this is a problem in my case is that I'm generating a set of case-classes from a Json schema file. Json naturally supports union types, so I would like to make my case classes reflect that too. This works great in one direction: users creating case-classes to be serialized out to Json. But gets sticky in the other direction: user's parsing Json files to have a set of populated case classes returned to them.
The "standard" Scala solution to this problem is to use an ordinary discriminated-union type (ie, to forego true union types altogether):
sealed trait Foo
case class IntFoo(x: Int) extends Foo
case class StringFoo(x: String) extends Foo
This reflects the fact that, as you observe, the particular type of the member is a runtime value; the JVM type-tag of the Foo instance provides this runtime value.
Miles Sabin's implementation of union types is very clever, but I'm not sure if it provides any practical benefit, because it only restricts the type of thing that can go into a Foo, but provides the user of a Foo with no computable version of that restriction, in the way a match provides you with a computable version of the sealed trait. In general, for a restriction to be useful, it needs two sides: a check that only the right things are put in, and an extractor (aka an eliminator) that allows the same right things to come out the other end.
Perhaps if you gave some explanation of why you're looking for a purer union type it would illuminate whether regular discriminated unions are sufficient or if you really need something more.
There's a reason every JSON parser for Scala requires well defined types into which the JSON will be converted, even if some fields have to be dropped: you cannot work with something you don't know the type of.
To given an example, say you have a, and maybe a is a String, maybe it's an Int, but you don't know what it is. Why computation could you possibly make with a, not knowing its type? Why would your code compute the sum of all a's, for instance, if you didn't know in advance it was a number?
Generally, the answer to that is to perform user-provided data manipulation at runtime over data with unknown characteristics, as the user itself sees that it's a number and decides they want to know what the sum of that field is. Fine, but you are going the wrong way about it if so.
There is a well defined way to represent JSON data in Scala (and, for that matter, any data that has the same characteristics as JSON. Which is using a hierarchy of classes. A json value may be a json object, array or one of a number of primitives. A json object contains a list of key/value pairs, whose keys are json strings and values are json values. And so on. This is easy to represent, and there are many library doing so already. In fact, there are so many that there's a project called Json4s which presents a unified API which can be used and is implemented by many of the aforementioned libraries.
Things like the records which Miles Sabin's Shapeless library provide are intended to be used when the input doesn't have a well defined schema, but the program knows what it needs from that input. And, yes, the program might know what to do with a if it is an Int or a String, but not every possible value.
The next Scala 3 (mid 2020) based on Dotty will implement the proposal for Union Type from last Sept. 2018
You see it in "a tour of Scala 3" (June 2019)
Union Types Provide ad-hoc combinations of types
Subsetting = Subtyping
No boxing overhead
case class UserName(name: String)
case class Password(hash: Hash)
def help(id: UserName | Password) = {
val user = id match {
case UserName(name) => lookupName(name)
case Password(hash) => lookupPassword(hash)
}
...
}
Union Types Work also with singleton types
Great for JS interop
type Command = "Click" | "Drag" | "KeyPressed"
def handleEvent(kind: Command) = kind match {
case "Click" => MouseClick()
case "Drag" => MoveTo()
case "KeyPressed" => KeyPressed()
}

Adding Strings to an Array out of a List in a Scala template

I have a List of Objects and want to iterate every Element in this List to get their Id Strings. Those Strings must be saved into another List. I always get this Compiler Error:
[error] FilePath:line:64: illegal start of simple expression
[error] #var formNameList : Array[String] = new Array[String](formList.size())
[error] ^
[error] File Path:69: ')' expected but '}' found.
[error] }
[error] ^
[error] two errors found
[error] (compile:compile) Compilation failed
[error] Total time: 3 s, completed 05.12.2013 14:03:37
So please guys help, before I drive insane.
My Code:
#var formNameList : Array[String] = new Array[String](formList.size())
#for(i <- 0 until formList.size()) {
#formNameList.add(formList.get(i).getFormId())
}
#views.html.formmanager.showresults(formNameList, formManager)
Im a Newbie in Scala and this is a very simple Task in Java but Scala is such a tough language. Its also very hard to read: What does this .:::, ::: or this <++= mean?
Short answer:
#views.html.formmanager.showresults(formList.map(_.getFormId).toArray, formManager)
Long answer:
Scala templates are templates - they should be used to generate some presentation of data and not be used as placeholders for a general code. I would strongly advice against doing any mutable or complex computations inside of templates. If you have complex code you should either pass it as a parameter or create a helper object like this:
# in helper.scala:
object Helper {
def toArrayOfIds(formList:List[Form]) = formList.map(_.getFormId).toArray
}
# in view.scala.html:
#Helper.toArrayOfIds(formList)
Another thing - prefer List to an Array. Usually I never use Array in my scala programms. Also notice the use of higher order function map instead of creating array an manually populating it. This is highly recommended. Just see how short the first example is.
.:::, ::: <++= could mean different things in different contexts. Usually first two operators mean the same thing which is concatenation of two lists. You can read about this in the "Programming in Scala" by Martin Odersky, first edition is available for free.
And if you need to introduce new variable in the template you can do it like this:
#defining(user.firstName + " " + user.lastName) { fullName =>
<div>Hello #fullName</div>
}
see play documentation