I have a Slick 3.0 table definition similar to the following:
case class Simple(a: String, b: Int, c: Option[String])
trait Tables { this: JdbcDriver =>
import api._
class Simples(tag: Tag) extends Table[Simple](tag, "simples") {
def a = column[String]("a")
def b = column[Int]("b")
def c = column[Option[String]]("c")
def * = (a, b, c) <> (Simple.tupled, Simple.unapply)
}
lazy val simples = TableQuery[Simples]
}
object DB extends Tables with MyJdbcDriver
I would like to be able to do 2 things:
Get a list of the column names as Seq[String]
For an instance of Simple, generate a Seq[String] that would correspond to how the data would be inserted into the database using a raw query (e.g. Simple("hello", 1, None) becomes Seq("'hello'", "1", "NULL"))
What would be the best way to do this using the Slick table definition?
First of all it is not possible to trick Slick and change the order on the left side of the <> operator in the * method without changing the order of values in Simple, the row type of Simples, i.e. what Ben assumed is not possible. The ProvenShape return type of the * projection method ensures that there is a Shape available for translating between the Column-based type in * and the client-side type and if you write def * = (c, b, a) <> Simple.tupled, Simple.unapply) having Simple defined as case class Simple(a: String, b: Int, c: Option[String]), Slick will complain with an error "No matching Shape found. Slick does not know how to map the given types...". Ergo, you can iterate over all the elements of an instance of Simple with its productIterator.
Secondly, you already have the definition of the Simples table in your code and querying metatables to get the same information you already have is not sensible. You can get all you column names with a one-liner simples.baseTableRow.create_*.map(_.name). Note that the * projection of the table also defines the columns generated when you create the table schema. So the columns not mentioned in the projection are not created and the statement above is guaranteed to return exactly what you need and not to drop anything.
To recap briefly:
To get a list of the column names of the Simples table as Seq[String] use
simples.baseTableRow.create_*.map(_.name).toSeq
To generate a Seq[String] that would correspond to how the data
would be inserted into the database using a raw query for aSimple,
an instance of Simple use aSimple.productIterator.toSeq
To get the column names, try this:
db.run(for {
metaTables <- slick.jdbc.meta.MTable.getTables("simples")
columns <- metaTables.head.getColumns
} yield columns.map {_.name}) foreach println
This will print
Vector(a, b, c)
And for the case class values, you can use productIterator:
Simple("hello", 1, None).productIterator.toVector
is
Vector(hello, 1, None)
You still have to do the value mapping, and guarantee that the order of the columns in the table and the values in the case class are the same.
Related
I need to iterate over data frame in specific order and apply some complex logic to calculate new column.
Also my strong preference is to do it in generic way so I do not have to list all columns of a row and do df.as[my_record] or case Row(...) => as shown here. Instead, I want to access row columns by their names and just add result column(s) to source row.
Below approach works just fine but I'd like to avoid specifying schema twice: first time so that I can access columns by name while iterating and second time to process output.
import org.apache.spark.sql.Row
import org.apache.spark.sql.types._
import org.apache.spark.sql.catalyst.encoders.RowEncoder
import org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema
val q = """
select 2 part, 1 id
union all select 2 part, 4 id
union all select 2 part, 3 id
union all select 2 part, 2 id
"""
val df = spark.sql(q)
def f_row(iter: Iterator[Row]) : Iterator[Row] = {
if (iter.hasNext) {
def complex_logic(p: Int): Integer = if (p == 3) null else p * 10;
val head = iter.next
val schema = StructType(head.schema.fields :+ StructField("result", IntegerType))
val r =
new GenericRowWithSchema((head.toSeq :+ complex_logic(head.getAs("id"))).toArray, schema)
iter.scanLeft(r)((r1, r2) =>
new GenericRowWithSchema((r2.toSeq :+ complex_logic(r2.getAs("id"))).toArray, schema)
)
} else iter
}
val schema = StructType(df.schema.fields :+ StructField("result", IntegerType))
val encoder = RowEncoder(schema)
df.repartition($"part").sortWithinPartitions($"id").mapPartitions(f_row)(encoder).show
What information is lost after applying mapPartitions so output cannot be processed without explicit encoder? How to avoid specifying it?
What information is lost after applying mapPartitions so output cannot be processed without
The information is hardly lost - it wasn't there from the begining - subclasses of Row or InternalRow are basically untyped, variable shape containers, which don't provide any useful type information, that could be used to derive an Encoder.
schema in GenericRowWithSchema is inconsequential as it describes content in terms of metadata not types.
How to avoid specifying it?
Sorry, you're out of luck. If you want to use dynamically typed constructs (a bag of Any) in a statically typed language you have to pay the price, which here is providing an Encoder.
OK - I have checked some of my spark code and using .mapPartitions with the Dataset API does not require me to explicitly build/pass an encoder.
You need something like:
case class Before(part: Int, id: Int)
case class After(part: Int, id: Int, newCol: String)
import spark.implicits._
// Note column names/types must match case class constructor parameters.
val beforeDS = <however you obtain your input DF>.as[Before]
def f_row(it: Iterator[Before]): Iterator[After] = ???
beforeDS.reparition($"part").sortWithinPartitions($"id").mapPartitions(f_row).show
I found below explanation sufficient, maybe it will be useful for others.
mapPartitions requires Encoder because otherwise it cannot construct Dataset from iterator or Rows. Even though each row has a schema, that shema cannot be derived (used) by constructor of Dataset[U].
def mapPartitions[U : Encoder](func: Iterator[T] => Iterator[U]): Dataset[U] = {
new Dataset[U](
sparkSession,
MapPartitions[T, U](func, logicalPlan),
implicitly[Encoder[U]])
}
On the other hand, without calling mapPartitions Spark can use the schema derived from initial query because structure (metadata) of the original columns is not changed.
I described alternatives in this answer: https://stackoverflow.com/a/53177628/7869491.
I am trying to implement generic grouping using Slick 3.2.3. By generic grouping I mean grouping the same query by different parameters or sets thereof.
Supposing I have a table:
class MyTable(tag: Tag) extends Table[MyEntry](tag, "my_table") {
def text1 = column[String]("text1")
def text2 = column[Option[String]]("text2")
def list = column[List[String]]("list") // I am using postgres+slick_pg
...
}
Then I have a complex query with several joins and I would like to be able to group it by text1, (text1, text2), list etc. One way to do it would be to define a generic function which performs grouping using extractor parameter:
private def getData[T](extractor: MyTable => T) = {
// supposing MyTable comes second in the list
// of joined tables in my complex query
val groupedQuery = myComplexQuery.groupedBy(x => extractor(x._2))
...
// here goes aggregation functions, mapping etc.
}
where one of extractor implementations may be defined as
val extractor: MyTable => (Rep[String], Rep[Option[String]]) = me => me.text1 -> me.text2
However, since extractor is generic, groupBy cannot find matching Shape for T type, and it means that I will have to provide it as well. My question is how exactly to define such Shapes? Documentation for slick.lifted package lacks examples, and it is not exactly obvious what generic types K, T, G and P mean in Query#groupBy definition (or FlatShapeLevel for that matter). I would appreciate if somebody provided examples of such extractor functions at least for a primitive type (String) and a tuple2 (say, (String, Option[String])). Or perhaps there is a better way to achieve the same result which I have overlooked? Thanks.
I have a database for objects called Campaigns containing three fields :
Id (int, not nullable)
Version (int, not nullable)
Stuff (Text, nullable)
Let's call CampaignsRow the corresponding slick entity class
When I select line from Campaigns, I don't always need to read stuff, which contains big chunks of text.
However, I'd very much like to work in the codebase with the class CampaignsRow instead of a tuple, and so to be able to sometimes just drop the Stuff column, while retaining the original type
Basically, I'm trying to write the following function :
//Force dropping the Stuff column from the current Query
def smallCampaign(campaigns: Query[Campaigns, CampaignsRow, Seq]): Query[Campaigns, CampaignsRow, Seq] = {
val smallCampaignQuery = campaigns.map {
row => CampaignsRow(row.id, row.version , None : Option[String])
}
smallCampaignQuery /* Fails because the type is now wrong, I have a Query[(Rep[Int], Rep[Int], Rep[Option[String]), (Int, Int, Option[String], Seq] */
}
Any idea how to do this ? I suspect this has to do with Shape in slick, but I can't find a resource to start understanding this class, and the slick source code is proving too complex for me to follow.
You're actually already doing almost what you want in def *, the default mapping. You can use the same tools in the map method. Your two tools are mapTo and <>.
As you've found, there is the mapTo method which you can only use if your case class exactly matches the shape of the tuple, so if you wanted a special case class just for this purpose:
case class CampaignLite(id: Int, version: Int)
val smallCampaignQuery = campaigns.map {
row => (row.id, row.version).mapTo[CampaignLite]
}
As you want to reuse your existing class, you can write your own convert functions instead of using the standard tupled and unapply and pass those to <>:
object CampaignRow {
def tupleLite(t: (Int, Int)) = CampaignRow(t._1, t._2, None)
def unapplyLite(c: CampaignRow) = Some((c.id, c.version))
}
val smallCampaignQuery = campaigns.map {
row => (row.id, row.version) <> (CampaignRow.tupleLite, CampaignRow.unapplyLite)
}
This gives you the most flexibility, as you can do whatever you like in your convert functions, but it's a bit more wordy.
As row is an instance of the Campaigns table you could always define it there alongside *, if you need to use it regularly.
class Campaigns ... {
...
def * = (id, version, stuff).mapTo[CampaignRow]
def liteMapping = (id, version) <> (CampaignRow.tupleLite, CampaignRow.unapplyLite)
}
val liteCampaigns = campaigns.map(_.liteMapping)
Reference: Essential Slick 3, section 5.2.1
If I understand your requirement correctly, you could consider making CampaignRow a case class that models your Campaigns table class by having Campaigns extend Table[CampaignRow] and providing the bidirectional mapping for the * projection:
case class CampaignRow(id: Int, version: Int, stuff: Option[String])
class Campaigns(tag: Tag) extends Table[CampaignRow](tag, "CAMPAIGNS") {
// ...
def * = (id, version, stuff) <> (CampaignRow.tupled, CampaignRow.unapply)
}
You should then be able to do something like below:
val campaigns = TableQuery[CampaignRow]
val smallCampaignQuery = campaigns.map( _.copy(stuff = None) )
For a relevant example, here's a Slick doc.
I'm experimenting with scalikejdbc (trying to move from Slick), and I'm stuck on creating my schema from the entities (read: case classes).
// example Slick equivalent
case class X(id: Int, ...)
class XTable(tag: Tag) extends Table[X] (tag, "x") {
def id = column[Int]("id")
... //more columns
def * = (id, ...) <> (X.tupled, X.unapply)
}
val xTable = TableQuery[XTable]
db.run(xtable.schema.create) //creates in the DB a table named "x", with "id" and all other columns
It seemed like using SQLSyntaxSupport could be a step in the right direction, with something like
// scalikejdbc
case class X (id: Int, ...)
object X extends SQLSyntaxSupport[X] {
def apply (x: ResultName[X])(rs: WrappedResultSet): X = new X(id = rs.get(x.id, ...))
}
X.table.??? // what to do now?
but could not figure out the next step.
What I'm looking for is the opposite of the tool described under [Reverse-engineering]: http://scalikejdbc.org/documentation/reverse-engineering.html
Any help/ideas, in particular directions to a relevant part of the documentation, will be appreciated
You can use the the statements method to get the SQL code, like for
most other SQL-based Actions. Schema Actions are currently the only
Actions that can produce more than one statement.
schema.create.statements.foreach(println)
schema.drop.statements.foreach(println)
http://slick.typesafe.com/doc/3.0.0/schemas.html
I'm new to SLICK (2.1) and am lost in creating my first query using union. Because the parameters are provided from external (via a web interface) eventually, I set them as optional. Please see the comment in the code below. How to create an appropriate query?
My actual class is more complex which I simplified for the sake of this question.
case class MyStuff(id: Int, value: Int, text: String)
class MyTable (tag: Tag) extends Table[MyStuff](tag, "MYSTUFF"){
def id = column[Int]("ID", O NotNull)
def value = column[Int]("VALUE", O NotNull)
def text = column[String]("TEXT", O NotNull)
def * =
(id,
value,
text).shaped <> ((MyStuff.apply _).tupled, MyStuff.unapply)
}
object myTable extends TableQuery(new MyTable(_)){
def getStuff(ids: Option[List[Int]], values: Option[List[Int]])(implicit session: Session): Option[List[MyStuff]] = {
/*
1) If 'ids' are given, retrieve all matching entries, if any.
2) If 'values' are given, retrieve all matching entries (if any), union with the results of the previous step, and remove duplicate entries.
4) If neither 'ids' nor 'values' are given, retrieve all entries.
*/
}
}
getStuff is called like this:
db: Database withSession { implicit session => val myStuff = myTable.getStuff(...) }
You can use inset if the values are Some, otherwise a literal false and only filter when something is not None.
if(ids.isDefined || values.isDefined)
myTable.filter(row =>
ids.map(row.id inSet _).getOrElse(slick.lifted.LiteralColumn(false))
) union myTable.filter(row =>
values.map(row.value inSet _).getOrElse(slick.lifted.LiteralColumn(false))
)
else myTable
If I understand you correctly you want to build a filter at runtime from the given input. You can look at the extended docs for 3.0 (http://slick.typesafe.com/doc/3.0.0-RC1/queries.html#sorting-and-filtering) at "building criteria using a "dynamic filter" e.g. from a webform". This part of the docs is also valid for version 2.1.