I am trying to implement simple db query with optional pagination. My try:
def getEntities(limit: Option[Int], offset: Option[Int]) = {
// MyTable is a slick definition of the db table
val withLimit = limit.fold(MyTable)(l => MyTable.take(l)) // Error here.
// Mytable and MyTable.take(l)
// has various types
val withOffset = offset.fold(withLimit)(o => withLimit.drop(o))
val query = withOffset.result
db.run(query)
}
The problem is an error:
type mismatch:
found: slick.lifted.Query
required: slick.lifted.TableQuery
How to make this code runnable? And maybe a little bit prettier?
My current fix to get Query from TableQuery is to add .filter(_ => true), but IMHO this is not nice:
val withLimit = limit.fold(MyTable.filter(_ => true))(l => MyTable.take(l))
Try to replace
val MyTable = TableQuery[SomeTable]
with
val MyTable: Query[SomeTable, SomeTable#TableElementType, Seq] = TableQuery[SomeTable]
i.e. to specify the type (statically upcast TableQuery to Query).
Related
I am using an Aggregator to apply some custom merge on a DataFrame after grouping its records by their primary key:
case class Player(
pk: String,
ts: String,
first_name: String,
date_of_birth: String
)
case class PlayerProcessed(
var ts: String,
var first_name: String,
var date_of_birth: String
)
// Cutomer Aggregator -This just for the example, actual one is more complex
object BatchDedupe extends Aggregator[Player, PlayerProcessed, PlayerProcessed] {
def zero: PlayerProcessed = PlayerProcessed("0", null, null)
def reduce(bf: PlayerProcessed, in : Player): PlayerProcessed = {
bf.ts = in.ts
bf.first_name = in.first_name
bf.date_of_birth = in.date_of_birth
bf
}
def merge(bf1: PlayerProcessed, bf2: PlayerProcessed): PlayerProcessed = {
bf1.ts = bf2.ts
bf1.first_name = bf2.first_name
bf1.date_of_birth = bf2.date_of_birth
bf1
}
def finish(reduction: PlayerProcessed): PlayerProcessed = reduction
def bufferEncoder: Encoder[PlayerProcessed] = Encoders.product
def outputEncoder: Encoder[PlayerProcessed] = Encoders.product
}
val ply1 = Player("12121212121212", "10000001", "Rogger", "1980-01-02")
val ply2 = Player("12121212121212", "10000002", "Rogg", null)
val ply3 = Player("12121212121212", "10000004", null, "1985-01-02")
val ply4 = Player("12121212121212", "10000003", "Roggelio", "1982-01-02")
val seq_users = sc.parallelize(Seq(ply1, ply2, ply3, ply4)).toDF.as[Player]
val grouped = seq_users.groupByKey(_.pk)
val non_sorted = grouped.agg(BatchDedupe.toColumn.name("deduped"))
non_sorted.show(false)
This returns:
+--------------+--------------------------------+
|key |deduped |
+--------------+--------------------------------+
|12121212121212|{10000003, Roggelio, 1982-01-02}|
+--------------+--------------------------------+
Now, I would like to order the records based on ts before aggregating them. From here I understand that .sortBy("ts") do not guarantee the order after the .groupByKey(_.pk). So I was trying to apply the .sortBy between the .groupByKey and the .agg
The output of the .groupByKey(_.pk) is a KeyValueGroupedDataset[String,Player], being the second element an Iterator. So to apply some sorting logic there I convert it into a Seq:
val sorted = grouped.mapGroups{case(k, iter) => (k, iter.toSeq.sortBy(_.ts))}.agg(BatchDedupe.toColumn.name("deduped"))
sorted.show(false)
However, the output of .mapGroups after adding the sorting logic is a Dataset[(String, Seq[Player])]. So when I try to invoke the .agg function on it I am getting the following exception:
Caused by: ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast to $line050e0d37885948cd91f7f7dd9e3b4da9311.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$Player
How could I convert back the output of my .mapGroups(...) into a KeyValueGroupedDataset[String,Player]?
I tried to cast back to Iterator as follows:
val sorted = grouped.mapGroups{case(k, iter) => (k, iter.toSeq.sortBy(_.ts).toIterator)}.agg(BatchDedupe.toColumn.name("deduped"))
But this approach produced the following exception:
UnsupportedOperationException: No Encoder found for Iterator[Player]
- field (class: "scala.collection.Iterator", name: "_2")
- root class: "scala.Tuple2"
How else can I add the sort logic between the .groupByKey and .agg methods?
Based on the discussion above, the purpose of the Aggregator is to get the latest field values per Player by ts ignoring null values.
This can be achieved fairly easily aggregating all fields individually using max_by. With that there's no need for a custom Aggregator nor the mutable aggregation buffer.
import org.apache.spark.sql.functions._
val players: Dataset[Player] = ...
// aggregate all columns except the key individually by ts
// NULLs will be ignored (SQL standard)
val aggColumns = players.columns
.filterNot(_ == "pk")
.map(colName => expr(s"max_by($colName, if(isNotNull($colName), ts, null))").as(colName))
val aggregatedPlayers = players
.groupBy(col("pk"))
.agg(aggColumns.head, aggColumns.tail: _*)
.as[Player]
On the most recent versions of Spark you can also use the build in max_by expression:
import org.apache.spark.sql.functions._
val players: Dataset[Player] = ...
// aggregate all columns except the key individually by ts
// NULLs will be ignored (SQL standard)
val aggColumns = players.columns
.filterNot(_ == "pk")
.map(colName => max_by(col(colName), when(col(colName).isNotNull, col("ts"))).as(colName))
val aggregatedPlayers = players
.groupBy(col("pk"))
.agg(aggColumns.head, aggColumns.tail: _*)
.as[Player]
I am trying to construct a temporary column from an expensive UDF that I need to run on each row of my Dataset[Row]. Currently it looks something like:
val myUDF = udf((values: Array[Byte], schema: String) => {
val list = new MyDecoder(schema).decode(values)
val myMap = list.map(
(value: SomeStruct) => (value.field1, value.field2)
).toMap
val field3 = list.head.field3
return (myMap, field3)
})
val decoded = myDF.withColumn("decoded_tmp", myUDF(col("data"), lit(schema))
.withColumn("myMap", col("decoded_tmp._1"))
.withColumn("field3", col("decoded_tmp._2"))
.drop("decoded_tmp")
However, when I try to compile this, I get a type mismatch error:
type mismatch;
found : (scala.collection.immutable.Map[String,Double], String)
required: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]
How can I get around this, or will I have to have 2 expensive UDF functions, one to produce myMap and the other to produce the field3 column?
Whenever I get an update request for a given id , I am trying to update the masterId and the updatedDtTm columns in a DB table( I don't want to update my createdDtTm). The following is my code :
case class Master(id:Option[Long] = None,masterId:String,createdDtTm:Option[java.util.Date],
updatedDtTm:Option[java.util.Date])
/**
* This is my Slick Mapping table
* with the default projection
*/
`class MappingMaster(tag:Tag) extends
Table[Master](tag,"master") {
implicit val DateTimeColumnType = MappedColumnType.base[java.util.Date, java.sql.Timestamp](
{
ud => new Timestamp(ud.getTime)
}, {
sd => new java.util.Date(sd.getTime)
})
def id = column[Long]("id",O.PrimaryKey,O.AutoInc)
def masterId = column[String]("master_id")
def createdDtTm = column[java.util.Date]("created_dttm")
def updatedDtTm = column[java.util.Date]("updated_dttm")
def * = (id.? , masterId , createdDtTm.? , updatedDtTm.?) <>
((Master.apply _).tupled , Master.unapply _) }
/**
* Some where in the DAO update call
*/
db.run(masterRecords.filter(_.id === id).map(rw =>(rw.masterId,rw.updatedDtTm)).
update(("new_master_id",new Date()))
// I also tried the following
db.run(masterRecords.filter(_id === id).map(rw => (rw.masterId,rw.updatedDtTm).shaped[(String,java.util.Date)]).update(("new_master_id",new Date()))
The documentation of Slick states that inorder to update multiple columns one needs to use the map to get the corresponding columns and then update them.
The problem here is the following - the update method seems to be accepting a value of Nothing.
I also tried the following which was doing the same thing as above:
val t = for {
ms <- masterRecords if (ms.id === "1234")
} yield (ms.masterId , ms.updateDtTm)
db.run(t.update(("new_master_id",new Date())))
When I compile the code , it gives me the following Compilation Exception :
Slick does not know how to map the given types.
[error] Possible causes: T in Table[T] does not match your * projection. Or you use an unsupported type in a Query (e.g. scala List).
[error] Required level: slick.lifted.FlatShapeLevel
[error] Source type: (slick.lifted.Rep[String], slick.lifted.Rep[java.util.Date])
[error] Unpacked type: (String, java.util.Date)
[error] Packed type: Any
[error] db.run(masterRecords.filter(_id === id).map(rw => (rw.masterId,rw.updatedDtTm).shaped[(String,java.util.Date)]).update(("new_master_id",new Date()))
I am using Scala 2.11 with Slick 3.0.1 and IntelliJ as the IDE. Really appreciate if you can throw some light on this.
Cheers,
Sathish
(Replaces original answer) It seems the implicit has to be in scope for the queries, this compiles:
case class Master(id:Option[Long] = None,masterId:String,createdDtTm:Option[java.util.Date],
updatedDtTm:Option[java.util.Date])
implicit val DateTimeColumnType = MappedColumnType.base[java.util.Date, java.sql.Timestamp](
{
ud => new Timestamp(ud.getTime)
}, {
sd => new java.util.Date(sd.getTime)
})
class MappingMaster(tag:Tag) extends Table[Master](tag,"master") {
def id = column[Long]("id",O.PrimaryKey,O.AutoInc)
def masterId = column[String]("master_id")
def createdDtTm = column[java.util.Date]("created_dttm")
def updatedDtTm = column[java.util.Date]("updated_dttm")
def * = (id.? , masterId , createdDtTm.? , updatedDtTm.?) <> ((Master.apply _).tupled , Master.unapply _)
}
private val masterRecords = TableQuery[MappingMaster]
val id: Long = 123
db.run(masterRecords.filter(_.id === id).map(rw =>(rw.masterId,rw.updatedDtTm)).update("new_master_id",new Date()))
val t = for {
ms <- masterRecords if (ms.id === id)
} yield (ms.masterId , ms.updatedDtTm)
db.run(t.update(("new_master_id",new Date())))
I have methods in my Play app that query database tables with over hundred columns. I can't define case class for each such query, because it would be just ridiculously big and would have to be changed with each alter of the table on the database.
I'm using this approach, where result of the query looks like this:
Map(columnName1 -> columnVal1, columnName2 -> columnVal2, ...)
Example of the code:
implicit val getListStringResult = GetResult[List[Any]] (
r => (1 to r.numColumns).map(_ => r.nextObject).toList
)
def getSomething(): Map[String, Any] = DB.withSession {
val columns = MTable.getTables(None, None, None, None).list.filter(_.name.name == "myTable").head.getColumns.list.map(_.column)
val result = sql"""SELECT * FROM myTable LIMIT 1""".as[List[Any]].firstOption.map(columns zip _ toMap).get
}
This is not a problem when query only runs on a single database and single table. I need to be able to use multiple tables and databases in my query like this:
def getSomething(): Map[String, Any] = DB.withSession {
//The line below is no longer valid because of multiple tables/databases
val columns = MTable.getTables(None, None, None, None).list.filter(_.name.name == "table1").head.getColumns.list.map(_.column)
val result = sql"""
SELECT *
FROM db1.table1
LEFT JOIN db2.table2 ON db2.table2.col1 = db1.table1.col1
LIMIT 1
""".as[List[Any]].firstOption.map(columns zip _ toMap).get
}
The same approach can no longer be used to retrieve column names. This problem doesn't exist when using something like PHP PDO or Java JDBCTemplate - these retrieve column names without any extra effort needed.
My question is: how do I achieve this with Slick?
import scala.slick.jdbc.{GetResult,PositionedResult}
object ResultMap extends GetResult[Map[String,Any]] {
def apply(pr: PositionedResult) = {
val rs = pr.rs // <- jdbc result set
val md = rs.getMetaData();
val res = (1 to pr.numColumns).map{ i=> md.getColumnName(i) -> rs.getObject(i) }.toMap
pr.nextRow // <- use Slick's advance method to avoid endless loop
res
}
}
val result = sql"select * from ...".as(ResultMap).firstOption
Another variant that produces map with not null columns (keys in lowercase):
private implicit val getMap = GetResult[Map[String, Any]](r => {
val metadata = r.rs.getMetaData
(1 to r.numColumns).flatMap(i => {
val columnName = metadata.getColumnName(i).toLowerCase
val columnValue = r.nextObjectOption
columnValue.map(columnName -> _)
}).toMap
})
I have a simple method to retrieve a user from a db with Sclick plain SQL method:
object Data {
implicit val getListStringResult = GetResult[List[String]] (
prs => (1 to prs.numColumns).map(_ => prs.nextString).toList
)
def getUser(id: Int): Option[List[String]] = DB.withSession {
sql"""SELECT * FROM "user" WHERE "id" = $id""".as[List[String]].firstOption
}
}
The result is List[String] but I would like it to be something like Map[String, String] - column name and value pair map. Is this possible? If so, how?
My stack is Play Framework 2.2.1, Slick 1.0.1, Scala 2.10.3, Java 8 64bit
import scala.slick.jdbc.meta._
val columns = MTable.getTables(None, None, None, None)
.list.filter(_.name.name == "USER") // <- upper case table name
.head.getColumns.list.map(_.column)
val user = sql"""SELECT * FROM "user" WHERE "id" = $id""".as[List[String]].firstOption
.map( columns zip _ toMap )
It's possible to do this without querying the table metadata as follows:
implicit val resultAsStringMap = GetResult[Map[String,String]] ( prs =>
(1 to prs.numColumns).map(_ =>
prs.rs.getMetaData.getColumnName(prs.currentPos+1) -> prs.nextString
).toMap
)
It's also possible to build a Map[String,Any] in the same way but that's obviously more complicated.