Filling a Scala immutable Map from a database table - scala

I have a SQL database table with the following structure:
create table category_value (
category varchar(25),
property varchar(25)
);
I want to read this into a Scala Map[String, Set[String]] where each entry in the map is a set of all of the property values that are in the same category.
I would like to do it in a "functional" style with no mutable data (other than the database result set).
Following on the Clojure loop construct, here is what I have come up with:
def fillMap(statement: java.sql.Statement): Map[String, Set[String]] = {
val resultSet = statement.executeQuery("select category, property from category_value")
#tailrec
def loop(m: Map[String, Set[String]]): Map[String, Set[String]] = {
if (resultSet.next) {
val category = resultSet.getString("category")
val property = resultSet.getString("property")
loop(m + (category -> m.getOrElse(category, Set.empty)))
} else m
}
loop(Map.empty)
}
Is there a better way to do this, without using mutable data structures?

If you like, you could try something around
def fillMap(statement: java.sql.Statement): Map[String, Set[String]] = {
val resultSet = statement.executeQuery("select category, property from category_value")
Iterator.continually((resultSet, resultSet.next)).takeWhile(_._2).map(_._1).map{ res =>
val category = res.getString("category")
val property = res.getString("property")
(category, property)
}.toIterable.groupBy(_._1).mapValues(_.map(_._2).toSet)
}
Untested, because I don’t have a proper sql.Statement. And the groupBy part might need some more love to look nice.
Edit: Added the requested changes.

There are two parts to this problem.
Getting the data out of the database and into a list of rows.
I would use a Spring SimpleJdbcOperations for the database access, so that things at least appear functional, even though the ResultSet is being changed behind the scenes.
First, some a simple conversion to let us use a closure to map each row:
implicit def rowMapper[T<:AnyRef](func: (ResultSet)=>T) =
new ParameterizedRowMapper[T]{
override def mapRow(rs:ResultSet, row:Int):T = func(rs)
}
Then let's define a data structure to store the results. (You could use a tuple, but defining my own case class has advantage of being just a little bit clearer regarding the names of things.)
case class CategoryValue(category:String, property:String)
Now select from the database
val db:SimpleJdbcOperations = //get this somehow
val resultList:java.util.List[CategoryValue] =
db.query("select category, property from category_value",
{ rs:ResultSet => CategoryValue(rs.getString(1),rs.getString(2)) } )
Converting the data from a list of rows into the format that you actually want
import scala.collection.JavaConversions._
val result:Map[String,Set[String]] =
resultList.groupBy(_.category).mapValues(_.map(_.property).toSet)
(You can omit the type annotations. I've included them to make it clear what's going on.)

Builders are built for this purpose. Get one via the desired collection type companion, e.g. HashMap.newBuilder[String, Set[String]].

This solution is basically the same as my other solution, but it doesn't use Spring, and the logic for converting a ResultSet to some sort of list is simpler than Debilski's solution.
def streamFromResultSet[T](rs:ResultSet)(func: ResultSet => T):Stream[T] = {
if (rs.next())
func(rs) #:: streamFromResultSet(rs)(func)
else
rs.close()
Stream.empty
}
def fillMap(statement:java.sql.Statement):Map[String,Set[String]] = {
case class CategoryValue(category:String, property:String)
val resultSet = statement.executeQuery("""
select category, property from category_value
""")
val queryResult = streamFromResultSet(resultSet){rs =>
CategoryValue(rs.getString(1),rs.getString(2))
}
queryResult.groupBy(_.category).mapValues(_.map(_.property).toSet)
}

There is only one approach I can think of that does not include either mutable state or extensive copying*. It is actually a very basic technique I learnt in my first term studying CS. Here goes, abstracting from the database stuff:
def empty[K,V](k : K) : Option[V] = None
def add[K,V](m : K => Option[V])(k : K, v : V) : K => Option[V] = q => {
if ( k == q ) {
Some(v)
}
else {
m(q)
}
}
def build[K,V](input : TraversableOnce[(K,V)]) : K => Option[V] = {
input.foldLeft(empty[K,V]_)((m,i) => add(m)(i._1, i._2))
}
Usage example:
val map = build(List(("a",1),("b",2)))
println("a " + map("a"))
println("b " + map("b"))
println("c " + map("c"))
> a Some(1)
> b Some(2)
> c None
Of course, the resulting function does not have type Map (nor any of its benefits) and has linear lookup costs. I guess you could implement something in a similar way that mimicks simple search trees.
(*) I am talking concepts here. In reality, things like value sharing might enable e.g. mutable list constructions without memory overhead.

Related

request timeout from flatMapping over cats.effect.IO

I am attempting to transform some data that is encapsulated in cats.effect.IO with a Map that also is in an IO monad. I'm using http4s with blaze server and when I use the following code the request times out:
def getScoresByUserId(userId: Int): IO[Response[IO]] = {
implicit val formats = DefaultFormats + ShiftJsonSerializer() + RawShiftSerializer()
implicit val shiftJsonReader = new Reader[ShiftJson] {
def read(value: JValue): ShiftJson = value.extract[ShiftJson]
}
implicit val shiftJsonDec = jsonOf[IO, ShiftJson]
// get the shifts
var getDbShifts: IO[List[Shift]] = shiftModel.findByUserId(userId)
// use the userRoleId to get the RoleId then get the tasks for this role
val taskMap : IO[Map[String, Double]] = taskModel.findByUserId(userId).flatMap {
case tskLst: List[Task] => IO(tskLst.map((task: Task) => (task.name -> task.standard)).toMap)
}
val traversed: IO[List[Shift]] = for {
shifts <- getDbShifts
traversed <- shifts.traverse((shift: Shift) => {
val lstShiftJson: IO[List[ShiftJson]] = read[List[ShiftJson]](shift.roleTasks)
.map((sj: ShiftJson) =>
taskMap.flatMap((tm: Map[String, Double]) =>
IO(ShiftJson(sj.name, sj.taskType, sj.label, sj.value.toString.toDouble / tm.get(sj.name).get)))
).sequence
//TODO: this flatMap is bricking my request
lstShiftJson.flatMap((sjLst: List[ShiftJson]) => {
IO(Shift(shift.id, shift.shiftDate, shift.shiftStart, shift.shiftEnd,
shift.lunchDuration, shift.shiftDuration, shift.breakOffProd, shift.systemDownOffProd,
shift.meetingOffProd, shift.trainingOffProd, shift.projectOffProd, shift.miscOffProd,
write[List[ShiftJson]](sjLst), shift.userRoleId, shift.isApproved, shift.score, shift.comments
))
})
})
} yield traversed
traversed.flatMap((sLst: List[Shift]) => Ok(write[List[Shift]](sLst)))
}
as you can see the TODO comment. I've narrowed down this method to the flatmap below the TODO comment. If I remove that flatMap and merely return "IO(shift)" to the traversed variable the request does not timeout; However, that doesn't help me much because I need to make use of the lstShiftJson variable which has my transformed json.
My intuition tells me I'm abusing the IO monad somehow, but I'm not quite sure how.
Thank you for your time in reading this!
So with the guidance of Luis's comment I refactored my code to the following. I don't think it is optimal (i.e. the flatMap at the end seems unecessary, but I couldnt' figure out how to remove it. BUT it's the best I've got.
def getScoresByUserId(userId: Int): IO[Response[IO]] = {
implicit val formats = DefaultFormats + ShiftJsonSerializer() + RawShiftSerializer()
implicit val shiftJsonReader = new Reader[ShiftJson] {
def read(value: JValue): ShiftJson = value.extract[ShiftJson]
}
implicit val shiftJsonDec = jsonOf[IO, ShiftJson]
// FOR EACH SHIFT
// - read the shift.roleTasks into a ShiftJson object
// - divide each task value by the task.standard where task.name = shiftJson.name
// - write the list of shiftJson back to a string
val traversed = for {
taskMap <- taskModel.findByUserId(userId).map((tList: List[Task]) => tList.map((task: Task) => (task.name -> task.standard)).toMap)
shifts <- shiftModel.findByUserId(userId)
traversed <- shifts.traverse((shift: Shift) => {
val lstShiftJson: List[ShiftJson] = read[List[ShiftJson]](shift.roleTasks)
.map((sj: ShiftJson) => ShiftJson(sj.name, sj.taskType, sj.label, sj.value.toString.toDouble / taskMap.get(sj.name).get ))
shift.roleTasks = write[List[ShiftJson]](lstShiftJson)
IO(shift)
})
} yield traversed
traversed.flatMap((t: List[Shift]) => Ok(write[List[Shift]](t)))
}
Luis mentioned that mapping my List[Shift] to a Map[String, Double] is a pure operation so we want to use a map instead of flatMap.
He mentioned that I'm wrapping every operation that comes from the database in IO which is causing a great deal of recomputation. (including DB transactions)
To solve this issue I moved all of the database operations inside of my for loop, using the "<-" operator to flatMap each of the return values allows the variables being used to preside within the IO monads, hence preventing the recomputation experienced before.
I do think there must be a better way of returning my return value. flatMapping the "traversed" variable to get back inside of the IO monad seems to be unnecessary recomputation, so please anyone correct me.

How to functionally handle a logging side effect

I want to log in the event that a record doesn't have an adjoining record. Is there a purely functional way to do this? One that separates the side effect from the data transformation?
Here's an example of what I need to do:
val records: Seq[Record] = Seq(record1, record2, ...)
val accountsMap: Map[Long, Account] = Map(record1.id -> account1, ...)
def withAccount(accountsMap: Map[Long, Account])(r: Record): (Record, Option[Account]) = {
(r, accountsMap.get(r.id))
}
def handleNoAccounts(tuple: (Record, Option[Account]) = {
val (r, a) = tuple
if (a.isEmpty) logger.error(s"no account for ${record.id}")
tuple
}
def toRichAccount(tuple: (Record, Option[Account]) = {
val (r, a) = tuple
a.map(acct => RichAccount(r, acct))
}
records
.map(withAccount(accountsMap))
.map(handleNoAccounts) // if no account is found, log
.flatMap(toRichAccount)
So there are multiple issues with this approach that I think make it less than optimal.
The tuple return type is clumsy. I have to destructure the tuple in both of the latter two functions.
The logging function has to handle the logging and then return the tuple with no changes. It feels weird that this is passed to .map even though no transformation is taking place -- maybe there is a better way to get this side effect.
Is there a functional way to clean this up?
I could be wrong (I often am) but I think this does everything that's required.
records
.flatMap(r =>
accountsMap.get(r.id).fold{
logger.error(s"no account for ${r.id}")
Option.empty[RichAccount]
}{a => Some(RichAccount(r,a))})
If you're using scala 2.13 or newer you could use tapEach, which takes function A => Unit to apply side effect on every element of function and then passes collection unchanged:
//you no longer need to return tuple in side-effecting function
def handleNoAccounts(tuple: (Record, Option[Account]): Unit = {
val (r, a) = tuple
if (a.isEmpty) logger.error(s"no account for ${record.id}")
}
records
.map(withAccount(accountsMap))
.tapEach(handleNoAccounts) // if no account is found, log
.flatMap(toRichAccount)
In case you're using older Scala, you could provide extension method (updated according to Levi's Ramsey suggestion):
implicit class SeqOps[A](s: Seq[A]) {
def tapEach(f: A => Unit): Seq[A] = {
s.foreach(f)
s
}
}

Slick: how to implement find by example i.e. findByExample generically?

I'm exploring the different possibilities on how to implement a generic DAO using the latest Slick 3.1.1 to boost productivity and yes there is need for it because basing the service layer of my Play Web application on TableQuery alone leads to a lot of boilerplate code. One of the methods I'd like to feature in my generic DAO implementation is the findByExample, possible in JPA with the help of the Criteria API. In my case, I'm using the Slick Code Generator to generate the model classes from a sql script.
I need the following to be able to dynamically access the attribute names, taken from Scala. Get field names list from case class:
import scala.reflect.runtime.universe._
def classAccessors[T: TypeTag]: List[MethodSymbol] = typeOf[T].members.collect {
case m: MethodSymbol if m.isCaseAccessor => m
}.toList
A draft implementation for findByExample would be:
def findByExample[T, R](example: R) : Future[Seq[R]] = {
var qt = TableQuery[T].result
val accessors = classAccessors[R]
(0 until example.productArity).map { i =>
example.productElement(i) match {
case None => // ignore
case 0 => // ignore
// ... some more default values => // ignore
// handle a populated case
case Some(x) => {
val columnName = accessors(i)
qt = qt.filter(_.columnByName(columnName) == x)
}
}
}
qt.result
}
But this doesn't work because I need better Scala Kungfu. T is the entity table type and R is the row type that is generated as a case class and therefore a valid Scala Product type.
The first problem in that code is that would be too inefficient because instead of doing e.g.
qt.filter(_.firstName === "Juan" && _.streetName === "Rosedale Ave." && _.streetNumber === 5)
is doing:
// find all
var qt = TableQuery[T].result
// then filter by each column at the time
qt = qt.filter(_.firstName === "Juan")
qt = qt.filter(_.streetName === "Rosedale Ave.")
qt = qt.filter(_.streetNumber === 5)
Second I can't see how to dynamically access the column name in the filter method i.e.
qt.filter(_.firstName == "Juan")
I need to instead have
qt.filter(_.columnByName("firstName") == "Juan")
but apparently there is no such possibility while using the filter function?
Probably the best ways to implement filters and sorting by dynamically provided column names would be either plain SQL or extending the code generator to generate extension methods, something like this:
implicit class DynamicPersonQueries[C[_]](q: Query[PersonTable, PersonRow, C]){
def dynamicFilter( column: String, value: String ) = column {
case "firstName" => q.filter(_.firstName === value)
case "streetNumber" => q.filter(_.streetNumber === value.toInt)
...
}
}
You might have to fiddle with the types a bit to get it to compile (and ideally update this post afterwards :)).
You can then filter by all the provided values like this:
val examples: Map[String, String] = ...
val t = TableQuery[PersonTable]
val query = examples.foldLeft(t){case (t,(column, value)) => t.dynamicFilter(column, value)
query.result
Extending the code generator is explained here: http://slick.lightbend.com/doc/3.1.1/code-generation.html#customization
After further researching found the following blog post Repository Pattern / Generic DAO Implementation.
There they declare and implement a generic filter method that works for any Model Entity type and therefore it is in my view a valid functional replacement to the more JPA findByExample.
i.e.
T <: Table[E] with IdentifyableTable[PK]
E <: Entity[PK]
PK: BaseColumnType
def filter[C <: Rep[_]](expr: T => C)(implicit wt: CanBeQueryCondition[C]) : Query[T, E, Seq] = tableQuery.filter(expr)

Access database column names from a Table?

Let's say I have a table:
object Suppliers extends Table[(Int, String, String, String)]("SUPPLIERS") {
def id = column[Int]("SUP_ID", O.PrimaryKey)
def name = column[String]("SUP_NAME")
def state = column[String]("STATE")
def zip = column[String]("ZIP")
def * = id ~ name ~ state ~ zip
}
Table's database name
The table's database name can be accessed by going: Suppliers.tableName
This is supported by the Scaladoc on AbstractTable.
For example, the above table's database name is "SUPPLIERS".
Columns' database names
Looking through AbstractTable, getLinearizedNodes and indexes looked promising. No column names in their string representations though.
I assume that * means "all the columns I'm usually interested in." * is a MappedProjection, which has this signature:
final case class MappedProjection[T, P <: Product](
child: Node,
f: (P) ⇒ T,
g: (T) ⇒ Option[P])(proj: Projection[P])
extends ColumnBase[T] with UnaryNode with Product with Serializable
*.getLinearizedNodes contains a huge sequence of numbers, and I realized that at this point I'm just doing a brute force inspection of everything in the API for possibly finding the column names in the String.
Has anybody also encountered this problem before, or could anybody give me a better understanding of how MappedProjection works?
It requires you to rely on Slick internals, which may change between versions, but it is possible. Here is how it works for Slick 1.0.1: You have to go via the FieldSymbol. Then you can extract the information you want like how columnInfo(driver: JdbcDriver, column: FieldSymbol): ColumnInfo does it.
To get a FieldSymbol from a Column you can use fieldSym(node: Node): Option[FieldSymbol] and fieldSym(column: Column[_]): FieldSymbol.
To get the (qualified) column names you can simply do the following:
Suppliers.id.toString
Suppliers.name.toString
Suppliers.state.toString
Suppliers.zip.toString
It's not explicitly stated anywhere that the toString will yield the column name, so your question is a valid one.
Now, if you want to programmatically get all the column names, then that's a bit harder. You could try using reflection to get all the methods that return a Column[_] and call toString on them, but it wouldn't be elegant. Or you could hack a bit and get a select * SQL statement from a query like this:
val selectStatement = DB withSession {
Query(Suppliers).selectStatement
}
And then parse our the column names.
This is the best I could do. If someone knows a better way then please share - I'm interested too ;)
Code is based on Lightbend Activator "slick-http-app".
slick version: 3.1.1
Added this method to the BaseDal:
def getColumns(): mutable.Map[String, Type] = {
val columns = mutable.Map.empty[String, Type]
def selectType(t: Any): Option[Any] = t match {
case t: TableExpansion => Some(t.columns)
case t: Select => Some(t.field)
case _ => None
}
def selectArray(t:Any): Option[ConstArray[Node]] = t match {
case t: TypeMapping => Some(t.child.children)
case _ => None
}
def selectFieldSymbol(t:Any): Option[FieldSymbol] = t match {
case t: FieldSymbol => Some(t)
case _ => None
}
val t = selectType(tableQ.toNode)
val c = selectArray(t.get)
for (se <- c.get) {
val col = selectType(se)
val fs = selectFieldSymbol(col.get)
columns += (fs.get.name -> fs.get.tpe)
}
columns
}
this method gets the column names (real names in DB) + types form the TableQ
used imports are:
import slick.ast._
import slick.util.ConstArray

Efficient groupwise aggregation on Scala collections

I often need to do something like
coll.groupBy(f(_)).mapValues(_.foldLeft(x)(g(_,_)))
What is the best way to achieve the same effect, but avoid explicitly constructing the intermediate collections with groupBy?
You could fold the initial collection over a map holding your intermediate results:
def groupFold[A,B,X](as: Iterable[A], f: A => B, init: X, g: (X,A) => X): Map[B,X] =
as.foldLeft(Map[B,X]().withDefaultValue(init)){
case (m,a) => {
val key = f(a)
m.updated(key, g(m(key),a))
}
}
You said collection and I wrote Iterable, but you have to think whether order matters in the fold in your question.
If you want efficient code, you will probably use a mutable map as in Rex' answer.
You can't really do it as a one-liner, so you should be sure you need it before writing something more elaborate like this (written from a somewhat performance-minded view since you asked for "efficient"):
final case class Var[A](var value: A) { }
def multifold[A,B,C](xs: Traversable[A])(f: A => B)(zero: C)(g: (C,A) => C) = {
import scala.collection.JavaConverters._
val m = new java.util.HashMap[B, Var[C]]
xs.foreach{ x =>
val v = {
val fx = f(x)
val op = m.get(fx)
if (op != null) op
else { val nv = Var(zero); m.put(fx, nv); nv }
}
v.value = g(v.value, x)
}
m.asScala.mapValues(_.value)
}
(Depending on your use case you may wish to pack into an immutable map instead in the last step.) Here's an example of it in action:
scala> multifold(List("salmon","herring","haddock"))(_(0))(0)(_ + _.length)
res1: scala.collection.mutable.HashMap[Char,Int] = Map(h -> 14, s -> 6)
Now, you might notice something weird here: I'm using a Java HashMap. This is because Java's HashMaps are 2-3x faster than Scala's. (You can write the equivalent thing with a Scala HashMap, but it doesn't actually make things any faster than your original.) Consequently, this operation is 2-3x faster than what you posted. But unless you're under severe memory pressure, creating the transient collections doesn't actually hurt you much.