Build dynamic query with Slick 2.1.0 - scala

Goal is to filter Items with optional keywords and/or shopId.
If none of them are defined, all Items should be returned.
My attempt is
case class ItemSearchParameters(keywords: Option[String], shopId: Option[Long])
def search(params: ItemSearchParameters): Either[Failure, List[Item]] = {
try {
db withDynSession {
val q = Items.query
if (params.keywords.isDefined) {
q.filter { i =>
((i.title like "%" + params.keywords + "%")
|| (i.description like "%" + params.keywords + "%"))
}
}
if (params.shopId.isDefined) {
q.filter { i =>
i.shopId === params.shopId
}
}
Right(q.run.toList)
}
} catch {
case e: SQLException =>
Left(databaseError(e))
}
}
params.keywords or params.ShopId defined this function returned all Items. Can someone please explain what is wrong?
Update: second attempt
def search(params: ItemSearchParameters): Either[Failure, List[Item]] = {
try {
db withDynSession {
var q = Items.query
q = params.keywords.map{ k => q.filter(_.title like "%" + k + "%")} getOrElse q
q = params.keywords.map{ k => q.filter(_.description like "%" + k + "%")} getOrElse q
q = params.shopId.map{ sid => q.filter(_.shopId === sid)} getOrElse q
Right(q.run.toList)
}
} catch {
case e: SQLException =>
Left(databaseError(e))
}
}
For this second attempt how to do (title OR description) if keywords isDefined?
Update: Third attempt with MaybeFilter Not working
case class MaybeFilter[X, Y](val query: scala.slick.lifted.Query[X, Y, Seq]) {
def filteredBy(op: Option[_])(f:(X) => Column[Option[Boolean]]) = {
op map { o => MaybeFilter(query.filter(f)) } getOrElse { this }
}
}
class ItemDAO extends Configuration {
implicit def maybeFilterConversor[X,Y](q:Query[X,Y,Seq]) = new MaybeFilter(q)
def search(params: ItemSearchParameters): Either[Failure, List[Item]] = {
try {
db withDynSession {
val q = Items
.filteredBy(params.keywords){i => ((i.title like "%" + params.keywords + "%")
|| (i.description like "%" + params.keywords + "%"))}
.filteredBy(params.shopId){_.shopId === params.shopId}
.query
Right(q.list)
}
} catch {
case e: SQLException =>
Left(databaseError(e))
}
}
}
Third attempt returns empty list if keywords is given

def search(params: ItemSearchParameters): Either[Failure, List[Item]] = {
try {
db withDynSession {
var q = Items.query
q = params.keywords.map{ k => q.filter(
i => (i.title like "%" + k + "%")
|| (i.description like "%" + k + "%")
)} getOrElse q
q = params.shopId.map{ sid => q.filter(
_.shopId === sid
)} getOrElse q
Right(q.run.toList)
}
} catch {
case e: SQLException =>
Left(databaseError(e))
}
}
I am not sure it is the best answer because of var q

As I understood you correct, you want to make a filter by optional fields.
Your second attempt is quiet closer to reality, the first has incorrect matching, you compare option fields to non option. You've answered your own answer while I was writing this response :)
I'd like to recommend you this MaybeFilter https://gist.github.com/cvogt/9193220
Or here is modified version: https://github.com/neowinx/hello-slick-2.1-dynamic-filter/blob/master/src/main/scala/HelloSlick.scala#L3-L7
Maybe this can help you to solve your problem in a more generic way.

Related

Advice needed implementing direct and inverse Dijkstra algorithm in Scala/Spark

I'm trying to implement both direct Dijkstra and its inverse version (that is finding longest path instead of shortest ones) but I'm having some trouble because I'm getting infinite distances for not disconnected nodes in the weighted undirected graph (and zero distances in the inverse version).
So far, I trusted in and modified this implementation I found in web: [http://note.yuhc.me/2015/03/graphx-pregel-shortest-path/]
My implementations for both functions are as follows:
Direct Dijkstra:
// Implementation of Dijkstra algorithm using Pregel API
def computeMinDistance(u: VertexId, k1: VertexId): Double = {
val g: Graph[(Double, VertexId), Double] = this.uncertainGraph.mapVertices((id, _) =>
if (id == u) (0.0, id) else (Double.PositiveInfinity, id)
)
println("Computing Digkstra distance info fof id: " + u.toString)
val sssp: Graph[(Double, VertexId), Double] = g.pregel[(Double, VertexId)]((Double.PositiveInfinity, Long.MaxValue), Int.MaxValue, EdgeDirection.Either)(
(id, dist, newDist) => {
if(dist._1 < newDist._1) {
(dist._1, id)
} else {
(newDist._1, id)
}
},
triplet => { // Send Message
if (triplet.srcAttr._1 + triplet.attr < triplet.dstAttr._1) {
println("triplet.srcAttr._1 = " + triplet.srcAttr._1 .toString)
println("triplet.dstAttr._1 = " + triplet.dstAttr._1 .toString)
Iterator((triplet.dstId, (triplet.srcAttr._1 + triplet.attr, triplet.srcId)))
} else {
Iterator.empty
}
},
(a, b) => (math.min(a._1, b._1), a._2) // Merge Message
)
sssp.vertices.take(20).foreach(println(_))
sssp.vertices.filter(element => element._1 == k1).map(element => element._2._1).collect()(0)
}
Inverse Dijkstra:
def computeMaxDistance(node: VertexId, center: VertexId): Double = {
val g: Graph[(Double, VertexId), Double] = this.uncertainGraph.mapVertices((id, _) =>
if (id != node) (0.0, id) else (Double.PositiveInfinity, id)
)
val sslp: Graph[(Double, VertexId), Double] = g.pregel[(Double, VertexId)]((Double.PositiveInfinity, Long.MaxValue), Int.MaxValue, EdgeDirection.Either)(
(id, dist, newDist) => {
if(dist._1 > newDist._1) {
(dist._1, id)
} else {
(newDist._1, id)
}
},
triplet => { // Send Message
if (triplet.srcAttr._1 + triplet.attr > triplet.dstAttr._1) {
println("triplet.srcAttr._1 = " + triplet.srcAttr._1 .toString)
println("triplet.dstAttr._1 = " + triplet.dstAttr._1 .toString)
Iterator((triplet.dstId, (triplet.srcAttr._1 + triplet.attr, triplet.srcId)))
} else {
Iterator.empty
}
},
(a, b) => (math.max(a._1, b._1), a._2) // Merge Message
)
sslp.vertices.take(20).foreach(println(_))
sslp.vertices.filter(element => element._1 == center).map(element => element._2._1).collect()(0)
}
Any help deeply appreciated. I'm not really that experienced with Scala and Spark. Thanks in advance.

Apache Spark: dealing with Option/Some/None in RDDs

I'm mapping over an HBase table, generating one RDD element per HBase row. However, sometimes the row has bad data (throwing a NullPointerException in the parsing code), in which case I just want to skip it.
I have my initial mapper return an Option to indicate that it returns 0 or 1 elements, then filter for Some, then get the contained value:
// myRDD is RDD[(ImmutableBytesWritable, Result)]
val output = myRDD.
map( tuple => getData(tuple._2) ).
filter( {case Some(y) => true; case None => false} ).
map( _.get ).
// ... more RDD operations with the good data
def getData(r: Result) = {
val key = r.getRow
var id = "(unk)"
var x = -1L
try {
id = Bytes.toString(key, 0, 11)
x = Long.MaxValue - Bytes.toLong(key, 11)
// ... more code that might throw exceptions
Some( ( id, ( List(x),
// more stuff ...
) ) )
} catch {
case e: NullPointerException => {
logWarning("Skipping id=" + id + ", x=" + x + "; \n" + e)
None
}
}
}
Is there a more idiomatic way to do this that's shorter? I feel like this looks pretty messy, both in getData() and in the map.filter.map dance I'm doing.
Perhaps a flatMap could work (generate 0 or 1 items in a Seq), but I don't want it to flatten the tuples I'm creating in the map function, just eliminate empties.
An alternative, and often overlooked way, would be using collect(PartialFunction pf), which is meant to 'select' or 'collect' specific elements in the RDD that are defined at the partial function.
The code would look like this:
val output = myRDD.collect{case Success(tuple) => tuple }
def getData(r: Result):Try[(String, List[X])] = Try {
val id = Bytes.toString(key, 0, 11)
val x = Long.MaxValue - Bytes.toLong(key, 11)
(id, List(x))
}
If you change your getData to return a scala.util.Try then you can simplify your transformations considerably. Something like this could work:
def getData(r: Result) = {
val key = r.getRow
var id = "(unk)"
var x = -1L
val tr = util.Try{
id = Bytes.toString(key, 0, 11)
x = Long.MaxValue - Bytes.toLong(key, 11)
// ... more code that might throw exceptions
( id, ( List(x)
// more stuff ...
) )
}
tr.failed.foreach(e => logWarning("Skipping id=" + id + ", x=" + x + "; \n" + e))
tr
}
Then your transform could start like so:
myRDD.
flatMap(tuple => getData(tuple._2).toOption)
If your Try is a Failure it will be turned into a None via toOption and then removed as part of the flatMap logic. At that point, your next step in the transform will only be working with the successful cases being whatever the underlying type is that is returned from getData without the wrapping (i.e. No Option)
If you are ok with dropping the data then you can just use mapPartitions. Here is a sample:
import scala.util._
val mixedData = sc.parallelize(List(1,2,3,4,0))
mixedData.mapPartitions(x=>{
val foo = for(y <- x)
yield {
Try(1/y)
}
for{goodVals <- foo.partition(_.isSuccess)._1}
yield goodVals.get
})
If you want to see the bad values, then you can use an accumulator or just log as you have been.
Your code would look something like this:
val output = myRDD.
mapPartitions( tupleIter => getCleanData(tupleIter) )
// ... more RDD operations with the good data
def getCleanData(iter: Iter[???]) = {
val triedData = getDataInTry(iter)
for{goodVals <- triedData.partition(_.isSuccess)._1}
yield goodVals.get
}
def getDataInTry(iter: Iter[???]) = {
for(r <- iter) yield {
Try{
val key = r._2.getRow
var id = "(unk)"
var x = -1L
id = Bytes.toString(key, 0, 11)
x = Long.MaxValue - Bytes.toLong(key, 11)
// ... more code that might throw exceptions
}
}
}

Scala Stream/Iterator that Generates Excel Column Names?

I would like a Scala Stream/Iterator that generates Excel column names.
e.g. the first would be 'A' second would be 'B' and onwards to 'AA' and beyond.
I have a function (shown below) that does it from an index but it seems wasteful to generate from an index each time when all I'll ever be doing is generating them in order. In practice this isn't a problem so I am fine using this method but just thought I would ask to see if anyone has anything nicer.
val charArray = ('A' to 'Z').toArray
def indexToExcelColumnName(i:Int):String = {
if (i < 0) {
""
} else {
indexToExcelColumnName((i / 26) - 1) + charArray(i % 26)
}
}
Something like that?
class ExcelColumnIterator extends Iterator[String]{
private var currentColumnName = "A"
private def nextColumn(str: String):String = str.last match {
case 'Z' if str.length == 1 => "AA"
case 'Z' => nextColumn(str.init) + 'A'
case c => str.init + (c+1).toChar
}
override def hasNext = true
override def next() = {
val t = currentColumnName
currentColumnName = nextColumn(currentColumnName)
t
}
}
First I'd write something generating names of a fixed size.
val namesOfLength: Int => Iterator[String] = {
case 1 => ('A' to 'Z').iterator.map(_.toString)
case n => ('A' to 'Z').iterator.flatMap(a => namesOfLength(n-1).map(a + _))
}
or
def namesOfLength(n: Int) =
(1 until n).foldLeft[Iterable[String]](('A' to 'Z').view.map(_.toString)) {
case (it, _) => ('A' to 'Z').view.flatMap(a => it.map(a + _))
}
Then chain them together.
Iterator.iterate(1)(_ + 1).flatMap(namesOfLength).take(100).toStream.force
Here's a one-liner solution:
Stream.iterate(List(""))(_.flatMap(s => ('A' to 'Z').map(s + _)))
.flatten.tail
If you'd prefer to get an Iterator out, substitute Iterator.iterate for Stream.iterate and drop(1) for tail.
And here's an alternate solution you might find amusing:
Stream.from(0)
.map(n => Integer.toString(n, 36))
.map(_.toUpperCase)
.filterNot(_.exists(_.isDigit))
😜

Value * is not a member of AnyVal

This is a fold that I wrote and I get this error:
Error:(26, 42) value * is not a member of AnyVal
(candE.intersect(candR), massE * massR)
^
allAssignmentsTable is a List[Map[Set[Candidate[A]],Double]]
val allAssignmentsTable = hypothesis.map(h => {
allAssignments.map(copySet => {
if(h.getAssignment.keySet.contains(copySet))
(copySet -> h.getAssignment(copySet))
else
(copySet -> 0.0)
}).toMap
})
val aggregated = allAssignmentsTable.foldLeft(initialFold) { (res,element) =>
val allIntersects = element.map {
case (candE, massE) =>
res.map {
case (candR, massR) => candE.intersect(candR), massE * massR
}.toList
}.toList.flatten
val normalizer = allIntersects.groupBy(_._1).filter(_._1.size == 0).map {
case(key, value) => value.foldLeft(0.0)((e,i) => i._2 + e)
}.head
allIntersects.groupBy(_._1).map {
case(key, value) => key -> value.foldLeft(0.0)((e,i) => i._2 + e)
}
}
if I do this: case(candE, massE:Double) then I won't get an error but I will get exception in match.
The problem that you get here:
val aggregated = allAssignmentsTable.foldLeft(initialFold) { (res,element) =>
val allIntersects = element.map {
case (candE, massE) =>
res.map {
case (candR, massR) => candE.intersect(candR), massE * massR
}.toList
}.toList.flatten
is most probably arising from the previous code block:
val allAssignmentsTable = hypothesis.map(h => {
allAssignments.map(copySet => {
if(h.getAssignment.keySet.contains(copySet))
(copySet -> h.getAssignment(copySet))
else
(copySet -> 0.0)
}).toMap
})
My hypothesis is that h.getAssignment(copySet) returns something else instead of Double (which seems to be confirmed by the error message quoted in the OP - (26, 42)etc, neither of these two values look like it is a Double. Therefore, allAssignmentsTable undercover is probably not List[Map[Set[Candidate[A]],Double]] but something else e.g. it has Any instead of Double, therefore operator * cannot be applied.

Pattern matching in conjunciton with filter

given the following code that I#d like to refactor - Im only interested in lines matching the 1st pattern that occurs, is there a way of shortening this like lets say to use it in conjunction with filter?
With best regards,
Stefan
def processsHybridLinks(it: Iterator[String]): Unit =
{
for (line <- it) {
val lineSplit = lineSplitAndFilter(line)
lineSplit match {
case Array(TaggedString(origin), TaggedString(linkName), TaggedString(target), ".") =>
{
println("trying to find pages " + origin + " and " + target)
val originPageOpt = Page.findOne(MongoDBObject("name" -> (decodeUrl(origin))))
val targetPageOpt = Page.findOne(MongoDBObject("name" -> (decodeUrl(target))))
(originPageOpt, targetPageOpt) match {
case (Some(origin), Some(target)) =>
createHybridLink(origin, linkName, target)
Logger.info(" creating Hybrid Link")
case _ => Logger.info(" couldnt create Hybrid LInk")
}
}
case _ =>
}
}
}
Have a look at collect method. It allows you to use a PartialFunction[A,B] defined using an incomplete pattern match as a sort of combination map and filter:
it.map(lineSplitAndFilter) collect {
case Array(TaggedString(o), TaggedString(n), TaggedString(t), ".") =>
(n, Page.findOne(...), Page.findOne(...))
} foreach {
case (n, Some(o), Some(t)) => ...
case _ =>
}