What is the best way, if at all, to implement a sorting function with internal state? - scala

What is the best way to implement a sorting function that has an internal state received else where?
Something like:
type Sorter[Item] = (Item, Item) => Boolean
type StringSorter = Sorter[String]
def customSorter : StringSorter = (i1,i2) =>
{
val i1Cnt = itemCountMap.get(i1)
val i2Cnt = itemCountMap.get(i2)
if (i1Cnt==None || i2Cnt==None ) {
i1<i2
} else {
i1Cnt.get<i2Cnt.get
}
}
And here is an example:
val l1 = List("a","b","c")
val itemCountMap = Map("a"->2,"b"->3,"c"1)
l1.sortWith(customSorter)
//The returned list will be ["c","a","b"]
I am pretty new to Scala, and in general, lambda functions are not suppose to have states (right?).
Why you ask? 'cause I am using a generic type lists in spark which later, deep in the code of the executors, I want to analyze based on a specific order, and this order may depend on some static list, and I also want to control this order function.

First, this i1Cnt==null will never be true because get on Map never returns null, it returns an Option WHICH IS NOT A NULL, it is very different.
Second, there is no problem with state in a lambda, there is a problem with mutable shared state (and not only in a lambda, but everywhere).
Third, your function would be better if it receives the Map to use instead of relying on a global variable.
Fourth, here is the fixed code.
type Sorter[Item] = (Item, Item) => Boolean
def customSorter(map: Map[String, Int]): Sorter[String] = { (s1, s2) =>
(map.get(s1), map.get(s2)) match {
case (Some(i1), Some(i2)) => i1 < i2
case _ => s1 < s2
}
}
You can see it running here)

Related

How to functionally handle a logging side effect

I want to log in the event that a record doesn't have an adjoining record. Is there a purely functional way to do this? One that separates the side effect from the data transformation?
Here's an example of what I need to do:
val records: Seq[Record] = Seq(record1, record2, ...)
val accountsMap: Map[Long, Account] = Map(record1.id -> account1, ...)
def withAccount(accountsMap: Map[Long, Account])(r: Record): (Record, Option[Account]) = {
(r, accountsMap.get(r.id))
}
def handleNoAccounts(tuple: (Record, Option[Account]) = {
val (r, a) = tuple
if (a.isEmpty) logger.error(s"no account for ${record.id}")
tuple
}
def toRichAccount(tuple: (Record, Option[Account]) = {
val (r, a) = tuple
a.map(acct => RichAccount(r, acct))
}
records
.map(withAccount(accountsMap))
.map(handleNoAccounts) // if no account is found, log
.flatMap(toRichAccount)
So there are multiple issues with this approach that I think make it less than optimal.
The tuple return type is clumsy. I have to destructure the tuple in both of the latter two functions.
The logging function has to handle the logging and then return the tuple with no changes. It feels weird that this is passed to .map even though no transformation is taking place -- maybe there is a better way to get this side effect.
Is there a functional way to clean this up?
I could be wrong (I often am) but I think this does everything that's required.
records
.flatMap(r =>
accountsMap.get(r.id).fold{
logger.error(s"no account for ${r.id}")
Option.empty[RichAccount]
}{a => Some(RichAccount(r,a))})
If you're using scala 2.13 or newer you could use tapEach, which takes function A => Unit to apply side effect on every element of function and then passes collection unchanged:
//you no longer need to return tuple in side-effecting function
def handleNoAccounts(tuple: (Record, Option[Account]): Unit = {
val (r, a) = tuple
if (a.isEmpty) logger.error(s"no account for ${record.id}")
}
records
.map(withAccount(accountsMap))
.tapEach(handleNoAccounts) // if no account is found, log
.flatMap(toRichAccount)
In case you're using older Scala, you could provide extension method (updated according to Levi's Ramsey suggestion):
implicit class SeqOps[A](s: Seq[A]) {
def tapEach(f: A => Unit): Seq[A] = {
s.foreach(f)
s
}
}

Scala: Find and update one element in a list

I am trying to find an elegant way to do:
val l = List(1,2,3)
val (item, idx) = l.zipWithIndex.find(predicate)
val updatedItem = updating(item)
l.update(idx, updatedItem)
Can I do all in one operation ? Find the item, if it exist replace with updated value and keep it in place.
I could do:
l.map{ i =>
if (predicate(i)) {
updating(i)
} else {
i
}
}
but that's pretty ugly.
The other complexity is the fact that I want to update only the first element which match predicate .
Edit: Attempt:
implicit class UpdateList[A](l: List[A]) {
def filterMap(p: A => Boolean)(update: A => A): List[A] = {
l.map(a => if (p(a)) update(a) else a)
}
def updateFirst(p: A => Boolean)(update: A => A): List[A] = {
val found = l.zipWithIndex.find { case (item, _) => p(item) }
found match {
case Some((item, idx)) => l.updated(idx, update(item))
case None => l
}
}
}
I don't know any way to make this in one pass of the collection without using a mutable variable. With two passes you can do it using foldLeft as in:
def updateFirst[A](list:List[A])(predicate:A => Boolean, newValue:A):List[A] = {
list.foldLeft((List.empty[A], predicate))((acc, it) => {acc match {
case (nl,pr) => if (pr(it)) (newValue::nl, _ => false) else (it::nl, pr)
}})._1.reverse
}
The idea is that foldLeft allows passing additional data through iteration. In this particular implementation I change the predicate to the fixed one that always returns false. Unfortunately you can't build a List from the head in an efficient way so this requires another pass for reverse.
I believe it is obvious how to do it using a combination of map and var
Note: performance of the List.map is the same as of a single pass over the list only because internally the standard library is mutable. Particularly the cons class :: is declared as
final case class ::[B](override val head: B, private[scala] var tl: List[B]) extends List[B] {
so tl is actually a var and this is exploited by the map implementation to build a list from the head in an efficient way. The field is private[scala] so you can't use the same trick from outside of the standard library. Unfortunately I don't see any other API call that allows to use this feature to reduce the complexity of your problem to a single pass.
You can avoid .zipWithIndex() by using .indexWhere().
To improve complexity, use Vector so that l(idx) becomes effectively constant time.
val l = Vector(1,2,3)
val idx = l.indexWhere(predicate)
val updatedItem = updating(l(idx))
l.updated(idx, updatedItem)
Reason for using scala.collection.immutable.Vector rather than List:
Scala's List is a linked list, which means data are access in O(n) time. Scala's Vector is indexed, meaning data can be read from any point in effectively constant time.
You may also consider mutable collections if you're modifying just one element in a very large collection.
https://docs.scala-lang.org/overviews/collections/performance-characteristics.html

Functional Breadth First Search in Scala with the State Monad

I'm trying to implement a functional Breadth First Search in Scala to compute the distances between a given node and all the other nodes in an unweighted graph. I've used a State Monad for this with the signature as :-
case class State[S,A](run:S => (A,S))
Other functions such as map, flatMap, sequence, modify etc etc are similar to what you'd find inside a standard State Monad.
Here's the code :-
case class Node(label: Int)
case class BfsState(q: Queue[Node], nodesList: List[Node], discovered: Set[Node], distanceFromSrc: Map[Node, Int]) {
val isTerminated = q.isEmpty
}
case class Graph(adjList: Map[Node, List[Node]]) {
def bfs(src: Node): (List[Node], Map[Node, Int]) = {
val initialBfsState = BfsState(Queue(src), List(src), Set(src), Map(src -> 0))
val output = bfsComp(initialBfsState)
(output.nodesList,output.distanceFromSrc)
}
#tailrec
private def bfsComp(currState:BfsState): BfsState = {
if (currState.isTerminated) currState
else bfsComp(searchNode.run(currState)._2)
}
private def searchNode: State[BfsState, Unit] = for {
node <- State[BfsState, Node](s => {
val (n, newQ) = s.q.dequeue
(n, s.copy(q = newQ))
})
s <- get
_ <- sequence(adjList(node).filter(!s.discovered(_)).map(n => {
modify[BfsState](s => {
s.copy(s.q.enqueue(n), n :: s.nodesList, s.discovered + n, s.distanceFromSrc + (n -> (s.distanceFromSrc(node) + 1)))
})
}))
} yield ()
}
Please can you advice on :-
Should the State Transition on dequeue in the searchNode function be a member of BfsState itself?
How do I make this code more performant/concise/readable?
First off, I suggest moving all the private defs related to bfs into bfs itself. This is the convention for methods that are solely used to implement another.
Second, I suggest simply not using State for this matter. State (like most monads) is about composition. It is useful when you have many things that all need access to the same global state. In this case, BfsState is specialized to bfs, will likely never be used anywhere else (it might be a good idea to move the class into bfs too), and the State itself is always run, so the outer world never sees it. (In many cases, this is fine, but here the scope is too small for State to be useful.) It'd be much cleaner to pull the logic of searchNode into bfsComp itself.
Third, I don't understand why you need both nodesList and discovered, when you can just call _.toList on discovered once you've done your computation. I've left it in in my reimplementation, though, in case there's more to this code that you haven't displayed.
def bfsComp(old: BfsState): BfsState = {
if(old.q.isEmpty) old // You don't need isTerminated, I think
else {
val (currNode, newQ) = old.q.dequeue
val newState = old.copy(q = newQ)
adjList(curNode)
.filterNot(s.discovered) // Set[T] <: T => Boolean and filterNot means you don't need to write !s.discovered(_)
.foldLeft(newState) { case (BfsState(q, nodes, discovered, distance), adjNode) =>
BfsState(
q.enqueue(adjNode),
adjNode :: nodes,
discovered + adjNode,
distance + (adjNode -> (distance(currNode) + 1)
)
}
}
}
def bfs(src: Node): (List[Node], Map[Node, Int]) = {
// I suggest moving BfsState and bfsComp into this method
val output = bfsComp(BfsState(Queue(src), List(src), Set(src), Map(src -> 0)))
(output.nodesList, output.distanceFromSrc)
// Could get rid of nodesList and say output.discovered.toList
}
In the event that you think you do have a good reason for using State here, here are my thoughts.
You use def searchNode. The point of a State is that it is pure and immutable, so it should be a val, or else you reconstruct the same State every use.
You write:
node <- State[BfsState, Node](s => {
val (n, newQ) = s.q.dequeue
(n, s.copy(q = newQ))
})
First off, Scala's syntax was designed so that you don't need to have both a () and {} surrounding an anonymous function:
node <- State[BfsState, Node] { s =>
// ...
}
Second, this doesn't look quite right to me. One benefit of using for-syntax is that the anonymous functions are hidden from you and there is minimal indentation. I'd just write it out
oldState <- get
(node, newQ) = oldState.q.dequeue
newState = oldState.copy(q = newQ)
Footnote: would it make sense to make Node an inner class of Graph? Just a suggestion.

Scala - How to group a list of tuples without pattern matching?

Consider the following structure (in reality the structure is a bit more complex):
case class A(id:String,name:String) {
override def equals(obj: Any):Boolean = {
if (obj == null || !obj.isInstanceOf[A]) return false
val a = obj.asInstanceOf[A]
name == a.name
}
override def hashCode() = {
31 + name.hashCode
}
}
val a1 = A("1","a")
val a2 = A("2","a")
val a3 = A("3","b")
val list = List((a1,a2),(a1,a3),(a2,a3))
Now let's say I want to group all tuples with equal A's. I could implement it like this
list.groupBy {
case (x,y) => (x,y)
}
But, I don't like to use pattern matching here, because it's not adding anything here. I want something simple, like this:
list.groupBy(_)
Unfortunately, this doesn't compile. Not even when I do:
list.groupBy[(A,A)](_)
Any suggestions how to simplify my code?
list.groupBy { case (x,y) => (x,y) }
Here you are deconstructing the tuple into its two constituent parts, just to immediately reassemble them exactly like they were before. In other words: you aren't actually doing anything useful. The input and output are identical. This is just the same as
list.groupBy { t => t }
which is of course just the identity function, which Scala helpfully provides for us:
list groupBy identity
If you want to group the elements of a list accoding to their own equals method, you only need to pass the identity function to groupBy:
list.groupBy(x=>x)
It's not enough to write list.groupBy(_) because of the scope of _, that is it would be desugared to x => list.groupBy(x), which is of course not what you want.

Filling a Scala immutable Map from a database table

I have a SQL database table with the following structure:
create table category_value (
category varchar(25),
property varchar(25)
);
I want to read this into a Scala Map[String, Set[String]] where each entry in the map is a set of all of the property values that are in the same category.
I would like to do it in a "functional" style with no mutable data (other than the database result set).
Following on the Clojure loop construct, here is what I have come up with:
def fillMap(statement: java.sql.Statement): Map[String, Set[String]] = {
val resultSet = statement.executeQuery("select category, property from category_value")
#tailrec
def loop(m: Map[String, Set[String]]): Map[String, Set[String]] = {
if (resultSet.next) {
val category = resultSet.getString("category")
val property = resultSet.getString("property")
loop(m + (category -> m.getOrElse(category, Set.empty)))
} else m
}
loop(Map.empty)
}
Is there a better way to do this, without using mutable data structures?
If you like, you could try something around
def fillMap(statement: java.sql.Statement): Map[String, Set[String]] = {
val resultSet = statement.executeQuery("select category, property from category_value")
Iterator.continually((resultSet, resultSet.next)).takeWhile(_._2).map(_._1).map{ res =>
val category = res.getString("category")
val property = res.getString("property")
(category, property)
}.toIterable.groupBy(_._1).mapValues(_.map(_._2).toSet)
}
Untested, because I don’t have a proper sql.Statement. And the groupBy part might need some more love to look nice.
Edit: Added the requested changes.
There are two parts to this problem.
Getting the data out of the database and into a list of rows.
I would use a Spring SimpleJdbcOperations for the database access, so that things at least appear functional, even though the ResultSet is being changed behind the scenes.
First, some a simple conversion to let us use a closure to map each row:
implicit def rowMapper[T<:AnyRef](func: (ResultSet)=>T) =
new ParameterizedRowMapper[T]{
override def mapRow(rs:ResultSet, row:Int):T = func(rs)
}
Then let's define a data structure to store the results. (You could use a tuple, but defining my own case class has advantage of being just a little bit clearer regarding the names of things.)
case class CategoryValue(category:String, property:String)
Now select from the database
val db:SimpleJdbcOperations = //get this somehow
val resultList:java.util.List[CategoryValue] =
db.query("select category, property from category_value",
{ rs:ResultSet => CategoryValue(rs.getString(1),rs.getString(2)) } )
Converting the data from a list of rows into the format that you actually want
import scala.collection.JavaConversions._
val result:Map[String,Set[String]] =
resultList.groupBy(_.category).mapValues(_.map(_.property).toSet)
(You can omit the type annotations. I've included them to make it clear what's going on.)
Builders are built for this purpose. Get one via the desired collection type companion, e.g. HashMap.newBuilder[String, Set[String]].
This solution is basically the same as my other solution, but it doesn't use Spring, and the logic for converting a ResultSet to some sort of list is simpler than Debilski's solution.
def streamFromResultSet[T](rs:ResultSet)(func: ResultSet => T):Stream[T] = {
if (rs.next())
func(rs) #:: streamFromResultSet(rs)(func)
else
rs.close()
Stream.empty
}
def fillMap(statement:java.sql.Statement):Map[String,Set[String]] = {
case class CategoryValue(category:String, property:String)
val resultSet = statement.executeQuery("""
select category, property from category_value
""")
val queryResult = streamFromResultSet(resultSet){rs =>
CategoryValue(rs.getString(1),rs.getString(2))
}
queryResult.groupBy(_.category).mapValues(_.map(_.property).toSet)
}
There is only one approach I can think of that does not include either mutable state or extensive copying*. It is actually a very basic technique I learnt in my first term studying CS. Here goes, abstracting from the database stuff:
def empty[K,V](k : K) : Option[V] = None
def add[K,V](m : K => Option[V])(k : K, v : V) : K => Option[V] = q => {
if ( k == q ) {
Some(v)
}
else {
m(q)
}
}
def build[K,V](input : TraversableOnce[(K,V)]) : K => Option[V] = {
input.foldLeft(empty[K,V]_)((m,i) => add(m)(i._1, i._2))
}
Usage example:
val map = build(List(("a",1),("b",2)))
println("a " + map("a"))
println("b " + map("b"))
println("c " + map("c"))
> a Some(1)
> b Some(2)
> c None
Of course, the resulting function does not have type Map (nor any of its benefits) and has linear lookup costs. I guess you could implement something in a similar way that mimicks simple search trees.
(*) I am talking concepts here. In reality, things like value sharing might enable e.g. mutable list constructions without memory overhead.