Related
I have an expensive function which I want to run as few times as possible with the following requirement:
I have several input values to try
If the function returns a value below a given threshold, I don't want to try other inputs
if no result is below the threshold, I want to take the result with the minimal output
I could not find a nice solution using Iterator's takeWhile/dropWhile, because I want to have the first matching element included. just ended up with the following solution:
val pseudoResult = Map("a" -> 0.6,"b" -> 0.2, "c" -> 1.0)
def expensiveFunc(s:String) : Double = {
pseudoResult(s)
}
val inputsToTry = Seq("a","b","c")
val inputIt = inputsToTry.iterator
val results = mutable.ArrayBuffer.empty[(String, Double)]
val earlyAbort = 0.5 // threshold
breakable {
while (inputIt.hasNext) {
val name = inputIt.next()
val res = expensiveFunc(name)
results += Tuple2(name,res)
if (res<earlyAbort) break()
}
}
println(results) // ArrayBuffer((a,0.6), (b,0.2))
val (name, bestResult) = results.minBy(_._2) // (b, 0.2)
If i set val earlyAbort = 0.1, the result should still be (b, 0.2) without evaluating all the cases again.
You can make use of Stream to achieve what you are looking for, remember Stream is some kind of lazy collection, that evaluate operations on demand.
Here is the scala Stream documentation.
You only need to do this:
val pseudoResult = Map("a" -> 0.6,"b" -> 0.2, "c" -> 1.0)
val earlyAbort = 0.5
def expensiveFunc(s: String): Double = {
println(s"Evaluating for $s")
pseudoResult(s)
}
val inputsToTry = Seq("a","b","c")
val results = inputsToTry.toStream.map(input => input -> expensiveFunc(input))
val finalResult = results.find { case (k, res) => res < earlyAbort }.getOrElse(results.minBy(_._2))
If find does not get any value, you can use the same stream to find the min, and the function is not evaluated again, this is because of memoization:
The Stream class also employs memoization such that previously computed values are converted from Stream elements to concrete values of type A
Consider that this code will fail if the original collection was empty, if you want to support empty collections you should replace minBy with sortBy(_._2).headOption and getOrElse by orElse:
val finalResultOpt = results.find { case (k, res) => res < earlyAbort }.orElse(results.sortBy(_._2).headOption)
And the output for this is:
Evaluating for a
Evaluating for b
finalResult: (String, Double) = (b,0.2)
finalResultOpt: Option[(String, Double)] = Some((b,0.2))
The clearest, simplest, thing to do is fold over the input, passing forward only the current best result.
val inputIt :Iterator[String] = inputsToTry.iterator
val earlyAbort = 0.5 // threshold
inputIt.foldLeft(("",Double.MaxValue)){ case (low,name) =>
if (low._2 < earlyAbort) low
else Seq(low, (name, expensiveFunc(name))).minBy(_._2)
}
//res0: (String, Double) = (b,0.2)
It calls on expensiveFunc() only as many times as is needed, but it does walk through the entire input iterator. If that's still too onerous (lots of input) then I'd go with a tail-recursive method.
val inputIt :Iterator[String] = inputsToTry.iterator
val earlyAbort = 0.5 // threshold
def bestMin(low :(String,Double) = ("",Double.MaxValue)) :(String,Double) = {
if (inputIt.hasNext) {
val name = inputIt.next()
val res = expensiveFunc(name)
if (res < earlyAbort) (name, res)
else if (res < low._2) bestMin((name,res))
else bestMin(low)
} else low
}
bestMin() //res0: (String, Double) = (b,0.2)
Use view in your input list:
try the following:
val pseudoResult = Map("a" -> 0.6, "b" -> 0.2, "c" -> 1.0)
def expensiveFunc(s: String): Double = {
println(s"executed for ${s}")
pseudoResult(s)
}
val inputsToTry = Seq("a", "b", "c")
val earlyAbort = 0.5 // threshold
def doIt(): List[(String, Double)] = {
inputsToTry.foldLeft(List[(String, Double)]()) {
case (n, name) =>
val res = expensiveFunc(name)
if(res < earlyAbort) {
return n++List((name, res))
}
n++List((name, res))
}
}
val (name, bestResult) = doIt().minBy(_._2)
println(name)
println(bestResult)
The output:
executed for a
executed for b
b
0.2
As you can see, only a and b are evaluated, and not c.
This is one of the use-cases for tail-recursion:
import scala.annotation.tailrec
val pseudoResult = Map("a" -> 0.6,"b" -> 0.2, "c" -> 1.0)
def expensiveFunc(s:String) : Double = {
pseudoResult(s)
}
val inputsToTry = Seq("a","b","c")
val earlyAbort = 0.5 // threshold
#tailrec
def f(s: Seq[String], result: Map[String, Double] = Map()): Map[String, Double] = s match {
case Nil => result
case h::t =>
val expensiveCalculation = expensiveFunc(h)
val intermediateResult = result + (h -> expensiveCalculation)
if(expensiveCalculation < earlyAbort) {
intermediateResult
} else {
f(t, intermediateResult)
}
}
val result = f(inputsToTry)
println(result) // Map(a -> 0.6, b -> 0.2)
val (name, bestResult) = f(inputsToTry).minBy(_._2) // ("b", 0.2)
If you implement takeUntil and use it, you'd still have to go through the list once more to get the lowest one if you don't find what you are looking for. Probably a better approach would be to have a function that combines find with reduceOption, returning early if something is found or else returning the result of reducing the collection to a single item (in your case, finding the smallest one).
The result is comparable with what you could achieve using a Stream, as highlighted in a previous reply, but avoids leveraging memoization, which can be cumbersome for very large collections.
A possible implementation could be the following:
import scala.annotation.tailrec
def findOrElse[A](it: Iterator[A])(predicate: A => Boolean,
orElse: (A, A) => A): Option[A] = {
#tailrec
def loop(elseValue: Option[A]): Option[A] = {
if (!it.hasNext) elseValue
else {
val next = it.next()
if (predicate(next)) Some(next)
else loop(Option(elseValue.fold(next)(orElse(_, next))))
}
}
loop(None)
}
Let's add our inputs to test this:
def f1(in: String): Double = {
println("calling f1")
Map("a" -> 0.6, "b" -> 0.2, "c" -> 1.0, "d" -> 0.8)(in)
}
def f2(in: String): Double = {
println("calling f2")
Map("a" -> 0.7, "b" -> 0.6, "c" -> 1.0, "d" -> 0.8)(in)
}
val inputs = Seq("a", "b", "c", "d")
As well as our helpers:
def apply[IN, OUT](in: IN, f: IN => OUT): (IN, OUT) =
in -> f(in)
def threshold[A](a: (A, Double)): Boolean =
a._2 < 0.5
def compare[A](a: (A, Double), b: (A, Double)): (A, Double) =
if (a._2 < b._2) a else b
We can now run this and see how it goes:
val r1 = findOrElse(inputs.iterator.map(apply(_, f1)))(threshold, compare)
val r2 = findOrElse(inputs.iterator.map(apply(_, f2)))(threshold, compare)
val r3 = findOrElse(Map.empty[String, Double].iterator)(threshold, compare)
r1 is Some(b, 0.2), r2 is Some(b, 0.6) and r3 is (reasonably) None. In the first case, since we use a lazy iterator and terminate early, we only invoke f1 twice.
You can have a look at the results and can play with this code here on Scastie.
For example, I have a Map[Integer,String] like
val map = Map(1 -> "a", 2 -> "b", 3 -> "c", 5 -> "d", 9 -> "e", 100 -> "z")
If given key is 2, then "b" is expected to return.
If given key is 50, then "e" and "z" are expected to return.
If given key is 0, then "a" is expected to return.
In other words, if the key exists in the Map the corresponding value should be returned. Otherwise the values of the closest smaller and larger keys should be returned (in the case no other key is smaller only the value of the closest larger key should be returned and vice versa).
How can this be accomplished?
Map doesn't preserve order hence I would suggest creating a method that:
converts the Map into a TreeMap
generates the lower/upper Map entries as Options in a list using to(key).lastOption and from(key).headOption, respectively
flattens the list and extracts the Map values:
Sample code as follows:
val map = Map(1->"a", 2->"b", 100->"z", 9->"e", 3->"c", 5->"d")
def closestValues(m: Map[Int, String], key: Int): Seq[String] = {
import scala.collection.immutable.TreeMap
val tm = TreeMap(m.toSeq: _*)
Seq( tm.to(key).lastOption, tm.from(key).headOption ).
flatten.distinct.map{ case (k, v) => v }
}
closestValues(map, 0)
// res1: Seq[String] = List(a)
closestValues(map, 2)
// res2: Seq[String] = List(b)
closestValues(map, 50)
// res3: Seq[String] = List(e, z)
closestValues(map, 101)
// res4: Seq[String] = List(z)
UPDATE:
Starting Scala 2.13, methods to and from for TreeMap are replaced with rangeTo and rangeFrom, respectively.
My 2-cents worth.
def getClose[K](m: Map[Int,K], k: Int): Seq[K] =
if (m.get(k).nonEmpty) Seq(m(k))
else {
val (below,above) = m.keys.partition(_ < k)
Seq( if (below.isEmpty) None else Some(below.max)
, if (above.isEmpty) None else Some(above.min)
).flatten.map(m)
}
I would recommend first converting the Map to a SortedMap since the order of the keys needs to be taken into account.
val map = Map(1->"a",2->"b",3->"c",5->"d",9->"e",100->"z")
val sortedMap = SortedMap[Int, String]() ++ map
After that, use the following method to get the closest values. The result is returned as a List.
def getClosestValue(num: Int) = {
if (sortedMap.contains(num)) {
List(sortedMap(num))
} else {
lazy val larger = sortedMap.filterKeys(_ > num)
lazy val lower = sortedMap.filterKeys(_ < num)
if (larger.isEmpty) {
List(sortedMap.last._2)
} else if (lower.isEmpty) {
List(sortedMap.head._2)
} else {
List(lower.last._2, larger.head._2)
}
}
}
Testing it with the following values:
println(getClosestValue(2))
println(getClosestValue(50))
println(getClosestValue(0))
println(getClosestValue(101))
will give
List(b)
List(z, e)
List(a)
List(z)
This not an efficient solution but you can do something like below
val map =Map(1->"a",2->"b",3->"c",5->"d",9->"e",100->"z")
val keyset = map.keySet
def getNearestValues(key: Int) : Array[String] = {
if(keyset.contains(key)) Array(map(key))
else{
var array = Array.empty[String]
val less = keyset.filter(_ < key)
if(!less.isEmpty) array = array ++ Array(map(less.toList.sortWith(_ < _).last))
val greater = keyset.filter(_ > key)
if(!greater.isEmpty) array = array ++ Array(map(greater.toList.sortWith(_ < _).head))
array
}
}
A small bit of functional way
val map =Map(1->"a",2->"b",3->"c",5->"d",9->"e",100->"z")
val keyset = map.keySet
def getNearestValues(key: Int) : Array[String] = keyset.contains(key) match {
case true => Array(map(key))
case false => {
val (lower, upper) = keyset.toList.sortWith(_ < _).span(x => x < key)
val lowArray = if(lower.isEmpty) Array.empty[String] else Array(map(lower.last))
val upperArray = if(upper.isEmpty) Array.empty[String] else Array(map(upper.head))
lowArray ++ upperArray
}
}
getNearestValues(0) should return Array(a) and getNearestValues(50) should return Array(e, z) and getNearestValues(9) should return Array(e)
You can solve this problem with the complexity smaller that in any proposed solutions above . So, if performance is critical, check this answer.
Another Scala solution
val m = Map(1 -> "a", 2 -> "b", 3 -> "c", 5 -> "d", 9 -> "e", 100 -> "z")
List(0, 2, 50, 101).foreach { i => {
val inp = i
val (mn, mx) = if (m.get(inp).nonEmpty) (Map(inp -> m(inp)), Map(inp -> m(inp))) else m.partition(x => x._1 > inp)
(mn, mx) match {
case (x, y) if y.isEmpty => println(m(mn.keys.min))
case (x, y) if x.isEmpty => println(m(mx.keys.max))
case (x, y) if y == x => println(m(inp))
case (x, y) => println(m(mn.keys.min), m(mx.keys.max))
}
}
}
Results:
a
b
(z,e)
z
I have a PartialFunction[String,String] and a Map[String,String].
I want to apply the partial functions on the map values and collect the entries for which it was applicaple.
i.e. given:
val m = Map( "a"->"1", "b"->"2" )
val pf : PartialFunction[String,String] = {
case "1" => "11"
}
I'd like to somehow combine _._2 with pfand be able to do this:
val composedPf : PartialFunction[(String,String),(String,String)] = /*someMagicalOperator(_._2,pf)*/
val collected : Map[String,String] = m.collect( composedPf )
// collected should be Map( "a"->"11" )
so far the best I got was this:
val composedPf = new PartialFunction[(String,String),(String,String)]{
override def isDefinedAt(x: (String, String)): Boolean = pf.isDefinedAt(x._2)
override def apply(v1: (String, String)): (String,String) = v1._1 -> pf(v1._2)
}
is there a better way?
Here is the magical operator:
val composedPf: PartialFunction[(String, String), (String, String)] =
{case (k, v) if pf.isDefinedAt(v) => (k, pf(v))}
Another option, without creating a composed function, is this:
m.filter(e => pf.isDefinedAt(e._2)).mapValues(pf)
There is a function in Scalaz, that does exactly that: second
scala> m collect pf.second
res0: scala.collection.immutable.Map[String,String] = Map(a -> 11)
This works, because PartialFunction is an instance of Arrow (a generalized function) typeclass, and second is one of the common operations defined for arrows.
In Clojure the diff function can be applied to maps, that doesn't seem to be the case in Scala, is anyone aware of something in Scala that would make it more accessible to obtain what the Clojure diff function obtains when it is applied to maps?
Here's the Clojure diff function explained for reference.
http://clojuredocs.org/clojure_core/clojure.data/diff
This is equivalent to Clojure's diff:
import collection.generic.CanBuildFrom
def diff[T, Col](x: Col with TraversableOnce[T], y: Col with TraversableOnce[T])
(implicit cbf: CanBuildFrom[Col, T, Col]): (Col, Col, Col) = {
val xs = x.toSet
val ys = y.toSet
def convert(s: Set[T]) = (cbf(x) ++= s).result
(convert(xs diff ys), convert(ys diff xs), convert(xs intersect ys))
}
It can operate on any kind of TraversableOnce and will return results with the same type as its parameters:
scala> diff(Map(1 -> 2), Map(1 -> 2))
res35: (scala.collection.immutable.Map[Int,Int], scala.collection.immutable.Map[Int,Int], scala.collection.immutable.Map[Int,Int]) = (Map(),Map(),Map(1 -> 2))
As others have said there isn't something exactly like that, but you can build it anyways. Here's my attempt that is added on as a companion to the map class. It produces the same result as the clojure diff example.
object MapsDemo extends App{
implicit class MapCompanionOps[A,B](val a: Map[A,B]) extends AnyVal {
def diff(b: Map[A,B]): (Map[A,B],Map[A,B],Map[A,B]) = {
(a.filter(p => !b.exists(_ == p)), //things-only-in-a
b.filter(p => !a.exists(_ == p)), //things-only-in-b
a.flatMap(p => b.find(_ == p) )) //things-in-both
}
}
val uno = Map("same" ->"same","different" -> "one")
val dos = Map("same" ->"same","different" -> "two","onlyhere"->"whatever")
println(uno diff dos) //(Map(different -> one),Map(different -> two, onlyhere -> whatever),Map(same -> same))
println( Map("a"->1).diff(Map("a"->1,"b"->2)) ) //(Map(),Map(b -> 2),Map(a -> 1))
}
You can achieve that by converting the maps to list first. For example:
scala> val a = Map(1->2, 2-> 3).toList
scala> val b = Map(1->2, 3-> 4).toList
scala> val closureDiff = List(a.diff(b), b.diff(a), a.intersect(b))
closureDiff: List[List[(Int, Int)]] = List(List((2,3)), List((3,4)), List((1,2)))
There is no function in the standard library that doe exactly what you need. However, an un-optimized version can be implemented easily in this manner(sorry for "span" mistake at first try).
def diffffK,V:(Map[K,V],Map[K,V],Map[K,V]) = {
val (both,left) = m1.partition({case (k,v) => m2.get(k) == Some(v) })
val right = m2.filter({case (k,v) => both.get(k) != Some(v) })
(both,left,right)
}
also, a map can be converted to a set with a single operator(toSet) and then you can use intercept, union and diff operators of Set.
I think this might be a common operation. So maybe it's inside the API but I can't find it. Also I'm interested in an efficient functional/simple solution if not.
Given a sequence of tuples ("a" -> 1, "b" ->2, "c" -> 3) I want to turn it into a map. That's easy using TraversableOnce.toMap. But I want to fail this construction if the resulting map "would contain a contradiction", i.e. different values assigned to the same key. Like in the sequence ("a" -> 1, "a" -> 2). But duplicates shall be allowed.
Currently I have this (very imperative) code:
def buildMap[A,B](in: TraversableOnce[(A,B)]): Option[Map[A,B]] = {
val map = new HashMap[A,B]
val it = in.toIterator
var fail = false
while(it.hasNext){
val next = it.next()
val old = map.put(next._1, next._2)
fail = old.isDefined && old.get != next._2
}
if(fail) None else Some(map.toMap)
}
Side Question
Is the final toMap really necessary? I get a type error when omitting it, but I think it should work. The implementation of toMap constructs a new map which I want to avoid.
As always when working with Seq[A] the optimal solution performance-wise depends on the concrete collection type.
A general but not very efficient solution would be to fold over an Option[Map[A,B]]:
def optMap[A,B](in: Iterable[(A,B)]): Option[Map[A,B]] =
in.iterator.foldLeft(Option(Map[A,B]())) {
case (Some(m),e # (k,v)) if m.getOrElse(k, v) == v => Some(m + e)
case _ => None
}
If you restrict yourself to using List[A,B]s an optimized version would be:
#tailrec
def rmap[A,B](in: List[(A,B)], out: Map[A,B] = Map[A,B]()): Option[Map[A,B]] = in match {
case (e # (k,v)) :: tail if out.getOrElse(k,v) == v =>
rmap(tail, out + e)
case Nil =>
Some(out)
case _ => None
}
Additionally a less idiomatic version using mutable maps could be implemented like this:
def mmap[A,B](in: Iterable[(A,B)]): Option[Map[A,B]] = {
val dest = collection.mutable.Map[A,B]()
for (e # (k,v) <- in) {
if (dest.getOrElse(k, v) != v) return None
dest += e
}
Some(dest.toMap)
}
Here is a fail-slowly solution (if creating the entire map and then discarding it is okay):
def uniqueMap[A,B](s: Seq[(A,B)]) = {
val m = s.toMap
if (m.size == s.length) Some(s) else None
}
Here is a mutable fail-fast solution (bail out as soon as the error is detected):
def uniqueMap[A,B](s: Seq[(A,B)]) = {
val h = new collection.mutable.HashMap[A,B]
val i = s.iterator.takeWhile(x => !(h contains x._1)).foreach(h += _)
if (h.size == s.length) Some(h) else None
}
And here's an immutable fail-fast solution:
def uniqueMap[A,B](s: Seq[(A,B)]) = {
def mapUniquely(i: Iterator[(A,B)], m: Map[A,B]): Option[Map[A,B]] = {
if (i.hasNext) {
val j = i.next
if (m contains j._1) None
else mapUniquely(i, m + j)
}
else Some(m)
}
mapUniquely(s.iterator, Map[A,B]())
}
Edit: and here's a solution using put for speed (hopefully):
def uniqueMap[A,B](s: Seq[(A,B)]) = {
val h = new collection.mutable.HashMap[A,B]
val okay = s.iterator.forall(x => {
val y = (h put (x._1,x._2))
y.isEmpty || y.get == x._2
})
if (okay) Some(h) else None
}
Edit: now tested, and it's ~2x as fast on input that works (returns true) than Moritz' or my straightforward solution.
Scala 2.9 is near, so why not to take advantage of the combinations method (inspired by Moritz's answer):
def optMap[A,B](in: List[(A,B)]) = {
if (in.combinations(2).exists {
case List((a,b),(c,d)) => a == c && b != d
case _ => false
}) None else Some(in.toMap)
}
scala> val in = List(1->1,2->3,3->4,4->5,2->3)
in: List[(Int, Int)] = List((1,1), (2,3), (3,4), (4,5), (2,3))
scala> optMap(in)
res29: Option[scala.collection.immutable.Map[Int,Int]] = Some(Map(1 -> 1, 2 -> 3, 3 -> 4, 4 -> 5))
scala> val in = List(1->1,2->3,3->4,4->5,2->3,1->2)
in: List[(Int, Int)] = List((1,1), (2,3), (3,4), (4,5), (2,3), (1,2))
scala> optMap(in)
res30: Option[scala.collection.immutable.Map[Int,Int]] = None
You can also use gourpBy as follows:
val pList = List(1 -> "a", 1 -> "b", 2 -> "c", 3 -> "d")
def optMap[A,B](in: Iterable[(A,B)]): Option[Map[A,B]] = {
Option(in.groupBy(_._1).map{case(_, list) => if(list.size > 1) return None else list.head})
}
println(optMap(pList))
It's efficiency is competitive to the above solutions.
In fact if you examine the gourpBy implementation you will see that it is very similar to some of the solutions suggested.