idiomatic "get or else update" for immutable.Map? - scala

What is the idiomatic way of a getOrElseUpdate for immutable.Map instances?. I use the snippet below, but it seems verbose and inefficient
var map = Map[Key, Value]()
def foo(key: Key) = {
val value = map.getOrElse(key, new Value)
map += key -> value
value
}

I would probably implement a getOrElseUpdated method like this:
def getOrElseUpdated[K, V](m: Map[K, V], key: K, op: => V): (Map[K, V], V) =
m.get(key) match {
case Some(value) => (m, value)
case None => val newval = op; (m.updated(key, newval), newval)
}
which either returns the original map if m has a mapping for key or another map with the mapping key -> op added. The definition of this method is similar to getOrElseUpdate of mutable.Map.

Let me summarise your problem:
You want to call a method on a immutable data structure
You want it to return some value and reassign a var
Because the data structure is immutable, you’ll need to
return a new immutable data structure, or
do the assignment inside the method, using a supplied closure
So, either your signature has to look like
def getOrElseUpdate(key: K): Tuple2[V, Map[K,V]]
//... use it like
val (v, m2) = getOrElseUpdate(k)
map = m2
or
def getOrElseUpdate(key: K, setter: (Map[K,V]) => Unit): V
//... use it like
val v = getOrElseUpdate(k, map = _)
If you can live with one of these solutions, you could add your own version with an implicit conversion but judging by the signatures alone, i wouldn’t think any of these is in the standard library.

There's no such way - map mutation (update), when you're getting a map value, is a side effect (which contradicts to immutability/functional style of programming).
When you want to make a new immutable map with the default value, if another value for the specified key doesn't exist, you can do the following:
map + (key -> map.getOrElse(key, new Value))

Why not use withDefault or withDefaultValue if you have an immutable map?

Related

Extend / Replicate Scala collections syntax to create your own collection?

I want to build a map however I want to discard all keys with empty values as shown below:
#tailrec
def safeFiltersMap(
map: Map[String, String],
accumulator: Map[String,String] = Map.empty): Map[String, String] = {
if(map.isEmpty) return accumulator
val curr = map.head
val (key, value) = curr
safeFiltersMap(
map.tail,
if(value.nonEmpty) accumulator + (key->value)
else accumulator
)
}
Now this is fine however I need to use it like this:
val safeMap = safeFiltersMap(Map("a"->"b","c"->"d"))
whereas I want to use it like the way we instantiate a map:
val safeMap = safeFiltersMap("a"->"b","c"->"d")
What syntax can I follow to achieve this?
The -> syntax isn't a special syntax in Scala. It's actually just a fancy way of constructing a 2-tuple. So you can write your own functions that take 2-tuples as well. You don't need to define a new Map type. You just need a function that filters the existing one.
def safeFiltersMap(args: (String, String)*): Map[String, String] =
Map(args: _*).filter {
result => {
val (_, value) = result
value.nonEmpty
}
}
Then call using
safeFiltersMap("a"->"b","c"->"d")

Scala: Grouping list of tuples

I need to group list of tuples in some unique way.
For example, if I have
val l = List((1,2,3),(4,2,5),(2,3,3),(10,3,2))
Then I should group the list with second value and map with the set of first value
So the result should be
Map(2 -> Set(1,4), 3 -> Set(2,10))
By so far, I came up with this
l groupBy { p => p._2 } mapValues { v => (v map { vv => vv._1 }).toSet }
This works, but I believe there should be a much more efficient way...
This is similar to this question. Basically, as #serejja said, your approach is correct and also the most concise one. You could use collection.breakOut as builder factory argument to the last map and thereby save the additional iteration to get the Set type:
l.groupBy(_._2).mapValues(_.map(_._1)(collection.breakOut): Set[Int])
You shouldn't probably go beyond this, unless you really need to squeeze the performance.
Otherwise, this is how a general toMultiMap function could look like which allows you to control the values collection type:
import collection.generic.CanBuildFrom
import collection.mutable
def toMultiMap[A, K, V, Values](xs: TraversableOnce[A])
(key: A => K)(value: A => V)
(implicit cbfv: CanBuildFrom[Nothing, V, Values]): Map[K, Values] = {
val b = mutable.Map.empty[K, mutable.Builder[V, Values]]
xs.foreach { elem =>
b.getOrElseUpdate(key(elem), cbfv()) += value(elem)
}
b.map { case (k, vb) => (k, vb.result()) } (collection.breakOut)
}
What it does is, it uses a mutable Map during building stage, and values gathered in a mutable Builder first (the builder is provided by the CanBuildFrom instance). After the iteration over all input elements has completed, that mutable map of builder values is converted into an immutable map of the values collection type (again using the collection.breakOut trick to get the desired output collection straight away).
Ex:
val l = List((1,2,3),(4,2,5),(2,3,3),(10,3,2))
val v = toMultiMap(l)(_._2)(_._1) // uses Vector for values
val s: Map[Int, Set[Int] = toMultiMap(l)(_._2)(_._1) // uses Set for values
So your annotated result type directs the type inference of the values type. If you do not annotate the result, Scala will pick Vector as default collection type.

Scala: Why does SortedMap's mapValues returns a Map and not a SortedMap?

I'm new to Scala.
I'm using SortedMap in my code, and I wanted to use mapValues to create a new map with some transformation on the values.
Instead of returning a new SortedMap, the mapValues function returns a new Map, which I then have to convert to a SortedMap.
For example
val my_map = SortedMap(1 -> "one", 0 -> "zero", 2 -> "two")
val new_map = my_map.mapValues(name => name.toUpperCase)
// returns scala.collection.immutable.Map[Int,java.lang.String] = Map(0 -> ZERO, 1 -> ONE, 2 -> TWO)
val sorted_new_map = SortedMap(new_map.toArray:_ *)
This looks inefficient - the last convertion probably sorts the keys again, or at least verify that they are sorted.
I could use the normal map function which operates both on the keys and the values, and deliberately not change the keys in my transformation function. This looks inefficient too, since the implementation of Map probably assume that the transformation may change the order of the keys (like in the case: my_map.map(tup => (-tup._1, tup._2)) - so it probably "re-sorts" them too.
Is anyone familiar with the internal implementations of Map and SortedMap, and could tell me if my assumptions are correct? Can the compiler recognize automatically that the keys have not been reordered? Is there an internal reason for why mapValues should not return a SortedMap? Is there a better way to transform the map's values without loosing the order of the keys?
Thanks
You've stumbled upon a tricky feature of Scala's Map implementation. The catch that you are missing is that mapValues does not actually return a new Map: it returns a view of a Map. In other words, it wraps your original map in such a way that whenever you access a value it will compute .toUpperCase before returning the value to you.
The upside to this behavior is that Scala won't compute the function for values that aren't accessed, and it won't spend time copying all the data into a new Map. The downside is that the function is re-computed every time that value is accessed. So you might end up doing extra computation if you access the same values many times.
So why does SortedMap not return a SortedMap? Because it's actually returning a Map-wrapper. The underlying Map, then one that is wrapped, is still a SortedMap, so if you were to iterate through, it would still be in sorted order. You and I know that, but the type-checker doesn't. It certainly seems like they could have written it in such a way that it still maintains the SortedMap trait, but they didn't.
You can see in the code that it's not returning a SortedMap, but that the iteration behavior is still going to be sorted:
// from MapLike
override def mapValues[C](f: B => C): Map[A, C] = new DefaultMap[A, C] {
def iterator = for ((k, v) <- self.iterator) yield (k, f(v))
...
The solution to your problem is the same as the solution to getting around the view issue: use .map{ case (k,v) => (k,f(v)) }, as you mentioned in your question.
If you really want that convenience method though, you can do what I do, and write you own, better, version of mapValues:
class EnrichedWithMapVals[T, U, Repr <: GenTraversable[(T, U)]](self: GenTraversableLike[(T, U), Repr]) {
/**
* In a collection of pairs, map a function over the second item of each
* pair. Ensures that the map is computed at call-time, and not returned
* as a view as 'Map.mapValues' would do.
*
* #param f function to map over the second item of each pair
* #return a collection of pairs
*/
def mapVals[R, That](f: U => R)(implicit bf: CanBuildFrom[Repr, (T, R), That]) = {
val b = bf(self.asInstanceOf[Repr])
b.sizeHint(self.size)
for ((k, v) <- self) b += k -> f(v)
b.result
}
}
implicit def enrichWithMapVals[T, U, Repr <: GenTraversable[(T, U)]](self: GenTraversableLike[(T, U), Repr]): EnrichedWithMapVals[T, U, Repr] =
new EnrichedWithMapVals(self)
Now when you call mapVals on a SortedMap you get back a non-view SortedMap:
scala> val m3 = m1.mapVals(_ + 1)
m3: SortedMap[String,Int] = Map(aardvark -> 2, cow -> 6, dog -> 10)
It actually works on any collection of pairs, not just Map implementations:
scala> List(('a,1),('b,2),('c,3)).mapVals(_+1)
res8: List[(Symbol, Int)] = List(('a,2), ('b,3), ('c,4))

Nearest keys in a SortedMap

Given a key k in a SortedMap, how can I efficiently find the largest key m that is less than or equal to k, and also the smallest key n that is greater than or equal to k. Thank you.
Looking at the source code for 2.9.0, the following code seems about to be the best you can do
def getLessOrEqual[A,B](sm: SortedMap[A,B], bound: A): B = {
val key = sm.to(x).lastKey
sm(key)
}
I don't know exactly how the splitting of the RedBlack tree works, but I guess it's something like a O(log n) traversal of the tree/construction of new elements and then a balancing, presumable also O(log n). Then you need to go down the new tree again to get the last key. Unfortunately you can't retrieve the value in the same go. So you have to go down again to fetch the value.
In addition the lastKey might throw an exception and there is no similar method that returns an Option.
I'm waiting for corrections.
Edit and personal comment
The SortedMap area of the std lib seems to be a bit neglected. I'm also missing a mutable SortedMap. And looking through the sources, I noticed that there are some important methods missing (like the one the OP asks for or the ones pointed out in my answer) and also some have bad implementation, like 'last' which is defined by TraversableLike and goes through the complete tree from first to last to obtain the last element.
Edit 2
Now the question is reformulated my answer is not valid anymore (well it wasn't before anyway). I think you have to do the thing I'm describing twice for lessOrEqual and greaterOrEqual. Well you can take a shortcut if you find the equal element.
Scala's SortedSet trait has no method that will give you the closest element to some other element.
It is presently implemented with TreeSet, which is based on RedBlack. The RedBlack tree is not visible through methods on TreeSet, but the protected method tree is protected. Unfortunately, it is basically useless. You'd have to override methods returning TreeSet to return your subclass, but most of them are based on newSet, which is private.
So, in the end, you'd have to duplicate most of TreeSet. On the other hand, it isn't all that much code.
Once you have access to RedBlack, you'd have to implement something similar to RedBlack.Tree's lookup, so you'd have O(logn) performance. That's actually the same complexity of range, though it would certainly do less work.
Alternatively, you'd make a zipper for the tree, so that you could actually navigate through the set in constant time. It would be a lot more work, of course.
Using Scala 2.11.7, the following will give what you want:
scala> val set = SortedSet('a', 'f', 'j', 'z')
set: scala.collection.SortedSet[Char] = TreeSet(a, f, j, z)
scala> val beforeH = set.to('h').last
beforeH: Char = f
scala> val afterH = set.from('h').head
afterH: Char = j
Generally you should use lastOption and headOption as the specified elements may not exist. If you are looking to squeeze a little more efficiency out, you can try replacing from(...).head with keysIteratorFrom(...).head
Sadly, the Scala library only allows to make this type of query efficiently:
and also the smallest key n that is greater than or equal to k.
val n = TreeMap(...).keysIteratorFrom(k).next
You can hack this by keeping two structures, one with normal keys, and one with negated keys. Then you can use the other structure to make the second type of query.
val n = - TreeMap(...).keysIteratorFrom(-k).next
Looks like I should file a ticket to add 'fromIterator' and 'toIterator' methods to 'Sorted' trait.
Well, one option is certainly using java.util.TreeMap.
It has lowerKey and higherKey methods, which do excatly what you want.
I had a similar problem: I wanted to find the closest element to a given key in a SortedMap. I remember the answer to this question being, "You have to hack TreeSet," so when I had to implement it for a project, I found a way to wrap TreeSet without getting into its internals.
I didn't see jazmit's answer, which more closely answers the original poster's question with minimum fuss (two method calls). However, those method calls do more work than needed for this application (multiple tree traversals), and my solution provides lots of hooks where other users can modify it to their own needs.
Here it is:
import scala.collection.immutable.TreeSet
import scala.collection.SortedMap
// generalize the idea of an Ordering to metric sets
trait MetricOrdering[T] extends Ordering[T] {
def distance(x: T, y: T): Double
def compare(x: T, y: T) = {
val d = distance(x, y)
if (d > 0.0) 1
else if (d < 0.0) -1
else 0
}
}
class MetricSortedMap[A, B]
(elems: (A, B)*)
(implicit val ordering: MetricOrdering[A])
extends SortedMap[A, B] {
// while TreeSet searches for an element, keep track of the best it finds
// with *thread-safe* mutable state, of course
private val best = new java.lang.ThreadLocal[(Double, A, B)]
best.set((-1.0, null.asInstanceOf[A], null.asInstanceOf[B]))
private val ord = new MetricOrdering[(A, B)] {
def distance(x: (A, B), y: (A, B)) = {
val diff = ordering.distance(x._1, y._1)
val absdiff = Math.abs(diff)
// the "to" position is a key-null pair; the object of interest
// is the other one
if (absdiff < best.get._1)
(x, y) match {
// in practice, TreeSet always picks this first case, but that's
// insider knowledge
case ((to, null), (pos, obj)) =>
best.set((absdiff, pos, obj))
case ((pos, obj), (to, null)) =>
best.set((absdiff, pos, obj))
case _ =>
}
diff
}
}
// use a TreeSet as a backing (not TreeMap because we need to get
// the whole pair back when we query it)
private val treeSet = TreeSet[(A, B)](elems: _*)(ord)
// find the closest key and return:
// (distance to key, the key, its associated value)
def closest(to: A): (Double, A, B) = {
treeSet.headOption match {
case Some((pos, obj)) =>
best.set((ordering.distance(to, pos), pos, obj))
case None =>
throw new java.util.NoSuchElementException(
"SortedMap has no elements, and hence no closest element")
}
treeSet((to, null.asInstanceOf[B])) // called for side effects
best.get
}
// satisfy the contract (or throw UnsupportedOperationException)
def +[B1 >: B](kv: (A, B1)): SortedMap[A, B1] =
new MetricSortedMap[A, B](
elems :+ (kv._1, kv._2.asInstanceOf[B]): _*)
def -(key: A): SortedMap[A, B] =
new MetricSortedMap[A, B](elems.filter(_._1 != key): _*)
def get(key: A): Option[B] = treeSet.find(_._1 == key).map(_._2)
def iterator: Iterator[(A, B)] = treeSet.iterator
def rangeImpl(from: Option[A], until: Option[A]): SortedMap[A, B] =
new MetricSortedMap[A, B](treeSet.rangeImpl(
from.map((_, null.asInstanceOf[B])),
until.map((_, null.asInstanceOf[B]))).toSeq: _*)
}
// test it with A = Double
implicit val doubleOrdering =
new MetricOrdering[Double] {
def distance(x: Double, y: Double) = x - y
}
// and B = String
val stuff = new MetricSortedMap[Double, String](
3.3 -> "three",
1.1 -> "one",
5.5 -> "five",
4.4 -> "four",
2.2 -> "two")
println(stuff.iterator.toList)
println(stuff.closest(1.5))
println(stuff.closest(1000))
println(stuff.closest(-1000))
println(stuff.closest(3.3))
println(stuff.closest(3.4))
println(stuff.closest(3.2))
I've been doing:
val m = SortedMap(myMap.toSeq:_*)
val offsetMap = (m.toSeq zip m.keys.toSeq.drop(1)).map {
case ( (k,v),newKey) => (newKey,v)
}.toMap
When I want the results of my map off-set by one key. I'm also looking for a better way, preferably without storing an extra map.

How to set and get keys from scala TreeMap?

Suppose I have
import scala.collection.immutable.TreeMap
val tree = new TreeMap[String, List[String]]
Now after above declaration, I want to assign key "k1" to List("foo", "bar")
and then how do i get or read back the key "k1" and also read back non-existent key "k2"?
what happens if I try to read non-existent key "k2" ?
The best way to "mutate" the immutable map is by referring to it in a variable (var as opposed to val):
var tree = TreeMap.empty[String, List[String]]
tree += ("k1" -> List("foo", "bar")) //a += b is sugar for "c = a + b; a = c"
It can be accessed directly using the apply method, where scala syntactic sugar kicks in so you can just access using parens:
val l = tree("k1") //equivalent to tree.apply("k1")
However, I rarely access maps like this because the method will throw a MatchError is the key is not present. Use get instead, which returns an Option[V] where V is the value-type:
val l = tree.get("k1") //returns Option[List[String]] = Some(List("foo", "bar"))
val m = tree.get("k2") //returns Option[List[String]] = None
In this case, the value returned for an absent key is None. What can I do with an optional result? Well, you can make use of methods map, flatMap, filter, collect and getOrElse. Try and avoid pattern-matching on it, or using the Option.get method directly!
For example:
val wordLen : List[Int] = tree.get("k1").map(l => l.map(_.length)) getOrElse Nil
EDIT: one way of building a Map without declaring it as a var, and assuming you are doing this by transforming some separate collection, is to do it via a fold. For example:
//coll is some collection class CC[A]
//f : A => (K, V)
val m = (TreeMap.empty[K, V] /: coll) { (tree, c) => tree + f(c) }
This may not be possible for your use case