Scala: Why mapValues produces a view and is there any stable alternatives? - scala

Just now I am surprised to learn that mapValues produces a view. The consequence is shown in the following example:
case class thing(id: Int)
val rand = new java.util.Random
val distribution = Map(thing(0) -> 0.5, thing(1) -> 0.5)
val perturbed = distribution mapValues { _ + 0.1 * rand.nextGaussian }
val sumProbs = perturbed.map{_._2}.sum
val newDistribution = perturbed mapValues { _ / sumProbs }
The idea is that I have a distribution, which is perturbed with some randomness then I renormalize it. The code actually fails in its original intention: since mapValues produces a view, _ + 0.1 * rand.nextGaussian is always re-evaluated whenever perturbed is used.
I am now doing something like distribution map { case (s, p) => (s, p + 0.1 * rand.nextGaussian) }, but that's just a little bit verbose. So the purpose of this question is:
Remind people who are unaware of this fact.
Look for reasons why they make mapValues output views.
Whether there is an alternative method that produces concrete Map.
Are there any other commonly-used collection methods that have this trap.
Thanks.

There's a ticket about this, SI-4776 (by YT).
The commit that introduces it has this to say:
Following a suggestion of jrudolph, made filterKeys and mapValues
transform abstract maps, and duplicated functionality for immutable
maps. Moved transform and filterNot from immutable to general maps.
Review by phaller.
I have not been able to find the original suggestion by jrudolph, but I assume it was done to make mapValues more efficient. Give the question, that may come as a surprise, but mapValues is more efficient if you are not likely to iterate over the values more than once.
As a work-around, one can do mapValues(...).view.force to produce a new Map.

The scala doc say:
a map view which maps every key of this map to f(this(key)). The resulting map wraps the original map without copying any elements.
So this should be expected, but this scares me a lot, I'll have to review bunch of code tomorrow. I wasn't expecting a behavior like that :-(
Just an other workaround:
You can call toSeq to get a copy, and if you need it back to map toMap, but this unnecessary create objects, and have a performance implication over using map
One can relatively easy write, a mapValues which doesn't create a view, I'll do it tomorrow and post the code here if no one do it before me ;)
EDIT:
I found an easy way to 'force' the view, use '.map(identity)' after mapValues (so no need of implementing a specific function):
scala> val xs = Map("a" -> 1, "b" -> 2)
xs: scala.collection.immutable.Map[java.lang.String,Int] = Map(a -> 1, b -> 2)
scala> val ys = xs.mapValues(_ + Random.nextInt).map(identity)
ys: scala.collection.immutable.Map[java.lang.String,Int] = Map(a -> 1315230132, b -> 1614948101)
scala> ys
res7: scala.collection.immutable.Map[java.lang.String,Int] = Map(a -> 1315230132, b -> 1614948101)
It's a shame the type returned isn't actually a view! othewise one would have been able to call 'force' ...

Is better(and deprecated) in scala 2.13, now returns a MapView:
API Doc

Related

Is there a way to maintain ordering with Scala's breakOut?

I recently discovered breakOut and love how elegant it is, but noticed that it doesn't maintain order.
eg (from REPL):
scala> val list = List("apples", "bananas", "oranges")
list: List[String] = List(apples, bananas, oranges)
scala> val hs: HashMap[String, Int] = list.map{x => (x -> x.length)}(breakOut)
hs: scala.collection.mutable.HashMap[String,Int] = Map(bananas -> 7, oranges -> 7, apples -> 6)
I like using breakOut since it's really clean and neat but ordering does matter to me. Is there a way to get it to maintain order or do I have to add elements to my hashmap one at a time?
You see this behavior, because of the fact that HashMap is a data structure with undefined order. Even if you see some ordering of the elements in the hash map and it's consistent across the runs, you shouldn't depend on it. If you really need the order, consider using LinkedHashMap

Converting Function1 to Map in Scala

I am trying to compose two maps and get a composed map in Scala. I am doing the following:
val m1=Map(1->'A',2->'C',3->'Z')
val m2=Map('A' -> "Andy", 'Z'->"Zee",'D'->"David")
val m3=m2.compose(m1)
Now m3 is m1;m2, but its type is Function1.
Question:
How can I convert Function1 to Map?
ps.
Here, instead of m2.compose(m1), it is suggested to use val m3 = m1 andThen m2. Anyways, the result is of type Function1.
What do you expext for non-existing keys? If yuo want to remove this values from map use
val m1: Map[Int, Char] = Map(1 -> 'A', 2 -> 'C', 3 -> 'Z')
val m2: Map[Char, String] = Map('A' -> "Andy", 'Z' -> "Zee", 'D' -> "David")
val m3: Map[Int, String] = m1.filter(x => m2.contains(x._2)).map(x => x._1 -> m2(x._2))
//Map(1 -> Andy, 3 -> Zee)
Skipping .filter(x => m2.contains(x._2)) with throw an exception for this situations
You can also do
val m3: Map[Int, String] = m1.mapValues(m2)
There is a caveat that despite the return type this will recalculate m2(m1(x)) on every access to m3(x), just like m2.compose(m1).
since I updated the title of the question, the answer does not now directly address the question subject. I will wait to see if somebody possibly provides a direct answer (that is, how to convert Function1 to map).
def funToMap[A, B](f: A => B): Map[A, B] = Map.empty[A, B].withDefault(f)
Why composing two Maps gives you a Function1
A Map offers the semantics of a Function1 and then some. When you compose two Map functions, you get a new Function1 because compose is an operation on functions.
So you should not expect to get a Map by composing two Maps -- you are just composing the function aspects of the Maps, not the Maps themselves.
How to get a composed Map
To actually compose two Maps into another Map, you can use mapValues, but be careful. If you do it like this:
val m3 = m1 mapValues m2
you will silently get a view (see this bug) rather than a simple map, so as you do lookups in m3 it will really do lookups in m1 and then in m2. For large Maps and few lookups, that's a win because you didn't process a lot of entries you didn't need to. But if you do lots of lookups, this can cause a serious performance problem.
To really make a new single Map data structure that represents the composition of m1 and m2 you need to do something like
val m3 = (m1 mapValues m2).view.force
That is kind of unfortunate. In a perfect world m1 mapValues m2 would give you the new single composed Map -- and very quickly, by creating a new data structure with shape identical to m1 but with values run through m2 -- and (m1 mapValues m2).view would give you the view (and the type would reflect that).

How to get a subset of a map?

How do I get a subset of a map?
Assume we have
val m: Map[Int, String] = ...
val k: List[Int]
Where all keys in k exist in m.
Now I would like to get a subsect of the Map m with only the pairs which key is in the list k.
Something like m.intersect(k), but intersect is not defined on a map.
One way is to use filterKeys: m.filterKeys(k.contains). But this might be a bit slow, because for each key in the original map a search in the list has to be done.
Another way I could think of is k.map(l => (l, m(l)).toMap. Here wie just iterate through the keys we are really interested in and do not make a search.
Is there a better (built-in) way ?
m filterKeys k.toSet
because a Set is a Function.
On performance:
filterKeys itself is O(1), since it works by producing a new map with overridden foreach, iterator, contains and get methods. The overhead comes when elements are accessed. It means that the new map uses no extra memory, but also that memory for the old map cannot be freed.
If you need to free up the memory and have fastest possible access, a fast way would be to fold the elements of k into a new Map without producing an intermediate List[(Int,String)]:
k.foldLeft(Map[Int,String]()){ (acc, x) => acc + (x -> m(x)) }
val s = Map(k.map(x => (x, m(x))): _*)
I think this is most readable and good performer:
k zip (k map m) toMap
Or, method invocation style would be:
k.zip(k.map(m)).toMap

Representing a graph (adjacency list) with HashMap[Int, Vector[Int]] (Scala)?

I was wondering how (if possible) I can go about making an adjacency list representation of a (mutable) graph via HashMap[Int, Vector[Int]]. HashMap would be mutable of course.
Currently I have it set as HashMap[Int, ArrayBuffer[Int]], but the fact that I can change each cell in the ArrayBuffer makes me uncomfortable, even though I'm fairly certain I'm not doing that. I would use a ListBuffer[Int] but I would like fast random access to neighbors due to my need to do fast random walks on the graphs. A Vector[Int] would solve this problem, but is there anyway to do this?
To my knowledge (tried this in the REPL), this won't work:
scala> val x = new mutable.HashMap[Int, Vector[Int]]
x: scala.collection.mutable.HashMap[Int,Vector[Int]] = Map()
scala> x(3) = Vector(1)
scala> x(3) += 4 // DOES NOT WORK
I need to be able to both append to it at any time and also access any element within it randomly (given the index). Is this possible?
Thanks!
-kstruct
Using the Vector:
x += 3 -> (x(3) :+ 4) //x.type = Map(3 -> Vector(1, 4))
You might notice that this will fail if there's no existing key, so you might like to set up your map as
val x = new mutable.HashMap[Int, Vector[Int]] withDefaultValue Vector.empty

Scala, make my loop more functional

I'm trying to reduce the extent to which I write Scala (2.8) like Java. Here's a simplification of a problem I came across. Can you suggest improvements on my solutions that are "more functional"?
Transform the map
val inputMap = mutable.LinkedHashMap(1->'a',2->'a',3->'b',4->'z',5->'c')
by discarding any entries with value 'z' and indexing the characters as they are encountered
First try
var outputMap = new mutable.HashMap[Char,Int]()
var counter = 0
for(kvp <- inputMap){
val character = kvp._2
if(character !='z' && !outputMap.contains(character)){
outputMap += (character -> counter)
counter += 1
}
}
Second try (not much better, but uses an immutable map and a 'foreach')
var outputMap = new immutable.HashMap[Char,Int]()
var counter = 0
inputMap.foreach{
case(number,character) => {
if(character !='z' && !outputMap.contains(character)){
outputMap2 += (character -> counter)
counter += 1
}
}
}
Nicer solution:
inputMap.toList.filter(_._2 != 'z').map(_._2).distinct.zipWithIndex.toMap
I find this solution slightly simpler than arjan's:
inputMap.values.filter(_ != 'z').toSeq.distinct.zipWithIndex.toMap
The individual steps:
inputMap.values // Iterable[Char] = MapLike(a, a, b, z, c)
.filter(_ != 'z') // Iterable[Char] = List(a, a, b, c)
.toSeq.distinct // Seq[Char] = List(a, b, c)
.zipWithIndex // Seq[(Char, Int)] = List((a,0), (b,1), (c,2))
.toMap // Map[Char, Int] = Map((a,0), (b,1), (c,2))
Note that your problem doesn't inherently involve a map as input, since you're just discarding the keys. If I were coding this, I'd probably write a function like
def buildIndex[T](s: Seq[T]): Map[T, Int] = s.distinct.zipWithIndex.toMap
and invoke it as
buildIndex(inputMap.values.filter(_ != 'z').toSeq)
First, if you're doing this functionally, you should use an immutable map.
Then, to get rid of something, you use the filter method:
inputMap.filter(_._2 != 'z')
and finally, to do the remapping, you can just use the values (but as a set) withzipWithIndex, which will count up from zero, and then convert back to a map:
inputMap.filter(_._2 != 'z').values.toSet.zipWithIndex.toMap
Since the order of values isn't going to be preserved anyway*, presumably it doesn't matter that the order may have been shuffled yet again with the set transformation.
Edit: There's a better solution in a similar vein; see Arjan's. Assumption (*) is wrong, since it was a LinkedHashMap. So you do need to preserve order, which Arjan's solution does.
i would create some "pipeline" like this, but this has a lot of operations and could be probably shortened. These two List.map's could be put in one, but I think you've got a general idea.
inputMap
.toList // List((5,c), (1,a), (2,a), (3,b), (4,z))
.sorted // List((1,a), (2,a), (3,b), (4,z), (5,c))
.filterNot((x) => {x._2 == 'z'}) // List((1,a), (2,a), (3,b), (5,c))
.map(_._2) // List(a, a, b, c)
.zipWithIndex // List((a,0), (a,1), (b,2), (c,3))
.map((x)=>{(x._2+1 -> x._1)}) // List((1,a), (2,a), (3,b), (4,c))
.toMap // Map((1,a), (2,a), (3,b), (4,c))
performing these operation on lists keeps ordering of elements.
EDIT: I misread the OP question - thought you wanted run length encoding. Here's my take on your actual question:
val values = inputMap.values.filterNot(_ == 'z').toSet.zipWithIndex.toMap
EDIT 2: As noted in the comments, use toSeq.distinct or similar if preserving order is important.
val values = inputMap.values.filterNot(_ == 'z').toSeq.distinct.zipWithIndex.toMap
In my experience I have found that maps and functional languages do not play nice. You'll note that all answers so far in one way or another in involve turning the map into a list, filtering the list, and then turning the list back into a map.
I think this is due to maps being mutable data structures by nature. Consider that when building a list, that the underlying structure of the list does not change when you append a new element and if a true list then an append is a constant O(1) operation. Whereas for a map the internal structure of a map can vastly change when a new element is added ie. when the load factor becomes too high and the add algorithm resizes the map. In this way a functional language cannot just create a series of a values and pop them into a map as it goes along due to the possible side effects of introducing a new key/value pair.
That said, I still think there should be better support for filtering, mapping and folding/reducing maps. Since we start with a map, we know the maximum size of the map and it should be easy to create a new one.
If you're wanting to get to grips with functional programming then I'd recommending steering clear of maps to start with. Stick with the things that functional languages were designed for -- list manipulation.