TreeMap Keys and Iteration in Scala - scala

I am using TreeMap and it behaves strangely in the following code.
Here is the code :
import scala.collection.immutable.TreeMap
object TreeMapTest extends App{
val mp = TreeMap((0,1) -> "a", (0,2) -> "b", (1,3) -> "c", (3,4) -> "f")
mp.keys.foreach(println) //A
println("****")
mp.map(x => x._1).foreach(println) //B
}
As you can see the two print lines (A and B) should have printed the same thing but the result is as follows:
(0,1)
(0,2)
(1,3)
(3,4)
****
(0,2)
(1,3)
(3,4)
Why is this happening here? The interesting thing is even the IDE believes that one can use these two interchangeably and suggests the replacement.

Scala collection library generally tries to return the same kind of collection it starts with, so that e.g. val seq: Seq[Int] = ...; seq.map(...) will return a Seq, val seq: List[Int] = ...; seq.map(...) will return a List, etc. This isn't always possible: e.g. a String is considered to be a collection of Char, but "ab".map(x => x.toInt) obviously can't return a String. Similarly for Map: if you map each pair to a non-pair, you can't get a Map back; but you map each pair to a pair (Int, Int), and so Scala returns Map[Int, Int]. So you can't get both (0, 1) and (0, 2): they would be duplicate keys.
To avoid this problem, convert your map to Seq[((Int, Int), String)] first: mp.toSeq.map(x => x._1) (or mp.keySet.toSeq).

Related

Tuple keys of a map are converted to another map when using map function to retrieve the keys

When I ran the following piece of code, I got some unexpected result
val a = Map(
("1", "2") -> 1,
("1", "4") -> 2,
("2", "2") -> 3,
("2", "4") -> 4
)
println(a.size)
val b = a.map(_._1)
println(b.size)
val c = a.keySet
println(c.size)
The result is:
res0: Int = 4
b: scala.collection.immutable.Map[String,String] = Map(1 -> 4, 2 -> 4)
res1: Int = 2
c: scala.collection.immutable.Set[(String, String)] = Set((1,2), (1,4), (2,2), (2,4))
res2: Int = 4
What I expected is the content of b is the same as that of c. Is it expected in Scala? Or some kind of side effect?
Yes, this is expected behaviour. As a general rule, Scala's collection map methods try to make the output collection the same type as the collection on which the method is called.
So, for Map.map, the output collection can be a Map whenever the result type of the function you pass to map is a Tuple. This is precisely the case when you call val b = a.map(_._1). At this point, another rule comes into play: a Map's keys must be unique. So, as Scala traverses a during the call to a.map(_._1), it inserts the result of _._1 into the new map that it's building (to become b). The second entry replaces the first, and the fourth entry replaces the third, because they have the same key.
If this is not the behaviour you want, you should be able to get round it by decalring the type of b to be something other than a Map.
e.g.
val b: Seq[(String, String)] = a.map(_._1)
I think you want to use .keys instead, because you are mapping a Map[(String,String),Int] into a Map[String,String] and since the keys need to be unique, the rest of the Map is discarded.
This happens because you are mapping into this Map[(String,String)] so Scala convert into a Map[String,String]

Why do .map.flatten and flatMap on a Scala Map return different results?

I heard many people say flatMap is similar to map + flatten. For example, the answer.
Quite a difference, right?
Because flatMap treats a String as a sequence of Char, it flattens the resulting list of strings into a sequence of characters (Seq[Char]).
flatMap is a combination of map and flatten, so it first runs map on the sequence, then runs flatten giving the result shown.
But I have some code problems today. The result of map and flatMap seems to be different. Here are my code
object ListDemo {
def main(args: Array[String]): Unit = {
val map1 = Map("a" -> List(1 ->11,1->111), "b" -> List(2 -> 22, 2 ->222)).map(_._2).flatten
val map2 = Map("a" -> List(1 ->11,1->111), "b" -> List(2 -> 22, 2 ->222)).flatMap(_._2)
map1.foreach(println)
println()
map2.foreach(println)
}
}
the result is not expected.
(1,11)
(1,111)
(2,22)
(2,222)
(1,111)
(2,222)
Why it happened?
When calling .map(f) on a Map with an f that returns (K, V) for some K and V (not necessarily the same K and V types of the original Map), the result will be a Map[K, V]. Otherwise, if f returns some other (non-pair) type T, the result will be an Iterable[T]. So it will be a Map if the function returns a pair and an Iterable otherwise.
In your case the function returns List[(Int, Int)], so the result is an Iterable[List[(Int, Int)]] - not a Map. The .flatten then turns this into an Iterable[(Int, Int)].
When using flatMap directly, you directly end up with the (Int, Int) pairs, so the result will be a Map[Int, Int] - not an Iterable[(Int, Int)]. And since Maps don't allow duplicate keys, the Map contains less elements than the Iterable.
TL;DR: the result type of both calls is different. The call to .map().flatten returns a Iterable[(Int, Int)] and the call to .flatMap() returns a Map[Int, Int]. Since a map may not contain the same key twice, the first entry per key is overwritten by the second entry.
Consider a Map as a Iterable[(Key,Value)]. When calling .map, you will have to provide it a function that returns a tuple, (Key, Value) (the actual types may be different from the original Key and Value).
In your example, Value happens to be a List[(Int, Int)]. When calling .map, and returning the Value of the original Map, you end up with a Iterable[List[(Int, Int)]], which your call to .flatten turns into a Iterable[(Int, Int)] by concatenating the 'inner' lists together. If you were to turn that into a map (by calling .toMap), you would see the same result as with flatMap.
Now, flatMap is different in that it expects a return type of Seq[(Key, Value)], rather than just (Key, Value). It then uses the returned value as the entries in a newly constructed Map.
In your case, your original Value of List[(Int, Int)] satisfies the expected return type, converting you original Map[(String, List[(Int, Int)] into a Map[(Int, Int)]. Since a map cannot contain two entries with the same keys, the second occurrence of the key replaces the earlier occurrence.
To see this behavior, it helps to use the REPL (just run scala) instead of writing a main class, so you can see the intermediate values and their types.
scala> val theMap = Map("a" -> List(1 ->11,1->111), "b" -> List(2 -> 22, 2 ->222))
theMap: scala.collection.immutable.Map[String,List[(Int, Int)]] = Map(a -> List((1,11), (1,111)), b -> List((2,22), (2,222)))
scala> val mapped = theMap.map(_._2)
mapped: scala.collection.immutable.Iterable[List[(Int, Int)]] = List(List((1,11), (1,111)), List((2,22), (2,222)))
scala> val flattened = mapped.flatten
flattened: scala.collection.immutable.Iterable[(Int, Int)] = List((1,11), (1,111), (2,22), (2,222))
scala> val flatMapped = theMap.flatMap(_._2)
flatMapped: scala.collection.immutable.Map[Int,Int] = Map(1 -> 111, 2 -> 222)
scala> val flattenedToMap = flattened.toMap
flattenedToMap: scala.collection.immutable.Map[Int,Int] = Map(1 -> 111, 2 -> 222)
You are kind of misunderstanding things here,
So, Lets say you have x: M[A] and f: A = N[B]for any Monad M and N then x.flatMap(f) should be same as x.map(f).flatten.
But what you have here is kind of a nested monad map: M[N[A]] and your function is f: A => B with following aliasing,
scala> type MapWithStringKey[A] = Map[String, A]
// defined type alias MapWithStringKey
scala> type TupleOfInt = (Int, Int)
// defined type alias TupleOfInt
scala> val map: MapWithStringKey[List[TupleOfInt]] = Map("a" -> List(1 ->11,1->111), "b" -> List(2 -> 22, 2 ->222))
// map: MapWithStringKey[List[TupleOfInt]] = Map(a -> List((1,11), (1,111)), b -> List((2,22), (2,222)))
This case is entirely different from above mentioned standard definition which connects flatMap to map and flatten.
Now, It is just one of the non-standards cases where you can choose to use either of the two choices depending on what you want. And when we add the special key-uniqueness properties of Map (which is already discussed in answer by #sepp2k), things become even more unpredictable.

Convert list to a map with key being an index in Scala

I have a list of strings
val list = List("a", "b", "c", "d", "e")
and I want to have a map with keys as indexes of items in list. So I did the following:
def mapByIndexes(list: List[String]): Map[Int, String] = (1 to list.size).zip(list).toMap
However, the resulting map does not preserve the index order and I'm getting this as a result:
Map(5 -> "e", 1 -> "a", 2 -> "b", 3 -> "c", 4 -> "d")
How can I modify the code above so I'm getting a map with the following, natural order?
Map(1 -> "a", 2 -> "b", 3 -> "c", 4 -> "d", 5 -> "e")
Note: I'm aware that I can just sort the resulting map, but can I avoid that step and create a map that already preserves the order?
Edit: Solution with ListMap described at Scala LinkedHashMap.toMap preserves order? works, but I don't like additional parentheses and _* for so simple thing. Isn't there anything else so I can just have a chaining? If not, I will accept #pamu answer.
I'm aware that I can just sort the resulting map
No, you can't. Sorting a Map doesn't make sense. But there are Map implementations which store keys in natural order, such as TreeMap (IntMap also does, IIRC). Note that it is not the same as preserving insertion order, as ListMap and LinkedHashMap do.
Solution with ListMap described at Scala LinkedHashMap.toMap preserves order? works, but I don't like additional parentheses and _* for so simple thing. Isn't there anything else so I can just have a chaining?
No (at least, I don't think so), but you can easily define it:
implicit class ToListMap[A, B](x: Seq[(A, B)]) {
def toListMap = ListMap(x: _*)
}
// somewhere where ToListMap is in scope or imported:
val list = List(1 -> 2, 3 -> 4)
list.toListMap
Be aware that ListMap is basically a list (as the name says), so lookups in it are slower than any reasonable map implementation.
Of course, you can do exactly the same with TreeMap.
Use ListMap. After zipping instead of doing toMap, Just construct the ListMap which preserves the order of elements. You can build a ListMap using its companion object. It accepts var args of tuples.
def mapByIndexes(list: List[String]): ListMap[Int, String] = ListMap((1 to list.size).zip(list): _*)
Scala REPL
scala> import scala.collection.immutable._
import scala.collection.immutable._
scala> def mapByIndexes(list: List[String]): ListMap[Int, String] = ListMap((1 to list.size).zip(list): _*)
mapByIndexes: (list: List[String])scala.collection.immutable.ListMap[Int,String]
scala> mapByIndexes(list)
res10: scala.collection.immutable.ListMap[Int,String] = Map(1 -> a, 2 -> b, 3 -> c, 4 -> d, 5 -> e)

Treating an Array of (String, Int) Tuples like a Dictionary

I have a list of tuples that looks like the following:
(("String1", Value1), ("String2", Value2), ...)
Where string is a String and value is a Double. Is there a method in scala to accomplish the following:
1) Search the list for a particular string value.
2) If we have a hit, return the value associated with the string.
3) If we have a miss, return -1.
This tuple sequence was created using collect on an RDD of format RDD[K, V] where the keys were strings and the vals were the doubles. Originally I planned on using lookup on the RDD, but it appears that this work needs to be done on the driver (hence the collect).
Maybe you can try to convert it to a map first then:
scala> val collection = Map(("hello", 1), ("world", 2))
collection: scala.collection.immutable.Map[String,Int] = Map(hello -> 1, world -> 2)
scala> collection getOrElse ("hello", -1)
res3: Int = 1
scala> collection getOrElse ("scala", -1)
res4: Int = -1
val m = list.toMap.withDefaultValue(-1d)
// 1 and 2
m("String1") // Value1
// 3
m("Some other") // -1d

What is the structure that is only enclosed by parentheses in scala?

Here's the problem:
I intend to retrieve a (Int, Int) object from a function, but I don't know how to get the second element. I've tried the following commands so as to retrieve the second value, or convert it to a Seq or List, but with no luck.
scala> val s = (1,2)
s: (Int, Int) = (1,2)
scala> s(1)
<console>:9: error: (Int, Int) does not take parameters
s(1)
^
scala> val ss = List(s)
ss: List[(Int, Int)] = List((1,2))
scala> ss(0)
res10: (Int, Int) = (1,2)
Could anyone give me some idea? Thanks a lot!
val s = (1, 2)
is syntatic sugar and creates a Tuple2, or in other words is equivalent to new Tuple2(1, 2). You can access elements in tuples with
s._1 // => 1
s._2 // => 2
Likewise, (1, 2, 3) would create a Tuple3, which also has a method _3 to access the third element.