Aggregate/Reduce by key function for a map in scala - scala

I have a map as given below in scala.
Map("x"-> "abc", "y"->"adc","z"->"abc", "l"-> "ert","h"->"dfg", "p"-> "adc")
I want the output as follows:
Map("abc"->["x","z"],"adc"->["y" , "p"], "ert"->"l", "dfg"->"h")
So, the output has the array as the value of those those keys which had same values in inital map. How can I get that done optimally?

A groupBy followed by some manipulation of the values it outputs should do.
scala> m.groupBy(x => x._2).mapValues(_.keys.toList)
res10: scala.collection.immutable.Map[String,List[String]]
= Map(abc -> List(x, z), dfg -> List(h), ert -> List(l), adc -> List(y, p))

Related

groupBy on String in Scala

I am doing a groupBy on a string as follows:
"message".groupBy("message".count(_.toChar))
I was expecting it to yield a map as:
{1 => "mag" , 2 => "es"}
However the above code doesn't even compile , where am I going wrong. I want to produce a map based on the discriminator function count of chars.
You can do:
("message".groupBy(identity).mapValues(_.size)
.groupBy(_._2).mapValues(_.foldLeft("")(_+_._1)))
// res8: scala.collection.immutable.Map[Int,String] = Map(2 -> es, 1 -> amg)

Scala - map function - Only returned last element of a Map

I am new to Scala and trying out the map function on a Map.
Here is my Map:
scala> val map1 = Map ("abc" -> 1, "efg" -> 2, "hij" -> 3)
map1: scala.collection.immutable.Map[String,Int] =
Map(abc -> 1, efg -> 2, hij -> 3)
Here is a map function and the result:
scala> val result1 = map1.map(kv => (kv._1.toUpperCase, kv._2))
result1: scala.collection.immutable.Map[String,Int] =
Map(ABC -> 1, EFG -> 2, HIJ -> 3)
Here is another map function and the result:
scala> val result1 = map1.map(kv => (kv._1.length, kv._2))
result1: scala.collection.immutable.Map[Int,Int] = Map(3 -> 3)
The first map function returns all the members as expected however the second map function returns only the last member of the Map. Can someone explain why this is happening?
Thanks in advance!
In Scala, a Map cannot have duplicate keys. When you add a new key -> value pair to a Map, if that key already exists, you overwrite the previous value. If you're creating maps from functional operations on collections, then you're going to end up with the value corresponding to the last instance of each unique key. In the example you wrote, each string key of the original map map1 has the same length, and so all your string keys produce the same integer key 3 for result1. What's happening under the hood to calculate result1 is:
A new, empty map is created
You map "abc" -> 1 to 3 -> 3 and add it to the map. Result now contains 1 -> 3.
You map "efg" -> 2 to 3 -> 2 and add it to the map. Since the key is the same, you overwrite the existing value for key = 3. Result now contains 2 -> 3.
You map "hij" -> 3 to 3 -> 3 and add it to the map. Since the key is the same, you overwrite the existing value for key = 3. Result now contains 3 -> 3.
Return the result, which is Map(3 -> 3)`.
Note: I made a simplifying assumption that the order of the elements in the map iterator is the same as the order you wrote in the declaration. The order is determined by hash bin and will probably not match the order you added elements, so don't build anything that relies on this assumption.

Understanding foldLeft with Map instead of List

I fould like to understand how foldLeft works for Maps. I do understand how it works if I have a List and call on it foldLeft with a zero-element and a function:
val list1 = List(1,2,3)
list1.foldLeft(0)((a,b) => a + b)
Where I add the zero-element 0 with the first element of list1 and then add the second element of list1 and so on. So output becomes the new input and the first input is the zero-element.
Now I got the code
val map1 = Map(1 -> 2.0, 3 -> 4.0, 5 -> 6.2) withDefaultValue 0.0
val map2 = Map(0 -> 3.0, 3 -> 7.0) withDefaultValue 0.0
def myfct(terms: Map[Int, Double], term: (Int, Double)): Map[Int, Double] = ???
map1.foldLeft(map2)(myfct)
So my first element here is a Tuple2, but since map2 is a Map and not a Tuple2, what is the zero-element?
When we had a List, namely list1, we always "took the next element in list1". What is the "next element in map1? Is it another pair of map1?
In this context, you can think of a Map as a list of tuples. You can create a list like this: List(1 -> 2.0, 3 -> 4.0, 5 -> 6.2), and call foldLeft on it (that is more or less exactly what Map.foldLeft does). If you understand how foldLeft works with lists, then now you know how it works with Maps too :)
To answer your specific questions:
The first parameter of foldLeft can be of any type. You could pass in a Map instead of an Int in your first example too. It does not have to be of the same type as elements of the collection you are processing (although, it could be), as you have in your first example, nor does it need to be the same type as the collection itself, as you have it in the last example.
Consider this for the sake of example:
List(1,2,3,4,5,6).foldLeft(Map.empty[String,Int]) { case(map,elem) =>
map + (elem.toString -> elem)
}
This produces the same result as list.map { x => x.toString -> x }.toMap. As you can see, the first parameter here is a Map, that is neither List nor an Int.
The type you pass to foldLeft is also the type it returns, and the type that the function you pass in returns. It is not "element zero".
foldLeft will pass that parameter to your reducer function, along with the first element of the list. Your function will combine the two elements, and produce a new value of the same type as the first param. That value is passed in again, again with the second element ... etc.
Perhaps, inspecting the signature of foldLeft would be helpful:
foldLeft[B](z: B)(op: (B, A) ⇒ B): B
Here A is the type of your collection elements, and B can be anything, the only requirement is that the four places where it appears have the same type.
Here is another example, that is (almost) equivalent to list.mkString(","):
List(1,2,3,4,5,6).foldLeft("") {
case("", i) => i.toString
case(s,i) => s + "," + i
}
As I explained in the beginning, a map in this context is a kind of a list (a sequence rather). Just like "we always took the next element of the list" when we dealt with lists, we will take the "next element of the map" in this case. You said it yourself, the elements of the map are tuples, so that's what the type of the next element will be:
Map("one" -> 1, "two" -> 2, "three" -> 3)
.foldLeft("") {
case("", (key,value)) => key + "->" + value
case(s, (key,value)) => s + ", " + key + "->" + value
}
As mentioned above, the signature of the foldLeft operation is as shown:
foldLeft[B](z: B)(op: (B, A) ⇒ B): B
The resulting type of the operation is B. So, this answers your first question:
So my first element here is a Tuple2, but since map2 is a Map and not
a Tuple2, what is the zero-element?
The zero-element is whatever you want to output from the foldLeft. It could be an Int or a Map or anything else for that matter.
op in the function signature (i.e the second argument), is how you would like to operate on that for each element of map1, which is a (key,value) pair, or another way of writing it is as key -> value.
Let's try and understand it by building it up with more simpler operations.
val map1 = Map(1 -> 1.0, 4 -> 4.0, 5 -> 5.0) withDefaultValue 0.0
val map2 = Map(0 -> 0.0, 3 -> 3.0) withDefaultValue 0.0
// Here we're just making the output an Int
// So we just add the first element of the key-value pair.
def myOpInt(z: Int, term: (Int, Double)): Int = {
z + term._1
}
// adds all the first elements of the key-value pairs of map1 together
map1.foldLeft(0)(myOpInt) // ... output is 10
// same as above, but for map2 ...
map2.foldLeft(0)(myOpInt) // ... output is 3
Now, taking it to the next step by using a zero element (z) as an existing map...we would essentially add elements to that map we are using as z.
val map1 = Map(1 -> 1.0, 4 -> 4.0, 5 -> 5.0) withDefaultValue 0.0
val map2 = Map(0 -> 0.0, 3 -> 3.0) withDefaultValue 0.0
// Here z is a Map of Int -> Double
// We expect a pair of (Int, Double) as the terms to fold left with z
def myfct(z: Map[Int, Double], term: (Int, Double)): Map[Int, Double] = {
z + term
}
map1.foldLeft(map2)(myfct)
// output is Map(0 -> 0.0, 5 -> 5.0, 1 -> 1.0, 3 -> 3.0, 4 -> 4.0)
// i.e. same elements of both maps concatenated together, albeit out of order, since Maps don't guarantee order ...
When we had a List, namely list1, we always "took the next element in
list1". What is the "next element in map1? Is it another pair of map1?
Yes, it's another key-value pair (k,v) in map1.

How to find the number of (key , value) pairs in a map in scala?

I need to find the number of (key , value) pairs in a Map in my Scala code. I can iterate through the map and get an answer but I wanted to know if there is any direct function for this purpose or not.
you can use .size
scala> val m=Map("a"->1,"b"->2,"c"->3)
m: scala.collection.immutable.Map[String,Int] = Map(a -> 1, b -> 2, c -> 3)
scala> m.size
res3: Int = 3
Use Map#size:
The size of this traversable or iterator.
The size method is from TraversableOnce so, barring infinite sequences or sequences that shouldn't be iterated again, it can be used over a wide range - List, Map, Set, etc.

Count occurrences of each element in a List[List[T]] in Scala

Suppose you have
val docs = List(List("one", "two"), List("two", "three"))
where e.g. List("one", "two") represents a document containing terms "one" and "two", and you want to build a map with the document frequency for every term, i.e. in this case
Map("one" -> 1, "two" -> 2, "three" -> 1)
How would you do that in Scala? (And in an efficient way, assuming a much larger dataset.)
My first Java-like thought is to use a mutable map:
val freqs = mutable.Map.empty[String,Int]
for (doc <- docs)
for (term <- doc)
freqs(term) = freqs.getOrElse(term, 0) + 1
which works well enough but I'm wondering how you could do that in a more "functional" way, without resorting to a mutable map?
Try this:
scala> docs.flatten.groupBy(identity).mapValues(_.size)
res0: Map[String,Int] = Map(one -> 1, two -> 2, three -> 1)
If you are going to be accessing the counts many times, then you should avoid mapValues since it is "lazy" and, thus, would recompute the size on every access. This version gives you the same result but won't require the recomputations:
docs.flatten.groupBy(identity).map(x => (x._1, x._2.size))
The identity function just means x => x.
docs.flatten.foldLeft(new Map.WithDefault(Map[String,Int](),Function.const(0))){
(m,x) => m + (x -> (1 + m(x)))}
What a train wreck!
[Edit]
Ah, that's better!
docs.flatten.foldLeft(Map[String,Int]() withDefaultValue 0){
(m,x) => m + (x -> (1 + m(x)))}
Starting Scala 2.13, after flattening the list of lists, we can use groupMapReduce which is a one-pass alternative to groupBy/mapValues:
// val docs = List(List("one", "two"), List("two", "three"))
docs.flatten.groupMapReduce(identity)(_ => 1)(_ + _)
// Map[String,Int] = Map("one" -> 1, "three" -> 1, "two" -> 2)
This:
flattens the List of Lists as a List
groups list elements (identity) (group part of groupMapReduce)
maps each grouped value occurrence to 1 (_ => 1) (map part of groupMapReduce)
reduces values within a group of values (_ + _) by summing them (reduce part of groupMapReduce).