Convert Map of mutable Set of strings to Map of immutable set of strings in Scala

Convert Map of mutable Set of strings to Map of immutable set of strings in Scala - scala

I have a "dirtyMap" which is immutable.Map[String, collection.mutable.Set[String]]. I want to convert dirtyMap to immutable Map[String, Set[String]]. Could you please let me know how to do this. I tried couple of ways that didn't produce positive result
Method 1: Using map function
dirtyMap.toSeq.map(e => {
val key = e._1
val value = e._2.to[Set]
e._1 -> e._2
}).toMap()
I'm getting syntax error
Method 2: Using foreach
dirtyMap.toSeq.foreach(e => {
val key = e._1
val value = e._2.to[Set]
e._1 -> e._2
}).toMap()
cannot apply toMap to output of foreach
Disclaimer: I am a Scala noob if you couldn't tell.
UPDATE: Method 1 works when I remove parenthesis from toMap() function. However, following is an elegant solution
dirtyMap.mapValues(v => v.toSet)
Thank you Gabriele for providing answer with a great explanation. Thanks Duelist and Debojit for your answer as well

You can simply do:
dirtyMap.mapValues(_.toSet)
mapValues will apply the function to only the values of the Map, and .toSet converts a mutable Set to an immutable one.
(I'm assuming dirtyMap is a collection.immutable.Map. In case it's a mutable one, just add toMap in the end)
If you're not familiar with the underscore syntax for lambdas, it's a shorthand for:
dirtyMap.mapValues(v => v.toSet)
Now, your first example doesn't compile because of the (). toMap takes no explicit arguments, but it takes an implicit argument. If you want the implicit argument to be inferred automatically, just remove the ().
The second example doesn't work because foreach returns Unit. This means that foreach executes side effects, but it doesn't return a value. If you want to chain transformations on a value, never use foreach, use map instead.

You can use
dirtyMap.map({case (k,v) => (k,v.toSet)})

You can use flatMap for it:
dirtyMap.flatMap(entry => Map[String, Set[String]](entry._1 -> entry._2.toSet)).toMap
Firstly you map each entry to immutable.Map(entry) with updated entry, where value is immutable.Set now. Your map looks like this: mutable.Map.
And then flatten is called, so you get mutable.Map with each entry with immutable.Set. And then toMap converts this map to to immutable.
This variant is complicated a bit, you simply can use dirtyMap.map(...).toMap as Debojit Paul mentioned.
Another variant is foldLeft:
dirtyMap.foldLeft(Map[String, Set[String]]())(
(map, entry) => map + (entry._1 -> entry._2.toSet)
)
You specify accumulator, which is immutable.Map and you add each entry to this map with converted Set.
As for me, I think using foldLeft is more effective way.

Related

Char count in string

I'm new to scala and FP in general and trying to practice it on a dummy example.
val counts = ransomNote.map(e=>(e,1)).reduceByKey{case (x,y) => x+y}
The following error is raised:
Line 5: error: value reduceByKey is not a member of IndexedSeq[(Char, Int)] (in solution.scala)
The above example looks similar to staring FP primer on word count, I'll appreciate it if you point on my mistake.

It looks like you are trying to use a Spark method on a Scala collection. The two APIs have a few similarities, but reduceByKey is not part of it.
In pure Scala you can do it like this:
val counts =
ransomNote.foldLeft(Map.empty[Char, Int].withDefaultValue(0)) {
(counts, c) => counts.updated(c, counts(c) + 1)
}
foldLeft iterates over the collection from the left, using the empty map of counts as the accumulated state (which returns 0 is no value is found), which is updated in the function passed as argument by being updated with the found value, incremented.
Note that accessing a map directly (counts(c)) is likely to be unsafe in most situations (since it will throw an exception if no item is found). In this situation it's fine because in this scope I know I'm using a map with a default value. When accessing a map you will more often than not want to use get, which returns an Option. More on that on the official Scala documentation (here for version 2.13.2).
You can play around with this code here on Scastie.

On Scala 2.13 you can use the new groupMapReduce
ransomNote.groupMapReduce(identity)(_ => 1)(_ + _)

val str = "hello"
val countsMap: Map[Char, Int] = str
.groupBy(identity)
.mapValues(_.length)
println(countsMap)

Scala practices: lists and case classes

I've just started using Scala/Spark and having come from a Java background and I'm still trying to wrap my head around the concept of immutability and other best practices of Scala.
This is a very small segment of code from a larger program:
intersections is RDD(Key, (String, String))
obs is (Key, (String, String))
Data is just a case class I've defined above.
val intersections = map1 join map2
var listOfDatas = List[Data]()
intersections take NumOutputs foreach (obs => {
listOfDatas ::= ParseInformation(obs._1.key, obs._2._1, obs._2._2)
})
listOfDatas foreach println
This code works and does what I need it to do, but I was wondering if there was a better way of making this happen. I'm using a variable list and rewriting it with a new list every single time I iterate, and I'm sure there has to be a better way to create an immutable list that's populated with the results of the ParseInformation method call. Also, I remember reading somewhere that instead of accessing the tuple values directly, the way I have done, you should use case classes within functions (as partial functions I think?) to improve readability.
Thanks in advance for any input!

This might work locally, but only because you are takeing locally. It will not work once distributed as the listOfDatas is passed to each worker as a copy. The better way of doing this IMO is:
val processedData = intersections map{case (key, (item1, item2)) => {
ParseInfo(key, item1, item2)
}}
processedData foreach println
A note for a new to functional dev: If all you are trying to do is transform data in an iterable (List), forget foreach. Use map instead, which runs your transformation on each item and spits out a new iterable of the results.

What's the type of intersections? It looks like you can replace foreach with map:
val listOfDatas: List[Data] =
intersections take NumOutputs map (obs => {
ParseInformation(obs._1.key, obs._2._1, obs._2._2)
})

trouble understanding with Map and map in scala

I tried to make some sort of convenient class (below) to hold folder and get file by using filename (string). This work as expect but one thing I don't understand is map part Map(folder.listFiles map {file => file.getName -> file}:_*).
I place :_* there to prevent some kind of type incompatible but I don't know what does it really do. Also, what is _* and could I replace is with anything more specific ?
thanks
class FolderAsMap (val folderName:String){
val folder = new File(folderName)
private val filesAsMap: Map[String, File] = Map(folder.listFiles map
{file => file.getName -> file}:_*)
def get(fileName:String): Option[File] = {
filesAsMap.get(fileName)
}
}

: _* is correct. Alternatively, you can use toMap:
folder.listFiles map {file => file.getName -> file}.toMap
Map(...) is method apply in object Map: def apply [A, B] (elems: (A, B)*): Map[A, B]. It has a repeated parameter. It is expected to be called with multiple parameters. The : _* is used to signal you are passing all the parameters as just one Seq argument.
It avoids ambiguities. In java, (where equivalent varargs are arrays instead of Seqs) there is a possible ambiguity, if a method f(Object... args) and you call it with f(someArray), it could mean that args has just one item, with is someArray (so f receives an array of just one element, which its someArray), or args is someArray and f receives someArray directly). Java choose the second version. In scala, with a richer type system and Seq rather than Array the ambiguity may arise much more often, and the rule is that you always have to write : _* when passing all arguments as one, even when no ambiguity is possible, as in here, rather than a complex rule to tell when there is an actual ambiguity.

The _* makes the compiler pass each element of folder.listFiles map { file => file.getName -> file} as an own argument to Map instead of all of it as one argument.
In this case the map function creates a Array (because folder.listFiles returns that type). So if you write:
val files = folder.listFiles map { file => file.getName -> file }
...the returned type will be Array[(String, File)]. To convert this to a Map you will need to pass files one by one to the maps constructor using the _* (or use the method toMap like #didierd wrote):
val filesAsMap = Map(files : _*)

Converting a Scala Map to a List

I have a map that I need to map to a different type, and the result needs to be a List. I have two ways (seemingly) to accomplish what I want, since calling map on a map seems to always result in a map. Assuming I have some map that looks like:
val input = Map[String, List[Int]]("rk1" -> List(1,2,3), "rk2" -> List(4,5,6))
I can either do:
val output = input.map{ case(k,v) => (k.getBytes, v) } toList
Or:
val output = input.foldRight(List[Pair[Array[Byte], List[Int]]]()){ (el, res) =>
(el._1.getBytes, el._2) :: res
}
In the first example I convert the type, and then call toList. I assume the runtime is something like O(n*2) and the space required is n*2. In the second example, I convert the type and generate the list in one go. I assume the runtime is O(n) and the space required is n.
My question is, are these essentially identical or does the second conversion cut down on memory/time/etc? Additionally, where can I find information on storage and runtime costs of various scala conversions?
Thanks in advance.

My favorite way to do this kind of things is like this:
input.map { case (k,v) => (k.getBytes, v) }(collection.breakOut): List[(Array[Byte], List[Int])]
With this syntax, you are passing to map the builder it needs to reconstruct the resulting collection. (Actually, not a builder, but a builder factory. Read more about Scala's CanBuildFroms if you are interested.) collection.breakOut can exactly be used when you want to change from one collection type to another while doing a map, flatMap, etc. — the only bad part is that you have to use the full type annotation for it to be effective (here, I used a type ascription after the expression). Then, there's no intermediary collection being built, and the list is constructed while mapping.

Mapping over a view in the first example could cut down on the space requirement for a large map:
val output = input.view.map{ case(k,v) => (k.getBytes, v) } toList

Scala: What is the most efficient way convert a Map[K,V] to an IntMap[V]?

Let"s say I have a class Point with a toInt method, and I have an immutable Map[Point,V], for some type V. What is the most efficient way in Scala to convert it to an IntMap[V]? Here is my current implementation:
def pointMap2IntMap[T](points: Map[Point,T]): IntMap[T] = {
var result: IntMap[T] = IntMap.empty[T]
for(t <- points) {
result += (t._1.toInt, t._2)
}
result
}
[EDIT] I meant primarily faster, but I would also be interested in shorter versions, even if they are not obviously faster.

IntMap has a built-in factory method (apply) for this:
IntMap(points.map(p => (p._1.toInt, p._2)).toSeq: _*)
If speed is an issue, you may use:
points.foldLeft(IntMap.empty[T])((m, p) => m.updated(p._1.toInt, p._2))

A one liner that uses breakOut to obtain an IntMap. It does a map to a new collection, using a custom builder factory CanBuildFrom which the breakOut call resolves:
Map[Int, String](1 -> "").map(kv => kv)(breakOut[Map[Int, String], (Int, String), immutable.IntMap[String]])
In terms of performance, it's hard to tell, but it creates a new IntMap, goes through all the bindings and adds them to the IntMap. A handwritten iterator while loop (preceded with a pattern match to check if the source map is an IntMap) would possibly result in somewhat better performance.