trouble understanding with Map and map in scala - scala

I tried to make some sort of convenient class (below) to hold folder and get file by using filename (string). This work as expect but one thing I don't understand is map part Map(folder.listFiles map {file => file.getName -> file}:_*).
I place :_* there to prevent some kind of type incompatible but I don't know what does it really do. Also, what is _* and could I replace is with anything more specific ?
thanks
class FolderAsMap (val folderName:String){
val folder = new File(folderName)
private val filesAsMap: Map[String, File] = Map(folder.listFiles map
{file => file.getName -> file}:_*)
def get(fileName:String): Option[File] = {
filesAsMap.get(fileName)
}
}

: _* is correct. Alternatively, you can use toMap:
folder.listFiles map {file => file.getName -> file}.toMap
Map(...) is method apply in object Map: def apply [A, B] (elems: (A, B)*): Map[A, B]. It has a repeated parameter. It is expected to be called with multiple parameters. The : _* is used to signal you are passing all the parameters as just one Seq argument.
It avoids ambiguities. In java, (where equivalent varargs are arrays instead of Seqs) there is a possible ambiguity, if a method f(Object... args) and you call it with f(someArray), it could mean that args has just one item, with is someArray (so f receives an array of just one element, which its someArray), or args is someArray and f receives someArray directly). Java choose the second version. In scala, with a richer type system and Seq rather than Array the ambiguity may arise much more often, and the rule is that you always have to write : _* when passing all arguments as one, even when no ambiguity is possible, as in here, rather than a complex rule to tell when there is an actual ambiguity.

The _* makes the compiler pass each element of folder.listFiles map { file => file.getName -> file} as an own argument to Map instead of all of it as one argument.
In this case the map function creates a Array (because folder.listFiles returns that type). So if you write:
val files = folder.listFiles map { file => file.getName -> file }
...the returned type will be Array[(String, File)]. To convert this to a Map you will need to pass files one by one to the maps constructor using the _* (or use the method toMap like #didierd wrote):
val filesAsMap = Map(files : _*)

Related

Convert Map of mutable Set of strings to Map of immutable set of strings in Scala

I have a "dirtyMap" which is immutable.Map[String, collection.mutable.Set[String]]. I want to convert dirtyMap to immutable Map[String, Set[String]]. Could you please let me know how to do this. I tried couple of ways that didn't produce positive result
Method 1: Using map function
dirtyMap.toSeq.map(e => {
val key = e._1
val value = e._2.to[Set]
e._1 -> e._2
}).toMap()
I'm getting syntax error
Method 2: Using foreach
dirtyMap.toSeq.foreach(e => {
val key = e._1
val value = e._2.to[Set]
e._1 -> e._2
}).toMap()
cannot apply toMap to output of foreach
Disclaimer: I am a Scala noob if you couldn't tell.
UPDATE: Method 1 works when I remove parenthesis from toMap() function. However, following is an elegant solution
dirtyMap.mapValues(v => v.toSet)
Thank you Gabriele for providing answer with a great explanation. Thanks Duelist and Debojit for your answer as well
You can simply do:
dirtyMap.mapValues(_.toSet)
mapValues will apply the function to only the values of the Map, and .toSet converts a mutable Set to an immutable one.
(I'm assuming dirtyMap is a collection.immutable.Map. In case it's a mutable one, just add toMap in the end)
If you're not familiar with the underscore syntax for lambdas, it's a shorthand for:
dirtyMap.mapValues(v => v.toSet)
Now, your first example doesn't compile because of the (). toMap takes no explicit arguments, but it takes an implicit argument. If you want the implicit argument to be inferred automatically, just remove the ().
The second example doesn't work because foreach returns Unit. This means that foreach executes side effects, but it doesn't return a value. If you want to chain transformations on a value, never use foreach, use map instead.
You can use
dirtyMap.map({case (k,v) => (k,v.toSet)})
You can use flatMap for it:
dirtyMap.flatMap(entry => Map[String, Set[String]](entry._1 -> entry._2.toSet)).toMap
Firstly you map each entry to immutable.Map(entry) with updated entry, where value is immutable.Set now. Your map looks like this: mutable.Map.
And then flatten is called, so you get mutable.Map with each entry with immutable.Set. And then toMap converts this map to to immutable.
This variant is complicated a bit, you simply can use dirtyMap.map(...).toMap as Debojit Paul mentioned.
Another variant is foldLeft:
dirtyMap.foldLeft(Map[String, Set[String]]())(
(map, entry) => map + (entry._1 -> entry._2.toSet)
)
You specify accumulator, which is immutable.Map and you add each entry to this map with converted Set.
As for me, I think using foldLeft is more effective way.

Scala: list to set using flatMap

I have class with field of type Set[String]. Also, I have list of objects of this class. I'd like to collect all strings from all sets of these objects into one set. Here is how I can do it already:
case class MyClass(field: Set[String])
val list = List(
MyClass(Set("123")),
MyClass(Set("456", "798")),
MyClass(Set("123", "798"))
)
list.flatMap(_.field).toSet // Set(123, 456, 798)
It works, but I think, I can achieve the same using only flatMap, without toSet invocation. I tried this, but it had given compilation error:
// error: Cannot construct a collection of type Set[String]
// with elements of type String based on a collection of type List[MyClass].
list.flatMap[String, Set[String]](_.field)
If I change type of list to Set (i.e., val list = Set(...)), then such flatMap invocation works.
So, can I use somehow Set.canBuildFrom or any other CanBuildFrom object to invoke flatMap on List object, so that I'll get Set as a result?
The CanBuildFrom instance you want is called breakOut and has to be provided as a second parameter:
import scala.collection.breakOut
case class MyClass(field: Set[String])
val list = List(
MyClass(Set("123")),
MyClass(Set("456", "798")),
MyClass(Set("123", "798"))
)
val s: Set[String] = list.flatMap(_.field)(breakOut)
Note that explicit type annotation on variable s is mandatory - that's how the type is chosen.
Edit:
If you're using Scalaz or cats, you can use foldMap as well:
import scalaz._, Scalaz._
list.foldMap(_.field)
This does essentially what mdms answer proposes, except the Set.empty and ++ parts are already baked in.
The way flatMap work in Scala is that it can only remove one wrapper for the same type of wrappers i.e. List[List[String]] -> flatMap -> List[String]
if you apply flatMap on different wrapper data types then you will always get the final outcome as higher level wrapper data type i.e.List[Set[String]] -> flatMap -> List[String]
if you want to apply the flatMap on different wrapper type i.e. List[Set[String]] -> flatMap -> Set[String] in you have 2 options :-
Explicitly cast the one datatype wrapper to another i.e. list.flatMap(_.field).toSet or
By providing implicit converter ie. implicit def listToSet(list: List[String]): Set[String] = list.toSet and the you can get val set:Set[String] = list.flatMap(_.field)
only then what you are trying to achieve will be accomplished.
Conclusion:- if you apply flatMap on 2 wrapped data type then you will always get the final result as op type which is on top of wrapper data type i.e. List[Set[String]] -> flatMap -> List[String] and if you want to convert or cast to different datatype then either you need to implicitly or explicitly cast it.
You could maybe provide a specific CanBuildFrom, but why not to use a fold instead?
list.foldLeft(Set.empty[String]){case (set, myClass) => set ++ myClass.field}
Still just one pass through the collection, and if you are sure the list is not empty, you could even user reduceLeft instead.

Is there an implicit in this call to flatMap?

In this code :
import java.io.File
def recursiveListFiles(f: File): Array[File] = {
val these = f.listFiles
these ++ these.filter(_.isDirectory).flatMap(recursiveListFiles)
}
taken from : How do I list all files in a subdirectory in scala?
Why does flatMap(recursiveListFiles) compile ? as recursiveListFiles accepts a File parameter ? Is file parameter implicitly passed to recursiveListFiles ?
No because the expanded flatMap looks like:
flatMap(file => recursiveListFiles(file))
So each file in these is getting mapped to an Array[File], which gets flattened in flatMap. No implicit magic here (in the way that you're asking).
Sort of. flatMap quite explicitly passes an argument to its own argument. This is the nature of a higher-order function -- you basically give it a callback, it calls the callback, and then it does something with the result. The only implicit thing happening is conversion of a method to a function type.
Any method can be converted to an equivalent function type. So def recursiveListFiles(f: File): Array[File] is equivalent to File => Array[File], which is great, because on an Array[File], you have flatMap[B](f: File => Array[B]): Array[B], and your method's function type fits perfectly: the type parameter B is chosen to be File.
As mentioned in another answer, you could explicitly create that function by doing:
these.filter(_.isDirectory).flatMap(file => recursiveListFiles(file)) or
these.filter(_.isDirectory).flatMap(recursiveListFiles(_)) or
these.filter(_.isDirectory).flatMap(recursiveListFiles _)
In more complex circumstances, you might need to work with one of those more verbose options, but in your case, no need to bother.
flatMap takes a function f: (A) ⇒ GenTraversableOnce[B] to return List[B].
In your case, it is taking recursiveListFiles which is a File ⇒ Array[File] to therefore return a List[File]. This resulting List[File] is then concatenated to these.

Scala SortedMap.map method returns non-sorted map when static type is Map

I encountered some unauthorized strangeness working with Scala's SortedMap[A,B]. If I declare the reference to SortedMap[A,B] "a" to be of type Map[A,B], then map operations on "a" will produce a non-sorted map implementation.
Example:
import scala.collection.immutable._
object Test extends App {
val a: Map[String, String] = SortedMap[String, String]("a" -> "s", "b" -> "t", "c" -> "u", "d" -> "v", "e" -> "w", "f" -> "x")
println(a.getClass+": "+a)
val b = a map {x => x} // identity
println(b.getClass+": "+b)
}
The output of the above is:
class scala.collection.immutable.TreeMap: Map(a -> s, b -> t, c -> u, d -> v, e -> w, f -> x)
class scala.collection.immutable.HashMap$HashTrieMap: Map(e -> w, f -> x, a -> s, b -> t, c -> u, d -> v)
The order of key/value pairs before and after the identity transformation is not the same.
The strange thing is that removing the type declaration from "a" makes this issue go away. That's fine in a toy example, but makes SortedMap[A,B] unusable for passing to methods that expect Map[A,B] parameters.
In general, I would expect higher order functions such as "map" and "filter" to not change the fundamental properties of the collections they are applied to.
Does anyone know why "map" is behaving like this?
The map method, like most of the collection methods, isn't defined specifically for SortedMap. It is defined on a higher-level class (TraversableLike) and uses a "builder" to turn the mapped result into the correct return type.
So how does it decide what the "correct" return type is? Well, it tries to give you back the return type that it started out as. When you tell Scala that you have a Map[String,String] and ask it to map, then the builder has to figure out how to "build" the type for returning. Since you told Scala that the input was a Map[String,String], the builder decides to build a Map[String,String] for you. The builder doesn't know that you wanted a SortedMap, so it doesn't give you one.
The reason it works when you leave off the the Map[String,String] type annotation is that Scala infers that the type of a is SortedMap[String,String]. Thus, when you call map, you are calling it on a SortedMap, and the builder knows to construct a SortedMap for returning.
As far as your assertion that methods shouldn't change "fundamental properties", I think you're looking at it from the wrong angle. The methods will always give you back an object that conforms to the type that you specify. It's the type that defines the behavior of the builder, not the underlying implementation. When you think about like that, it's the type that forms the contract for how methods should behave.
Why might we want this?
Why is this the preferred behavior? Let's look at a concrete example. Say we have a SortedMap[Int,String]
val sortedMap = SortedMap[Int, String](1 -> "s", 2 -> "t", 3 -> "u", 4 -> "v")
If I were to map over it with a function that modifies the keys, I run the risk of losing elements when their keys clash:
scala> sortedMap.map { case (k, v) => (k / 2, v) }
res3: SortedMap[Int,String] = Map(0 -> s, 1 -> u, 2 -> v)
But hey, that's fine. It's a Map after all, and I know it's a Map, so I should expect that behavior.
Now let's say we have a function that accepts an Iterable of pairs:
def f(iterable: Iterable[(Int, String)]) =
iterable.map { case (k, v) => (k / 2, v) }
Since this function has nothing to do with Maps, it would be very surprising if the result of this function ever had fewer elements than the input. After all, map on a Iterable should produce the mapped version of each element. But a Map is an Iterable of pairs, so we can pass it into this function. So what happens in Scala when we do?
scala> f(sortedMap)
res4: Iterable[(Int, String)] = List((0,s), (1,t), (1,u), (2,v))
Look at that! No elements lost! In other words, Scala won't surprise us by violating our expectations about how map on an Iterable should work. If the builder instead tried to produce a SortedMap based on the fact that the input was a SortedMap, then our function f would have surprising results, and this would be bad.
So the moral of the story is: Use the types to tell the collections framework how to deal with your data. If you want your code to be able to expect that a map is sorted, then you should type it as SortedMap.
The signature of map is:
def
map[B, That](f: ((A, B)) ⇒ B)(implicit bf: CanBuildFrom[Map[A, B], B, That]): That
The implicit parameter bf is used to build the resulting collection. So in your example, since the type of a is Map[String, String], the type of bf is:
val cbf = implicitly[CanBuildFrom[Map[String, String], (String, String), Map[String, String]]]
Which just builds a Map[String, String] which doesn't have any of the properties of the SortedMap. See:
cbf() ++= List("b" -> "c", "e" -> "g", "a" -> "b") result
For more information, see this excellent article: http://docs.scala-lang.org/overviews/core/architecture-of-scala-collections.html
As dyross points out, it's the Builder, which is chosen (via the CanBuildFrom) on the basis of the target type, which determines the class of the collection that you get out of a map operation. Now this might not be the behaviour that you wanted, but it does for example allow you select the target type:
val b: SortedMap[String, String] = a.map(x => x)(collection.breakOut)
(breakOut gives a generic CanBuildFrom whose type is determined by context, i.e. our type annotation.)
So you could add some type parameters that allow you accept any sort of Map or Traversable (see this question), which would allow you do do a map operation in your method while retaining the correct type information, but as you can see it's not straightforward.
I think a much simpler approach is instead to define functions that you apply to your collections using the collections' map, flatMap etc methods, rather than by sending the collection itself to a method.
i.e. instead of
def f[Complex type parameters](xs: ...)(complex implicits) = ...
val result = f(xs)
do
val f: X => Y = ...
val results = xs map f
In short: you explicitly declared a to be of type Map, and the Scala collections framework tries very hard for higher order functions such as map and filter to not change the fundamental properties of the collections they are applied to, therefore it will also return a Map since that is what you explicitly told it you wanted.

Converting a Scala Map to a List

I have a map that I need to map to a different type, and the result needs to be a List. I have two ways (seemingly) to accomplish what I want, since calling map on a map seems to always result in a map. Assuming I have some map that looks like:
val input = Map[String, List[Int]]("rk1" -> List(1,2,3), "rk2" -> List(4,5,6))
I can either do:
val output = input.map{ case(k,v) => (k.getBytes, v) } toList
Or:
val output = input.foldRight(List[Pair[Array[Byte], List[Int]]]()){ (el, res) =>
(el._1.getBytes, el._2) :: res
}
In the first example I convert the type, and then call toList. I assume the runtime is something like O(n*2) and the space required is n*2. In the second example, I convert the type and generate the list in one go. I assume the runtime is O(n) and the space required is n.
My question is, are these essentially identical or does the second conversion cut down on memory/time/etc? Additionally, where can I find information on storage and runtime costs of various scala conversions?
Thanks in advance.
My favorite way to do this kind of things is like this:
input.map { case (k,v) => (k.getBytes, v) }(collection.breakOut): List[(Array[Byte], List[Int])]
With this syntax, you are passing to map the builder it needs to reconstruct the resulting collection. (Actually, not a builder, but a builder factory. Read more about Scala's CanBuildFroms if you are interested.) collection.breakOut can exactly be used when you want to change from one collection type to another while doing a map, flatMap, etc. — the only bad part is that you have to use the full type annotation for it to be effective (here, I used a type ascription after the expression). Then, there's no intermediary collection being built, and the list is constructed while mapping.
Mapping over a view in the first example could cut down on the space requirement for a large map:
val output = input.view.map{ case(k,v) => (k.getBytes, v) } toList