I fould like to understand how foldLeft works for Maps. I do understand how it works if I have a List and call on it foldLeft with a zero-element and a function:
val list1 = List(1,2,3)
list1.foldLeft(0)((a,b) => a + b)
Where I add the zero-element 0 with the first element of list1 and then add the second element of list1 and so on. So output becomes the new input and the first input is the zero-element.
Now I got the code
val map1 = Map(1 -> 2.0, 3 -> 4.0, 5 -> 6.2) withDefaultValue 0.0
val map2 = Map(0 -> 3.0, 3 -> 7.0) withDefaultValue 0.0
def myfct(terms: Map[Int, Double], term: (Int, Double)): Map[Int, Double] = ???
map1.foldLeft(map2)(myfct)
So my first element here is a Tuple2, but since map2 is a Map and not a Tuple2, what is the zero-element?
When we had a List, namely list1, we always "took the next element in list1". What is the "next element in map1? Is it another pair of map1?
In this context, you can think of a Map as a list of tuples. You can create a list like this: List(1 -> 2.0, 3 -> 4.0, 5 -> 6.2), and call foldLeft on it (that is more or less exactly what Map.foldLeft does). If you understand how foldLeft works with lists, then now you know how it works with Maps too :)
To answer your specific questions:
The first parameter of foldLeft can be of any type. You could pass in a Map instead of an Int in your first example too. It does not have to be of the same type as elements of the collection you are processing (although, it could be), as you have in your first example, nor does it need to be the same type as the collection itself, as you have it in the last example.
Consider this for the sake of example:
List(1,2,3,4,5,6).foldLeft(Map.empty[String,Int]) { case(map,elem) =>
map + (elem.toString -> elem)
}
This produces the same result as list.map { x => x.toString -> x }.toMap. As you can see, the first parameter here is a Map, that is neither List nor an Int.
The type you pass to foldLeft is also the type it returns, and the type that the function you pass in returns. It is not "element zero".
foldLeft will pass that parameter to your reducer function, along with the first element of the list. Your function will combine the two elements, and produce a new value of the same type as the first param. That value is passed in again, again with the second element ... etc.
Perhaps, inspecting the signature of foldLeft would be helpful:
foldLeft[B](z: B)(op: (B, A) ⇒ B): B
Here A is the type of your collection elements, and B can be anything, the only requirement is that the four places where it appears have the same type.
Here is another example, that is (almost) equivalent to list.mkString(","):
List(1,2,3,4,5,6).foldLeft("") {
case("", i) => i.toString
case(s,i) => s + "," + i
}
As I explained in the beginning, a map in this context is a kind of a list (a sequence rather). Just like "we always took the next element of the list" when we dealt with lists, we will take the "next element of the map" in this case. You said it yourself, the elements of the map are tuples, so that's what the type of the next element will be:
Map("one" -> 1, "two" -> 2, "three" -> 3)
.foldLeft("") {
case("", (key,value)) => key + "->" + value
case(s, (key,value)) => s + ", " + key + "->" + value
}
As mentioned above, the signature of the foldLeft operation is as shown:
foldLeft[B](z: B)(op: (B, A) ⇒ B): B
The resulting type of the operation is B. So, this answers your first question:
So my first element here is a Tuple2, but since map2 is a Map and not
a Tuple2, what is the zero-element?
The zero-element is whatever you want to output from the foldLeft. It could be an Int or a Map or anything else for that matter.
op in the function signature (i.e the second argument), is how you would like to operate on that for each element of map1, which is a (key,value) pair, or another way of writing it is as key -> value.
Let's try and understand it by building it up with more simpler operations.
val map1 = Map(1 -> 1.0, 4 -> 4.0, 5 -> 5.0) withDefaultValue 0.0
val map2 = Map(0 -> 0.0, 3 -> 3.0) withDefaultValue 0.0
// Here we're just making the output an Int
// So we just add the first element of the key-value pair.
def myOpInt(z: Int, term: (Int, Double)): Int = {
z + term._1
}
// adds all the first elements of the key-value pairs of map1 together
map1.foldLeft(0)(myOpInt) // ... output is 10
// same as above, but for map2 ...
map2.foldLeft(0)(myOpInt) // ... output is 3
Now, taking it to the next step by using a zero element (z) as an existing map...we would essentially add elements to that map we are using as z.
val map1 = Map(1 -> 1.0, 4 -> 4.0, 5 -> 5.0) withDefaultValue 0.0
val map2 = Map(0 -> 0.0, 3 -> 3.0) withDefaultValue 0.0
// Here z is a Map of Int -> Double
// We expect a pair of (Int, Double) as the terms to fold left with z
def myfct(z: Map[Int, Double], term: (Int, Double)): Map[Int, Double] = {
z + term
}
map1.foldLeft(map2)(myfct)
// output is Map(0 -> 0.0, 5 -> 5.0, 1 -> 1.0, 3 -> 3.0, 4 -> 4.0)
// i.e. same elements of both maps concatenated together, albeit out of order, since Maps don't guarantee order ...
When we had a List, namely list1, we always "took the next element in
list1". What is the "next element in map1? Is it another pair of map1?
Yes, it's another key-value pair (k,v) in map1.
Related
Why doesn't this work:
val m = Map( 1-> 2, 2-> 4, 3 ->6)
def h(k: Int, v: Int) = if (v > 2) Some(k->v) else None
m.flatMap { case(k,v) => h(k,v) }
m.flatMap { (k,v) => h(k,v) }
The one with the case statement gives me:
res1: scala.collection.immutable.Map[Int,Int] = Map(2 -> 4, 3 -> 6)
but the other one fails and says MIssing Type parameter v, and expected: Int, actual:(Int, Int)
The case keyword signifies pattern matching, so the Tuple2 (a Mapis an Iterable ofTuple2 elements) that you are flatMapping "over" gets decomposed into k and v. (The fact that flatMap works when the h function is producing an Option rather than a Map or Iterable is the Scala collections library being perhaps overly permissive.)
Without the case keyword, you are providing a function that requires two arguments, but flatMap needs a function that accepts a single argument (a Tuple2). So the second version does not typecheck.
For second one you can do this, if you don't want to use case.
m.flatMap { x => h(x._1, x._2) } // x is (key,value) pair here(each element in map), hence accessing the key , value as _1,_2 respectively
I need to check if a Traversable (which I already know to be nonEmpty) has a single element or more.
I could use size, but (tell me if I'm wrong) I suspect that this could be O(n), and traverse the collection to compute it.
I could check if tail.nonEmpty, or if .head != .last
Which are the pros and cons of the two approaches? Is there a better way? (for example, will .last do a full iteration as well?)
All approaches that cut elements from beginning of the collection and return tail are inefficient. For example tail for List is O(1), while tail for Array is O(N). Same with drop.
I propose using take:
list.take(2).size == 1 // list is singleton
take is declared to return whole collection if collection length is less that take's argument. Thus there will be no error if collection is empty or has only one element. On the other hand if collection is huge take will run in O(1) time nevertheless. Internally take will start iterating your collection, take two steps and break, putting elements in new collection to return.
UPD: I changed condition to exactly match the question
Not all will be the same, but let's take a worst case scenario where it's a List. last will consume the entire List just to access that element, as will size.
tail.nonEmpty is obtained from a head :: tail pattern match, which doesn't need to consume the entire List. If you already know the list to be non-empty, this should be the obvious choice.
But not all tail operations take constant time like a List: Scala Collections Performance
You can take a view of a traversable. You can slice the TraversableView lazily.
The initial star is because the REPL prints some output.
scala> val t: Traversable[Int] = Stream continually { println("*"); 42 }
*
t: Traversable[Int] = Stream(42, ?)
scala> t.view.slice(0,2).size
*
res1: Int = 2
scala> val t: Traversable[Int] = Stream.fill(1) { println("*"); 42 }
*
t: Traversable[Int] = Stream(42, ?)
scala> t.view.slice(0,2).size
res2: Int = 1
The advantage is that there is no intermediate collection.
scala> val t: Traversable[_] = Map((1 to 10) map ((_, "x")): _*)
t: Traversable[_] = Map(5 -> x, 10 -> x, 1 -> x, 6 -> x, 9 -> x, 2 -> x, 7 -> x, 3 -> x, 8 -> x, 4 -> x)
scala> t.take(2)
res3: Traversable[Any] = Map(5 -> x, 10 -> x)
That returns an unoptimized Map, for instance:
scala> res3.getClass
res4: Class[_ <: Traversable[Any]] = class scala.collection.immutable.HashMap$HashTrieMap
scala> Map(1->"x",2->"x").getClass
res5: Class[_ <: scala.collection.immutable.Map[Int,String]] = class scala.collection.immutable.Map$Map2
What about pattern matching?
itrbl match { case _::Nil => "one"; case _=>"more" }
I need to find the number of (key , value) pairs in a Map in my Scala code. I can iterate through the map and get an answer but I wanted to know if there is any direct function for this purpose or not.
you can use .size
scala> val m=Map("a"->1,"b"->2,"c"->3)
m: scala.collection.immutable.Map[String,Int] = Map(a -> 1, b -> 2, c -> 3)
scala> m.size
res3: Int = 3
Use Map#size:
The size of this traversable or iterator.
The size method is from TraversableOnce so, barring infinite sequences or sequences that shouldn't be iterated again, it can be used over a wide range - List, Map, Set, etc.
Suppose you have
val docs = List(List("one", "two"), List("two", "three"))
where e.g. List("one", "two") represents a document containing terms "one" and "two", and you want to build a map with the document frequency for every term, i.e. in this case
Map("one" -> 1, "two" -> 2, "three" -> 1)
How would you do that in Scala? (And in an efficient way, assuming a much larger dataset.)
My first Java-like thought is to use a mutable map:
val freqs = mutable.Map.empty[String,Int]
for (doc <- docs)
for (term <- doc)
freqs(term) = freqs.getOrElse(term, 0) + 1
which works well enough but I'm wondering how you could do that in a more "functional" way, without resorting to a mutable map?
Try this:
scala> docs.flatten.groupBy(identity).mapValues(_.size)
res0: Map[String,Int] = Map(one -> 1, two -> 2, three -> 1)
If you are going to be accessing the counts many times, then you should avoid mapValues since it is "lazy" and, thus, would recompute the size on every access. This version gives you the same result but won't require the recomputations:
docs.flatten.groupBy(identity).map(x => (x._1, x._2.size))
The identity function just means x => x.
docs.flatten.foldLeft(new Map.WithDefault(Map[String,Int](),Function.const(0))){
(m,x) => m + (x -> (1 + m(x)))}
What a train wreck!
[Edit]
Ah, that's better!
docs.flatten.foldLeft(Map[String,Int]() withDefaultValue 0){
(m,x) => m + (x -> (1 + m(x)))}
Starting Scala 2.13, after flattening the list of lists, we can use groupMapReduce which is a one-pass alternative to groupBy/mapValues:
// val docs = List(List("one", "two"), List("two", "three"))
docs.flatten.groupMapReduce(identity)(_ => 1)(_ + _)
// Map[String,Int] = Map("one" -> 1, "three" -> 1, "two" -> 2)
This:
flattens the List of Lists as a List
groups list elements (identity) (group part of groupMapReduce)
maps each grouped value occurrence to 1 (_ => 1) (map part of groupMapReduce)
reduces values within a group of values (_ + _) by summing them (reduce part of groupMapReduce).
I have a Map[String, String] and want to concatenate the values to a single string.
I can see how to do this using a List...
scala> val l = List("te", "st", "ing", "123")
l: List[java.lang.String] = List(te, st, ing, 123)
scala> l.reduceLeft[String](_+_)
res8: String = testing123
fold* or reduce* seem to be the right approach I just can't get the syntax right for a Map.
Folds on a map work the same way they would on a list of pairs. You can't use reduce because then the result type would have to be the same as the element type (i.e. a pair), but you want a string. So you use foldLeft with the empty string as the neutral element. You also can't just use _+_ because then you'd try to add a pair to a string. You have to instead use a function that adds the accumulated string, the first value of the pair and the second value of the pair. So you get this:
scala> val m = Map("la" -> "la", "foo" -> "bar")
m: scala.collection.immutable.Map[java.lang.String,java.lang.String] = Map(la -> la, foo -> bar)
scala> m.foldLeft("")( (acc, kv) => acc + kv._1 + kv._2)
res14: java.lang.String = lalafoobar
Explanation of the first argument to fold:
As you know the function (acc, kv) => acc + kv._1 + kv._2 gets two arguments: the second is the key-value pair currently being processed. The first is the result accumulated so far. However what is the value of acc when the first pair is processed (and no result has been accumulated yet)? When you use reduce the first value of acc will be the first pair in the list (and the first value of kv will be the second pair in the list). However this does not work if you want the type of the result to be different than the element types. So instead of reduce we use fold where we pass the first value of acc as the first argument to foldLeft.
In short: the first argument to foldLeft says what the starting value of acc should be.
As Tom pointed out, you should keep in mind that maps don't necessarily maintain insertion order (Map2 and co. do, but hashmaps do not), so the string may list the elements in a different order than the one in which you inserted them.
The question has been answered already, but I'd like to point out that there are easier ways to produce those strings, if that's all you want. Like this:
scala> val l = List("te", "st", "ing", "123")
l: List[java.lang.String] = List(te, st, ing, 123)
scala> l.mkString
res0: String = testing123
scala> val m = Map(1 -> "abc", 2 -> "def", 3 -> "ghi")
m: scala.collection.immutable.Map[Int,java.lang.String] = Map((1,abc), (2,def), (3,ghi))
scala> m.values.mkString
res1: String = abcdefghi