Scala set vs map in for comprehension - scala

Playing around with Scala I'm facing these two similar pieces of code that puzzle me:
val m = Map("a"->2D, "b"->3D)
for((k, v) <- m) yield (v, k) // Yields Map(4.0 -> a, 3.0 -> b)
for(k <- m.keys) yield (m(k), k) // Yields Set((4.0,a), (3.0,b))
Why the different behavior?
Is it possible to change the second comprehension so that it yields a Map instead of a Set?
I sense there is something good to learn here, any additional pointers appreciated

Recall that a for comprehension is de-sugared into map() and flatMap() (and withFilter()) calls. In this case, because each of your examples has a single generator (<-) each one becomes a single map() call.
Also recall that map() will return the same monad (wrapper type) that it was called on.
In the 1st example you're mapping over a Map so you get a Map back: from Map[String,Double] to Map[Double,String]. The tuples are transformed in to key->value pairs.
In the 2nd example you're mapping over a Set of elements from the keys of a Map, so you get a Set back. No tuple transformation takes place. They are left as tuples.
To get a Map out of the 2nd example, i.e. to get the tuples transformed, wrap the entire for in parentheses and tag a .toMap at the end.

Related

Is the map generator from the EPFL online course able to generate every possible map?

https://www.coursera.org/learn/progfun2 assignment for Week 1 shows, as an example, a generator for maps of type Map[Int, Int]:
lazy val genMap: Gen[Map[Int,Int]] = oneOf(
const(Map.empty[Int,Int]),
for {
k <- arbitrary[Int]
v <- arbitrary[Int]
m <- oneOf(const(Map.empty[Int,Int]), genMap)
} yield m.updated(k, v)
)
I'm new to Scala, but I'm familiar with generators in imperative programming languages. My understanding of the generator's execution flow is as follows:
arbitrary[Int] is called, it returns a generator yielding an endless sequence of Ints, the first generated value is assigned to k
arbitrary[Int] is called again, it returns a new generator, the first generated value is assigned to v
A random map is created recursively, updated with k->v, and yielded to the consumer
When the next value from the generator is requested, the execution resumes at m <- ... definition, proceeding with a new random m and the same k->v mapping
Neither const nor the recursive genMap ever run out of values, meaning that the "loop" for m never terminates, so new values for v and k are never requested from the corresponding arbitrary generators.
My conclusion is that all generated maps would either be empty or include the k->v mapping generated in the first iteration of the outermost invocation, i.e. genMap can never generate a non-empty map without such a mapping.
Q1: are my analysis and my conclusion correct?
Q2: if they are, how can I implement a generator which, after generating a first map, would have non-zero chance of generating any possible map?
Q3: if I simplify the last definition in the for-expression to m <- genMap, does that change the generator's behaviour in any way?
In short, your analysis and conclusion aren't correct.
I suspect the root of the misunderstanding is in interpreting for as a loop (it's not in general, and specifically not so in this context (when dealing with things that are more explicitly collections, for is close enough, I guess)).
I'll explain from the top down.
oneOf, given 1 or more generators will create a generator which, when asked to generate a value, will defer to one of the the given generators by random selection. So
oneOf(
const(Map.empty[Int, Int]),
k: Gen[Map[Int, Int]] // i.e. some generator for Map[Int, Int]
)
The output might be
someMapFromK, Map.empty, someMapFromK, someMapFromK, Map.empty, Map.empty...
In this case, our k is
for {
k <- arbitrary[Int]
v <- arbitrary[Int]
m <- oneOf(const(Map.empty[Int, Int]), genMap) // genMap being the name the outermost generator will be bound to
} yield m.updated(k)
for is syntactic sugar for calls to flatMap and map:
arbitrary[Int].flatMap { k =>
arbitrary[Int].flatMap { v =>
oneOf(const(Map.empty[Int, Int]), genMap).map { m =>
m.updated(k, v)
}
}
}
For something like List, map and flatMap consume the entire collection. Gen is lazier:
flatMap basically means generate a value, and feed that value to a function that results in a Gen
map basically means generate a value, and transform it
If we imagined a method on Gen named sample which gave us the "next" generated value (for this purpose, we'll say that for a Gen[T] it will result in T and never throw an exception, etc.) genMap is exactly analogous to:
trait SimpleGen[T] { def sample: T }
lazy val genMap: SimpleGen[Map[Int, Int]] = new SimpleGen[Map[Int, Int]] {
def sample: Map[Int, Int] =
if (scala.util.Random.nextBoolean) Map.empty
else {
val k = arbitrary[Int].sample
val v = arbitrary[Int].sample
val m =
if (scala.util.Random.nextBoolean) Map.empty
else genMap.sample // Since genMap is lazy, we can recurse
m.updated(k, v)
}
}
Regarding the third question, in the original definition, the extra oneOf serves to bound the recursion depth to prevent the stack from being blown. For that definition, there's a 1/4 chance of going recursive, while replacing the inner oneOf with genMap would have a 1/2 chance of going recursive. Thus (ignoring the chance of a collision in the ks), for the first:
50% chance of empty (50% chance of 1+)
37.5% chance of size 1 (12.5% chance of 2+)
9.375% chance of size 2 (3.125% chance of 3+)
2.34375 chance of size 3 (0.78125% chance of 4+)...
While for the second:
50% chance of empty
25% chance of size 1
12.5% chance of size 2
6.25% chance of size 3...
Technically the possibility of stack overflow implies that depending on how many recursions you can make there's a maximum number of k -> v pairs in the Map you can generate, so there are almost certainly Maps that could not be generated.

Removing duplicates (ints) in an array and replacing them with chars

So i'm trying to make a basic hitori solver, but i am not sure where i should start. I'm still new to Scala.
My first issue is that i'm trying to have an array of some ints (1,2,3,4,2)
and making the program output them like this: (1,2,3,4,B)
notice that the duplicate has become a char B.
Where do i start? Here is what i already did, but didn't do what i excatly need.
val s = lines.split(" ").toSet;
var jetSet = s
for(i<-jetSet){
print(i);
}
One way is to fold over the numbers, left to right, building the Set[Int], for the uniqueness test, and the list of output, as you go along.
val arr = Array(1,2,3,4,2)
arr.foldLeft((Set[Int](),List[String]())){case ((s,l),n) =>
if (s(n)) (s,"B" :: l)
else (s + n, n.toString :: l)
}._2.reverse // res0: List[String] = List(1, 2, 3, 4, B)
From here you can use mkString() to format the output as desired.
What I'd suggest is to break your program into a number of steps and try to solve those.
As a first step you could transform the list into tuples of the numbers and the number of times they have appeared so far ...
(1,2,3,4,2) becomes ((1,1),(2,1),(3,1),(4,1),(2,2)
Next step it's easy to map over this list returning the number if the count is 1 or the letter if it is greater.
That first step is a little bit tricky because as you walk through the list you need to keep track of how many you've seen so far of each letter.
When want to process a sequence and maintain some changing state as you do, you should use a fold. If you're not familiar with fold it has the following signature:
def foldLeft[B](z: B)(op: (B, A) => B): B
Note that the type of z (the initial value) has to match the type of the return value from the fold (B).
So one way to do this would be for type B to be a tuple of (outputList, seensofarCounts)
outputList would accumulate in each step by taking the next number and updating the map of how many of each numbers you've seen so far. "seensofarCounts" would be a map of the numbers and the current count.
So what you get out of the foldLeft is a tuple of (((1,1),(2,1),(3,1),(4,1),(2,2), Map(1 -> 1, 2, 2 ETC ... ))
Now you can map over that first element of the tuple as described above.
Once it's working you could avoid the last step by updating the numbers to letters as you work through the fold.
Usually this technique of breaking things into steps makes it simple to reason about, then when it's working you may see that some steps trivially collapse into each other.
Hope this helps.

Can we replace map with flatMap?

I was trying to find line with maximum words, and i wrote the following lines, to run on spark-shell:
import java.lang.Math
val counts = textFile.map(line => line.split(" ").size).reduce((a, b) => Math.max(a, b))
But since, map is one to one , and flatMap is one to either zero or anything. So i tried replacing map with flatMap, in above code. But its giving error as:
<console>:24: error: type mismatch;
found : Int
required: TraversableOnce[?]
val counts = F1.flatMap(s => s.split(" ").size).reduce((a,b)=> Math.max(a,b))
If anybody could make me understand the reason, it will really be helpful.
flatMap must return an Iterable which is clearly not what you want. You do want a map because you want to map a line to the number of words, so you want a one-to-one function that takes a line and maps it to the number of words (though you could create a collection with one element, being the size of course...).
FlatMap is meant to associate a collection to an input, for instance if you wanted to map a line to all its words you would do:
val words = textFile.flatMap(x => x.split(" "))
and that would return an RDD[String] containing all the words.
In the end, map transforms an RDD of size N into another RDD of size N (e.g. your lines to their length) whereas flatMap transforms an RDD of size N into an RDD of size P (actually an RDD of size N into an RDD of size N made of collections, all these collections are then flattened to produce the RDD of size P).
P.S.: one last word that has nothing to do with your problem, it is more efficient to do (for a string s)
val nbWords = s.split(" ").length
than call .size(). Indeed, the split method returns an array of String and arrays do not have a size method. So when you call .size() you have an implicit conversion from Array[String] to SeqLike[String] which creates new objects. But Array[T] do have a length field so there's no conversion calling length. (It's a detail but I think it's good habit though).
Any use of map can be replaced by flatMap, but the function argument has to be changed to return a single-element List: textFile.flatMap(line => List(line.split(" ").size)). This isn't a good idea: it just makes your code less understandable and less efficient.
After reading Tired of Null Pointer Exceptions? Consider Using Java SE 8's Optional!'s part about why use flatMap() rather than Map(), I have realized the truly reason why flatMap() can not replace map() is that map() is not a special case of flatMap().
It's true that flatMap() means one-to-many, but that's not the only thing flatMap() does. It can also strip outer Stream() if put it simply.
See the definations of map and flatMap:
Stream<R> map(Function<? super T, ? extends R> mapper)
Stream<R> flatMap(Function<? super T, ? extends Stream<? extends R>> mapper)
the only difference is the type of returned value in inner function. What map() returned is "Stream<'what inner function returned'>", while what flatMap() returned is just "what inner function returned".
So you can say that flatMap() can kick outer Stream() away, but map() can't. This is the most difference in my opinion, and also why map() is not just a special case of flatMap().
ps:
If you really want to make one-to-one with flatMap, then you should change it into one-to-List(one). That means you should add an outer Stream() manually which will be stripped by flatMap() later. After that you'll get the same effect as using map().(Certainly, it's clumsy. So don't do like that.)
Here are examples for Java8, but the same as Scala:
use map():
list.stream().map(line -> line.split(" ").length)
deprecated use flatMap():
list.stream().flatMap(line -> Arrays.asList(line.split(" ").length).stream())

Flattening a Set of pairs of sets to one pair of sets

I have a for-comprehension with a generator from a Set[MyType]
This MyType has a lazy val variable called factsPair which returns a pair of sets:
(Set[MyFact], Set[MyFact]).
I wish to loop through all of them and unify the facts into one flattened pair (Set[MyFact], Set[MyFact]) as follows, however I am getting No implicit view available ... and not enough arguments for flatten: implicit (asTraversable ... errors. (I am a bit new to Scala so still trying to get used to the errors).
lazy val allFacts =
(for {
mytype <- mytypeList
} yield mytype.factsPair).flatten
What do I need to specify to flatten for this to work?
Scala flatten works on same types. You have a Seq[(Set[MyFact], Set[MyFact])], which can't be flattened.
I would recommend learning the foldLeft function, because it's very general and quite easy to use as soon as you get the hang of it:
lazy val allFacts = myTypeList.foldLeft((Set[MyFact](), Set[MyFact]())) {
case (accumulator, next) =>
val pairs1 = accumulator._1 ++ next.factsPair._1
val pairs2 = accumulator._2 ++ next.factsPair._2
(pairs1, pairs2)
}
The first parameter takes the initial element it will append the other elements to. We start with an empty Tuple[Set[MyFact], Set[MyFact]] initialized like this: (Set[MyFact](), Set[MyFact]()).
Next we have to specify the function that takes the accumulator and appends the next element to it and returns with the new accumulator that has the next element in it. Because of all the tuples, it doesn't look nice, but works.
You won't be able to use flatten for this, because flatten on a collection returns a collection, and a tuple is not a collection.
You can, of course, just split, flatten, and join again:
val pairs = for {
mytype <- mytypeList
} yield mytype.factsPair
val (first, second) = pairs.unzip
val allFacts = (first.flatten, second.flatten)
A tuple isn't traverable, so you can't flatten over it. You need to return something that can be iterated over, like a List, for example:
List((1,2), (3,4)).flatten // bad
List(List(1,2), List(3,4)).flatten // good
I'd like to offer a more algebraic view. What you have here can be nicely solved using monoids. For each monoid there is a zero element and an operation to combine two elements into one.
In this case, sets for a monoid: the zero element is an empty set and the operation is a union. And if we have two monoids, their Cartesian product is also a monoid, where the operations are defined pairwise (see examples on Wikipedia).
Scalaz defines monoids for sets as well as tuples, so we don't need to do anything there. We'll just need a helper function that combines multiple monoid elements into one, which is implemented easily using folding:
def msum[A](ps: Iterable[A])(implicit m: Monoid[A]): A =
ps.foldLeft(m.zero)(m.append(_, _))
(perhaps there already is such a function in Scala, I didn't find it). Using msum we can easily define
def pairs(ps: Iterable[MyType]): (Set[MyFact], Set[MyFact]) =
msum(ps.map(_.factsPair))
using Scalaz's implicit monoids for tuples and sets.

How to get a subset of a map?

How do I get a subset of a map?
Assume we have
val m: Map[Int, String] = ...
val k: List[Int]
Where all keys in k exist in m.
Now I would like to get a subsect of the Map m with only the pairs which key is in the list k.
Something like m.intersect(k), but intersect is not defined on a map.
One way is to use filterKeys: m.filterKeys(k.contains). But this might be a bit slow, because for each key in the original map a search in the list has to be done.
Another way I could think of is k.map(l => (l, m(l)).toMap. Here wie just iterate through the keys we are really interested in and do not make a search.
Is there a better (built-in) way ?
m filterKeys k.toSet
because a Set is a Function.
On performance:
filterKeys itself is O(1), since it works by producing a new map with overridden foreach, iterator, contains and get methods. The overhead comes when elements are accessed. It means that the new map uses no extra memory, but also that memory for the old map cannot be freed.
If you need to free up the memory and have fastest possible access, a fast way would be to fold the elements of k into a new Map without producing an intermediate List[(Int,String)]:
k.foldLeft(Map[Int,String]()){ (acc, x) => acc + (x -> m(x)) }
val s = Map(k.map(x => (x, m(x))): _*)
I think this is most readable and good performer:
k zip (k map m) toMap
Or, method invocation style would be:
k.zip(k.map(m)).toMap