In my dao I receive a tuple[String,String] of which _1 is non-unique and _2 is unique. I groupBy based on _1 to get this -
val someCache : Map[String, List[(String, String)]]
This is obviously wasteful since _1 is being repeated for all values of the Map. Since _2 is unique, what I want is something like -
val someCache : Map[String, Set[String]]
i.e. group by _1 and use as key and use the paired _2s as value of type Set[String]
def foo(ts: Seq[(String, String)]): Map[String, Set[String]] = {
ts.foldLeft(Map[String, Set[String]]()) { (agg, t) =>
agg + (t._1 -> (agg.getOrElse(t._1, Set()) + t._2))
}
}
scala> foo(List(("1","2"),("1","3"),("2","3")))
res4: Map[String,Set[String]] = Map(1 -> Set(2, 3), 2 -> Set(3))
Straightforward solution is to map over all elements and convert each list to set:
someCache.map{ case (a, l) => a -> l.map{ _._2 }.toSet }
You could also use mapValues but you should note that it creates a lazy collection and performs transformation on every access to value.
Related
I want to use reducebykey but when i try to use it, it show error:
type miss match required Nothing
question: How can I create a custom function for reducebykey?
{(key,value)}
key:string
value: map
example:
rdd = {("a", "weight"->1), ("a", "weight"->2)}
expect{("a"->3)}
def combine(x: mutable.map[string,Int],y:mutable.map[string,Int]):mutable.map[String,Int]={
x.weight = x.weithg+y.weight
x
}
rdd.reducebykey((x,y)=>combine(x,y))
Lets say you have a RDD[(K, V)] (or PairRDD[K, V] to be more accurate) and you want to somehow combine values with same key then you can use reduceByKey which expects a function (V, V) => V and gives you the modified RDD[(K, V)] (or PairRDD[K, V])
Here, your rdd = {("a", "weight"->1), ("a", "weight"->2)} is not real Scala and similary the whole combine function is wrong both syntactically and logically (it will not compile). But I am guessing that what you have is something like following,
val rdd = sc.parallelize(List(
("a", "weight"->1),
("a", "weight"->2)
))
Which means that your rdd is of type RDD[(String, (String, Int))] or PairRDD[String, (String, Int)] which means that reduceByKey wants a function of type ((String, Int), (String, Int)) => (String, Int).
def combine(x: (String, Int), y: (String, Int])): (String, Int) =
(x._1, x._2 + y._2)
val rdd2 = rdd.reducebykey(combine)
If your problem is something else then please update the question to share your problem with real code, so that others can actually understand it.
I am not able to understand the functioning of flatMap function on Map objects.
You use flatMap if you want to flatten your result from map-function.
keep in mind:
flatMap(something)
is identical to
map(something).flatten
I think it is good question, cause map cannot be flatten as other collections. First off all we should look at the signature of this method:
def flatMap[B](f: (A) ⇒ GenTraversableOnce[B]): Map[B]
So, the documentation says that it should return Map, but it is not true, cause it can return any GenTraversableOnce and not only it. We can see it in the provided examples:
def getWords(lines: Seq[String]): Seq[String] = lines flatMap (line => line split "\\W+")
// lettersOf will return a Seq[Char] of likely repeated letters, instead of a Set
def lettersOf(words: Seq[String]) = words flatMap (word => word.toSet)
// lettersOf will return a Set[Char], not a Seq
def lettersOf(words: Seq[String]) = words.toSet flatMap (word => word.toSeq)
// xs will be an Iterable[Int]
val xs = Map("a" -> List(11,111), "b" -> List(22,222)).flatMap(_._2)
// ys will be a Map[Int, Int]
val ys = Map("a" -> List(1 -> 11,1 -> 111), "b" -> List(2 -> 22,2 -> 222)).flatMap(_._2)
So let look at the full signature:
def flatMap[B, That](f: ((K, V)) ⇒ GenTraversableOnce[B])(implicit bf: CanBuildFrom[Map[K, V], B, That]): That
Now we see it returns That - something that implicit CanBuildFrom can provide for us.
You can find many explanation how CanBuildFrom works.
But the main idea is put some function from your key -> value pair to GenTraversableOnce, it can be some Map, Seq or even Option and it will be mapped and flattened. Also you can provide your own CanBuildFrom.
If you have a value or a key in the Map having a list then it can be flatMap-ed.
Example:
val a = Map(1->List(1,2),2->List(2,3))
a.map(_._2) gives List(List(1,2),List(2,3))
you can flatten this using flatMap => a.flatMap(_._2) or a.map(_._2).flatten gives List(1,2,2,3)
src: http://www.scala-lang.org/old/node/12158.html
Not sure about any other way of using flatMap on a Map though.
How can a list of words be counted into a Map structure where Int is the count and String is the current word.
I'm attempting to use a fold for this but this is closest I've got :
val links = List("word1" , "word2" , "word3")
links.fold(Map.empty[String, Int]) ((count : Int, word : String) => count + (word -> (count.getOrElse(word, 0) + 1)))
Which causes error :
value getOrElse is not a member of Int
If you take a look at the signature of fold, you can see that
links.fold(Map.empty[String, Int]) ((count : Int, word : String) => ???)
won't compile
fold on List[A] has type fold[A1 >: A](z: A1)(op: (A1, A1) ⇒ A1): A1
That's not something you can use; Map.empty[String, Int] is not a subtype of String
What you need is foldLeft: foldLeft[B](z: B)(op: (B, A) ⇒ B): B
Your A is String. Your B is Map[String, Int], but then in your second parameter list you have (Int, String) => ??? That doesn't conform to the signature. It should be (Map[String, Int], String) => Map[String, Int]
A solution immediate presents itself:
(map: Map[String, Int], next : String) => map + (next, map.get(next).getOrElse(0) + 1)
Putting it all together, you'll have
links.foldLeft(Map.empty[String, Int])
((map: Map[String, Int], next : String) => map + (next, map.get(next).getOrElse(0) + 1))
Maybe not the most efficient, but the clearest way for me would be:
val grouped = links groupBy { identity } // Map[String, List[String]]
val summed = grouped mapValues { _.length } // Map[String, Int]
println(grouped) // Map(word2 -> List(word2, word2), word1 -> List(word1))
println(summed) // Map(word2 -> 2, word1 -> 1)
You need to use a foldLeft:
val links = List("word1" , "word2" , "word3", "word3")
val wordCount = links.foldLeft(Map.empty[String, Int])((map, word) => map + (word -> (map.getOrElse(word,0) + 1)))
This is an example where some of the abstractions of a library like cats or scalaz are useful and provide a nice solution.
We can represent a word "foo" as Map("foo" -> 1). If we can combine these maps for all our words we end up with the word count. The keyword here is combine, which is an function defined in Semigroup. We can use this function to combine all the maps of our word list together by using combineAll (which is defined in Foldable and which does the folding for you).
import cats.implicits._
val words = List("a", "a", "b", "c", "c", "c")
words.map(i => Map(i -> 1)).combineAll
// Map[String,Int] = Map(b -> 1, a -> 2, c -> 3)
Or in one step using foldMap :
words.foldMap(i => Map(i -> 1))
// Map[String,Int] = Map(b -> 1, a -> 2, c -> 3)
Can someone maybe explain to me why I get a compiler error below and the best way to do this type of conversion
Thanks
Des
case class A[T](i: Int, x: T)
val set: Set[A[_]] = Set(A(1, 'x'), A(2, 3))
val map: Map[Int, A[_]] = set.map(a => a.i -> a)
type mismatch; found : scala.collection.immutable.Set[(Int, A[_$19]) forSome { type _$19 }] required: Map[Int,A[_]]
There are a couple of things here, first I suppose that A is a case class (otherwise you would need to use the new keyword), second your map returns a Set of tuples (not a Map), third you are returning a A[_] in the map, but a.x returns an Any, not an A:
scala> case class A[T](i: Int, x: T)
defined class A
scala> val set: Set[A[_]] = Set(A(1, 'x'), A(2, 3))
set: Set[A[_]] = Set(A(1,x), A(2,3))
To match the type signature you can use toMap and change Map[Int, A[_]] to Map[Int, _]
scala> val map: Map[Int, _] = set.map(a => a.i -> a.x).toMap
map: Map[Int, _] = Map(1 -> x, 2 -> 3)
If you want to keep the original signature (Map[Int, A[_]]) you need to return an A in the tuple:
scala> val map: Map[Int, A[_]] = set.map(a => a.i -> a).toMap
map: Map[Int,A[_]] = Map(1 -> A(1,x), 2 -> A(2,3))
What would be a functional way to zip two dictionaries in Scala?
map1 = new HashMap("A"->1,"B"->2)
map2 = new HashMap("B"->22,"D"->4) // B is the only common key
zipper(map1,map2) should give something similar to
Seq( ("A",1,0), // no A in second map, so third value is zero
("B",2,22),
("D",0,4)) // no D in first map, so second value is zero
If not functional, any other style is also appreciated
def zipper(map1: Map[String, Int], map2: Map[String, Int]) = {
for(key <- map1.keys ++ map2.keys)
yield (key, map1.getOrElse(key, 0), map2.getOrElse(key, 0))
}
scala> val map1 = scala.collection.immutable.HashMap("A" -> 1, "B" -> 2)
map1: scala.collection.immutable.HashMap[String,Int] = Map(A -> 1, B -> 2)
scala> val map2 = scala.collection.immutable.HashMap("B" -> 22, "D" -> 4)
map2: scala.collection.immutable.HashMap[String,Int] = Map(B -> 22, D -> 4)
scala> :load Zipper.scala
Loading Zipper.scala...
zipper: (map1: Map[String,Int], map2: Map[String,Int])Iterable[(String, Int, Int)]
scala> zipper(map1, map2)
res1: Iterable[(String, Int, Int)] = Set((A,1,0), (B,2,22), (D,0,4))
Note using get is probably preferable to getOrElse in this case. None is used to specify that a value does not exist instead of using 0.
As an alternative to Brian's answer, this can be used to enhance the map class by way of implicit methods:
implicit class MapUtils[K, +V](map: collection.Map[K, V]) {
def zipAllByKey[B >: V, C >: V](that: collection.Map[K, C], thisElem: B, thatElem: C): Iterable[(K, B, C)] =
for (key <- map.keys ++ that.keys)
yield (key, map.getOrElse(key, thisElem), that.getOrElse(key, thatElem))
}
The naming and API are similar to the sequence zipAll.