Word count using fold - scala

How can a list of words be counted into a Map structure where Int is the count and String is the current word.
I'm attempting to use a fold for this but this is closest I've got :
val links = List("word1" , "word2" , "word3")
links.fold(Map.empty[String, Int]) ((count : Int, word : String) => count + (word -> (count.getOrElse(word, 0) + 1)))
Which causes error :
value getOrElse is not a member of Int

If you take a look at the signature of fold, you can see that
links.fold(Map.empty[String, Int]) ((count : Int, word : String) => ???)
won't compile
fold on List[A] has type fold[A1 >: A](z: A1)(op: (A1, A1) ⇒ A1): A1
That's not something you can use; Map.empty[String, Int] is not a subtype of String
What you need is foldLeft: foldLeft[B](z: B)(op: (B, A) ⇒ B): B
Your A is String. Your B is Map[String, Int], but then in your second parameter list you have (Int, String) => ??? That doesn't conform to the signature. It should be (Map[String, Int], String) => Map[String, Int]
A solution immediate presents itself:
(map: Map[String, Int], next : String) => map + (next, map.get(next).getOrElse(0) + 1)
Putting it all together, you'll have
links.foldLeft(Map.empty[String, Int])
((map: Map[String, Int], next : String) => map + (next, map.get(next).getOrElse(0) + 1))

Maybe not the most efficient, but the clearest way for me would be:
val grouped = links groupBy { identity } // Map[String, List[String]]
val summed = grouped mapValues { _.length } // Map[String, Int]
println(grouped) // Map(word2 -> List(word2, word2), word1 -> List(word1))
println(summed) // Map(word2 -> 2, word1 -> 1)

You need to use a foldLeft:
val links = List("word1" , "word2" , "word3", "word3")
val wordCount = links.foldLeft(Map.empty[String, Int])((map, word) => map + (word -> (map.getOrElse(word,0) + 1)))

This is an example where some of the abstractions of a library like cats or scalaz are useful and provide a nice solution.
We can represent a word "foo" as Map("foo" -> 1). If we can combine these maps for all our words we end up with the word count. The keyword here is combine, which is an function defined in Semigroup. We can use this function to combine all the maps of our word list together by using combineAll (which is defined in Foldable and which does the folding for you).
import cats.implicits._
val words = List("a", "a", "b", "c", "c", "c")
words.map(i => Map(i -> 1)).combineAll
// Map[String,Int] = Map(b -> 1, a -> 2, c -> 3)
Or in one step using foldMap :
words.foldMap(i => Map(i -> 1))
// Map[String,Int] = Map(b -> 1, a -> 2, c -> 3)

Related

Calculate the number of occurrences of letters, and put the dictionaries in the list

I'm trying to count occurences of letters in the Stream of strings, and then put maps for each string ( "letter"-> count) in the List.
def checksum(ipt: Stream[String]) = ipt.foldLeft(List(Map("x"->1)))( (n:
List[Map[String, Int]], m: String) => n ++
m.split("").groupBy(identity).mapValues(_.size).toMap)
It gives problem:
Expression of type List[Equals] doesn't conform to expected type List[Map[String, Int]]
What's wrong? Like there is no problem with doing it for each string:
def checksum(ipt: Stream[String]) = ipt.foreach( (m: String) => println(m.split("").groupBy(identity).mapValues(_.size)))
It gives something like that on
val s = "bababc"
val d = "abbcde"
checksum(List(s,d).toStream)
out:
Map(b -> 3, a -> 2, c -> 1)
Map(e -> 1, a -> 1, b -> 2, c -> 1, d -> 1)
But how do i stash all this maps in the List now? I can't use vars and need to do it in one expression.
if you need a map for each string, you can achieve it with a map function over the first stream as follows:
def checksums(ipt: Stream[String]): Stream[Map[Char, Int]] = {
ipt.map(checksum)
}
def checksum(ipt: String): Map[Char, Int] = ipt.foldLeft(Map.empty[Char, Int]) { case (acc, ch) =>
acc.get(ch) match {
case Some(q) => acc + (ch -> (q + 1))
case None => acc + (ch -> 1)
}
}
Going back to your code, the operator to add an element to a List is :+, not ++.
++ is used to concatenate lists.
So you can fix your code like this:
def checksumFixed(ipt: Stream[String]) = {
ipt.foldLeft(List(Map("x"->1))) { (n: List[Map[String, Int]], m: String) =>
n :+ m.split("").groupBy(identity).mapValues(_.length)
}
}

Zip two HashMaps(or dictionaries)

What would be a functional way to zip two dictionaries in Scala?
map1 = new HashMap("A"->1,"B"->2)
map2 = new HashMap("B"->22,"D"->4) // B is the only common key
zipper(map1,map2) should give something similar to
Seq( ("A",1,0), // no A in second map, so third value is zero
("B",2,22),
("D",0,4)) // no D in first map, so second value is zero
If not functional, any other style is also appreciated
def zipper(map1: Map[String, Int], map2: Map[String, Int]) = {
for(key <- map1.keys ++ map2.keys)
yield (key, map1.getOrElse(key, 0), map2.getOrElse(key, 0))
}
scala> val map1 = scala.collection.immutable.HashMap("A" -> 1, "B" -> 2)
map1: scala.collection.immutable.HashMap[String,Int] = Map(A -> 1, B -> 2)
scala> val map2 = scala.collection.immutable.HashMap("B" -> 22, "D" -> 4)
map2: scala.collection.immutable.HashMap[String,Int] = Map(B -> 22, D -> 4)
scala> :load Zipper.scala
Loading Zipper.scala...
zipper: (map1: Map[String,Int], map2: Map[String,Int])Iterable[(String, Int, Int)]
scala> zipper(map1, map2)
res1: Iterable[(String, Int, Int)] = Set((A,1,0), (B,2,22), (D,0,4))
Note using get is probably preferable to getOrElse in this case. None is used to specify that a value does not exist instead of using 0.
As an alternative to Brian's answer, this can be used to enhance the map class by way of implicit methods:
implicit class MapUtils[K, +V](map: collection.Map[K, V]) {
def zipAllByKey[B >: V, C >: V](that: collection.Map[K, C], thisElem: B, thatElem: C): Iterable[(K, B, C)] =
for (key <- map.keys ++ that.keys)
yield (key, map.getOrElse(key, thisElem), that.getOrElse(key, thatElem))
}
The naming and API are similar to the sequence zipAll.

Use map elements as method arguments

I've been stuck on this for a while now: I have this method
Method(a: (A,B)*): Unit
and I have a map of type Map[A,B], is there a way to convert this map so that I can directly use it as an argument?
Something like:
Method(map.convert)
Thank you.
You can build a sequence of pairs from a Map by using .toSeq.
You can also pass a sequence of type Seq[T] as varargs by "casting" it with : _*.
Chain the conversions to achieve what you want:
scala> val m = Map('a' -> 1, 'b' -> 2, 'z' -> 26)
m: scala.collection.immutable.Map[Char,Int] = Map(a -> 1, b -> 2, z -> 26)
scala> def foo[A,B](pairs : (A,B)*) = pairs.foreach(println)
foo: [A, B](pairs: (A, B)*)Unit
scala> foo(m.toSeq : _*)
(a,1)
(b,2)
(z,26)
I'm not sure but you can try converting map into array of tuples and then cast it to vararg type, like so:
Method(map.toArray: _*)
Just convert the map to a sequence of pairs using ".toSeq" and pass it to the var-args method postfixed with ":_*" (which is Scala syntax to allow passing a sequence as arguments to a var-args method).
Example:
def m(a: (String, Int)*) { for ((k, v) <- a) println(k+"->"+v) }
val x= Map("a" -> 1, "b" -> 2)
m(x.toSeq:_*)
In repl:
scala> def m(a: (String, Int)*) { for ((k, v) <- a) println(k+"->"+v) }
m: (a: (String, Int)*)Unit
scala> val x= Map("a" -> 1, "b" -> 2)
x: scala.collection.immutable.Map[java.lang.String,Int] = Map(a -> 1, b -> 2)
scala> m(x.toSeq:_*)
a->1
b->2

Cleaner tuple groupBy

I have a sequence of key-value pairs (String, Int), and I want to group them by key into a sequence of values (i.e. Seq[(String, Int)]) => Map[String, Iterable[Int]])).
Obviously, toMap isn't useful here, and groupBy maintains the values as tuples. The best I managed to come up with is:
val seq: Seq[( String, Int )]
// ...
seq.groupBy( _._1 ).mapValues( _.map( _._2 ) )
Is there a cleaner way of doing this?
Here's a pimp that adds a toMultiMap method to traversables. Would it solve your problem?
import collection._
import mutable.Builder
import generic.CanBuildFrom
class TraversableOnceExt[CC, A](coll: CC, asTraversable: CC => TraversableOnce[A]) {
def toMultiMap[T, U, That](implicit ev: A <:< (T, U), cbf: CanBuildFrom[CC, U, That]): immutable.Map[T, That] =
toMultiMapBy(ev)
def toMultiMapBy[T, U, That](f: A => (T, U))(implicit cbf: CanBuildFrom[CC, U, That]): immutable.Map[T, That] = {
val mutMap = mutable.Map.empty[T, mutable.Builder[U, That]]
for (x <- asTraversable(coll)) {
val (key, value) = f(x)
val builder = mutMap.getOrElseUpdate(key, cbf(coll))
builder += value
}
val mapBuilder = immutable.Map.newBuilder[T, That]
for ((k, v) <- mutMap)
mapBuilder += ((k, v.result))
mapBuilder.result
}
}
implicit def commomExtendTraversable[A, C[A] <: TraversableOnce[A]](coll: C[A]): TraversableOnceExt[C[A], A] =
new TraversableOnceExt[C[A], A](coll, identity)
Which can be used like this:
val map = List(1 -> 'a', 1 -> 'à', 2 -> 'b').toMultiMap
println(map) // Map(1 -> List(a, à), 2 -> List(b))
val byFirstLetter = Set("abc", "aeiou", "cdef").toMultiMapBy(elem => (elem.head, elem))
println(byFirstLetter) // Map(c -> Set(cdef), a -> Set(abc, aeiou))
If you add the following implicit defs, it will also work with collection-like objects such as Strings and Arrays:
implicit def commomExtendStringTraversable(string: String): TraversableOnceExt[String, Char] =
new TraversableOnceExt[String, Char](string, implicitly)
implicit def commomExtendArrayTraversable[A](array: Array[A]): TraversableOnceExt[Array[A], A] =
new TraversableOnceExt[Array[A], A](array, implicitly)
Then:
val withArrays = Array(1 -> 'a', 1 -> 'à', 2 -> 'b').toMultiMap
println(withArrays) // Map(1 -> [C#377653ae, 2 -> [C#396fe0f4)
val byLowercaseCode = "Mama".toMultiMapBy(c => (c.toLower.toInt, c))
println(byLowercaseCode) // Map(97 -> aa, 109 -> Mm)
There's no method or data structure in the standard library to do this, and your solution looks about as concise as you'll get. If you use this in more than one place, you might like to factor it out into a utility method
def groupTuples[A, B](seq: Seq[(A, B)]) =
seq groupBy (_._1) mapValues (_ map (_._2))
which you then obviously just call with groupTuples(seq). This might not be the most efficient possible in terms of CPU clock cycles, but I don't think it's particularly inefficient either.
I did a rough benchmark against Jean-Philippe's solution on a list of 9 tuples and this is marginally faster. Both were about twice as fast as folding the sequence into a map (effectively re-implementing groupBy to give the output you want).
I don't know if you consider it cleaner:
seq.groupBy(_._1).map { case (k,v) => (k,v.map(_._2))}
Starting Scala 2.13, most collections are provided with the groupMap method which is (as its name suggests) an equivalent (more efficient) of a groupBy followed by mapValues:
List(1 -> 'a', 1 -> 'b', 2 -> 'c').groupMap(_._1)(_._2)
// Map[Int,List[Char]] = Map(2 -> List(c), 1 -> List(a, b))
This:
groups elements based on the first part of tuples (Map(2 -> List((2,c)), 1 -> List((1,a), (1,b))))
maps grouped values (List((1,a), (1,b))) by taking their second tuple part (List(a, b)).

Convert List of tuple to map (and deal with duplicate key ?)

I was thinking about a nice way to convert a List of tuple with duplicate key [("a","b"),("c","d"),("a","f")] into map ("a" -> ["b", "f"], "c" -> ["d"]). Normally (in python), I'd create an empty map and for-loop over the list and check for duplicate key. But I am looking for something more scala-ish and clever solution here.
btw, actual type of key-value I use here is (Int, Node) and I want to turn into a map of (Int -> NodeSeq)
For Googlers that don't expect duplicates or are fine with the default duplicate handling policy:
List("a" -> 1, "b" -> 2, "a" -> 3).toMap
// Result: Map(a -> 3, c -> 2)
As of 2.12, the default policy reads:
Duplicate keys will be overwritten by later keys: if this is an unordered collection, which key is in the resulting map is undefined.
Group and then project:
scala> val x = List("a" -> "b", "c" -> "d", "a" -> "f")
//x: List[(java.lang.String, java.lang.String)] = List((a,b), (c,d), (a,f))
scala> x.groupBy(_._1).map { case (k,v) => (k,v.map(_._2))}
//res1: scala.collection.immutable.Map[java.lang.String,List[java.lang.String]] = Map(c -> List(d), a -> List(b, f))
More scalish way to use fold, in the way like there (skip map f step).
Here's another alternative:
x.groupBy(_._1).mapValues(_.map(_._2))
For Googlers that do care about duplicates:
implicit class Pairs[A, B](p: List[(A, B)]) {
def toMultiMap: Map[A, List[B]] = p.groupBy(_._1).mapValues(_.map(_._2))
}
> List("a" -> "b", "a" -> "c", "d" -> "e").toMultiMap
> Map("a" -> List("b", "c"), "d" -> List("e"))
Starting Scala 2.13, most collections are provided with the groupMap method which is (as its name suggests) an equivalent (more efficient) of a groupBy followed by mapValues:
List("a" -> "b", "c" -> "d", "a" -> "f").groupMap(_._1)(_._2)
// Map[String,List[String]] = Map(a -> List(b, f), c -> List(d))
This:
groups elements based on the first part of tuples (group part of groupMap)
maps grouped values by taking their second tuple part (map part of groupMap)
This is an equivalent of list.groupBy(_._1).mapValues(_.map(_._2)) but performed in one pass through the List.
Below you can find a few solutions. (GroupBy, FoldLeft, Aggregate, Spark)
val list: List[(String, String)] = List(("a","b"),("c","d"),("a","f"))
GroupBy variation
list.groupBy(_._1).map(v => (v._1, v._2.map(_._2)))
Fold Left variation
list.foldLeft[Map[String, List[String]]](Map())((acc, value) => {
acc.get(value._1).fold(acc ++ Map(value._1 -> List(value._2))){ v =>
acc ++ Map(value._1 -> (value._2 :: v))
}
})
Aggregate Variation - Similar to fold Left
list.aggregate[Map[String, List[String]]](Map())(
(acc, value) => acc.get(value._1).fold(acc ++ Map(value._1 ->
List(value._2))){ v =>
acc ++ Map(value._1 -> (value._2 :: v))
},
(l, r) => l ++ r
)
Spark Variation - For big data sets (Conversion to a RDD and to a Plain Map from RDD)
import org.apache.spark.rdd._
import org.apache.spark.{SparkContext, SparkConf}
val conf: SparkConf = new
SparkConf().setAppName("Spark").setMaster("local")
val sc: SparkContext = new SparkContext (conf)
// This gives you a rdd of the same result
val rdd: RDD[(String, List[String])] = sc.parallelize(list).combineByKey(
(value: String) => List(value),
(acc: List[String], value) => value :: acc,
(accLeft: List[String], accRight: List[String]) => accLeft ::: accRight
)
// To convert this RDD back to a Map[(String, List[String])] you can do the following
rdd.collect().toMap
Here is a more Scala idiomatic way to convert a list of tuples to a map handling duplicate keys. You want to use a fold.
val x = List("a" -> "b", "c" -> "d", "a" -> "f")
x.foldLeft(Map.empty[String, Seq[String]]) { case (acc, (k, v)) =>
acc.updated(k, acc.getOrElse(k, Seq.empty[String]) ++ Seq(v))
}
res0: scala.collection.immutable.Map[String,Seq[String]] = Map(a -> List(b, f), c -> List(d))
You can try this
scala> val b = new Array[Int](3)
// b: Array[Int] = Array(0, 0, 0)
scala> val c = b.map(x => (x -> x * 2))
// c: Array[(Int, Int)] = Array((1,2), (2,4), (3,6))
scala> val d = Map(c : _*)
// d: scala.collection.immutable.Map[Int,Int] = Map(1 -> 2, 2 -> 4, 3 -> 6)