Convert List[(Int,String)] into List[Int] in scala - scala

My goal is to to map every word in a text (Index, line) to a list containing the indices of every line the word occurs in. I managed to write a function that returns a list of all words assigned to a index.
The following function should do the rest (map a list of indices to every word):
def mapIndicesToWords(l:List[(Int,String)]):Map[String,List[Int]] = ???
If I do this:
l.groupBy(x => x._2)
it returns a Map[String, List[(Int,String)]. Now I just want to change the value to type List[Int].
I thought of using .mapValues(...) and fold the list somehow, but I'm new to scala and don't know the correct approach for this.
So how do I convert the list?

Also you can use foldLeft, you need just specify accumulator (in your case Map[String, List[Int]]), which will be returned as a result, and write some logic inside. Here is my implementation.
def mapIndicesToWords(l:List[(Int,String)]): Map[String,List[Int]] =
l.foldLeft(Map[String, List[Int]]())((map, entry) =>
map.get(entry._2) match {
case Some(list) => map + (entry._2 -> (entry._1 :: list))
case None => map + (entry._2 -> List(entry._1))
}
)
But with foldLeft, elements of list will be in reversed order, so you can use foldRight. Just change foldLeft to foldRight and swap input parameters, (map, entry) to (entry, map).
And be careful, foldRight works 2 times slower. It is implemented using method reverse list and foldLeft.

scala> val myMap: Map[String,List[(Int, String)]] = Map("a" -> List((1,"line1"), (2, "line")))
myMap: Map[String,List[(Int, String)]] = Map(a -> List((1,line1), (2,line)))
scala> myMap.mapValues(lst => lst.map(pair => pair._1))
res0: scala.collection.immutable.Map[String,List[Int]] = Map(a -> List(1, 2))

Related

Scala concatenate maps from a list

Having val mapList: List[Map[String, Int]], I want to do something like:
val map = mapList foldLeft (Map[String, Int]()) ( _ ++ _ )
or
val map = mapList foldLeft (Map[String, Int]())
( (m1: Map[String, Int], m2: Map[String, Int]) => m1 ++ m2 )
Neither option is compiled (first says "missing parameter type for expanded function (x, y) => x ++ y" and second says "type mismatch; found (Map[String, Int], Map[String, Int]) => Map[String, Int]; required: String").
I want to achieve a classical solution for concatenating a list of immutable maps such as List( Map("apple" -> 5, "pear" -> 7), Map("pear" -> 3, "apricot" -> 0) ) would produce a Map("apple" -> 5, "pear" -> 10, "apricot" -> 0).
Using scala 2.10.5.
You need to add a dot before foldLeft. You can only use spaces instead of dots under specialized conditions, such as for methods with exactly 1 parameter (arity-1 methods):
val map = mapList.foldLeft(Map[String, Int]()) ( _ ++ _ )
You can read more about method invocation best practices here.
You might also be interested in the reduce methods, which are specialized versions of the fold methods, where the return type is the same as the type of the elements of the collection. For example reduceLeft uses the first element of the collection as a seed for the foldLeft. Of course, since this relies on the first element's existence, it will throw an exception if the collection is empty. Since reduceLeft takes only 1 parameter, you can more easily use a space to invoke the method:
mapList.reduceLeft( _ ++ _)
mapList reduceLeft(_ ++ _)
Finally, you should note that all you are doing here is merging the maps. When using ++ to merge the maps, you will just override keys that are already present in the map – you won't be adding the values of duplicate keys. If you wanted to do that, you could follow the answers provided here, and apply them to the foldLeft or reduceLeft. For example:
mapList reduceLeft { (acc, next) =>
(acc.toList ++ next.toList).groupBy(_._1).toMap.mapValues(_.map(_._2).sum)
}
Or slightly differently:
mapList.map(_.toSeq).reduceLeft(_ ++ _).groupBy(_._1).toMap.mapValues(_.map(_._2).sum)
And, if you're using Scalaz, then most concisely:
mapList reduceLeft { _ |+| _ }

In Scala, is there an equivalent of Haskell's "fromListWith" for Map?

In Haskell, there is a function called fromListWith which can generate a Map from a function (used to merge values with the same key) and a list:
fromListWith :: Ord k => (a -> a -> a) -> [(k, a)] -> Map k a
The following expression will be evaluated to true:
fromListWith (++) [(5,"a"), (5,"b"), (3,"b"), (3,"a"), (5,"a")] == fromList [(3, "ab"), (5, "aba")]
In Scala, there is a similar function called toMap on List objects , which can also convert a list to a Map, but it can't have a parameter of function to deal with duplicated keys.
Does anyone have ideas about this?
Apart from using scalaz you could also define one yourself:
implicit class ListToMapWith[K, V](list: List[(K, V)]) {
def toMapWith(op: (V, V) => V) =
list groupBy (_._1) mapValues (_ map (_._2) reduce op)
}
Here is a usage example:
scala> val testList = List((5,"a"), (5,"b"), (3,"b"), (3,"a"), (5,"a"))
scala> testList toMapWith (_ + _)
res1: scala.collection.immutable.Map[Int,String] = Map(5 -> aba, 3 -> ba)
The stdlib doesn't have such a feature, however, there is a port of Data.Map available in scalaz that does have this function available.

Scala: Count Words

I am trying to write the function of countWords(ws) that counts the frequency of words in a list of words ws returning a map from words to occurrences.
that ws is a List[String], using the List data type I should produce a Map[String,Int] using the Map data type. an example of what should the function do:
def test{
expect (Map("aa" -> 2, "bb" -> 1)) {
countWords(List("aa", "bb"))
}
}
This is just a perpetration for a test and its not an assignment. I have been stuck on this function for while now. This is what I have so far:
object Solution {
// define function countWords
def countWords(ws : List[String]) : Map[String,Int] = ws match {
case List() => List()
}
}//
which gives type mismatch. I am not quite sure how to use the scala Map Function, for example when ws is Empty list what should it return that passed by Map[String,Int] I have been trying, and thats why I post it here to get some help. thank you.
Another way to do it is using groupBy which outputs Map(baz -> List(baz, baz, baz), foo -> List(foo, foo), bar -> List(bar)). Then you can map the values of the Map with mapValues to get a count of the number of times each word appears.
scala> List("foo", "foo", "bar", "baz", "baz", "baz")
res0: List[String] = List(foo, foo, bar, baz, baz, baz)
scala> res0.groupBy(x => x).mapValues(_.size)
res0: scala.collection.immutable.Map[String,Int] = Map(baz -> 3, foo -> 2, bar -> 1)
Regarding the type mismatch in your program countWords is expecting a Map[String, Int] as the return type and the first(and only) match you have returns an empty List with type Nothing. If you change the match to case List() => Map[String, Int]() it will no longer give a type error. It also gives a warning about an in-exhaustive pattern match obviously won't return the correct output.
The easiest solution is this
def countWords(ws: List[String]): Map[String, Int] = {
ws.toSet.map((word: String) => (word, ws.count(_ == word))).toMap
}
But it's not the fastest one since it searches through the list several times.
edit:
The fastest way is to use a mutable HashMap
def countWords(ws: List[String]): Map[String, Int] = {
val map = scala.collection.mutable.HashMap.empty[String, Int]
for(word <- ws) {
val n = map.getOrElse(word, 0)
map += (word -> (n + 1))
}
map.toMap
}
Use fold to go through your list starting with an empty map
ws.foldLeft(Map.empty[String, Int]){
(count, word) => count + (word -> (count.getOrElse(word, 0) + 1))
}

Create a Map of Iterables only using immutable collections

I have an iterable val pairs: Iterable[Pair[Key, Value]], that has some key=>value pairs.
Now, I want to create a Map[Key, Iterable[Value]], that has for each key an Iterable of all values of given key in pairs. (I don't actually need a Seq, any Iterable is fine).
I can do it using mutable Map and/or using mutable ListBuffers.
However, everyone tells me that the "right" scala is without using mutable collections. So, is it possible to do this only with immutable collections? (for example, with using map, foldLeft, etc.)
I have found out a really simple way to do this
pairs.groupBy{_._1}.mapValues{_.map{_._2}}
And that's it.
Anything that you can do with a non-cyclic mutable data structure you can also do with an immutable data structure. The trick is pretty simple:
loop -> recursion or fold
mutating operation -> new-copy-with-change-made operation
So, for example, in your case you're probably looping through the Iterable and adding a value each time. If we apply our handy trick, we
def mkMap[K,V](data: Iterable[(K,V)]): Map[K, Iterable[V]] = {
#annotation.tailrec def mkMapInner(
data: Iterator[(K,V)],
map: Map[K,Vector[V]] = Map.empty[K,Vector[V]]
): Map[K,Vector[V]] = {
if (data.hasNext) {
val (k,v) = data.next
mkMapInner(data, map + (k -> map.get(k).map(_ :+ v).getOrElse(Vector(v))))
}
else map
}
mkMapInner(data.iterator)
}
Here I've chosen to implement the loop-replacement by declaring a recursive inner method (with #annotation.tailrec to check that the recursion is optimized to a while loop so it won't break the stack)
Let's test it out:
val pairs = Iterable((1,"flounder"),(2,"salmon"),(1,"halibut"))
scala> mkMap(pairs)
res2: Map[Int,Iterable[java.lang.String]] =
Map(1 -> Vector(flounder, halibut), 2 -> Vector(salmon))
Now, it turns out that Scala's collection libraries also contain something useful for this:
scala> pairs.groupBy(_._1).mapValues{ _.map{_._2 } }
with the groupBy being the key method, and the rest cleaning up what it produces into the form you want.
For the record, you can write this pretty cleanly with a fold. I'm going to assume that your Pair is the one in the standard library (aka Tuple2):
pairs.foldLeft(Map.empty[Key, Seq[Value]]) {
case (m, (k, v)) => m.updated(k, m.getOrElse(k, Seq.empty) :+ v)
}
Although of course in this case the groupBy approach is more convenient.
val ps = collection.mutable.ListBuffer(1 -> 2, 3 -> 4, 1 -> 5)
ps.groupBy(_._1).mapValues(_ map (_._2))
// = Map(1 -> ListBuffer(2, 5), 3 -> ListBuffer(4))
This gives a mutable ListBuffer in the output map. If you want your output to be immutable (not sure if this is quite what you're asking), use collection.breakOut:
ps.groupBy(_._1).mapValues(_.map(_._2)(collection.breakOut))
// = Map(1 -> Vector(2, 5), 3 -> Vector(4))
It seems like Vector is the default for breakOut, but to be sure, you can specify the return type on the left hand side: val myMap: Map[Int,Vector[Int]] = ....
More info on breakOut here.
As a method:
def immutableGroup[A,B](xs: Traversable[(A,B)]): Map[A,Vector[B]] =
xs.groupBy(_._1).mapValues(_.map(_._2)(collection.breakOut))
I perform this function so often that I have an implicit written called groupByKey that does precisely this:
class EnrichedWithGroupByKey[A, Repr <: Traversable[A]](self: TraversableLike[A, Repr]) {
def groupByKey[T, U, That](implicit ev: A <:< (T, U), bf: CanBuildFrom[Repr, U, That]): Map[T, That] =
self.groupBy(_._1).map { case (k, vs) => k -> (bf(self.asInstanceOf[Repr]) ++= vs.map(_._2)).result }
}
implicit def enrichWithGroupByKey[A, Repr <: Traversable[A]](self: TraversableLike[A, Repr]) = new EnrichedWithGroupByKey[A, Repr](self)
And you use it like this:
scala> List(("a", 1), ("b", 2), ("b", 3), ("a", 4)).groupByKey
res0: Map[java.lang.String,List[Int]] = Map(a -> List(1, 4), b -> List(2, 3))
Note that I use .map { case (k, vs) => k -> ... } instead of mapValues because mapValues creates a view, instead of just performing the map immediately. If you plan on accessing those values many times, you'll want to avoid the view approach because it will mean recomputing the .map(_._2) every time.

Scala best way of turning a Collection into a Map-by-key?

If I have a collection c of type T and there is a property p on T (of type P, say), what is the best way to do a map-by-extracting-key?
val c: Collection[T]
val m: Map[P, T]
One way is the following:
m = new HashMap[P, T]
c foreach { t => m add (t.getP, t) }
But now I need a mutable map. Is there a better way of doing this so that it's in 1 line and I end up with an immutable Map? (Obviously I could turn the above into a simple library utility, as I would in Java, but I suspect that in Scala there is no need)
You can use
c map (t => t.getP -> t) toMap
but be aware that this needs 2 traversals.
You can construct a Map with a variable number of tuples. So use the map method on the collection to convert it into a collection of tuples and then use the : _* trick to convert the result into a variable argument.
scala> val list = List("this", "maps", "string", "to", "length") map {s => (s, s.length)}
list: List[(java.lang.String, Int)] = List((this,4), (maps,4), (string,6), (to,2), (length,6))
scala> val list = List("this", "is", "a", "bunch", "of", "strings")
list: List[java.lang.String] = List(this, is, a, bunch, of, strings)
scala> val string2Length = Map(list map {s => (s, s.length)} : _*)
string2Length: scala.collection.immutable.Map[java.lang.String,Int] = Map(strings -> 7, of -> 2, bunch -> 5, a -> 1, is -> 2, this -> 4)
In addition to #James Iry's solution, it is also possible to accomplish this using a fold. I suspect that this solution is slightly faster than the tuple method (fewer garbage objects are created):
val list = List("this", "maps", "string", "to", "length")
val map = list.foldLeft(Map[String, Int]()) { (m, s) => m(s) = s.length }
This can be implemented immutably and with a single traversal by folding through the collection as follows.
val map = c.foldLeft(Map[P, T]()) { (m, t) => m + (t.getP -> t) }
The solution works because adding to an immutable Map returns a new immutable Map with the additional entry and this value serves as the accumulator through the fold operation.
The tradeoff here is the simplicity of the code versus its efficiency. So, for large collections, this approach may be more suitable than using 2 traversal implementations such as applying map and toMap.
Another solution (might not work for all types)
import scala.collection.breakOut
val m:Map[P, T] = c.map(t => (t.getP, t))(breakOut)
this avoids the creation of the intermediary list, more info here:
Scala 2.8 breakOut
What you're trying to achieve is a bit undefined.
What if two or more items in c share the same p? Which item will be mapped to that p in the map?
The more accurate way of looking at this is yielding a map between p and all c items that have it:
val m: Map[P, Collection[T]]
This could be easily achieved with groupBy:
val m: Map[P, Collection[T]] = c.groupBy(t => t.p)
If you still want the original map, you can, for instance, map p to the first t that has it:
val m: Map[P, T] = c.groupBy(t => t.p) map { case (p, ts) => p -> ts.head }
Scala 2.13+
instead of "breakOut" you could use
c.map(t => (t.getP, t)).to(Map)
Scroll to "View": https://www.scala-lang.org/blog/2017/02/28/collections-rework.html
This is probably not the most efficient way to turn a list to map, but it makes the calling code more readable. I used implicit conversions to add a mapBy method to List:
implicit def list2ListWithMapBy[T](list: List[T]): ListWithMapBy[T] = {
new ListWithMapBy(list)
}
class ListWithMapBy[V](list: List[V]){
def mapBy[K](keyFunc: V => K) = {
list.map(a => keyFunc(a) -> a).toMap
}
}
Calling code example:
val list = List("A", "AA", "AAA")
list.mapBy(_.length) //Map(1 -> A, 2 -> AA, 3 -> AAA)
Note that because of the implicit conversion, the caller code needs to import scala's implicitConversions.
c map (_.getP) zip c
Works well and is very intuitiv
How about using zip and toMap?
myList.zip(myList.map(_.length)).toMap
For what it's worth, here are two pointless ways of doing it:
scala> case class Foo(bar: Int)
defined class Foo
scala> import scalaz._, Scalaz._
import scalaz._
import Scalaz._
scala> val c = Vector(Foo(9), Foo(11))
c: scala.collection.immutable.Vector[Foo] = Vector(Foo(9), Foo(11))
scala> c.map(((_: Foo).bar) &&& identity).toMap
res30: scala.collection.immutable.Map[Int,Foo] = Map(9 -> Foo(9), 11 -> Foo(11))
scala> c.map(((_: Foo).bar) >>= (Pair.apply[Int, Foo] _).curried).toMap
res31: scala.collection.immutable.Map[Int,Foo] = Map(9 -> Foo(9), 11 -> Foo(11))
This works for me:
val personsMap = persons.foldLeft(scala.collection.mutable.Map[Int, PersonDTO]()) {
(m, p) => m(p.id) = p; m
}
The Map has to be mutable and the Map has to be return since adding to a mutable Map does not return a map.
use map() on collection followed with toMap
val map = list.map(e => (e, e.length)).toMap