Scala groupBy + mapValues vs. groupBy + map + breakOut - scala

Let's say I have data like this:
scala> case class Foo(a: Int, b: Int)
defined class Foo
scala> val data: List[Foo] = Foo(1,10) :: Foo(2, 20) :: Foo(3,30) :: Nil
data: List[Foo] = List(Foo(1,10), Foo(2,20), Foo(3,30))
I know that in my data, there will be no instances of Foo with the same value of field a - and I want to transform it to Map[Int, Foo] (I don't want Map[Int, List[Foo]])
I can either:
scala> val m: Map[Int,Foo] = data.groupBy(_.a).mapValues(_.head)
m: Map[Int,Foo] = Map(2 -> Foo(2,20), 1 -> Foo(1,10), 3 -> Foo(3,30))
or:
scala> val m: Map[Int,Foo] = data.groupBy(_.a).map(e => e._1 -> e._2.head)(collection.breakOut)
m: Map[Int,Foo] = Map(2 -> Foo(2,20), 1 -> Foo(1,10), 3 -> Foo(3,30))
My questions:
1) How could I make the implementation with breakOut more concise / idiomatic?
2) What should I be aware of "under the covers" in each of the above-two solutions? I.e. hidden memory / compute costs. In particular, I am looking for a "layperson's" explanation of breakOut that does not necessarily involve an in-depth discussion of the signature of map.
3) Are there any other solutions I should be aware of (including, for example, using libraries such as ScalaZ)?

1) As pointed out by #Kigyo, the right answer, given that there are no duplicate as, wouldn't use groupBy:
val m: Map[Int,Foo] = data.map(e => e.a -> e)(breakOut)
Using groupBy is good when there could be duplicate as, but is totally unnecessary given your problem.
2) First, don't use mapValues if you plan on accessing values multiple times. The .mapValues method does not create a new Map (like the .map method does). Instead, it creates a view of a Map that recomputes the function (_.head in your case) every time it is accessed. If you plan on accessing things a lot, consider map{case (a,b) => a -> ??} instead.
Second, passing the breakOut function as the CanBuildFrom parameter does not incur additional costs. The reason for this is that the CanBuildFrom parameter is always present, just sometimes it's implicit. The true signature is this:
def map[B, That](f: (A) ⇒ B)(implicit bf: CanBuildFrom[List[A], B, That]): That
The purpose of the CanBuildFrom is to tell scala how to make a That out of the result of mapping (which is a collection of Bs). If you leave off breakOut, then it uses an implicit CanBuildFrom, but either way, there must be a CanBuildFrom so that there is some object that is able to build the That out of the Bs.
Finally, in your example with breakOut, the breakOut is completely redundant since groupBy produces a Map, so .map on a Map gives you back a Map by default.
val m: Map[Int,Foo] = data.groupBy(_.a).map(e => e._1 -> e._2.head)

Related

converting a list to map in scala

I am trying to convert a list to map in scala.
Input
val colNames = List("salary_new", "age_new", "loc_new")
Output
Map(salary_new -> salary, age_new -> age, loc_new -> loc)
Following code is working, but seems like I am over killing it.
val colRenameMap = colNames.flatMap(colname => Map(colname -> colname.split("_")(0))).toMap
I think map instead of flatMap would be more suitable for your case. Also you don't need to use the Map type internally, a single tuple should do the job.
For the sake of completeness this is how the definition of toMap looks like:
toMap[T, U](implicit ev: A <:< (T, U)): immutable.Map[T, U]
as you can see the method expects a (T, U) which is a Tuple2.
Finally, two options using map:
// option 1: key/value
colNames.map{c => c -> c.split("_")(0)}.toMap
// option 2: tuple
colNames.map{c => (c, c.split("_")(0))}.toMap

Iterable with two elements?

We have Option which is an Iterable over 0 or 1 elements.
I would like to have such a thing with two elements. The best I have is
Array(foo, bar).map{...}, while what I would like is:
(foo, bar).map{...}
(such that Scala recognized there are two elements in the Iterable).
Does such a construction exist in the standard library?
EDIT: another solution is to create a map method:
def map(a:Foo) = {...}
val (mappedFoo, mappedBar) = (map(foo), map(bar))
If all you want to do is map on tuples of the same type, a simple version is:
implicit class DupleOps[T](t: (T,T)) {
def map[B](f : T => B) = (f(t._1), f(t._2))
}
Then you can do the following:
val t = (0,1)
val (x,y) = t.map( _ +1) // x = 1, y = 2
There's no specific type in the scala standard library for mapping over exactly 2 elements.
I can suggest you the following thing (I suppose foo and bar has the same type T):
(foo, bar) // -> Tuple2[T,T]
.productIterator // -> Iterator[Any]
.map(_.asInstanceOf[T]) // -> Iterator[T]
.map(x => // some works)
No, it doesn't.
You could
Make one yourself.
Write an implicit conversion from 2-tuples to a Seq of the common supertype. But this won't yield 2-tuples from operations.
object TupleOps {
implicit def tupleToSeq[A <: C, B <: C](tuple: (A, B)): Seq[C] = Seq(tuple._1,tuple._2)
}
import TupleOps._
(0, 1).map(_ + 1)
Use HLists from shapeless. These provide operations on heterogenous lists, whereas you (probably?) have a homogeneous list, but it should work.

Scala - Iterate over an Iterator of type Product[K,V]

I am a newbie to Scala and I am trying to understand collectives. I have a sample Scala code in which a method is defined as follows:
override def write(records: Iterator[Product2[K, V]]): Unit = {...}
From what I understand, this function is passed an argument record which is an Iterator of type Product2[K,V]. Now what I don't understand is this Product2 a user defined class or is it a built in data structure. Moreover how do explore the key-value pair contents of Product2 and how do I iterate over them.
Chances are Product2 is a built-in class and you can easily check it if you're in modern IDE (just hover over it with ctrl pressed), or, by inspecting file header -- if there is no related imports, like some.custom.package.Product2, it's built-in.
What is Product2 and where it's defined? You can easily found out such things by utilizing Scala's ScalaDoc:
In case of build-in class you can treat it like tuple of 2 elements (in fact Tuple2 extends Product2, as you may see below), which has ._1 and ._2 accessor methods.
scala> val x: Product2[String, Int] = ("foo", 1)
// x: Product2[String,Int] = (foo,1)
scala> x._1
// res0: String = foo
scala> x._2
// res1: Int = 1
See How should I think about Scala's Product classes? for more.
Iteration is also hassle free, for example here is the map operation:
scala> val xs: Iterator[Product2[String, Int]] = List("foo" -> 1, "bar" -> 2, "baz" -> 3).iterator
xs: Iterator[Product2[String,Int]] = non-empty iterator
scala> val keys = xs.map(kv => kv._1)
keys: Iterator[String] = non-empty iterator
scala> val keys = xs.map(kv => kv._1).toList
keys: List[String] = List(foo, bar, baz)
scala> xs
res2: Iterator[Product2[String,Int]] = empty iterator
Keep in mind though, that once iterator was consumed, it transitions to empty state and can't be re-used again.
Product2 is just two values of type K and V.
use it like this:
write(List((1, "one"), (2, "two")))
the prototype can also be written like: override def write(records: Iterator[(K, V)]): Unit = {...}
To access values k of type K and v of type V.
override def write(records: Iterator[(K, V)]): Unit = {
records.map{case (k, v) => w(k, v)}
}

Create a Map of Iterables only using immutable collections

I have an iterable val pairs: Iterable[Pair[Key, Value]], that has some key=>value pairs.
Now, I want to create a Map[Key, Iterable[Value]], that has for each key an Iterable of all values of given key in pairs. (I don't actually need a Seq, any Iterable is fine).
I can do it using mutable Map and/or using mutable ListBuffers.
However, everyone tells me that the "right" scala is without using mutable collections. So, is it possible to do this only with immutable collections? (for example, with using map, foldLeft, etc.)
I have found out a really simple way to do this
pairs.groupBy{_._1}.mapValues{_.map{_._2}}
And that's it.
Anything that you can do with a non-cyclic mutable data structure you can also do with an immutable data structure. The trick is pretty simple:
loop -> recursion or fold
mutating operation -> new-copy-with-change-made operation
So, for example, in your case you're probably looping through the Iterable and adding a value each time. If we apply our handy trick, we
def mkMap[K,V](data: Iterable[(K,V)]): Map[K, Iterable[V]] = {
#annotation.tailrec def mkMapInner(
data: Iterator[(K,V)],
map: Map[K,Vector[V]] = Map.empty[K,Vector[V]]
): Map[K,Vector[V]] = {
if (data.hasNext) {
val (k,v) = data.next
mkMapInner(data, map + (k -> map.get(k).map(_ :+ v).getOrElse(Vector(v))))
}
else map
}
mkMapInner(data.iterator)
}
Here I've chosen to implement the loop-replacement by declaring a recursive inner method (with #annotation.tailrec to check that the recursion is optimized to a while loop so it won't break the stack)
Let's test it out:
val pairs = Iterable((1,"flounder"),(2,"salmon"),(1,"halibut"))
scala> mkMap(pairs)
res2: Map[Int,Iterable[java.lang.String]] =
Map(1 -> Vector(flounder, halibut), 2 -> Vector(salmon))
Now, it turns out that Scala's collection libraries also contain something useful for this:
scala> pairs.groupBy(_._1).mapValues{ _.map{_._2 } }
with the groupBy being the key method, and the rest cleaning up what it produces into the form you want.
For the record, you can write this pretty cleanly with a fold. I'm going to assume that your Pair is the one in the standard library (aka Tuple2):
pairs.foldLeft(Map.empty[Key, Seq[Value]]) {
case (m, (k, v)) => m.updated(k, m.getOrElse(k, Seq.empty) :+ v)
}
Although of course in this case the groupBy approach is more convenient.
val ps = collection.mutable.ListBuffer(1 -> 2, 3 -> 4, 1 -> 5)
ps.groupBy(_._1).mapValues(_ map (_._2))
// = Map(1 -> ListBuffer(2, 5), 3 -> ListBuffer(4))
This gives a mutable ListBuffer in the output map. If you want your output to be immutable (not sure if this is quite what you're asking), use collection.breakOut:
ps.groupBy(_._1).mapValues(_.map(_._2)(collection.breakOut))
// = Map(1 -> Vector(2, 5), 3 -> Vector(4))
It seems like Vector is the default for breakOut, but to be sure, you can specify the return type on the left hand side: val myMap: Map[Int,Vector[Int]] = ....
More info on breakOut here.
As a method:
def immutableGroup[A,B](xs: Traversable[(A,B)]): Map[A,Vector[B]] =
xs.groupBy(_._1).mapValues(_.map(_._2)(collection.breakOut))
I perform this function so often that I have an implicit written called groupByKey that does precisely this:
class EnrichedWithGroupByKey[A, Repr <: Traversable[A]](self: TraversableLike[A, Repr]) {
def groupByKey[T, U, That](implicit ev: A <:< (T, U), bf: CanBuildFrom[Repr, U, That]): Map[T, That] =
self.groupBy(_._1).map { case (k, vs) => k -> (bf(self.asInstanceOf[Repr]) ++= vs.map(_._2)).result }
}
implicit def enrichWithGroupByKey[A, Repr <: Traversable[A]](self: TraversableLike[A, Repr]) = new EnrichedWithGroupByKey[A, Repr](self)
And you use it like this:
scala> List(("a", 1), ("b", 2), ("b", 3), ("a", 4)).groupByKey
res0: Map[java.lang.String,List[Int]] = Map(a -> List(1, 4), b -> List(2, 3))
Note that I use .map { case (k, vs) => k -> ... } instead of mapValues because mapValues creates a view, instead of just performing the map immediately. If you plan on accessing those values many times, you'll want to avoid the view approach because it will mean recomputing the .map(_._2) every time.

Scala best way of turning a Collection into a Map-by-key?

If I have a collection c of type T and there is a property p on T (of type P, say), what is the best way to do a map-by-extracting-key?
val c: Collection[T]
val m: Map[P, T]
One way is the following:
m = new HashMap[P, T]
c foreach { t => m add (t.getP, t) }
But now I need a mutable map. Is there a better way of doing this so that it's in 1 line and I end up with an immutable Map? (Obviously I could turn the above into a simple library utility, as I would in Java, but I suspect that in Scala there is no need)
You can use
c map (t => t.getP -> t) toMap
but be aware that this needs 2 traversals.
You can construct a Map with a variable number of tuples. So use the map method on the collection to convert it into a collection of tuples and then use the : _* trick to convert the result into a variable argument.
scala> val list = List("this", "maps", "string", "to", "length") map {s => (s, s.length)}
list: List[(java.lang.String, Int)] = List((this,4), (maps,4), (string,6), (to,2), (length,6))
scala> val list = List("this", "is", "a", "bunch", "of", "strings")
list: List[java.lang.String] = List(this, is, a, bunch, of, strings)
scala> val string2Length = Map(list map {s => (s, s.length)} : _*)
string2Length: scala.collection.immutable.Map[java.lang.String,Int] = Map(strings -> 7, of -> 2, bunch -> 5, a -> 1, is -> 2, this -> 4)
In addition to #James Iry's solution, it is also possible to accomplish this using a fold. I suspect that this solution is slightly faster than the tuple method (fewer garbage objects are created):
val list = List("this", "maps", "string", "to", "length")
val map = list.foldLeft(Map[String, Int]()) { (m, s) => m(s) = s.length }
This can be implemented immutably and with a single traversal by folding through the collection as follows.
val map = c.foldLeft(Map[P, T]()) { (m, t) => m + (t.getP -> t) }
The solution works because adding to an immutable Map returns a new immutable Map with the additional entry and this value serves as the accumulator through the fold operation.
The tradeoff here is the simplicity of the code versus its efficiency. So, for large collections, this approach may be more suitable than using 2 traversal implementations such as applying map and toMap.
Another solution (might not work for all types)
import scala.collection.breakOut
val m:Map[P, T] = c.map(t => (t.getP, t))(breakOut)
this avoids the creation of the intermediary list, more info here:
Scala 2.8 breakOut
What you're trying to achieve is a bit undefined.
What if two or more items in c share the same p? Which item will be mapped to that p in the map?
The more accurate way of looking at this is yielding a map between p and all c items that have it:
val m: Map[P, Collection[T]]
This could be easily achieved with groupBy:
val m: Map[P, Collection[T]] = c.groupBy(t => t.p)
If you still want the original map, you can, for instance, map p to the first t that has it:
val m: Map[P, T] = c.groupBy(t => t.p) map { case (p, ts) => p -> ts.head }
Scala 2.13+
instead of "breakOut" you could use
c.map(t => (t.getP, t)).to(Map)
Scroll to "View": https://www.scala-lang.org/blog/2017/02/28/collections-rework.html
This is probably not the most efficient way to turn a list to map, but it makes the calling code more readable. I used implicit conversions to add a mapBy method to List:
implicit def list2ListWithMapBy[T](list: List[T]): ListWithMapBy[T] = {
new ListWithMapBy(list)
}
class ListWithMapBy[V](list: List[V]){
def mapBy[K](keyFunc: V => K) = {
list.map(a => keyFunc(a) -> a).toMap
}
}
Calling code example:
val list = List("A", "AA", "AAA")
list.mapBy(_.length) //Map(1 -> A, 2 -> AA, 3 -> AAA)
Note that because of the implicit conversion, the caller code needs to import scala's implicitConversions.
c map (_.getP) zip c
Works well and is very intuitiv
How about using zip and toMap?
myList.zip(myList.map(_.length)).toMap
For what it's worth, here are two pointless ways of doing it:
scala> case class Foo(bar: Int)
defined class Foo
scala> import scalaz._, Scalaz._
import scalaz._
import Scalaz._
scala> val c = Vector(Foo(9), Foo(11))
c: scala.collection.immutable.Vector[Foo] = Vector(Foo(9), Foo(11))
scala> c.map(((_: Foo).bar) &&& identity).toMap
res30: scala.collection.immutable.Map[Int,Foo] = Map(9 -> Foo(9), 11 -> Foo(11))
scala> c.map(((_: Foo).bar) >>= (Pair.apply[Int, Foo] _).curried).toMap
res31: scala.collection.immutable.Map[Int,Foo] = Map(9 -> Foo(9), 11 -> Foo(11))
This works for me:
val personsMap = persons.foldLeft(scala.collection.mutable.Map[Int, PersonDTO]()) {
(m, p) => m(p.id) = p; m
}
The Map has to be mutable and the Map has to be return since adding to a mutable Map does not return a map.
use map() on collection followed with toMap
val map = list.map(e => (e, e.length)).toMap