Create a map from a collection using a function - scala

I want to create a map from a collection by providing it a mapping function. It's basically equivalent to what a normal map method does, only I want it to return a Map, not a flat collection.
I would expect it to have a signature like
def toMap[T, S](T => S): Map[T, S]
when invoked like this
val collection = List(1, 2, 3)
val map: Map[Int, String] = collection.toMap(_.toString + " seconds")
the expected value of map would be
Map(1 -> "1 seconds", 2 -> "2 seconds", 3 -> "3 seconds")
The method would be equivalent to
val collection = List(1, 2, 3)
val map: Map[Int, String] = collection.map(x => (x, x.toString + " seconds")).toMap
is there such a method in Scala?

scalaz has a fproduct method for Functors which returns things in the right shape for calling .toMap on the result:
scala> import scalaz._,Scalaz._
import scalaz._
import Scalaz._
scala> val collection = List(1, 2, 3)
collection: List[Int] = List(1, 2, 3)
scala> collection.fproduct(_.toString + " seconds").toMap
res0: scala.collection.immutable.Map[Int,String] = Map(1 -> 1 seconds, 2 -> 2 seconds, 3 -> 3 seconds)

There is no such single method. As you say, you can use map followed by toMap. If you are concerned about the intermediary list you are creating, you might consider using breakOut as the implicit second argument to map:
import scala.collection.breakOut
val map: Map[Int, String] = collection.map(x => (x._1, x._2.toString + " seconds"))(breakOut)
You can read more about breakOut and the implicit argument of map here.
This method allows you to construct other types that have a suitable CanBuildFrom implementation as well, without the intermediate step:
val arr: Array[(Int, String)] = collection.map(x => (x._1, x._2.toString + " seconds"))(breakOut)
You might also want to consider using views which inhibit creating of intermediary collections:
val m = (List("A", "B", "C").view map (x => x -> x)).toMap
The differences between these approaches are described here.
Finally, there is the mapValues method, which might be suitable for your purposes, if you are only mapping the values of each key-value pair. Be careful, though, since this method actually returns a view, and might lead to unexpected performance hits.

Related

Convert List to Map in Akka Stream

I want to convert list items to single map as a stage in my Akka Streams workflow. As an example, say I had the following class.
case class MyClass(myString: String, myInt: Int)
I want to convert a List of MyClass instances to a Map that keys them by myString.
So if I had List(MyClass("hello", 1), MyClass("world", 2), MyClass("hello", 3)), I would want a map of hello mapping to List(1, 3) and world mapping to List(2).
The following is what I have so far.
val flowIWant = {
Flow[MyClass].map { entry =>
entry.myString -> entry.myInt
} ??? // How to combine tuples into a single map?
}
Also, it would be ideal for the flow to end up producing the individual map entities so I can work with each individually for the next stage (I want to do an operation on each map entity individual).
I am not sure if this a fold type operation or what. Thanks for any help.
It is not really clear what you actually want to get. From the way you stated your problem, I see at least the following transformations you could have meant:
Flow[List[MyClass], Map[String, Int], _]
Flow[List[MyClass], Map[String, List[Int]], _]
Flow[MyClass, (String, Int), _]
Flow[MyClass, (String, List[Int]), _]
From your wording I suspect that most likely you want something like the last one, but it doesn't really make sense to have such a transformation, because it won't be able to emit anything - in order to combine all values corresponding to a key you need to read the entire input.
If you have an incoming stream of MyClass and want to get a Map[String, List[Int]] from it, then there is no other choice than to attach it to a folding sink and execute the stream until completion. For example:
val source: Source[MyClass, _] = ??? // your source of MyClass instances
val collect: Sink[MyClass, Future[Map[String, List[Int]]] =
Sink.fold[Map[String, List[Int]], MyClass](Map.empty.withDefaultValue(List.empty)) {
(m, v) => m + (v.myString -> (v.myInt :: m(v.myString)))
}
val result: Future[Map[String, List[Int]]] = source.toMat(collect)(Keep.right).run()
I think you want to scan it:
source.scan((Map.empty[String, Int], None: Option((String, Int))))((acc, next) => { val (map, _)
val newMap = map.updated(next._1 -> map.getOrElse(next._1, List()))
(newMap, Some(newMap.get(next._1)))}).map(_._2.get)
This way you can check the contents of the Map till the memory is exhausted. (The content related to the last element is in the value part of the initial tuple wrapped in an Option.)
This may be what you are looking for :
import akka.actor.ActorSystem
import akka.stream.ActorMaterializer
import akka.stream.scaladsl.{Sink, Source}
import scala.util.{Failure, Success}
object Stack {
def main(args: Array[String]): Unit = {
case class MyClass(myString: String, myInt: Int)
implicit val actorSystem = ActorSystem("app")
implicit val actorMaterializer = ActorMaterializer()
import scala.concurrent.ExecutionContext.Implicits.global
val list = List(MyClass("hello", 1), MyClass("world", 2), MyClass("hello", 3))
val eventualMap = Source(list).fold(Map[String, List[Int]]())((m, e) => {
val newValue = e.myInt :: m.get(e.myString).getOrElse(Nil)
m + (e.myString -> newValue)
}).runWith(Sink.head)
eventualMap.onComplete{
case Success(m) => {
println(m)
actorSystem.terminate()
}
case Failure(e) => {
e.printStackTrace()
actorSystem.terminate()
}
}
}
}
With this code, you'll get the following output :
Map(hello -> List(3, 1), world -> List(2))
If you would like to have the following output :
Vector(Map(), Map(hello -> List(1)), Map(hello -> List(1), world -> List(2)), Map(hello -> List(3, 1), world -> List(2)))
Just use scan instead of fold and run with Sink.seq.
The difference between fold and scan is that fold wait for the upstream to complete before pushing down, whereas scan push every updates to downstream.

Scala - Iterate over an Iterator of type Product[K,V]

I am a newbie to Scala and I am trying to understand collectives. I have a sample Scala code in which a method is defined as follows:
override def write(records: Iterator[Product2[K, V]]): Unit = {...}
From what I understand, this function is passed an argument record which is an Iterator of type Product2[K,V]. Now what I don't understand is this Product2 a user defined class or is it a built in data structure. Moreover how do explore the key-value pair contents of Product2 and how do I iterate over them.
Chances are Product2 is a built-in class and you can easily check it if you're in modern IDE (just hover over it with ctrl pressed), or, by inspecting file header -- if there is no related imports, like some.custom.package.Product2, it's built-in.
What is Product2 and where it's defined? You can easily found out such things by utilizing Scala's ScalaDoc:
In case of build-in class you can treat it like tuple of 2 elements (in fact Tuple2 extends Product2, as you may see below), which has ._1 and ._2 accessor methods.
scala> val x: Product2[String, Int] = ("foo", 1)
// x: Product2[String,Int] = (foo,1)
scala> x._1
// res0: String = foo
scala> x._2
// res1: Int = 1
See How should I think about Scala's Product classes? for more.
Iteration is also hassle free, for example here is the map operation:
scala> val xs: Iterator[Product2[String, Int]] = List("foo" -> 1, "bar" -> 2, "baz" -> 3).iterator
xs: Iterator[Product2[String,Int]] = non-empty iterator
scala> val keys = xs.map(kv => kv._1)
keys: Iterator[String] = non-empty iterator
scala> val keys = xs.map(kv => kv._1).toList
keys: List[String] = List(foo, bar, baz)
scala> xs
res2: Iterator[Product2[String,Int]] = empty iterator
Keep in mind though, that once iterator was consumed, it transitions to empty state and can't be re-used again.
Product2 is just two values of type K and V.
use it like this:
write(List((1, "one"), (2, "two")))
the prototype can also be written like: override def write(records: Iterator[(K, V)]): Unit = {...}
To access values k of type K and v of type V.
override def write(records: Iterator[(K, V)]): Unit = {
records.map{case (k, v) => w(k, v)}
}

Filtering a Scala Multimap and outputting as a list of Tuples

I have a map using the multimap trait, like so
val multiMap = new HashMap[Foo, Set[Bar]] with MultiMap[Foo, Bar]
I would like to combine filtering this map on specific values
multiMap.values.filter(bar => barCondition)
with flattening the matching results into a list of tuples of the form
val fooBarPairs: List[(Foo, Bar)]
What would be the idiomatic way of doing this? I was hoping that Scala might provide something like an anamorphism to do this without looping, but as a complete newbie I am not sure what my options are.
Here's an example:
import collection.mutable.{HashMap, MultiMap, Set}
val mm = new HashMap[String, Set[Int]] with MultiMap[String, Int]
mm.addBinding("One", 1).addBinding("One",11).addBinding("Two",22).
addBinding("Two",222)
// mm.type = Map(Two -> Set(22, 222), One -> Set(1, 11))
I think the easiest way to get what you want is to use a for-expression:
for {
(str, xs) <- mm.toSeq
x <- xs
if x > 10
} yield (str, x) // = ArrayBuffer((Two,222), (Two,22), (One,11))
You need the .toSeq or the output type will be a Map, which would mean each mapping is overidden by subsequent elements. Use toList on this output if you need a List specifically.
Here is an example of what I think you want to do:
scala> mm
res21: scala.collection.mutable.HashMap[String,scala.collection.mutable.Set[Int]] with scala.collection.mutable.MultiMap[String,Int]
= Map(two -> Set(6, 4, 5), one -> Set(2, 1, 3))
scala> mm.toList.flatMap(pair =>
pair._2.toList.flatMap(bar =>
if (bar%2==0)
Some((pair._1, bar))
else
None))
res22: List[(String, Int)] = List((two,6), (two,4), (one,2))
Here is another, slightly more concise solution:
import collection.mutable.{HashMap, MultiMap, Set}
val mm = new HashMap[String, Set[Int]] with MultiMap[String, Int]
val f = (i: Int) => i > 10
mm.addBinding("One", 1)
.addBinding("One",11)
.addBinding("Two",22)
.addBinding("Two",222)
/* Map(Two -> Set(22, 222), One -> Set(1, 11)) */
mm.map{case (k, vs) => vs.filter(f).map((k, _))}.flatten
/* ArrayBuffer((Two,222), (Two,22), (One,11)) */

Create a Map of Iterables only using immutable collections

I have an iterable val pairs: Iterable[Pair[Key, Value]], that has some key=>value pairs.
Now, I want to create a Map[Key, Iterable[Value]], that has for each key an Iterable of all values of given key in pairs. (I don't actually need a Seq, any Iterable is fine).
I can do it using mutable Map and/or using mutable ListBuffers.
However, everyone tells me that the "right" scala is without using mutable collections. So, is it possible to do this only with immutable collections? (for example, with using map, foldLeft, etc.)
I have found out a really simple way to do this
pairs.groupBy{_._1}.mapValues{_.map{_._2}}
And that's it.
Anything that you can do with a non-cyclic mutable data structure you can also do with an immutable data structure. The trick is pretty simple:
loop -> recursion or fold
mutating operation -> new-copy-with-change-made operation
So, for example, in your case you're probably looping through the Iterable and adding a value each time. If we apply our handy trick, we
def mkMap[K,V](data: Iterable[(K,V)]): Map[K, Iterable[V]] = {
#annotation.tailrec def mkMapInner(
data: Iterator[(K,V)],
map: Map[K,Vector[V]] = Map.empty[K,Vector[V]]
): Map[K,Vector[V]] = {
if (data.hasNext) {
val (k,v) = data.next
mkMapInner(data, map + (k -> map.get(k).map(_ :+ v).getOrElse(Vector(v))))
}
else map
}
mkMapInner(data.iterator)
}
Here I've chosen to implement the loop-replacement by declaring a recursive inner method (with #annotation.tailrec to check that the recursion is optimized to a while loop so it won't break the stack)
Let's test it out:
val pairs = Iterable((1,"flounder"),(2,"salmon"),(1,"halibut"))
scala> mkMap(pairs)
res2: Map[Int,Iterable[java.lang.String]] =
Map(1 -> Vector(flounder, halibut), 2 -> Vector(salmon))
Now, it turns out that Scala's collection libraries also contain something useful for this:
scala> pairs.groupBy(_._1).mapValues{ _.map{_._2 } }
with the groupBy being the key method, and the rest cleaning up what it produces into the form you want.
For the record, you can write this pretty cleanly with a fold. I'm going to assume that your Pair is the one in the standard library (aka Tuple2):
pairs.foldLeft(Map.empty[Key, Seq[Value]]) {
case (m, (k, v)) => m.updated(k, m.getOrElse(k, Seq.empty) :+ v)
}
Although of course in this case the groupBy approach is more convenient.
val ps = collection.mutable.ListBuffer(1 -> 2, 3 -> 4, 1 -> 5)
ps.groupBy(_._1).mapValues(_ map (_._2))
// = Map(1 -> ListBuffer(2, 5), 3 -> ListBuffer(4))
This gives a mutable ListBuffer in the output map. If you want your output to be immutable (not sure if this is quite what you're asking), use collection.breakOut:
ps.groupBy(_._1).mapValues(_.map(_._2)(collection.breakOut))
// = Map(1 -> Vector(2, 5), 3 -> Vector(4))
It seems like Vector is the default for breakOut, but to be sure, you can specify the return type on the left hand side: val myMap: Map[Int,Vector[Int]] = ....
More info on breakOut here.
As a method:
def immutableGroup[A,B](xs: Traversable[(A,B)]): Map[A,Vector[B]] =
xs.groupBy(_._1).mapValues(_.map(_._2)(collection.breakOut))
I perform this function so often that I have an implicit written called groupByKey that does precisely this:
class EnrichedWithGroupByKey[A, Repr <: Traversable[A]](self: TraversableLike[A, Repr]) {
def groupByKey[T, U, That](implicit ev: A <:< (T, U), bf: CanBuildFrom[Repr, U, That]): Map[T, That] =
self.groupBy(_._1).map { case (k, vs) => k -> (bf(self.asInstanceOf[Repr]) ++= vs.map(_._2)).result }
}
implicit def enrichWithGroupByKey[A, Repr <: Traversable[A]](self: TraversableLike[A, Repr]) = new EnrichedWithGroupByKey[A, Repr](self)
And you use it like this:
scala> List(("a", 1), ("b", 2), ("b", 3), ("a", 4)).groupByKey
res0: Map[java.lang.String,List[Int]] = Map(a -> List(1, 4), b -> List(2, 3))
Note that I use .map { case (k, vs) => k -> ... } instead of mapValues because mapValues creates a view, instead of just performing the map immediately. If you plan on accessing those values many times, you'll want to avoid the view approach because it will mean recomputing the .map(_._2) every time.

Scala best way of turning a Collection into a Map-by-key?

If I have a collection c of type T and there is a property p on T (of type P, say), what is the best way to do a map-by-extracting-key?
val c: Collection[T]
val m: Map[P, T]
One way is the following:
m = new HashMap[P, T]
c foreach { t => m add (t.getP, t) }
But now I need a mutable map. Is there a better way of doing this so that it's in 1 line and I end up with an immutable Map? (Obviously I could turn the above into a simple library utility, as I would in Java, but I suspect that in Scala there is no need)
You can use
c map (t => t.getP -> t) toMap
but be aware that this needs 2 traversals.
You can construct a Map with a variable number of tuples. So use the map method on the collection to convert it into a collection of tuples and then use the : _* trick to convert the result into a variable argument.
scala> val list = List("this", "maps", "string", "to", "length") map {s => (s, s.length)}
list: List[(java.lang.String, Int)] = List((this,4), (maps,4), (string,6), (to,2), (length,6))
scala> val list = List("this", "is", "a", "bunch", "of", "strings")
list: List[java.lang.String] = List(this, is, a, bunch, of, strings)
scala> val string2Length = Map(list map {s => (s, s.length)} : _*)
string2Length: scala.collection.immutable.Map[java.lang.String,Int] = Map(strings -> 7, of -> 2, bunch -> 5, a -> 1, is -> 2, this -> 4)
In addition to #James Iry's solution, it is also possible to accomplish this using a fold. I suspect that this solution is slightly faster than the tuple method (fewer garbage objects are created):
val list = List("this", "maps", "string", "to", "length")
val map = list.foldLeft(Map[String, Int]()) { (m, s) => m(s) = s.length }
This can be implemented immutably and with a single traversal by folding through the collection as follows.
val map = c.foldLeft(Map[P, T]()) { (m, t) => m + (t.getP -> t) }
The solution works because adding to an immutable Map returns a new immutable Map with the additional entry and this value serves as the accumulator through the fold operation.
The tradeoff here is the simplicity of the code versus its efficiency. So, for large collections, this approach may be more suitable than using 2 traversal implementations such as applying map and toMap.
Another solution (might not work for all types)
import scala.collection.breakOut
val m:Map[P, T] = c.map(t => (t.getP, t))(breakOut)
this avoids the creation of the intermediary list, more info here:
Scala 2.8 breakOut
What you're trying to achieve is a bit undefined.
What if two or more items in c share the same p? Which item will be mapped to that p in the map?
The more accurate way of looking at this is yielding a map between p and all c items that have it:
val m: Map[P, Collection[T]]
This could be easily achieved with groupBy:
val m: Map[P, Collection[T]] = c.groupBy(t => t.p)
If you still want the original map, you can, for instance, map p to the first t that has it:
val m: Map[P, T] = c.groupBy(t => t.p) map { case (p, ts) => p -> ts.head }
Scala 2.13+
instead of "breakOut" you could use
c.map(t => (t.getP, t)).to(Map)
Scroll to "View": https://www.scala-lang.org/blog/2017/02/28/collections-rework.html
This is probably not the most efficient way to turn a list to map, but it makes the calling code more readable. I used implicit conversions to add a mapBy method to List:
implicit def list2ListWithMapBy[T](list: List[T]): ListWithMapBy[T] = {
new ListWithMapBy(list)
}
class ListWithMapBy[V](list: List[V]){
def mapBy[K](keyFunc: V => K) = {
list.map(a => keyFunc(a) -> a).toMap
}
}
Calling code example:
val list = List("A", "AA", "AAA")
list.mapBy(_.length) //Map(1 -> A, 2 -> AA, 3 -> AAA)
Note that because of the implicit conversion, the caller code needs to import scala's implicitConversions.
c map (_.getP) zip c
Works well and is very intuitiv
How about using zip and toMap?
myList.zip(myList.map(_.length)).toMap
For what it's worth, here are two pointless ways of doing it:
scala> case class Foo(bar: Int)
defined class Foo
scala> import scalaz._, Scalaz._
import scalaz._
import Scalaz._
scala> val c = Vector(Foo(9), Foo(11))
c: scala.collection.immutable.Vector[Foo] = Vector(Foo(9), Foo(11))
scala> c.map(((_: Foo).bar) &&& identity).toMap
res30: scala.collection.immutable.Map[Int,Foo] = Map(9 -> Foo(9), 11 -> Foo(11))
scala> c.map(((_: Foo).bar) >>= (Pair.apply[Int, Foo] _).curried).toMap
res31: scala.collection.immutable.Map[Int,Foo] = Map(9 -> Foo(9), 11 -> Foo(11))
This works for me:
val personsMap = persons.foldLeft(scala.collection.mutable.Map[Int, PersonDTO]()) {
(m, p) => m(p.id) = p; m
}
The Map has to be mutable and the Map has to be return since adding to a mutable Map does not return a map.
use map() on collection followed with toMap
val map = list.map(e => (e, e.length)).toMap