Convert List to Map in Akka Stream - scala

I want to convert list items to single map as a stage in my Akka Streams workflow. As an example, say I had the following class.
case class MyClass(myString: String, myInt: Int)
I want to convert a List of MyClass instances to a Map that keys them by myString.
So if I had List(MyClass("hello", 1), MyClass("world", 2), MyClass("hello", 3)), I would want a map of hello mapping to List(1, 3) and world mapping to List(2).
The following is what I have so far.
val flowIWant = {
Flow[MyClass].map { entry =>
entry.myString -> entry.myInt
} ??? // How to combine tuples into a single map?
}
Also, it would be ideal for the flow to end up producing the individual map entities so I can work with each individually for the next stage (I want to do an operation on each map entity individual).
I am not sure if this a fold type operation or what. Thanks for any help.

It is not really clear what you actually want to get. From the way you stated your problem, I see at least the following transformations you could have meant:
Flow[List[MyClass], Map[String, Int], _]
Flow[List[MyClass], Map[String, List[Int]], _]
Flow[MyClass, (String, Int), _]
Flow[MyClass, (String, List[Int]), _]
From your wording I suspect that most likely you want something like the last one, but it doesn't really make sense to have such a transformation, because it won't be able to emit anything - in order to combine all values corresponding to a key you need to read the entire input.
If you have an incoming stream of MyClass and want to get a Map[String, List[Int]] from it, then there is no other choice than to attach it to a folding sink and execute the stream until completion. For example:
val source: Source[MyClass, _] = ??? // your source of MyClass instances
val collect: Sink[MyClass, Future[Map[String, List[Int]]] =
Sink.fold[Map[String, List[Int]], MyClass](Map.empty.withDefaultValue(List.empty)) {
(m, v) => m + (v.myString -> (v.myInt :: m(v.myString)))
}
val result: Future[Map[String, List[Int]]] = source.toMat(collect)(Keep.right).run()

I think you want to scan it:
source.scan((Map.empty[String, Int], None: Option((String, Int))))((acc, next) => { val (map, _)
val newMap = map.updated(next._1 -> map.getOrElse(next._1, List()))
(newMap, Some(newMap.get(next._1)))}).map(_._2.get)
This way you can check the contents of the Map till the memory is exhausted. (The content related to the last element is in the value part of the initial tuple wrapped in an Option.)

This may be what you are looking for :
import akka.actor.ActorSystem
import akka.stream.ActorMaterializer
import akka.stream.scaladsl.{Sink, Source}
import scala.util.{Failure, Success}
object Stack {
def main(args: Array[String]): Unit = {
case class MyClass(myString: String, myInt: Int)
implicit val actorSystem = ActorSystem("app")
implicit val actorMaterializer = ActorMaterializer()
import scala.concurrent.ExecutionContext.Implicits.global
val list = List(MyClass("hello", 1), MyClass("world", 2), MyClass("hello", 3))
val eventualMap = Source(list).fold(Map[String, List[Int]]())((m, e) => {
val newValue = e.myInt :: m.get(e.myString).getOrElse(Nil)
m + (e.myString -> newValue)
}).runWith(Sink.head)
eventualMap.onComplete{
case Success(m) => {
println(m)
actorSystem.terminate()
}
case Failure(e) => {
e.printStackTrace()
actorSystem.terminate()
}
}
}
}
With this code, you'll get the following output :
Map(hello -> List(3, 1), world -> List(2))
If you would like to have the following output :
Vector(Map(), Map(hello -> List(1)), Map(hello -> List(1), world -> List(2)), Map(hello -> List(3, 1), world -> List(2)))
Just use scan instead of fold and run with Sink.seq.
The difference between fold and scan is that fold wait for the upstream to complete before pushing down, whereas scan push every updates to downstream.

Related

cats-effect:How to transform Map[x,IO[y]] to IO[Map[x,y]]

I have a map of string to IO like this Map[String, IO[String]], I want to transform it into IO[Map[String, String]]. How to do it?
It would be nice to use unorderedTraverse here, but as codenoodle pointed out, it doesn't work because IO is not a commutative applicative. However there is a type that is, and it's called IO.Par. Like the name suggests, its ap combinator won't execute things sequentially but in parallel, so it's commutative – doing a and then b is not the same as doing b and then a, but doing a and b concurrently is the same as doing b and a concurrently.
So you can use unorderedTraverse using a function that doesn't return IO but IO.Par. However the downside to that is that now you need to convert from IO to IO.Par and then back – hardly an improvement.
To solve this problem, I have added the parUnorderedTraverse method in cats 2.0 that will take care of these conversions for you. And because it all happens in parallel it will also be more efficient! There are also parUnorderedSequence, parUnorderedFlatTraverse and parUnorderedFlatSequence.
I should also point out that this works not only for IO but also for everything else with a Parallel instance, such as Either[A, ?] (where A is a CommutativeSemigroup). It should also be possible for List/ZipList, but nobody appears to have bothered to do it yet.
You'll have to be a little careful with this one. Maps in Scala are unordered, so if you try to use cats's sequence like this…
import cats.instances.map._
import cats.effect.IO
import cats.UnorderedTraverse
object Example1 {
type StringMap[V] = Map[String, V]
val m: StringMap[IO[String]] = Map("1" -> IO{println("1"); "1"})
val n: IO[StringMap[String]] = UnorderedTraverse[StringMap].unorderedSequence[IO, String](m)
}
you'll get the following error:
Error: could not find implicit value for evidence parameter of type cats.CommutativeApplicative[cats.effect.IO]
The issue here is that the IO monad is not actually commutative. Here is the definition of commutativity:
map2(u, v)(f) = map2(v, u)(flip(f)) // Commutativity (Scala)
This definition shows that the result is the same even when the effects happen in a different order.
You can make the above code compile by providing an instance of CommutativeApplicative[IO] but that still doesn't make the IO monad commutative. If you run the following code you can see the side effects are not processed in the same order:
import cats.effect.IO
import cats.CommutativeApplicative
object Example2 {
implicit object FakeEvidence extends CommutativeApplicative[IO] {
override def pure[A](x: A): IO[A] = IO(x)
override def ap[A, B](ff: IO[A => B])(fa: IO[A]): IO[B] =
implicitly[Applicative[IO]].ap(ff)(fa)
}
def main(args: Array[String]): Unit = {
def flip[A, B, C](f: (A, B) => C) = (b: B, a: A) => f(a, b)
val fa = IO{println(1); 1}
val fb = IO{println(true); true}
val f = (a: Int, b: Boolean) => s"$a$b"
println(s"IO is not commutative: ${FakeEvidence.map2(fa, fb)(f).unsafeRunSync()} == ${FakeEvidence.map2(fb, fa)(flip(f)).unsafeRunSync()} (look at the side effects above^^)")
}
}
Which outputs the following:
1
true
true
1
IO is not commutative: 1true == 1true (look at the side effects above^^)
In order to get around this I would suggest making your map something with an order, like a List, where sequence will not require commutativity. The following example is just one way to do this:
import cats.effect.IO
import cats.implicits._
object Example3 {
val m: Map[String, IO[String]] = Map("1" -> IO {println("1"); "1"})
val l: IO[List[(String, String)]] = m.toList.traverse[IO, (String, String)] { case (s, io) => io.map(s2 => (s, s2))}
val n: IO[Map[String, String]] = l.map { _.toMap }
}

Create a map from a collection using a function

I want to create a map from a collection by providing it a mapping function. It's basically equivalent to what a normal map method does, only I want it to return a Map, not a flat collection.
I would expect it to have a signature like
def toMap[T, S](T => S): Map[T, S]
when invoked like this
val collection = List(1, 2, 3)
val map: Map[Int, String] = collection.toMap(_.toString + " seconds")
the expected value of map would be
Map(1 -> "1 seconds", 2 -> "2 seconds", 3 -> "3 seconds")
The method would be equivalent to
val collection = List(1, 2, 3)
val map: Map[Int, String] = collection.map(x => (x, x.toString + " seconds")).toMap
is there such a method in Scala?
scalaz has a fproduct method for Functors which returns things in the right shape for calling .toMap on the result:
scala> import scalaz._,Scalaz._
import scalaz._
import Scalaz._
scala> val collection = List(1, 2, 3)
collection: List[Int] = List(1, 2, 3)
scala> collection.fproduct(_.toString + " seconds").toMap
res0: scala.collection.immutable.Map[Int,String] = Map(1 -> 1 seconds, 2 -> 2 seconds, 3 -> 3 seconds)
There is no such single method. As you say, you can use map followed by toMap. If you are concerned about the intermediary list you are creating, you might consider using breakOut as the implicit second argument to map:
import scala.collection.breakOut
val map: Map[Int, String] = collection.map(x => (x._1, x._2.toString + " seconds"))(breakOut)
You can read more about breakOut and the implicit argument of map here.
This method allows you to construct other types that have a suitable CanBuildFrom implementation as well, without the intermediate step:
val arr: Array[(Int, String)] = collection.map(x => (x._1, x._2.toString + " seconds"))(breakOut)
You might also want to consider using views which inhibit creating of intermediary collections:
val m = (List("A", "B", "C").view map (x => x -> x)).toMap
The differences between these approaches are described here.
Finally, there is the mapValues method, which might be suitable for your purposes, if you are only mapping the values of each key-value pair. Be careful, though, since this method actually returns a view, and might lead to unexpected performance hits.

How to convert Map[A,Future[B]] to Future[Map[A,B]]?

I've been working with the Scala Akka library and have come across a bit of a problem. As the title says, I need to convert Map[A, Future[B]] to Future[Map[A,B]]. I know that one can use Future.sequence for Iterables like Lists, but that doesn't work in this case.
I was wondering: is there a clean way in Scala to make this conversion?
See if this works for you:
val map = Map("a" -> future{1}, "b" -> future{2}, "c" -> future{3})
val fut = Future.sequence(map.map(entry => entry._2.map(i => (entry._1, i)))).map(_.toMap)
The idea is to map the map to an Iterable for a Tuple of the key of the map and the result of the future tied to that key. From there you can sequence that Iterable and then once you have the aggregate Future, map it and convert that Iterable of Tuples to a map via toMap.
Now, an alternative to this approach is to try and do something similar to what the sequence function is doing, with a couple of tweaks. You could write a sequenceMap function like so:
def sequenceMap[A, B](in: Map[B, Future[A]])(implicit executor: ExecutionContext): Future[Map[B, A]] = {
val mb = new MapBuilder[B,A, Map[B,A]](Map())
in.foldLeft(Promise.successful(mb).future) {
(fr, fa) => for (r <- fr; a <- fa._2.asInstanceOf[Future[A]]) yield (r += ((fa._1, a)))
} map (_.result)
}
And then use it in an example like this:
val map = Map("a" -> future{1}, "b" -> future{2}, "c" -> future{3})
val fut = sequenceMap(map)
fut onComplete{
case Success(m) => println(m)
case Failure(ex) => ex.printStackTrace()
}
This might be slightly more efficient than the first example as it creates less intermediate collections and has less hits to the ExecutionContext.
I think the most succinct we can be with core Scala 2.12.x is
val futureMap = Map("a" -> future{1}, "b" -> future{2}, "c" -> future{3})
Future.traverse(futureMap.toList) { case (k, fv) => fv.map(k -> _) } map(_.toMap)
Update: You can actually get the nice .sequence syntax in Scalaz 7 without too much fuss:
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.{ Future, future }
import scalaz._, Scalaz.{ ToTraverseOps => _, _ }
import scalaz.contrib.std._
val m = Map("a" -> future(1), "b" -> future(2), "c" -> future(3))
And then:
scala> m.sequence.onSuccess { case result => println(result) }
Map(a -> 1, b -> 2, c -> 3)
In principle it shouldn't be necessary to hide ToTraverseOps like this, but for now it does the trick. See the rest of my answer below for more details about the Traverse type class, dependencies, etc.
As copumpkin notes in a comment above, Scalaz contains a Traverse type class with an instance for Map[A, _] that is one of the puzzle pieces here. The other piece is the Applicative instance for Future, which isn't in Scalaz 7 (which is still cross-built against pre-Future 2.9), but is in scalaz-contrib.
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.Future
import scalaz._, Scalaz._
import scalaz.contrib.std._
def sequence[A, B](m: Map[A, Future[B]]): Future[Map[A, B]] = {
type M[X] = Map[A, X]
(m: M[Future[B]]).sequence
}
Or:
def sequence[A, B](m: Map[A, Future[B]]): Future[Map[A, B]] =
Traverse[({ type L[X] = Map[A, X] })#L] sequence m
Or:
def sequence[A, B](m: Map[A, Future[B]]): Future[Map[A, B]] =
TraverseOpsUnapply(m).sequence
In a perfect world you'd be able to write m.sequence, but the TraverseOps machinery that should make this syntax possible isn't currently able to tell how to go from a particular Map instance to the appropriate Traverse instance.
This also works, where the idea is to use the sequence result (of the map's values) to fire a promise that says you can start retrieving values from your map. mapValues gives you a non-strict view of your map, so the value.get.get is only applied when you retrieve the value. That's right, you get to keep your map! Free ad for the puzzlers in that link.
import concurrent._
import concurrent.duration._
import scala.util._
import ExecutionContext.Implicits.global
object Test extends App {
def calc(i: Int) = { Thread sleep i * 1000L ; i }
val m = Map("a" -> future{calc(1)}, "b" -> future{calc(2)}, "c" -> future{calc(3)})
val m2 = m mapValues (_.value.get.get)
val k = Future sequence m.values
val p = Promise[Map[String,Int]]
k onFailure { case t: Throwable => p failure t }
k onSuccess { case _ => p success m2 }
val res = Await.result(p.future, Duration.Inf)
Console println res
}
Here's the REPL where you see it force the m2 map by printing all its values:
scala> val m2 = m mapValues (_.value.get.get)
m2: scala.collection.immutable.Map[String,Int] = Map(a -> 1, b -> 2, c -> 3)
This shows the same thing with futures that are still in the future:
scala> val m2 = m mapValues (_.value.get.get)
java.util.NoSuchElementException: None.get
Just make a new future which waits for all futures in the map values , then builds a map to return.
I would try to avoid using overengineered Scalaz based super-functional solutions (unless your project is already heavily Scalaz based and has tons of "computationally sophisticated" code; no offense on the "overengineered" remark):
// the map you have
val foo: Map[A, Future[B]] = ???
// get a Seq[Future[...]] so that we can run Future.sequence on it
val bar: Seq[Future[(A, B)]] = foo.map { case (k, v) => v.map(k -> _) }
// here you go; convert back `toMap` once it completes
Future.sequence(bar).onComplete { data =>
// do something with data.toMap
}
However, it should be safe to assume that your map values are somehow generated from the map keys, which initially reside in a Seq such as List, and that the part of code that builds the initial Map is under your control as opposed to being sent from elsewhere. So I would personally take an even simpler/cleaner approach instead by not starting out with Map[A, Future[B]] in the first place.
def fetchAgeFromDb(name: String): Future[Int] = ???
// no foo needed anymore
// no Map at all before the future completes
val bar = personNames.map { name => fetchAgeFromDb(name).map(name -> _) }
// just as above
Future.sequence(bar).onComplete { data =>
// do something with data.toMap
}
Is this solution acceptable :
without an execution context this should works ...
def removeMapFuture[A, B](in: Future[Map[A, Future[B]]]) = {
in.flatMap { k =>
Future.sequence(k.map(l =>
l._2.map(l._1 -> _)
)).map {
p => p.toMap
}
}
}

Is the ordering of members of a map, seeming to be by addition, reliable?

When I create an immutable map with a standard call to Map() or by concatenating the existing maps created that way, in all my tests I get that traversing its members provides them in the order of addition. That's exactly the way I need them to be sorted, but there's not a word in the documentation about the reliability of the ordering of the members of the map.
So I was wondering whether it is safe to expect the standard Map to return its items in the order of addition or I should look for some other implementations and which ones in that case.
I don't think it's safe, the order is not preserved starting from 5 elements (Scala 2.9.1):
scala> Map(1 -> 1, 2 -> 2, 3 -> 3, 4 -> 4, 5 -> 5)
res9: scala.collection.immutable.Map[Int,Int] =
Map(5 -> 5, 1 -> 1, 2 -> 2, 3 -> 3, 4 -> 4)
With bigger maps the order is completely "random", try Map((1 to 100) zip (1 to 100): _*).
Try LinkedHashMap for ordered entries and TreeMap to achieve sorted entries.
There is no promise about the order of Map. There is an OrderedMap in scalas collection package. The values in that package are ordered by an implicit Ordering. As quickfix I recommend you to use a list of keys for the ordering of your Map.
var keyOrdering = List[Int]()
var unorderedMap = Map[Int, String]()
unorderedMap += (1 -> "one")
keyOrdering :+= 1
Edit
You could implement your own Ordering and pass it to a SortedMap as well.
Edit #2
A simple example would be the following:
scala> import scala.collection.SortedMap
import scala.collection.SortedMap
scala> implicit object IntOrdering extends Ordering[Int]
| def compare(a: Int, b: Int) = b - a
| }
defined module IntOrdering
scala> var sm = SortedMap[Int, String]()
sm: scala.collection.SortedMap[Int,String] = Map()
scala> sm += (1 -> "one")
scala> sm += (2 -> "two")
scala> println(sm)
Map(2 -> two, 1 -> one)
The implicit Ordering is applied to the keys, so IntOrdering might be applied to a SortedMap[Int, Any].
Edit #3
A self ordering DataType like in my comment might look this way:
case class DataType[T](t: T, index: Int)
object DataType{
private var index = -1
def apply[T](t: T) = { index += 1 ; new DataType[T](t, index)
}
Now we need to change the Ordering:
implicit object DataTypeOrdering extends Ordering[DataType[_]] {
def compare(a: DataType[_], b: DataType[_]) = a.index - b.index
}
I hope this is the way you expected my answer.
After digging I've found out that there exists an immutable ListMap that behaves exactly as I want it, but according to this table its performance is just awfull. So I wrote a custom immutable implementation that should perform effectively on all operations except removal, where it performs linearly. It does require a bit more memory as it's backed by a standard Map and a Queue, which itself utilizes a List twice, but in the current age it's not an issue, right.
import collection.immutable.Queue
object OrderedMap {
def apply[A, B](elems: (A, B)*) =
new OrderedMap(Map(elems: _*), Queue(elems: _*))
}
class OrderedMap[A, B](
map: Map[A, B] = Map[A, B](),
protected val queue: Queue[(A, B)] = Queue()
) extends Map[A, B] {
def get(key: A) =
map.get(key)
def iterator =
queue.iterator
def +[B1 >: B](kv: (A, B1)) =
new OrderedMap(
map + kv,
queue enqueue kv
)
def -(key: A) =
new OrderedMap(
map - key,
queue filter (_._1 != key)
)
override def hashCode() =
queue.hashCode
override def equals(that: Any) =
that match {
case that: OrderedMap[A, B] =>
queue.equals(that.queue)
case _ =>
super.equals(that)
}
}

Scala best way of turning a Collection into a Map-by-key?

If I have a collection c of type T and there is a property p on T (of type P, say), what is the best way to do a map-by-extracting-key?
val c: Collection[T]
val m: Map[P, T]
One way is the following:
m = new HashMap[P, T]
c foreach { t => m add (t.getP, t) }
But now I need a mutable map. Is there a better way of doing this so that it's in 1 line and I end up with an immutable Map? (Obviously I could turn the above into a simple library utility, as I would in Java, but I suspect that in Scala there is no need)
You can use
c map (t => t.getP -> t) toMap
but be aware that this needs 2 traversals.
You can construct a Map with a variable number of tuples. So use the map method on the collection to convert it into a collection of tuples and then use the : _* trick to convert the result into a variable argument.
scala> val list = List("this", "maps", "string", "to", "length") map {s => (s, s.length)}
list: List[(java.lang.String, Int)] = List((this,4), (maps,4), (string,6), (to,2), (length,6))
scala> val list = List("this", "is", "a", "bunch", "of", "strings")
list: List[java.lang.String] = List(this, is, a, bunch, of, strings)
scala> val string2Length = Map(list map {s => (s, s.length)} : _*)
string2Length: scala.collection.immutable.Map[java.lang.String,Int] = Map(strings -> 7, of -> 2, bunch -> 5, a -> 1, is -> 2, this -> 4)
In addition to #James Iry's solution, it is also possible to accomplish this using a fold. I suspect that this solution is slightly faster than the tuple method (fewer garbage objects are created):
val list = List("this", "maps", "string", "to", "length")
val map = list.foldLeft(Map[String, Int]()) { (m, s) => m(s) = s.length }
This can be implemented immutably and with a single traversal by folding through the collection as follows.
val map = c.foldLeft(Map[P, T]()) { (m, t) => m + (t.getP -> t) }
The solution works because adding to an immutable Map returns a new immutable Map with the additional entry and this value serves as the accumulator through the fold operation.
The tradeoff here is the simplicity of the code versus its efficiency. So, for large collections, this approach may be more suitable than using 2 traversal implementations such as applying map and toMap.
Another solution (might not work for all types)
import scala.collection.breakOut
val m:Map[P, T] = c.map(t => (t.getP, t))(breakOut)
this avoids the creation of the intermediary list, more info here:
Scala 2.8 breakOut
What you're trying to achieve is a bit undefined.
What if two or more items in c share the same p? Which item will be mapped to that p in the map?
The more accurate way of looking at this is yielding a map between p and all c items that have it:
val m: Map[P, Collection[T]]
This could be easily achieved with groupBy:
val m: Map[P, Collection[T]] = c.groupBy(t => t.p)
If you still want the original map, you can, for instance, map p to the first t that has it:
val m: Map[P, T] = c.groupBy(t => t.p) map { case (p, ts) => p -> ts.head }
Scala 2.13+
instead of "breakOut" you could use
c.map(t => (t.getP, t)).to(Map)
Scroll to "View": https://www.scala-lang.org/blog/2017/02/28/collections-rework.html
This is probably not the most efficient way to turn a list to map, but it makes the calling code more readable. I used implicit conversions to add a mapBy method to List:
implicit def list2ListWithMapBy[T](list: List[T]): ListWithMapBy[T] = {
new ListWithMapBy(list)
}
class ListWithMapBy[V](list: List[V]){
def mapBy[K](keyFunc: V => K) = {
list.map(a => keyFunc(a) -> a).toMap
}
}
Calling code example:
val list = List("A", "AA", "AAA")
list.mapBy(_.length) //Map(1 -> A, 2 -> AA, 3 -> AAA)
Note that because of the implicit conversion, the caller code needs to import scala's implicitConversions.
c map (_.getP) zip c
Works well and is very intuitiv
How about using zip and toMap?
myList.zip(myList.map(_.length)).toMap
For what it's worth, here are two pointless ways of doing it:
scala> case class Foo(bar: Int)
defined class Foo
scala> import scalaz._, Scalaz._
import scalaz._
import Scalaz._
scala> val c = Vector(Foo(9), Foo(11))
c: scala.collection.immutable.Vector[Foo] = Vector(Foo(9), Foo(11))
scala> c.map(((_: Foo).bar) &&& identity).toMap
res30: scala.collection.immutable.Map[Int,Foo] = Map(9 -> Foo(9), 11 -> Foo(11))
scala> c.map(((_: Foo).bar) >>= (Pair.apply[Int, Foo] _).curried).toMap
res31: scala.collection.immutable.Map[Int,Foo] = Map(9 -> Foo(9), 11 -> Foo(11))
This works for me:
val personsMap = persons.foldLeft(scala.collection.mutable.Map[Int, PersonDTO]()) {
(m, p) => m(p.id) = p; m
}
The Map has to be mutable and the Map has to be return since adding to a mutable Map does not return a map.
use map() on collection followed with toMap
val map = list.map(e => (e, e.length)).toMap