flatmapping a nested Map in scala - scala

Suppose I have val someMap = Map[String -> Map[String -> String]] defined as such:
val someMap =
Map(
("a1" -> Map( ("b1" -> "c1"), ("b2" -> "c2") ) ),
("a2" -> Map( ("b3" -> "c3"), ("b4" -> "c4") ) ),
("a3" -> Map( ("b5" -> "c5"), ("b6" -> "c6") ) )
)
and I would like to flatten it to something that looks like
List(
("a1","b1","c1"),("a1","b2","c2"),
("a2","b3","c3"),("a2","b4","c4"),
("a3","b5","c5"),("a3","b6","c6")
)
What is the most efficient way of doing this? I was thinking about creating some helper function that processes each (a_i -> Map(String,String)) key value pair and return
def helper(key: String, values: Map[String -> String]): (String,String,String)
= {val sublist = values.map(x => (key,x._1,x._2))
return sublist
}
then flatmap this function over someMap. But this seems somewhat unnecessary to my novice scala eyes, so I was wondering if there was a more efficient way to parse this Map.

No need to create helper function just write nested lambda:
val result = someMap.flatMap { case (k, v) => v.map { case (k1, v1) => (k, k1, v1) } }
Or
val y = someMap.flatMap(x => x._2.map(y => (x._1, y._1, y._2)))

Since you're asking about efficiency, the most efficient yet functional approach I can think of is using foldLeft and foldRight.
You need foldRight since :: constructs the immutable list in reverse.
someMap.foldRight(List.empty[(String, String, String)]) { case ((a, m), acc) =>
m.foldRight(acc) {
case ((b, c), acc) => (a, b, c) :: acc
}
}
Here, assuming Map.iterator.reverse is implemented efficiently, no intermediate collections are created.
Alternatively, you can use foldLeft and then reverse the result:
someMap.foldLeft(List.empty[(String, String, String)]) { case (acc, (a, m)) =>
m.foldLeft(acc) {
case (acc, (b, c)) => (a, b, c) :: acc
}
}.reverse
This way a single intermediate List is created, but you don't rely on the implementation of the reversed iterator (foldLeft uses forward iterator).
Note: one liners, such as someMap.flatMap(x => x._2.map(y => (x._1, y._1, y._2))) are less efficient, as, in addition to the temporary buffer to hold intermediate results of flatMap, they create and discard additional intermediate collections for each inner map.
UPD
Since there seems to be some confusion, I'll clarify what I mean. Here is an implementation of map, flatMap, foldLeft and foldRight from TraversibleLike:
def map[B, That](f: A => B)(implicit bf: CanBuildFrom[Repr, B, That]): That = {
def builder = { // extracted to keep method size under 35 bytes, so that it can be JIT-inlined
val b = bf(repr)
b.sizeHint(this)
b
}
val b = builder
for (x <- this) b += f(x)
b.result
}
def flatMap[B, That](f: A => GenTraversableOnce[B])(implicit bf: CanBuildFrom[Repr, B, That]): That = {
def builder = bf(repr) // extracted to keep method size under 35 bytes, so that it can be JIT-inlined
val b = builder
for (x <- this) b ++= f(x).seq
b.result
}
def foldLeft[B](z: B)(op: (B, A) => B): B = {
var result = z
this foreach (x => result = op(result, x))
result
}
def foldRight[B](z: B)(op: (A, B) => B): B =
reversed.foldLeft(z)((x, y) => op(y, x))
It's clear that map and flatMap create intermediate buffer using corresponding builder, while foldLeft and foldRight reuse the same user-supplied accumulator object, and only use iterators.

Related

Programming a state monad in Scala

The theory of how a state monad looks like I borrow from Philip Wadler's Monads for Functional Programming:
type M a = State → (a, State)
type State = Int
unit :: a → M a
unit a = λx. (a, x)
(*) :: M a → (a → M b) → M b
m * k = λx.
let (a, y) = m x in
let (b, z) = k a y in
(b, z)
The way I would like to use a state monad is as follows:
Given a list L I want different parts of my code to get this list and update this list by adding new elements at its end.
I guess the above would be modified as:
type M = State → (List[Data], State)
type State = List[Data]
def unit(a: List[Data]) = (x: State) => (a,x)
def star(m: M, k: List[Data] => M): M = {
(x: M) =>
val (a,y) = m(x)
val (b,z) = k(a)(y)
(b,z)
}
def get = ???
def update = ???
How do I fill in the details, i.e.?
How do I instantiate my hierarchy to work on a concrete list?
How do I implement get and update in terms of the above?
Finally, how would I do this using Scala's syntax with flatMap and unit?
Your M is defined incorrectly. It should take a/A as a parameter, like so:
type M[A] = State => (A, State)
You've also missed that type parameter elsewhere.
unit should have a signature like this:
def unit[A](a: A): M[A]
star should have a signature like this:
def star[A, B](m: M[A], k: A => M[B]): M[B]
Hopefully, that makes the functions more clear.
Your implementation of unit was pretty much the same:
def unit[A](a: A): M[A] = x => (a, x)
However, in star, the parameter of your lambda (x) is of type State, not M, because M[B] is basically State => (A, State). The rest you got right:
def star[A, B](m: M[A])(k: A => M[B]): M[B] =
(x: State) => {
val (a, y) = m(x)
val (b, z) = k(a)(y)
(b, z)
}
Edit: According to #Luis Miguel Mejia Suarez:
It would probably be easier to implement if you make your State a class and define flatMap inside it. And you can define unit in the companion object.
He suggested final class State[S, A](val run: S => (A, S)), which would also allow you to use infix functions like >>=.
Another way to do it would be to define State as a type alias for a function S => (A, S) and extend it using an implicit class.
type State[S, A] = S => (A, S)
object State {
//This is basically "return"
def unit[S, A](a: A): State[S, A] = s => (a, s)
}
implicit class StateOps[S, A](private runState: S => (A, S)) {
//You can rename this to ">>=" or "flatMap"
def *[B](k: A => State[S, B]): State[S, B] = s => {
val (a, s2) = runState(s)
k(a)(s2)
}
}
If your definition of get is
set the result value to the state and leave the state unchanged
(borrowed from Haskell Wiki), then you can implement it like this:
def get[S]: State[S, S] = s => (s, s)
If you mean that you want to extract the state (in this case a List[Data]), you can use execState (define it in StateOps):
def execState(s: S): S = runState(s)._2
Here's a terrible example of how you can add elements to a List.
def addToList(n: Int)(list: List[Int]): ((), List[Int]) = ((), n :: list)
def fillList(n: Int): State[List[Int], ()] =
n match {
case 0 => s => ((), s)
case n => fillList(n - 1) * (_ => addToList(n))
}
println(fillList(10)(List.empty)) gives us this (the second element can be extracted with execState):
((),List(10, 9, 8, 7, 6, 5, 4, 3, 2, 1))

Processing sequence with duplicates concurrently

Suppose I've got a function fab: A => B , a sequence of A and need to get a sequence of pairs (A, B) like this:
def foo(fab: A => B, as: Seq[A]): Seq[(A, B)] = as.zip(as.map(fab))
Now I want to run fab concurrently using scala.concurrent.Future but I want to run fab only once for all duplicate elements in as. For instance,
val fab: A => B = ...
val a1: A = ...
val a2: A = ...
val as = a1 :: a1 :: a2 :: a1 :: a2 :: Nil
foo(fab, as) // invokes fab twice and run these invocations concurrently
How would you implement it ?
def foo[A, B](as: Seq[A])(f: A => B)(implicit exc: ExecutionContext)
: Future[Seq[(A, B)]] = {
Future
.traverse(as.toSet)(a => Future((a, (a, f(a)))))
.map(abs => as map abs.toMap)
}
Explanation:
as.toSet ensures that f is invoked only once for each a
The (a, (a, f(a))) gives you a set with nested tuples of shape (a, (a, b))
Mapping the original sequence of as by a Map with pairs (a, (a, b)) gives you a sequence of (a, b)s.
Since your f is not asynchronous anyway, and since you don't mind using futures, you might consider using par-collections as well:
def foo2[A, B](as: Seq[A])(f: A => B): Seq[(A, B)] = {
as map as.toSet.par.map((a: A) => a -> (a, f(a))).seq.toMap
}

How to compose two different `State Monad`?

When I learn State Monad, I'm not sure how to compose two functions with different State return types.
State Monad definition:
case class State[S, A](runState: S => (S, A)) {
def flatMap[B](f: A => State[S, B]): State[S, B] = {
State(s => {
val (s1, a) = runState(s)
val (s2, b) = f(a).runState(s1)
(s2, b)
})
}
def map[B](f: A => B): State[S, B] = {
flatMap(a => {
State(s => (s, f(a)))
})
}
}
Two different State types:
type AppendBang[A] = State[Int, A]
type AddOne[A] = State[String, A]
Two methods with differnt State return types:
def addOne(n: Int): AddOne[Int] = State(s => (s + ".", n + 1))
def appendBang(str: String): AppendBang[String] = State(s => (s + 1, str + " !!!"))
Define a function to use the two functions above:
def myAction(n: Int) = for {
a <- addOne(n)
b <- appendBang(a.toString)
} yield (a, b)
And I hope to use it like this:
println(myAction(1))
The problem is myAction is not compilable, it reports some error like this:
Error:(14, 7) type mismatch;
found : state_monad.State[Int,(Int, String)]
required: state_monad.State[String,?]
b <- appendBang(a.toString)
^
How can I fix it? Do I have to define some Monad transformers?
Update: The question may be not clear, let me give an example
Say I want to define another function, which uses addOne and appendBang internally. Since they all need existing states, I have to pass some to it:
def myAction(n: Int)(addOneState: String, appendBangState: Int): ((String, Int), String) = {
val (addOneState2, n2) = addOne(n).runState(addOneState)
val (appendBangState2, n3) = appendBang(n2.toString).runState(appendBangState)
((addOneState2, appendBangState2), n3)
}
I have to run addOne and appendBang one by one, passing and getting the states and result manually.
Although I found it can return another State, the code is not improved much:
def myAction(n: Int): State[(String, Int), String] = State {
case (addOneState: String, appendBangState: Int) =>
val (addOneState2, n2) = addOne(n).runState(addOneState)
val (appendBangState2, n3) = appendBang(n2.toString).runState( appendBangState)
((addOneState2, appendBangState2), n3)
}
Since I'm not quite familiar with them, just wondering is there any way to improve it. The best hope is that I can use for comprehension, but not sure if that's possible
Like I mentioned in my first comment, it will be impossible to use a for comprehension to do what you want, because it can not change the type of the state (S).
Remember that a for comprehension can be translated to a combination of flatMaps, withFilter and one map. If we look at your State.flatMap, it takes a function f to change a State[S,A] into State[S, B]. We can use flatMap and map (and thus a for comprehension) to chain together operations on the same state, but we can't change the type of the state in this chain.
We could generalize your last definition of myAction to combine, compose, ... two functions using state of a different type. We can try to implement this generalized compose method directly in our State class (although this is probably so specific, it probably doesn't belong in State). If we look at State.flatMap and myAction we can see some similarities:
We first call runState on our existing State instance.
We then call runState again
In myAction we first use the result n2 to create a State[Int, String] (AppendBang[String] or State[S2, B]) using the second function (appendBang or f) on which we then call runState. But our result n2 is of type String (A) and our function appendBang needs an Int (B) so we need a function to convert A into B.
case class State[S, A](runState: S => (S, A)) {
// flatMap and map
def compose[B, S2](f: B => State[S2, B], convert: A => B) : State[(S, S2), B] =
State( ((s: S, s2: S2) => {
val (sNext, a) = runState(s)
val (s2Next, b) = f(convert(a)).runState(s2)
((sNext, s2Next), b)
}).tupled)
}
You then could define myAction as :
def myAction(i: Int) = addOne(i).compose(appendBang, _.toString)
val twoStates = myAction(1)
// State[(String, Int),String] = State(<function1>)
twoStates.runState(("", 1))
// ((String, Int), String) = ((.,2),2 !!!)
If you don't want this function in your State class you can create it as an external function :
def combineStateFunctions[S1, S2, A, B](
a: A => State[S1, A],
b: B => State[S2, B],
convert: A => B
)(input: A): State[(S1, S2), B] = State(
((s1: S1, s2: S2) => {
val (s1Next, temp) = a(input).runState(s1)
val (s2Next, result) = b(convert(temp)).runState(s2)
((s1Next, s2Next), result)
}).tupled
)
def myAction(i: Int) =
combineStateFunctions(addOne, appendBang, (_: Int).toString)(i)
Edit : Bergi's idea to create two functions to lift a State[A, X] or a State[B, X] into a State[(A, B), X].
object State {
def onFirst[A, B, X](s: State[A, X]): State[(A, B), X] = {
val runState = (a: A, b: B) => {
val (nextA, x) = s.runState(a)
((nextA, b), x)
}
State(runState.tupled)
}
def onSecond[A, B, X](s: State[B, X]): State[(A, B), X] = {
val runState = (a: A, b: B) => {
val (nextB, x) = s.runState(b)
((a, nextB), x)
}
State(runState.tupled)
}
}
This way you can use a for comprehension, since the type of the state stays the same ((A, B)).
def myAction(i: Int) = for {
x <- State.onFirst(addOne(i))
y <- State.onSecond(appendBang(x.toString))
} yield y
myAction(1).runState(("", 1))
// ((String, Int), String) = ((.,2),2 !!!)

Cleaner tuple groupBy

I have a sequence of key-value pairs (String, Int), and I want to group them by key into a sequence of values (i.e. Seq[(String, Int)]) => Map[String, Iterable[Int]])).
Obviously, toMap isn't useful here, and groupBy maintains the values as tuples. The best I managed to come up with is:
val seq: Seq[( String, Int )]
// ...
seq.groupBy( _._1 ).mapValues( _.map( _._2 ) )
Is there a cleaner way of doing this?
Here's a pimp that adds a toMultiMap method to traversables. Would it solve your problem?
import collection._
import mutable.Builder
import generic.CanBuildFrom
class TraversableOnceExt[CC, A](coll: CC, asTraversable: CC => TraversableOnce[A]) {
def toMultiMap[T, U, That](implicit ev: A <:< (T, U), cbf: CanBuildFrom[CC, U, That]): immutable.Map[T, That] =
toMultiMapBy(ev)
def toMultiMapBy[T, U, That](f: A => (T, U))(implicit cbf: CanBuildFrom[CC, U, That]): immutable.Map[T, That] = {
val mutMap = mutable.Map.empty[T, mutable.Builder[U, That]]
for (x <- asTraversable(coll)) {
val (key, value) = f(x)
val builder = mutMap.getOrElseUpdate(key, cbf(coll))
builder += value
}
val mapBuilder = immutable.Map.newBuilder[T, That]
for ((k, v) <- mutMap)
mapBuilder += ((k, v.result))
mapBuilder.result
}
}
implicit def commomExtendTraversable[A, C[A] <: TraversableOnce[A]](coll: C[A]): TraversableOnceExt[C[A], A] =
new TraversableOnceExt[C[A], A](coll, identity)
Which can be used like this:
val map = List(1 -> 'a', 1 -> 'à', 2 -> 'b').toMultiMap
println(map) // Map(1 -> List(a, à), 2 -> List(b))
val byFirstLetter = Set("abc", "aeiou", "cdef").toMultiMapBy(elem => (elem.head, elem))
println(byFirstLetter) // Map(c -> Set(cdef), a -> Set(abc, aeiou))
If you add the following implicit defs, it will also work with collection-like objects such as Strings and Arrays:
implicit def commomExtendStringTraversable(string: String): TraversableOnceExt[String, Char] =
new TraversableOnceExt[String, Char](string, implicitly)
implicit def commomExtendArrayTraversable[A](array: Array[A]): TraversableOnceExt[Array[A], A] =
new TraversableOnceExt[Array[A], A](array, implicitly)
Then:
val withArrays = Array(1 -> 'a', 1 -> 'à', 2 -> 'b').toMultiMap
println(withArrays) // Map(1 -> [C#377653ae, 2 -> [C#396fe0f4)
val byLowercaseCode = "Mama".toMultiMapBy(c => (c.toLower.toInt, c))
println(byLowercaseCode) // Map(97 -> aa, 109 -> Mm)
There's no method or data structure in the standard library to do this, and your solution looks about as concise as you'll get. If you use this in more than one place, you might like to factor it out into a utility method
def groupTuples[A, B](seq: Seq[(A, B)]) =
seq groupBy (_._1) mapValues (_ map (_._2))
which you then obviously just call with groupTuples(seq). This might not be the most efficient possible in terms of CPU clock cycles, but I don't think it's particularly inefficient either.
I did a rough benchmark against Jean-Philippe's solution on a list of 9 tuples and this is marginally faster. Both were about twice as fast as folding the sequence into a map (effectively re-implementing groupBy to give the output you want).
I don't know if you consider it cleaner:
seq.groupBy(_._1).map { case (k,v) => (k,v.map(_._2))}
Starting Scala 2.13, most collections are provided with the groupMap method which is (as its name suggests) an equivalent (more efficient) of a groupBy followed by mapValues:
List(1 -> 'a', 1 -> 'b', 2 -> 'c').groupMap(_._1)(_._2)
// Map[Int,List[Char]] = Map(2 -> List(c), 1 -> List(a, b))
This:
groups elements based on the first part of tuples (Map(2 -> List((2,c)), 1 -> List((1,a), (1,b))))
maps grouped values (List((1,a), (1,b))) by taking their second tuple part (List(a, b)).

Is there a cleaner way to pattern-match in Scala anonymous functions?

I find myself writing code like the following:
val b = a map (entry =>
entry match {
case ((x,y), u) => ((y,x), u)
}
)
I would like to write it differently, if only this worked:
val c = a map (((x,y) -> u) =>
(y,x) -> u
)
Is there any way I can get something close to this?
Believe it or not, this works:
val b = List(1, 2)
b map {
case 1 => "one"
case 2 => "two"
}
You can skip the p => p match in simple cases. So this should work:
val c = a map {
case ((x,y) -> u) => (y,x) -> u
}
In your example, there are three subtly different semantics that you may be going for.
Map over the collection, transforming each element that matches a pattern. Throw an exception if any element does not match. These semantics are achieved with
val b = a map { case ((x, y), u) => ((y, x), u) }
Map over the collection, transforming each element that matches a pattern. Silently discard elements that do not match:
val b = a collect { case ((x, y), u) => ((y, x), u) }
Map over the collection, safely destructuring and then transforming each element. These are the semantics that I would expect for an expression like
val b = a map (((x, y), u) => ((y, x), u)))
Unfortunately, there is no concise syntax to achieve these semantics in Scala.
Instead, you have to destructure yourself:
val b = a map { p => ((p._1._2, p._1._1), p._2) }
One might be tempted to use a value definition for destructuring:
val b = a map { p => val ((x,y), u) = p; ((y, x), u) }
However, this version is no more safe than the one that uses explicit pattern matching. For this reason, if you want the safe destructuring semantics, the most concise solution is to explicitly type your collection to prevent unintended widening and use explicit pattern matching:
val a: List[((Int, Int), Int)] = // ...
// ...
val b = a map { case ((x, y), u) => ((y, x), u) }
If a's definition appears far from its use (e.g. in a separate compilation unit), you can minimize the risk by ascribing its type in the map call:
val b = (a: List[((Int, Int), Int)]) map { case ((x, y), u) => ((y, x), u) }
In your quoted example, the cleanest solution is:
val xs = List((1,2)->3,(4,5)->6,(7,8)->9)
xs map { case (a,b) => (a.swap, b) }
val b = a map { case ((x,y), u) => ((y,x), u) }