Group values by a key with any Monoid

Group values by a key with any Monoid - scala

I would like to write a method mergeKeys that groups the values in an Iterable[(K, V)] by the keys. For example, I could write:
def mergeKeysList[K, V](iter: Iterable[(K, V)]) = {
iter.foldLeft(Map[K, List[V]]().withDefaultValue(List.empty[V])) {
case (map, (k, v)) =>
map + (k -> (v :: map(k)))
}
}
However, I would like to be able to use any Monoid instead of writing a method for List. For example, the values may be integers and I want to sum them instead of appending them in a list. Or they may be tuples (String, Int) where I want to accumulate the strings in a set but add the integers. How can I write such a method? Or is there something else I can use in scalaz to get this done?
Update: I wasn't as far away as I thought. I got a little bit closer, but I still don't know how to make it work if the values are tuples. Do I need to write yet another implicit conversion? I.e., one implicit conversion for each number of type parameters?
sealed trait SuperTraversable[T, U, F[_]]
extends scalaz.PimpedType[TraversableOnce[(T, F[U])]] {
def mergeKeys(implicit mon: Monoid[F[U]]): Map[T, F[U]] = {
value.foldLeft(Map[T, F[U]]().withDefaultValue(mon.zero)) {
case (map, (k, v)) =>
map + (k -> (map(k) |+| v))
}
}
}
implicit def superTraversable[T, U, F[_]](
as: TraversableOnce[(T, F[U])]
): SuperTraversable[T, U, F] =
new SuperTraversable[T, U, F] {
val value = as
}

First, while it's not relevant to your question, you are limiting your code's
generality by explicitly mentioning the type constructor F[_]. It works fine
without doing so:
sealed trait SuperTraversable[K, V]
extends scalaz.PimpedType[TraversableOnce[(K, V)]] {
def mergeKeys(implicit mon: Monoid[V]): Map[K, V] = {
value.foldLeft(Map[K, V]().withDefaultValue(mon.zero)) {
case (map, (k, v)) =>
map + (k -> (map(k) |+| v))
}
}
}
[...]
Now, for your actual question, there's no need to change mergeKeys to handle
funny kinds of combinations; just write a Monoid to handle whatever kind of
combining you want to do. Say you wanted to do your Strings+Ints example:
implicit def monoidStringInt = new Monoid[(String, Int)] {
val zero = ("", 0)
def append(a: (String, Int), b: => (String, Int)) = (a, b) match {
case ((a1, a2), (b1, b2)) => (a1 + b1, a2 + b2)
}
}
println {
List(
"a" -> ("Hello, ", 20),
"b" -> ("Goodbye, ", 30),
"a" -> ("World", 12)
).mergeKeys
}
gives
Map(a -> (Hello, World,32), b -> (Goodbye, ,30))

Related

flatmapping a nested Map in scala

Suppose I have val someMap = Map[String -> Map[String -> String]] defined as such:
val someMap =
Map(
("a1" -> Map( ("b1" -> "c1"), ("b2" -> "c2") ) ),
("a2" -> Map( ("b3" -> "c3"), ("b4" -> "c4") ) ),
("a3" -> Map( ("b5" -> "c5"), ("b6" -> "c6") ) )
)
and I would like to flatten it to something that looks like
List(
("a1","b1","c1"),("a1","b2","c2"),
("a2","b3","c3"),("a2","b4","c4"),
("a3","b5","c5"),("a3","b6","c6")
)
What is the most efficient way of doing this? I was thinking about creating some helper function that processes each (a_i -> Map(String,String)) key value pair and return
def helper(key: String, values: Map[String -> String]): (String,String,String)
= {val sublist = values.map(x => (key,x._1,x._2))
return sublist
}
then flatmap this function over someMap. But this seems somewhat unnecessary to my novice scala eyes, so I was wondering if there was a more efficient way to parse this Map.

No need to create helper function just write nested lambda:
val result = someMap.flatMap { case (k, v) => v.map { case (k1, v1) => (k, k1, v1) } }
Or
val y = someMap.flatMap(x => x._2.map(y => (x._1, y._1, y._2)))

Since you're asking about efficiency, the most efficient yet functional approach I can think of is using foldLeft and foldRight.
You need foldRight since :: constructs the immutable list in reverse.
someMap.foldRight(List.empty[(String, String, String)]) { case ((a, m), acc) =>
m.foldRight(acc) {
case ((b, c), acc) => (a, b, c) :: acc
}
}
Here, assuming Map.iterator.reverse is implemented efficiently, no intermediate collections are created.
Alternatively, you can use foldLeft and then reverse the result:
someMap.foldLeft(List.empty[(String, String, String)]) { case (acc, (a, m)) =>
m.foldLeft(acc) {
case (acc, (b, c)) => (a, b, c) :: acc
}
}.reverse
This way a single intermediate List is created, but you don't rely on the implementation of the reversed iterator (foldLeft uses forward iterator).
Note: one liners, such as someMap.flatMap(x => x._2.map(y => (x._1, y._1, y._2))) are less efficient, as, in addition to the temporary buffer to hold intermediate results of flatMap, they create and discard additional intermediate collections for each inner map.
UPD
Since there seems to be some confusion, I'll clarify what I mean. Here is an implementation of map, flatMap, foldLeft and foldRight from TraversibleLike:
def map[B, That](f: A => B)(implicit bf: CanBuildFrom[Repr, B, That]): That = {
def builder = { // extracted to keep method size under 35 bytes, so that it can be JIT-inlined
val b = bf(repr)
b.sizeHint(this)
b
}
val b = builder
for (x <- this) b += f(x)
b.result
}
def flatMap[B, That](f: A => GenTraversableOnce[B])(implicit bf: CanBuildFrom[Repr, B, That]): That = {
def builder = bf(repr) // extracted to keep method size under 35 bytes, so that it can be JIT-inlined
val b = builder
for (x <- this) b ++= f(x).seq
b.result
}
def foldLeft[B](z: B)(op: (B, A) => B): B = {
var result = z
this foreach (x => result = op(result, x))
result
}
def foldRight[B](z: B)(op: (A, B) => B): B =
reversed.foldLeft(z)((x, y) => op(y, x))
It's clear that map and flatMap create intermediate buffer using corresponding builder, while foldLeft and foldRight reuse the same user-supplied accumulator object, and only use iterators.

How to compose two different `State Monad`?

When I learn State Monad, I'm not sure how to compose two functions with different State return types.
State Monad definition:
case class State[S, A](runState: S => (S, A)) {
def flatMap[B](f: A => State[S, B]): State[S, B] = {
State(s => {
val (s1, a) = runState(s)
val (s2, b) = f(a).runState(s1)
(s2, b)
})
}
def map[B](f: A => B): State[S, B] = {
flatMap(a => {
State(s => (s, f(a)))
})
}
}
Two different State types:
type AppendBang[A] = State[Int, A]
type AddOne[A] = State[String, A]
Two methods with differnt State return types:
def addOne(n: Int): AddOne[Int] = State(s => (s + ".", n + 1))
def appendBang(str: String): AppendBang[String] = State(s => (s + 1, str + " !!!"))
Define a function to use the two functions above:
def myAction(n: Int) = for {
a <- addOne(n)
b <- appendBang(a.toString)
} yield (a, b)
And I hope to use it like this:
println(myAction(1))
The problem is myAction is not compilable, it reports some error like this:
Error:(14, 7) type mismatch;
found : state_monad.State[Int,(Int, String)]
required: state_monad.State[String,?]
b <- appendBang(a.toString)
^
How can I fix it? Do I have to define some Monad transformers?
Update: The question may be not clear, let me give an example
Say I want to define another function, which uses addOne and appendBang internally. Since they all need existing states, I have to pass some to it:
def myAction(n: Int)(addOneState: String, appendBangState: Int): ((String, Int), String) = {
val (addOneState2, n2) = addOne(n).runState(addOneState)
val (appendBangState2, n3) = appendBang(n2.toString).runState(appendBangState)
((addOneState2, appendBangState2), n3)
}
I have to run addOne and appendBang one by one, passing and getting the states and result manually.
Although I found it can return another State, the code is not improved much:
def myAction(n: Int): State[(String, Int), String] = State {
case (addOneState: String, appendBangState: Int) =>
val (addOneState2, n2) = addOne(n).runState(addOneState)
val (appendBangState2, n3) = appendBang(n2.toString).runState( appendBangState)
((addOneState2, appendBangState2), n3)
}
Since I'm not quite familiar with them, just wondering is there any way to improve it. The best hope is that I can use for comprehension, but not sure if that's possible

Like I mentioned in my first comment, it will be impossible to use a for comprehension to do what you want, because it can not change the type of the state (S).
Remember that a for comprehension can be translated to a combination of flatMaps, withFilter and one map. If we look at your State.flatMap, it takes a function f to change a State[S,A] into State[S, B]. We can use flatMap and map (and thus a for comprehension) to chain together operations on the same state, but we can't change the type of the state in this chain.
We could generalize your last definition of myAction to combine, compose, ... two functions using state of a different type. We can try to implement this generalized compose method directly in our State class (although this is probably so specific, it probably doesn't belong in State). If we look at State.flatMap and myAction we can see some similarities:
We first call runState on our existing State instance.
We then call runState again
In myAction we first use the result n2 to create a State[Int, String] (AppendBang[String] or State[S2, B]) using the second function (appendBang or f) on which we then call runState. But our result n2 is of type String (A) and our function appendBang needs an Int (B) so we need a function to convert A into B.
case class State[S, A](runState: S => (S, A)) {
// flatMap and map
def compose[B, S2](f: B => State[S2, B], convert: A => B) : State[(S, S2), B] =
State( ((s: S, s2: S2) => {
val (sNext, a) = runState(s)
val (s2Next, b) = f(convert(a)).runState(s2)
((sNext, s2Next), b)
}).tupled)
}
You then could define myAction as :
def myAction(i: Int) = addOne(i).compose(appendBang, _.toString)
val twoStates = myAction(1)
// State[(String, Int),String] = State(<function1>)
twoStates.runState(("", 1))
// ((String, Int), String) = ((.,2),2 !!!)
If you don't want this function in your State class you can create it as an external function :
def combineStateFunctions[S1, S2, A, B](
a: A => State[S1, A],
b: B => State[S2, B],
convert: A => B
)(input: A): State[(S1, S2), B] = State(
((s1: S1, s2: S2) => {
val (s1Next, temp) = a(input).runState(s1)
val (s2Next, result) = b(convert(temp)).runState(s2)
((s1Next, s2Next), result)
}).tupled
)
def myAction(i: Int) =
combineStateFunctions(addOne, appendBang, (_: Int).toString)(i)
Edit : Bergi's idea to create two functions to lift a State[A, X] or a State[B, X] into a State[(A, B), X].
object State {
def onFirst[A, B, X](s: State[A, X]): State[(A, B), X] = {
val runState = (a: A, b: B) => {
val (nextA, x) = s.runState(a)
((nextA, b), x)
}
State(runState.tupled)
}
def onSecond[A, B, X](s: State[B, X]): State[(A, B), X] = {
val runState = (a: A, b: B) => {
val (nextB, x) = s.runState(b)
((a, nextB), x)
}
State(runState.tupled)
}
}
This way you can use a for comprehension, since the type of the state stays the same ((A, B)).
def myAction(i: Int) = for {
x <- State.onFirst(addOne(i))
y <- State.onSecond(appendBang(x.toString))
} yield y
myAction(1).runState(("", 1))
// ((String, Int), String) = ((.,2),2 !!!)

How to "extract" type parameter to instantiate another class

The following Scala code works:
object ReducerTestMain extends App {
type MapOutput = KeyVal[String, Int]
def mapFun(s:String): MapOutput = KeyVal(s, 1)
val red = new ReducerComponent[String, Int]((a: Int, b: Int) => a + b)
val data = List[String]("a", "b", "c", "b", "c", "b")
data foreach {s => red(mapFun(s))}
println(red.mem)
// OUTPUT: Map(a -> 1, b -> 3, c -> 2)
}
class ReducerComponent[K, V](f: (V, V) => V) {
var mem = Map[K, V]()
def apply(kv: KeyVal[K, V]) = {
val KeyVal(k, v) = kv
mem += (k -> (if (mem contains k) f(mem(k), v) else v))
}
}
case class KeyVal[K, V](key: K, value:V)
My problem is I would like to instantiate ReducerComponent like this:
val red = new ReducerComponent[MapOutput, Int]((a: Int, b: Int) => a + b)
or even better:
val red = new ReducerComponent[MapOutput](_ + _)
That means a lot of things:
I would like to type-check that MapOutput is of the type KeyVal[K, C],
I want to type-check that C is the same type used in f,
I also need to "extract" K in order to instantiate mem, and type-check parameters from apply.
Is it a lot to ask? :) I wanted to write something like
class ReducerComponent[KeyVal[K,V]](f: (V, V) => V) {...}
By the time I will instantiate ReducerComponent all I have is f and MapOutput, so inferring V is OK. But then I only have KeyVal[K,V] as a type parameter from a class, which can be different from KeyVal[_,_].
I know what I'm asking is probably crazy if you understand how type inference works, but I don't! And I don't even know what would be a good way to proceed --- apart from making explicit type declarations all the way up in my high-level code. Should I just change all the architecture?

Just write a simple factory:
case class RC[M <: KeyVal[_, _]](){
def apply[K,V](f: (V,V) => V)(implicit ev: KeyVal[K,V] =:= M) = new ReducerComponent[K,V](f)
}
def plus(x: Double, y: Double) = x + y
scala> RC[KeyVal[Int, Double]].apply(plus)
res12: ReducerComponent[Int,Double] = ReducerComponent#7229d116
scala> RC[KeyVal[Int, Double]]()(plus)
res16: ReducerComponent[Int,Double] = ReducerComponent#389f65fe
As you can see, ReducerComponent has appropriate type. Implicit evidence is used here to catch K and V from your M <: KeyVal[_, _].
P.S. The version above requires to specify parameter types explicitly for your f, like (_: Double) + (_: Double). If you want to avoid this:
case class RC[M <: KeyVal[_, _]](){
def factory[K,V](implicit ev: KeyVal[K,V] =:= M) = new {
def apply(f: (V,V) => V) = new ReducerComponent[K,V](f)
}
}
scala> RC[KeyVal[Int, Double]].factory.apply(_ + _)
res5: ReducerComponent[Int,Double] = ReducerComponent#3dc04400
scala> val f = RC[KeyVal[Int, Double]].factory
f: AnyRef{def apply(f: (Double, Double) => Double): ReducerComponent[Int,Double]} = RC$$anon$1#19388ff6
scala> f(_ + _)
res13: ReducerComponent[Int,Double] = ReducerComponent#24d8ae83
Update. If you want to generelize keyval - use type function:
type KV[K,V] = KeyVal[K,V] //may be anything, may implement `type KV[K,V]` from some supertrait
case class RC[M <: KV[_, _]](){
def factory[K,V](implicit ev: KV[K,V] =:= M) = new {
def apply(f: (V,V) => V) = new ReducerComponent[K,V](f)
}
}
But keep in mind that apply from your question still takes KeyVal[K,V].
You can also pass KV into some class:
class Builder[KV[_,_]] {
case class RC[M <: KV[_, _]](){
def factory[K,V](implicit ev: KV[K,V] =:= M) = new {
def apply(f: (V,V) => V) = new ReducerComponent[K,V](f)
}
}
}
scala> val b = new Builder[KeyVal]
scala> val f = b.RC[KeyVal[Int, Double]].factory
scala> f(_ + _)
res2: ReducerComponent[Int,Double] = ReducerComponent#54d9c993

You will need path-dependent types for this. I recommend the following:
First, write a trait that has your relevant types as members, so you can access them within definitions:
trait KeyValAux {
type K
type V
type KV = KeyVal[K, V]
}
Now you can create a factory for ReducerComponent:
object ReducerComponent {
def apply[T <: KeyValAux](f: (T#V, T#V) => T#V) =
new ReducerComponent[T#K, T#V](f)
}
Note that here, we can simply access the members of the type. We can't do this for type parameters.
Now, define your MapOutput in terms of KeyValAux (maybe another name is more appropriate for your use case):
type MapOutput = KeyValAux { type K = String; type V = Int }
def mapFun(s:String): MapOutput#KV = KeyVal(s, 1)
val red = ReducerComponent[MapOutput](_ + _)
UPDATE
As #dk14 mentions in the comments, if you still want the type-parameter syntax, you could do the following:
trait OutputSpec[KK, VV] extends KeyValAux {
type K = KK
type V = VV
}
You can then write:
type MapOutput = OutputSpec[String, Int]
Alternatively, you can write OutputSpec as a type function:
type OutputSpec[KK, VV] = KeyValAux { type K = KK; type V = VV }
This will not generate an additional unused class.

Make scala map addition generic

Using the solution here, I'm adding two maps together and treating them as if they were sparse vectors. So
def addTwoVectors(map1: Map[Int, Double], map2: Map[Int, Double]) = {
map1 ++ map2.map{ case (k,v) => k -> (v + map1.getOrElse(k,0)) }
}
Now I'd like to make this generic so that
def addTwoMaps[I, D <% Numeric[D]](m1: Map[I, D], m2: Map[I, D]) = {
m1 ++ m2.map{ case (k,v) => k -> (v + m1.getOrElse(k, 0.asInstanceOf[D])) }
}
Unfortunately, it doesn't seem to work:
error: type mismatch;
found : D
required: String
So how do I make this function generic?

How about this one?
import scala.math.Numeric.Implicits._
def addTwoMaps[I, D](m1: Map[I, D], m2: Map[I, D])(implicit numeric: scala.math.Numeric[D]) = {
m1 ++ m2.map{ case (k: I,v: D) => k -> (v + m1.getOrElse(k, numeric.zero)) }
}
Cause I don't know which one Numeric I have, I'm taking this info implicitly, and then importing zero method, which is specific for every Numeric type.
Actually, I do believe, that scalaz solution will be much more cleaner and hope that somebody will post it.

General 'map' function for Scala tuples?

I would like to map the elements of a Scala tuple (or triple, ...) using a single function returning type R. The result should be a tuple (or triple, ...) with elements of type R.
OK, if the elements of the tuple are from the same type, the mapping is not a problem:
scala> implicit def t2mapper[A](t: (A,A)) = new { def map[R](f: A => R) = (f(t._1),f(t._2)) }
t2mapper: [A](t: (A, A))java.lang.Object{def map[R](f: (A) => R): (R, R)}
scala> (1,2) map (_ + 1)
res0: (Int, Int) = (2,3)
But is it also possible to make this solution generic, i.e. to map tuples that contain elements of different types in the same manner?
Example:
class Super(i: Int)
object Sub1 extends Super(1)
object Sub2 extends Super(2)
(Sub1, Sub2) map (_.i)
should return
(1,2): (Int, Int)
But I could not find a solution so that the mapping function determines the super type of Sub1 and Sub2. I tried to use type boundaries, but my idea failed:
scala> implicit def t2mapper[A,B](t: (A,B)) = new { def map[X >: A, X >: B, R](f: X => R) = (f(t._1),f(t._2)) }
<console>:8: error: X is already defined as type X
implicit def t2mapper[A,B](t: (A,B)) = new { def map[X >: A, X >: B, R](f: X => R) = (f(t._1),f(t._2)) }
^
<console>:8: error: type mismatch;
found : A
required: X
Note: implicit method t2mapper is not applicable here because it comes after the application point and it lacks an explicit result type
implicit def t2mapper[A,B](t: (A,B)) = new { def map[X >: A, X >: B, R](f: X => R) = (f(t._1),f(t._2)) }
Here X >: B seems to override X >: A. Does Scala not support type boundaries regarding multiple types? If yes, why not?

I think this is what you're looking for:
implicit def t2mapper[X, A <: X, B <: X](t: (A,B)) = new {
def map[R](f: X => R) = (f(t._1), f(t._2))
}
scala> (Sub1, Sub2) map (_.i)
res6: (Int, Int) = (1,2)
A more "functional" way to do this would be with 2 separate functions:
implicit def t2mapper[A, B](t: (A, B)) = new {
def map[R](f: A => R, g: B => R) = (f(t._1), g(t._2))
}
scala> (1, "hello") map (_ + 1, _.length)
res1: (Int, Int) = (2,5)

I’m not a scala type genius but maybe this works:
implicit def t2mapper[X, A<:X, B<:X](t: (A,B)) = new { def map[A, B, R](f: X => R) = (f(t._1),f(t._2)) }

This can easily be achieved using shapeless, although you'll have to define the mapping function first before doing the map:
object fun extends Poly1 {
implicit def value[S <: Super] = at[S](_.i)
}
(Sub1, Sub2) map fun // typed as (Int, Int), and indeed equal to (1, 2)
(I had to add a val in front of i in the definition of Super, this way: class Super(val i: Int), so that it can be accessed outside)

The deeper question here is "why are you using a Tuple for this?"
Tuples are hetrogenous by design, and can contain an assortment of very different types. If you want a collection of related things, then you should be using ...drum roll... a collection!
A Set or Sequence will have no impact on performance, and would be a much better fit for this kind of work. After all, that's what they're designed for.

For the case when the two functions to be applied are not the same
scala> Some((1, "hello")).map((((_: Int) + 1 -> (_: String).length)).tupled).get
res112: (Int, Int) = (2,5)
The main reason I have supplied this answer is it works for lists of tuples (just change Some to List and remove the get).

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Group values by a key with any Monoid - scala

Related

flatmapping a nested Map in scala

How to compose two different `State Monad`?

How to "extract" type parameter to instantiate another class

Make scala map addition generic

General 'map' function for Scala tuples?

Categories

Resources