Get rid of Option [T] when get values from Map[T, T] - scala

I am writing my own Semigroup for Map [T, T]. The logic of the function is as follows:
in case of the same key the values should be combined in the result map.
in case of different keys the values should be added to the result map.
I wrote this function, but I ran into a problem: the get(key) method returns not just T, but Option[T].
What are the ways to solve this problem?
trait Semigroup[T]:
extension (left: T) def combine(right: T): T
given Semigroup[Int] with
extension (left: Int) def combine(right: Int): Int = ???
given Semigroup[String] with
extension (left: String) def combine(right: String): String = ???
given [T]: Semigroup[List[T]] with
extension (left: List[T]) def combine(right: List[T]): List[T] = ???
given [T: Semigroup]: Semigroup[Map[String, T]] with
extension (left: Map[String, T])
def combine(right: Map[String, T]): Map[String, T] = {
(left.keySet ++ right.keySet) map { key => key -> (summon[Semigroup[T]].combine(left.get(key)) { right.get(key)} ) } toMap
}

You do not want to remove the Option, it is what will tell you if you need to combine the values because the key is present in both Maps; or otherwise, just preserve the value.
(left.keySet | right.keySet).iterator.map { key =>
val value = (left.get(key), right.get(key)) match {
case (Some(v1), Some(v2)) => v1.combine(v2)
case (None, Some(v)) => v
case (Some(v), None) => v
case (None, None) => ??? // Should never happend.
}
key -> value
}.toMap
See the code running here.

Option is a good thing :) It protects you from crashing when key is not in a map.
(left.keySet ++ right.keySet)
.map { key => key -> (left.get(key), right.get(key)) }
.flatMap { case (k, (l, r) =>
l.zip(r)
.map { case (x,y) => summon[Semigroup[T]].combine(x)(y)))
.orElse(l)
.orElse(r)
.map(key -> _)
}.toMap
Or, maybe make a special semigroup for options (I am not sure about scala3 syntax, but something like this I think):
given [T : Semigroup]: Semigroup[Option[T]] with
extension (left: Option[T]) def combine(right: Option[T]): Option[T] =
left.zip(right).map { case(l,r) => summon[Semigroup[T]].combine(l)(r) }
.orElse(l)
.orElse(r)
Then you can just write
(left.keySet ++ right.keySet).map { key =>
key -> summon[Semigroup[Option[T]].combine(left.get(key))(right.get(key))
}

I just thought about a better option :) You should not have to deal with merging map keys explicitly ...
(left.toSeq ++ right.toSeq)
.groupMapReduce(_._1)(_._2)(summon[Semigroup[T]].combine(_)(_))
groupMapReduce fееls a little like black magic, but it's basically a combination of groupBy(_.Ь1), mapValues(_.map(_._2)) (dropping the key from grouped lists), and reduce(combine) (combining the grouped values in the the list.

Related

Combine two Maps with same key type, but different value type in scala

I want to combine two Maps with the same type of key, but different type of value.
The result should have values containing with both value types (optional, because some values could be present in only one of the input Maps)
The type anotation is
def combineMaps[T, U, V](map1: Map[T, U], map2: Map[T, V]): Map[T, (Option[U], Option[V])] = {
???
}
I know it could be achieved with complicated code like:
(map1.mapValues(Some(_) -> None).toList ++ map2.mapValues(None -> Some(_)).toList) // List[(T, (Option[U], Option[V]))]
.groupBy(_._1) // Map[T, List[(T, (Option[U], Option[V]))]]
.mapValues(_.map(_._2)) // Map[T, List[(Option[U], Option[V])]]
.mapValues { list => (
list.collectFirst { case (Some(u), _) => u },
list.collectFirst { case (_, Some(v)) => v }
) } // Map[T, (Option[U], Option[V])]
Although the code is working, it does not benefit from the fact that every key in a Map is present only once. Method .toList drop this type information.
I am looking for some elegant scala way to do it (possibly with cats/scalaz, but best without them)
Yeah cats has you covered, this is what the Align typeclass provides.
You only need to do:
m1.align(m2)
That will return a Map[T, Ior[U, V]] which is better than a pair of Options since Ior preserves the fact that at least one of the two elements must exists.
A union of the keySets followed by mapping the keys to tuples of gets should do it:
def combineMaps[T, U, V](m1: Map[T, U], m2: Map[T, V]): Map[T, (Option[U], Option[V])] =
(m1.keySet union m2.keySet).map(k => (k, (m1.get(k), m2.get(k)))).toMap
Example:
combineMaps(Map('a'->1, 'b'->2, 'c'->3), Map('a'->10.0, 'c'->30.0, 'd'->40.0))
// res1: Map[Char,(Option[Int], Option[Double])] = Map(
// a -> (Some(1),Some(10.0)), b -> (Some(2),None), c -> (Some(3),Some(30.0)), d -> (None,Some(40.0))
// )

Compile time guarantee a map has a key for each enum case

Given the following enum:
enum Connector:
case CHAdeMO
case Mennekes
case CCS
case Tesla
Is there a type like Map[ConnectorType, Int] but which would produce a compile error for:
Map(
Connector.CHAdeMO -> 1,
Connector.Mennekes -> 2,
Connector.CCS -> 3,
)
That is the map does not contain a key for Connector.Tesla. In other words, in compile time the type should be akin to ((Connector.CHAdeMO, Int), (Connector.Mennekes, Int), (Connector.CCS, Int), (Connector.Tesla, Int)) but behave like a regular Map otherwise.
Here's a solution using tuples (Scastie):
opaque type CheckedMap <: Map[Connector, Int] = Map[Connector, Int]
type Contains[E, T <: Tuple] <: Boolean = T match {
case EmptyTuple => false
case h *: t =>
h match {
case (E, _) => true
case _ => Contains[E, t]
}
}
type ContainsAll[S <: Tuple, T <: Tuple] = S match {
case EmptyTuple => DummyImplicit
case h *: t =>
Contains[h, T] match {
case true => ContainsAll[t, T]
case false => Nothing
}
}
type AllConnectors = (
Connector.CHAdeMO.type,
Connector.Mennekes.type,
Connector.CCS.type,
Connector.Tesla.type
)
def checkedMap[T <: Tuple](t: T)(using
#annotation.implicitNotFound(
"Not all Connector types given."
) c: ContainsAll[AllConnectors, T]
): CheckedMap = t.toList.asInstanceOf[List[(Connector, Int)]].toMap
This will take a tuple of tuples (Connector, Int) and check if it contains all the types in another tuple of all Connector types. If the input contains all the connector types at least once, it looks for an implicit DummyImplicit, otherwise, it looks for an implicit Nothing, which it obviously doesn't find. The resulting error message is quite lengthy and unhelpful, so I put in a custom error message. Note that this doesn't check if there are duplicate keys, but could be trivially modified to do so.
Unfortunately, I found myself having to explicitly annotate the key-value pairs' types at the use site:
//Errors because Tesla is missing
checkedMap(
(
Connector.CHAdeMO -> 1: (Connector.CHAdeMO.type, Int),
Connector.Mennekes -> 2: (Connector.Mennekes.type, Int),
Connector.CCS -> 3: (Connector.CCS.type, Int)
)
)
//Valid
checkedMap(
(
Connector.CHAdeMO -> 1: (Connector.CHAdeMO.type, Int),
Connector.Mennekes -> 2: (Connector.Mennekes.type, Int),
Connector.CCS -> 3: (Connector.CCS.type, Int),
Connector.Tesla -> 4: (Connector.Tesla.type, Int)
)
)

subsets manipulation on vectors in spark scala

I have an RDD curRdd of the form
res10: org.apache.spark.rdd.RDD[(scala.collection.immutable.Vector[(Int, Int)], Int)] = ShuffledRDD[102]
with curRdd.collect() producing the following result.
Array((Vector((5,2)),1), (Vector((1,1)),2), (Vector((1,1), (5,2)),2))
Here key : vector of pairs of ints and value: count
Now, I want to convert it into another RDD of the same form RDD[(scala.collection.immutable.Vector[(Int, Int)], Int)] by percolating down the counts.
That is (Vector((1,1), (5,2)),2)) will contribute its count of 2 to any key which is a subset of it like (Vector((5,2)),1) becomes (Vector((5,2)),3).
For the example above, our new RDD will have
(Vector((5,2)),3), (Vector((1,1)),4), (Vector((1,1), (5,2)),2)
How do I achieve this? Kindly help.
First you can introduce subsets operation for Seq:
implicit class SubSetsOps[T](val elems: Seq[T]) extends AnyVal {
def subsets: Vector[Seq[T]] = elems match {
case Seq() => Vector(elems)
case elem +: rest => {
val recur = rest.subsets
recur ++ recur.map(elem +: _)
}
}
}
empty subset will allways the be first element in the result vector, so you can omit it with .tail
Now your task is pretty obvious map-reduce which is flatMap-reduceByKey in terms of RDD:
val result = curRdd
.flatMap { case (keys, count) => keys.subsets.tail.map(_ -> count) }
.reduceByKey(_ + _)
Update
This implementation could introduce new sets in the result, if you would like to choose only those that was presented in the original collection, you can join result with original:
val result = curRdd
.flatMap { case (keys, count) => keys.subsets.tail.map(_ -> count) }
.reduceByKey(_ + _)
.join(curRdd map identity[(Seq[(Int, Int)], Int)])
.map { case (key, (v, _)) => (key, v) }
Note that map identity step is needed to convert key type from Vector[_] to Seq[_] in the original RDD. You can instead modify SubSetsOps definition substituting all occurencest of Seq[T] with Vector[T] or change definition following (hardcode scala.collection) way:
import scala.collection.SeqLike
import scala.collection.generic.CanBuildFrom
implicit class SubSetsOps[T, F[e] <: SeqLike[e, F[e]]](val elems: F[T]) extends AnyVal {
def subsets(implicit cbf: CanBuildFrom[F[T], T, F[T]]): Vector[F[T]] = elems match {
case Seq() => Vector(elems)
case elem +: rest => {
val recur = rest.subsets
recur ++ recur.map(elem +: _)
}
}
}

Specialization of Scala methods to a specific tags

I have a generic map with values, some of which can be in turn lists of values.
I'm trying to process a given key and convert the results to the type expected by an outside caller, like this:
// A map with some values being other collections.
val map: Map[String, Any] = Map("foo" -> 1, "bar" -> Seq('a', 'b'. 'a'))
// A generic method with a "specialization" for collections (pseudocode)
def cast[T](key: String) = map.get(key).map(_.asInstanceOf[T])
def cast[C <: Iterable[T]](key: String) = map.get(key).map(list => list.to[C].map(_.asIntanceOf[T]))
// Expected usage
cast[Int]("foo") // Should return 1:Int
cast[Set[Char]]("bar") // Should return Set[Char]('a', 'b')
This is to show what I would like to do, but it does not work. The compiler error complains (correctly, about 2 possible matches). I've also tried to make this a single function with some sort of pattern match on the type to no avail.
I've been reading on #specialized, TypeTag, CanBuildFrom and other scala functionality, but I failed to find a simple way to put it all together. Separate examples I've found address different pieces and some ugly workarounds, but nothing that would simply allow an external user to call cast and get an exception is the cast was invalid. Some stuff is also old, I'm using Scala 2.10.5.
This appears to work but it has a some problems.
def cast[T](m: Map[String, Any], k: String):T = m(k) match {
case x: T => x
}
With the right input you get the correct output.
scala> cast[Int](map,"foo")
res18: Int = 1
scala> cast[Set[Char]](map,"bar")
res19: Set[Char] = Set(a, b)
But it throws if the type is wrong for the key or if the map has no such key (of course).
You can do this via implicit parameters:
val map: Map[String, Any] = Map("foo" -> 1, "bar" -> Set('a', 'b'))
abstract class Casts[B] {def cast(a: Any): B}
implicit val doubleCast = new Casts[Double] {
override def cast(a: Any): Double = a match {
case x: Int => x.toDouble
}
}
implicit val intCast = new Casts[Int] {
override def cast(a: Any): Int = a match {
case x: Int => x
case x: Double => x.toInt
}
}
implicit val seqCharCast = new Casts[Seq[Char]] {
override def cast(a: Any): Seq[Char] = a match {
case x: Set[Char] => x.toSeq
case x: Seq[Char] => x
}
}
def cast[T](key: String)(implicit p:Casts[T]) = p.cast(map(key))
println(cast[Double]("foo")) // <- 1.0
println(cast[Int]("foo")) // <- 1
println(cast[Seq[Char]]("bar")) // <- ArrayBuffer(a, b) which is Seq(a, b)
But you still need to iterate over all type-to-type options, which is reasonable as Set('a', 'b').asInstanceOf[Seq[Char]] throws, and you cannot use a universal cast, so you need to handle such cases differently.
Still it sounds like an overkill, and you may need to review your approach from global perspective

Scala - reduce/foldLeft

I have a nested map m which is like:
m = Map("email" -> "a#b.com", "background" -> Map("language" -> "english"))
I have an array arr = Array("background","language")
How do I foldLeft/reduce the array and find the string "english" from the map. I tried this:
arr.foldLeft(m) { (acc,x) => acc.get(x) }
But I get this error:
<console>:10: error: type mismatch;
found : Option[java.lang.Object]
required: scala.collection.immutable.Map[java.lang.String,java.lang.Object]
arr.foldLeft(m) { (acc,x) => acc.get(x) }
You should pay attention to types. Here, you start with m : Map[String, Any] as your acc. You combine with a string x and calls get, which returns an Option[Object]. To continue, you must check that there is a value, check whether this value is a Map, cast (unchecked because of type erasure, hence dangerous).
I believe the fault is in the that the type of your structure, Map[String, Any] represents what you have rather poorly.
Suppose you do instead
sealed trait Tree
case class Node(items: Map[String, Tree]) extends Tree
case class Leaf(s: String) extends Tree
You may add some helpers to make declaring a Tree easy
object Tree {
implicit def fromString(s: String) = Leaf(s)
implicit def fromNamedString(nameAndValue: (String, String))
= (nameAndValue._1, Leaf(nameAndValue._2))
}
object Node {
def apply(items: (String, Tree)*) : Node = Node(Map(items: _*))
}
Then declaring the tree is just as easy as your first version, but the type is much more precise
m = Node("email" -> "a#b.com", "background" -> Node("language" -> "english"))
You can then add methods, for instance in trait Tree
def get(path: String*) : Option[Tree] = {
if (path.isEmpty) Some(this)
else this match {
case Leaf(_) => None
case Node(map) => map.get(path.head).flatMap(_.get(path.tail: _*))
}
}
def getLeaf(path: String*): Option[String]
= get(path: _*).collect{case Leaf(s) =>s}
Or if you would rather do it with a fold
def get(path: String*) = path.foldLeft[Option[Tree]](Some(this)) {
case (Some(Node(map)), p) => map.get(p)
case _ => None
}
Folding as an abstraction over nested maps isn't really supported. Also, you're approaching this in a way that is going to prevent the type system from giving you much help. But, if you insist, then you want a recursive function:
def lookup(m: Map[String,Object], a: Array[String]): Option[String] = {
if (a.length == 0) None
else m.get(a(0)).flatMap(_ match {
case mm: Map[_,_] => lookup(mm.asInstanceOf[Map[String,Object]],a.tail)
case s: String if (a.length==1) => Some(s)
case _ => None
})
}