find set of keys in Scala map where values overlap - scala

I'm working with a map object in scala where the key is a basket ID and the value is a set of item ID's contained within a basket. The goal is to ingest this map object and compute for each basket, a set of other basket ID's that contain at least one common item.
Say the input map object is
val basket = Map("b1" -> Set("i1", "i2", "i3"), "b2" -> Set("i2", "i4"), "b3" -> Set("i3", "i5"), "b4" -> Set("i6"))
Is it possible to perform the computation in spark such that I get the intersecting basket information back? For example
val intersects = Map("b1" -> Set("b2", "b3"), "b2" -> Set("b1"), "b3" -> Set("b1"), "b4" -> Set())
Thanks!

Something like...
val basket = Map("b1" -> Set("i1", "i2", "i3"), "b2" -> Set("i2", "i4"), "b3" -> Set("i3", "i5"), "b4" -> Set("i6"))
def intersectKeys( set : Set[String], map : Map[String,Set[String]] ) : Set[String] = {
val checks = map.map { case (k, v) =>
if (set.intersect(v).nonEmpty) Some(k) else None
}
checks.collect { case Some(k) => k }.toSet
}
// each set picks up its own key, which we don't want, so we subtract it back out
val intersects = basket.map { case (k,v) => (k, intersectKeys(v, basket) - k) }

Related

Count occurence of key values from several maps grouped by a key in scala 2.11.x

Imagine the following list of maps (which could be potentially longer):
List(
Map[String,String]("wind"->"high", "rain"->"heavy", "class"->"very late"),
Map[String,String]("wind"->"none", "rain"->"slight", "class"->"on time"),
Map[String,String]("wind"->"high", "rain"->"none", "class"->"very late"),
...
)
How can I get to the following form:
Map("very late" -> Set(("wind",Map("high" -> 2)), ("rain",Map("heavy" -> 1, "none" -> 1))),
"on time" -> Set(("wind",Map("none" -> 1)), ("rain",Map("slight" -> 1))))
This will get you what you want.
val maps = List(...)
maps.groupBy(_.getOrElse("class","no-class"))
.mapValues(_.flatMap(_ - "class").groupBy(_._1)
.mapValues(_.map(_._2).groupBy(identity)
.mapValues(_.length)
).toSet
)
The problem is, what you want isn't a good place to be.
The result type is Map[String,Set[(String,Map[String,Int])]] which is a terrible hodgepodge of collection types. Why Set? What purpose does that serve? How is this useful? How do you retrieve meaningful data from it?
This looks like an XY problem.
Here are two versions.
First using Set, which looks like what you want,
val grouped2 = maps.foldLeft(Map.empty[String, Set[(String, Map[String, Int])]]) {
case (acc, map) =>
map.get("class").fold(acc) { key =>
val keyOpt = acc.get(key)
if (keyOpt.isDefined) {
val updatedSet = (map - "class").foldLeft(Set.empty[(String, Map[String, Int])]) {
case (setAcc, (k1, v1)) =>
keyOpt.flatMap(_.find(_._1 == k1)).map { tup =>
setAcc + ((k1, tup._2.get(v1).fold(tup._2 ++ Map(v1 -> 1))(v => tup._2 + ((v1, v + 1)))))
}.getOrElse(setAcc + (k1 -> Map(v1 -> 1)))
}
acc.updated(key, updatedSet)
} else {
acc + (key -> (map - "class").map(tup => (tup._1, Map(tup._2 -> 1))).toSet)
}
}
}
and then using a Map,
val grouped1 = maps.foldLeft(Map.empty[String, Map[String, Map[String, Int]]]) {
case (acc, map) =>
map.get("class").fold(acc) { key =>
val keyOpt = acc.get(key)
if (keyOpt.isDefined) {
val updatedMap = (map - "class").foldLeft(Map.empty[String, Map[String, Int]]) {
case (mapAcc, (k1, v1)) =>
keyOpt.flatMap(_.get(k1)).map{ statMap =>
mapAcc + ((k1, statMap.get(v1).fold(statMap ++ Map(v1 -> 1))(v => statMap + ((v1, v + 1)))))
}.getOrElse(mapAcc + (k1 -> Map(v1 -> 1)))
}
acc.updated(key, updatedMap)
} else {
acc + (key -> (map - "class").map(tup => (tup._1, Map(tup._2 -> 1))))
}
}
}
I was playing with Map version and changed it to Set. In a few days, I don't think I will understand everything above just by looking at it. So, I have tried to make it as understandable as possible for me. Adapt this to your own solution or wait for others.
Here goes another implementation, which uses the keySet method of Map to join two maps together.
Also, note that I changed the output type from a Map to a Set of tuples whose second value is another Map. To three nested Maps which, IMHO, makes more sense.
def groupMaps[K, V](groupingKey: K, data: List[Map[K, V]]): Map[V, Map[K, Map[V, Int]]] =
data.foldLeft(Map.empty[V, Map[K, Map[V, Int]]]) {
case (acc, map) =>
map.get(key = groupingKey).fold(ifEmpty = acc) { groupingValue =>
val newValues = (map - groupingKey).map {
case (key, value) =>
key -> Map(value -> 1)
}
val finalValues = acc.get(key = groupingValue).fold(ifEmpty = newValues) { oldValues =>
(oldValues.keySet | newValues.keySet).iterator.map { key =>
val oldMap = oldValues.getOrElse(key = key, default = Map.empty[V, Int])
val newMap = newValues.getOrElse(key = key, default = Map.empty[V, Int])
val finalMap = (oldMap.keySet | newMap.keySet).iterator.map { value =>
val oldCount = oldMap.getOrElse(key = value, default = 0)
val newCount = newMap.getOrElse(key = value, default = 0)
value -> (oldCount + newCount)
}.toMap
key -> finalMap
}.toMap
}
acc.updated(key = groupingValue, finalValues)
}
}
Which can be used like this:
val maps =
List(
Map("wind" -> "none", "rain" -> "none", "class" -> "on time"),
Map("wind" -> "none", "rain" -> "slight", "class" -> "on time"),
Map("wind" -> "none", "rain" -> "slight", "class" -> "late"),
Map("wind" -> "none", "rain" -> "slight")
)
val result = groupMaps(groupingKey = "class", maps)
// val result: Map[Strig, Map[String, Map[String, Int]]] =
// Map(
// on time -> Map(wind -> Map(none -> 2), rain -> Map(none -> 1, slight -> 1)),
// late -> Map(wind -> Map(none -> 1), rain -> Map(slight -> 1))
// )
If you need to maintain the output type you asked for, then you can just do a .mapValue(_.toSet) at the end of the foldLeft
You can see the code running here

Scala Map - Use map function to replace key->value

I want to change the keys and values for the keys key1 and key2 only when their values are val1 and val2 (both these mappings should be present for the transformation to take place). I am able to do it using the following code, but I do not think this is very elegant or efficient.
Is there a better way to do the same thing, perhaps using just one .map function applied over map?
Code:
val map = Map(
"key1" -> "val1",
"key2" -> "val2",
"otherkey1" -> "otherval1"
)
val requiredKeys = List("key1", "key2")
val interestingMap = map.filterKeys(requiredKeys.contains) // will give ("key1" -> "val1", "key2" -> "val2").
val changedIfMatched =
if (interestingMap.get("key1").get.equalsIgnoreCase("val1") && interestingMap.get("key2").get.equalsIgnoreCase("val2"))
Map("key1" -> "newval1", "key2" -> "newval2")
else
interestingMap
print(map ++ changedIfMatched) // to replace the old key->values with the new ones, if any.
Also can ++ operation to update the old key->value mappings be made more efficient?
Just do the check ahead of time:
map
.get("k1").filter(_.equalsIgnoreCase("v1"))
.zip(map.get("k2").filter(_.equalsIgnoreCase("v2")))
.headOption
.fold(map) { _ =>
map ++ Map("key1" -> "newVal1", "key2" -> "newVal2")
}
Here's an approach that checks that both key value pairs match.
EDIT: Added a mapValues method to the Map class. This technique can be used to do further checks on the values of the map.
val m = Map("key1" -> "val1", "key2" -> "VAL2", "otherkey1" -> "otherval1")
val oldKVs = Map("key1" -> "val1", "key2" -> "val2")
val newKVs = Map("newkey1" -> "newval1", "newkey2" -> "newval2")
implicit class MapImp[T,S](m: Map[T,S]) {
def mapValues[R](f: S => R) = m.map { case (k,v) => (k, f(v)) }
def subsetOf(m2: Map[T,S]) = m.toSet subsetOf m2.toSet
}
def containsKVs[T](m: Map[T,String], sub: Map[T,String]) =
sub.mapValues(_.toLowerCase) subsetOf m.mapValues(_.toLowerCase)
val m2 = if (containsKVs(m, oldKVs)) m -- oldKVs.keys ++ newKVs else m
println(m2)
// Map(otherkey1 -> otherval1, newkey1 -> newval1, newkey2 -> newval2)
It takes advantage of the fact that you can convert Maps into Sets of Tuple2.
I think this will be the most generic and resuable solution for the problem.
object Solution1 extends App {
val map = Map(
"key1" -> "val1",
"key2" -> "val2",
"otherkey1" -> "otherval1"
)
implicit class MapUpdate[T](map: Map[T, T]) {
def updateMapForGivenKeyValues: (Iterable[(T, T)], Iterable[(T, T)]) => Map[T, T] =
(fromKV: Iterable[(T, T)], toKV: Iterable[(T, T)]) => {
val isKeyValueExist: Boolean = fromKV.toIterator.forall {
(oldKV: (T, T)) =>
map.toIterator.contains(oldKV)
}
if (isKeyValueExist) map -- fromKV.map(_._1) ++ toKV else map
}
}
val updatedMap = map.updateMapForGivenKeyValues(List("key1" -> "val1", "key2" -> "val2"),
List("newKey1" -> "newVal1", "newVal2" -> "newKey2"))
println(updatedMap)
}
So the method updateMapForGivenKeyValues takes the List of old key value and new key value tuple. If all the key value pairs mentioned in the first parameter of the method exist in the map then only we will update the map with new key value pairs mentioned in the second parameter of the method. As the method is generic will can be used on any data type like String, Int, some case class etc.
we can easily re-use the method for different type of maps without even changing a single line of code.
Answer to modified question
val map = Map(
"key1" -> "val1",
"key2" -> "val2",
"otherkey1" -> "otherval1"
)
val requiredVals = List("key1"->"val1", "key2"->"val2")
val newVals = List("newval1", "newval2")
val result =
if (requiredVals.forall{ case (k, v) => map.get(k).exists(_.equalsIgnoreCase(v)) }) {
map ++ requiredVals.map(_._1).zip(newVals)
} else {
map
}
This solution use forall to check that all the key/value pairs in requiredKeys are found in the map by testing each pair in turn.
For each key/value pair (k, v) it does a get on the map using the key to retrieve the current value as Option[String]. This will be None if the key is not found or Some(s) if the key is found.
The code then calls exists on the Option[String]. This method will return false if value is None (the key is not found), otherwise it will return the result of the test that is passed to it. The test is _.equalsIgnoreCase(v) which does a case-insensitive comparison of the contents of the Option (_) and the value from the requireKeys list (v).
If this test fails then the original value of map is returned.
If this test succeeds then a modified version of the map is return. The expression requiredVals.map(_._1) returns the keys from the requireVals list, and the zip(newVals) associates the new values with the original keys. The resulting list of values is added to the map using ++ which will replace the existing values with the new ones.
Original answer
val map = Map(
"key1" -> "val1",
"key2" -> "val2",
"otherkey1" -> "otherval1"
)
val requiredVals = Map("key1"->"val1", "key2"->"val2")
val newVals = Map("newkey1" -> "newval1", "newkey2" -> "newval2")
val result =
if (requiredVals.forall{ case (k, v) => map.get(k).exists(_.equalsIgnoreCase(v)) }) {
map -- requiredVals.keys ++ newVals
} else {
map
}
Note that this replaces the old keys with the new keys, which appears to be what is described. If you want to keep the original keys and values, just delete "-- requiredVals.keys" and it will add the new keys without removing the old ones.
You can use the following code:
val interestingMap =
if(map.getOrElse("key1", "") == "val1" && map.getOrElse("key2", "") == "val2")
map - "key1" - "key2" + ("key1New" -> "val1New") + ("key2New" -> "val2New")
else map
The check part(if statement) can be tweaked to suit your specific need.
if any of these key-value pairs are not present in the map, the original map will be returned, otherwise, you will get a new map with two updates at the requested keys.
Regarding efficiency, as long as there are only two keys to be updated, I do not think there is a real performance difference between using + to add elements directly and using ++ operator to overwrite the keys wholesale. If your map is huge though, maybe using a mutable map proves to be a better option in the long run.

How to create pair of keys of Map

For exemple:
mylist: Map("Start" -> 2015-05-30T00:00:00.000Z, "Daily" -> 2015-06-02T00:00:00.000Z, "Hourly" -> 2015-06-03T08:00:00.000Z, "End" -> 2015-06-04T15:00:00.000Z)
I want to output as following format:
myout: List( ("Start" -> 2015-05-30T00:00:00.000Z, "Daily" -> 2015-06-02T00:00:00.000Z), ("Daily" -> 2015-06-02T00:00:00.000Z, "Hourly" -> 2015-06-03T08:00:00.000Z), ("Hourly" -> 2015-06-03T08:00:00.000Z, "End" -> 2015-06-04T15:00:00.000) )
OR
myout: List( ("Start", "Daily"), ("Daily", "Hourly"), ("Hourly", "End"))
Case 1: Always start with "Start" key, Anything comes before "Start" key ignore it. Same for last "End" key
mylist: Map(Hourly -> 2015-06-01T08:00:00.000Z, Start -> 2015-05-30T00:00:00.000Z, Daily -> 2015-06-02T00:00:00.000Z, End -> 2015-06-04T15:00:00.000Z, Weekly-> 2015-06-05T00:00:00.000Z)
output should like:
List((Start, Daily), (Daily, End))
I am looking output using scala.
import scala.collection.immutable.ListMap
val x = ListMap("Start" -> "x", "Daily" -> "y", "Hourly" -> "z", "End" -> "a")
x.toList.sliding(2).map( a => (a(0)._1, a(1)._1)).toList
List((Start,Daily), (Daily,Hourly), (Hourly,End))
Since a Map is not ordered, I have modified the input data to get stable results.
As for the 1st question
val m =
Map(
"1-Start" -> "2015-05-30T00:00:00.000Z",
"2-Daily" -> "2015-06-02T00:00:00.000Z",
"3-Hourly" -> "2015-06-03T08:00:00.000Z",
"4-End" -> "2015-06-04T15:00:00.000Z")
The basic idea is to zip the list of keys with its own tail to get the pairs:
scala> m.keys.toList.sorted.zip(m.keys.toList.sorted.tail)
res57: List[(String, String)] = List((1-Start,2-Daily), (2-Daily,3-Hourly),
(3-Hourly,4-End))
To simplify the expression a "pipe forward operator" is helpful:
object PipeForwardContainer {
implicit class PipeForward[T](val v: T) extends AnyVal {
def |>[R](f: T => R): R = {
f(v)
}
}
}
import PipeForwardContainer._
This operator provides a reference to the intermediate result. Therefore you can write:
scala> m.keys.toList.sorted |> { l => l.zip(l.tail) }
res97: List[(String, String)] = List((1-Start,2-Daily), (2-Daily,3-Hourly),
(3-Hourly,4-End))
As for the 2nd question
val m =
Map(
"1-Hourly" -> "2015-06-03T08:00:00.000Z",
"2-Start" -> "2015-05-30T00:00:00.000Z",
"3-Daily" -> "2015-06-02T00:00:00.000Z",
"4-End" -> "2015-06-04T15:00:00.000Z",
"5-Weekly"-> "2015-06-05T00:00:00.000Z")
To get the raw list you can slice out the relevant elements by index:
scala> m.keys.toList.sorted |> { l =>
l.slice(l.indexOf("2-Start"), l.indexOf("4-End") + 1) }
res96: List[String] = List(2-Start, 3-Daily, 4-End)
Again with zip to get the pairs:
scala> m.keys.toList.sorted |> { l =>
l.slice(l.indexOf("2-Start"), l.indexOf("4-End") + 1)
} |> { l => l.zip(l.tail) }
res98: List[(String, String)] = List((1-Start,2-Daily), (2-Daily,3-Hourly),
(3-Hourly,4-End))

How to use Reduce on Scala

I am using scala to implement an algorithm. I have a case where I need to implement such scenario:
test = Map(t -> List((t,2)), B -> List((B,3), (B,1)), D -> List((D,1)))
I need to some the second member of every common tuples.
The desired result :
Map((t,2),(B,4),(D,1))
val resReduce = test.foldLeft(Map.empty[String, List[Map.empty[String, Int]]){(count, tup) => count + (tup -> (count.getOrElse(tup, 0) + 1))
I am trying to use "Reduce", I have to go through every group I did and sum their second member. Any idea how to do that.
If you know that all lists are nonempty and start with the same key (e.g. they were produced by groupBy), then you can just
test.mapValues(_.map(_._2).sum).toMap
Alternatively, you might want an intermediate step that allows you to perform error-checking:
test.map{ case(k,xs) =>
val v = {
if (xs.exists(_._1 != k)) ??? // Handle key-mismatch case
else xs.reduceOption((l,r) => l.copy(_2 = l._2 + r._2))
}
v.getOrElse(??? /* Handle empty-list case */)
}
You could do something like this:
test collect{
case (key, many) => (key, many.map(_._2).sum)
}
wherein you do not have to assume that the list has any members. However, if you want to exclude empty lists, add a guard
case (key, many) if many.nonEmpty =>
like that.
scala> val test = Map("t" -> List(("t",2)), "B" -> List(("B",3), ("B",1)), "D" -> List(("D",1)))
test: scala.collection.immutable.Map[String,List[(String, Int)]] = Map(t -> List((t,2)), B -> List((B,3), (B,1)), D -> List((D,1)))
scala> test.map{case (k,v) => (k, v.map(t => t._2).sum)}
res32: scala.collection.immutable.Map[String,Int] = Map(t -> 2, B -> 4, D -> 1)
Yet another approach, in essence quite similar to what has already been suggested,
implicit class mapAcc(val m: Map[String,List[(String,Int)]]) extends AnyVal {
def mapCount() = for ( (k,v) <- m ) yield { (k,v.map {_._2}.sum) }
}
Then for a given
val test = Map("t" -> List(("t",2)), "B" -> List(("B",3), ("B",1)), "D" -> List(("D",1)))
a call
test.mapCount()
delivers
Map(t -> 2, B -> 4, D -> 1)

Better and efficient way to group list of tuples2 in scala

I have the list of Scala Tuples2 ,which i have to group them . I currently use the following way to perform it.
var matches:List[Tuple2[String,Int]]
var m = matches.toSeq.groupBy(i=>i._1).map(t=>(t._1,t._2)).toSeq.sortWith(_._2.size>_._2.size).sortWith(_._2.size>_._2.size)
The above grouping gives me
Seq[(String,Seq[(String,Int)])]
but i would like to have
Seq[(String,Seq[Int])]
I would like to know is there any better and efficient way to the same.
First off, some thoughts:
// You should use `val` instead of `var`
var matches: List[Tuple2[String, Int]] = List("a" -> 1, "a" -> 2, "b" -> 3, "c" -> 4, "c" -> 5)
var m = matches
.toSeq // This isn't necessary: it's already a Seq
.groupBy(i => i._1)
.map(t => (t._1, t._2)) // This isn't doing anything at all
.toSeq
.sortWith(_._2.size > _._2.size) // `sortBy` will reduce redundancy
.sortWith(_._2.size > _._2.size) // Not sure why you have this twice since clearly the
// second sorting isn't doing anything...
So try this:
val matches: List[Tuple2[String, Int]] = List("a" -> 1, "a" -> 2, "b" -> 3, "c" -> 4, "c" -> 5)
val m: Seq[(String, Seq[Int])] =
matches
.groupBy(_._1)
.map { case (k, vs) => k -> vs.map(_._2) } // Drop the String part of the value
.toVector
.sortBy(_._2.size)
println(m) // Vector((b,List(3)), (a,List(1, 2)), (c,List(4, 5)))