How to transform input data into following format? - groupby

How to transform input data into following format? - groupby - scala

What I have is the following input data for a function in a piece of scala code I'm writing:
List(
(1,SubScriptionState(CNN,ONLINE,Seq(12))),
(1,SubScriptionState(SKY,ONLINE,Seq(12))),
(1,SubScriptionState(FOX,ONLINE,Seq(12))),
(2,SubScriptionState(CNN,ONLINE,Seq(12))),
(2,SubScriptionState(SKY,ONLINE,Seq(12))),
(2,SubScriptionState(FOX,ONLINE,Seq(12))),
(2,SubScriptionState(CNN,OFFLINE,Seq(13))),
(2,SubScriptionState(SKY,ONLINE,Seq(13))),
(2,SubScriptionState(FOX,ONLINE,Seq(13))),
(3,SubScriptionState(CNN,OFFLINE,Seq(13))),
(3,SubScriptionState(SKY,ONLINE,Seq(13))),
(3,SubScriptionState(FOX,ONLINE,Seq(13)))
)
SubscriptionState is just a case class here:
case class SubscriptionState(channel: Channel, state: ChannelState, subIds: Seq[Long])
I want to transform it into this:
Map(
1 -> Map(
SubScriptionState(SKY,ONLINE,Seq(12)) -> 1,
SubScriptionState(CNN,ONLINE,Seq(12)) -> 1,
SubScriptionState(FOX,ONLINE,Seq(12)) -> 1),
2 -> Map(
SubScriptionState(SKY,ONLINE,Seq(12,13)) -> 2,
SubScriptionState(CNN,ONLINE,Seq(12)) -> 1,
SubScriptionState(FOX,ONLINE,Seq(12,13)) -> 2,
SubScriptionState(CNN,OFFLINE,Seq(13)) -> 1),
3 -> Map(
SubScriptionState(SKY,ONLINE,Seq(13)) -> 1,
SubScriptionState(FOX,ONLINE,Seq(13)) -> 1,
SubScriptionState(CNN,OFFLINE,Seq(13)) -> 1)
)
How would I go about doing this in scala?

Here is my approach to the problem. I think it may not be a perfect solution, but it works as you would expect.
val result: Map[Int, Map[SubscriptionState, Int]] = list
.groupBy(_._1)
.view
.mapValues { statesById =>
statesById
.groupBy { case (_, subscriptionState) => (subscriptionState.channel, subscriptionState.state) }
.map { case (_, groupedStatesById) =>
val subscriptionState = groupedStatesById.head._2 // groupedStatesById should contain at least one element
val allSubIds = groupedStatesById.flatMap(_._2.subIds)
val updatedSubscriptionState = subscriptionState.copy(subIds = allSubIds)
updatedSubscriptionState -> allSubIds.size
}
}.toMap

This is a "simple" solution using groupMap and groupMapReduce
list
.groupMap(_._1)(_._2)
.view
.mapValues{
_.groupMapReduce(ss => (ss.channel, ss.state))(_.subIds)(_ ++ _)
.map{case (k,v) => SubScriptionState(k._1, k._2, v) -> v.length}
}
.toMap
The groupMap converts the data to a Map[Int, List[SubScriptionState]] and the mapValues converts each List to the appropriate Map. (The view and toMap wrappers make mapValues more efficient and safe.)
The groupMapReduce converts the List[SubScriptionState] into a Map[(Channel, ChannelState), List[SubId]].
The map on this inner Map juggles these values around to make Map[SubScriptionState, Int] as required.
I'm not clear what the purpose of inner Map is. The value is the length of the subIds field so it could be obtained directly from the key rather than needing to look it up in the Map

An attempt using foldLeft:
list.foldLeft(Map.empty[Int, Map[SubscriptionState, Int]]) { (acc, next) =>
val subMap = acc.getOrElse(next._1, Map.empty[SubscriptionState, Int])
val channelSub = subMap.find { case (sub, _) => sub.channel == next._2.channel && sub.state == next._2.state }
acc + (next._1 -> channelSub.fold(subMap + (next._2 -> next._2.subIds.length)) { case (sub, _) =>
val subIds = sub.subIds ++ next._2.subIds
(subMap - sub) + (sub.copy(subIds = subIds) -> subIds.length)
})
}
I noticed that count is not used while folding and can be calculated using storeIds. Also, as storeIds can vary, the inner Map is rather useless as you will have to use find instead of get to fetch values from Map. So if you have control over your ADTs, you could use an intermediary ADT like:
case class SubscriptionStateWithoutIds(channel: Channel, state: ChannelState)
then you can rewrite your foldLeft as follows:
list.foldLeft(Map.empty[Int, Map[SubscriptionStateWithoutIds, Seq[Long]]]) { (acc, next) =>
val subMap = acc.getOrElse(next._1, Map.empty[SubscriptionStateWithoutIds, Seq[Long]])
val withoutId = SubscriptionStateWithoutIds(next._2.channel, next._2.state)
val channelSub = subMap.get(withoutId)
acc + (next._1 -> (subMap + channelSub.fold(withoutId -> next._2.subIds) { seq => withoutId -> (seq ++ next._2.subIds) }))
}
The biggest advantage of intermediary ADT is you can have a cleaner groupMapReduce version:
list.groupMap(_._1)(sub => SubscriptionStateWithoutIds(sub._2.channel, sub._2.state) -> sub._2.subIds)
.map { case (key, value) => key -> value.groupMapReduce(_._1)(_._2)(_ ++ _) }

Related

Count occurence of key values from several maps grouped by a key in scala 2.11.x

Imagine the following list of maps (which could be potentially longer):
List(
Map[String,String]("wind"->"high", "rain"->"heavy", "class"->"very late"),
Map[String,String]("wind"->"none", "rain"->"slight", "class"->"on time"),
Map[String,String]("wind"->"high", "rain"->"none", "class"->"very late"),
...
)
How can I get to the following form:
Map("very late" -> Set(("wind",Map("high" -> 2)), ("rain",Map("heavy" -> 1, "none" -> 1))),
"on time" -> Set(("wind",Map("none" -> 1)), ("rain",Map("slight" -> 1))))

This will get you what you want.
val maps = List(...)
maps.groupBy(_.getOrElse("class","no-class"))
.mapValues(_.flatMap(_ - "class").groupBy(_._1)
.mapValues(_.map(_._2).groupBy(identity)
.mapValues(_.length)
).toSet
)
The problem is, what you want isn't a good place to be.
The result type is Map[String,Set[(String,Map[String,Int])]] which is a terrible hodgepodge of collection types. Why Set? What purpose does that serve? How is this useful? How do you retrieve meaningful data from it?
This looks like an XY problem.

Here are two versions.
First using Set, which looks like what you want,
val grouped2 = maps.foldLeft(Map.empty[String, Set[(String, Map[String, Int])]]) {
case (acc, map) =>
map.get("class").fold(acc) { key =>
val keyOpt = acc.get(key)
if (keyOpt.isDefined) {
val updatedSet = (map - "class").foldLeft(Set.empty[(String, Map[String, Int])]) {
case (setAcc, (k1, v1)) =>
keyOpt.flatMap(_.find(_._1 == k1)).map { tup =>
setAcc + ((k1, tup._2.get(v1).fold(tup._2 ++ Map(v1 -> 1))(v => tup._2 + ((v1, v + 1)))))
}.getOrElse(setAcc + (k1 -> Map(v1 -> 1)))
}
acc.updated(key, updatedSet)
} else {
acc + (key -> (map - "class").map(tup => (tup._1, Map(tup._2 -> 1))).toSet)
}
}
}
and then using a Map,
val grouped1 = maps.foldLeft(Map.empty[String, Map[String, Map[String, Int]]]) {
case (acc, map) =>
map.get("class").fold(acc) { key =>
val keyOpt = acc.get(key)
if (keyOpt.isDefined) {
val updatedMap = (map - "class").foldLeft(Map.empty[String, Map[String, Int]]) {
case (mapAcc, (k1, v1)) =>
keyOpt.flatMap(_.get(k1)).map{ statMap =>
mapAcc + ((k1, statMap.get(v1).fold(statMap ++ Map(v1 -> 1))(v => statMap + ((v1, v + 1)))))
}.getOrElse(mapAcc + (k1 -> Map(v1 -> 1)))
}
acc.updated(key, updatedMap)
} else {
acc + (key -> (map - "class").map(tup => (tup._1, Map(tup._2 -> 1))))
}
}
}
I was playing with Map version and changed it to Set. In a few days, I don't think I will understand everything above just by looking at it. So, I have tried to make it as understandable as possible for me. Adapt this to your own solution or wait for others.

Here goes another implementation, which uses the keySet method of Map to join two maps together.
Also, note that I changed the output type from a Map to a Set of tuples whose second value is another Map. To three nested Maps which, IMHO, makes more sense.
def groupMaps[K, V](groupingKey: K, data: List[Map[K, V]]): Map[V, Map[K, Map[V, Int]]] =
data.foldLeft(Map.empty[V, Map[K, Map[V, Int]]]) {
case (acc, map) =>
map.get(key = groupingKey).fold(ifEmpty = acc) { groupingValue =>
val newValues = (map - groupingKey).map {
case (key, value) =>
key -> Map(value -> 1)
}
val finalValues = acc.get(key = groupingValue).fold(ifEmpty = newValues) { oldValues =>
(oldValues.keySet | newValues.keySet).iterator.map { key =>
val oldMap = oldValues.getOrElse(key = key, default = Map.empty[V, Int])
val newMap = newValues.getOrElse(key = key, default = Map.empty[V, Int])
val finalMap = (oldMap.keySet | newMap.keySet).iterator.map { value =>
val oldCount = oldMap.getOrElse(key = value, default = 0)
val newCount = newMap.getOrElse(key = value, default = 0)
value -> (oldCount + newCount)
}.toMap
key -> finalMap
}.toMap
}
acc.updated(key = groupingValue, finalValues)
}
}
Which can be used like this:
val maps =
List(
Map("wind" -> "none", "rain" -> "none", "class" -> "on time"),
Map("wind" -> "none", "rain" -> "slight", "class" -> "on time"),
Map("wind" -> "none", "rain" -> "slight", "class" -> "late"),
Map("wind" -> "none", "rain" -> "slight")
)
val result = groupMaps(groupingKey = "class", maps)
// val result: Map[Strig, Map[String, Map[String, Int]]] =
// Map(
// on time -> Map(wind -> Map(none -> 2), rain -> Map(none -> 1, slight -> 1)),
// late -> Map(wind -> Map(none -> 1), rain -> Map(slight -> 1))
// )
If you need to maintain the output type you asked for, then you can just do a .mapValue(_.toSet) at the end of the foldLeft
You can see the code running here

Scala Filter a map for based on unique values within Map values

In Scala, I'm trying to filter a map based on a unique property with the Map values.
case class Product(
item: Item,
)
productModels: Map[Int, Product]
How can I create a new Map (or filter productModels) to only contain values where Product.Item.someproperty is unique within the Map?
I've been trying foldLeft on productModels, but can't seem to get it. I'll keep trying but want to check with you all as well.
Thanks

You can do it the following way:
productModels
.groupBy(_._1) // produces Map[Product, Map[Int, Product]]
.filter {case (k,v) => v.size == 1} // filters unique values
.flatMap {case (_,v) => v}

The easiest way to do that is to transform your map into another map, where keys are desired fields of Item:
case class Product(item:String)
val productModels =
Map(
1 -> Product("a"),
2 -> Product("b"),
3 -> Product("c"),
4 -> Product("a")
)
// here I'm calculating distinct by Product.item for simplicity
productModels.map { case e#(_, v) => v.item -> e }.values.toMap
Result:
Map(4 -> Product(a), 2 -> Product(b), 3 -> Product(c))
Note, that the order of the elements is not guaranteed, as generic Map doesn't have particular order of keys. If you use Map that has item order, such as ListMap and want to preserve order of elements, here is the necessary adjustment:
productModels.toList.reverse.map { case e#(_, v) => v.item -> e }.toMap.values.toMap
Result:
res1: scala.collection.immutable.Map[Int,Product] = Map(1 -> Product(a), 3 -> Product(c), 2 -> Product(b))

case class Item(property:String)
case class Product(item:Item)
val xs = Map[Int, Product]() // your example has this data structure
// just filter the map based on the item property value
xs filter { case (k,v) => v.item.property == "some property value" }

Here is implementation with foldLeft:
productModels.foldLeft(Map.empty[Int, Product]){
(acc, el) =>
if (acc.exists(_._2.item.someproperty == el._2.item.someproperty)) acc
else acc + el
}

How to use Reduce on Scala

I am using scala to implement an algorithm. I have a case where I need to implement such scenario:
test = Map(t -> List((t,2)), B -> List((B,3), (B,1)), D -> List((D,1)))
I need to some the second member of every common tuples.
The desired result :
Map((t,2),(B,4),(D,1))
val resReduce = test.foldLeft(Map.empty[String, List[Map.empty[String, Int]]){(count, tup) => count + (tup -> (count.getOrElse(tup, 0) + 1))
I am trying to use "Reduce", I have to go through every group I did and sum their second member. Any idea how to do that.

If you know that all lists are nonempty and start with the same key (e.g. they were produced by groupBy), then you can just
test.mapValues(_.map(_._2).sum).toMap
Alternatively, you might want an intermediate step that allows you to perform error-checking:
test.map{ case(k,xs) =>
val v = {
if (xs.exists(_._1 != k)) ??? // Handle key-mismatch case
else xs.reduceOption((l,r) => l.copy(_2 = l._2 + r._2))
}
v.getOrElse(??? /* Handle empty-list case */)
}

You could do something like this:
test collect{
case (key, many) => (key, many.map(_._2).sum)
}
wherein you do not have to assume that the list has any members. However, if you want to exclude empty lists, add a guard
case (key, many) if many.nonEmpty =>
like that.

scala> val test = Map("t" -> List(("t",2)), "B" -> List(("B",3), ("B",1)), "D" -> List(("D",1)))
test: scala.collection.immutable.Map[String,List[(String, Int)]] = Map(t -> List((t,2)), B -> List((B,3), (B,1)), D -> List((D,1)))
scala> test.map{case (k,v) => (k, v.map(t => t._2).sum)}
res32: scala.collection.immutable.Map[String,Int] = Map(t -> 2, B -> 4, D -> 1)

Yet another approach, in essence quite similar to what has already been suggested,
implicit class mapAcc(val m: Map[String,List[(String,Int)]]) extends AnyVal {
def mapCount() = for ( (k,v) <- m ) yield { (k,v.map {_._2}.sum) }
}
Then for a given
val test = Map("t" -> List(("t",2)), "B" -> List(("B",3), ("B",1)), "D" -> List(("D",1)))
a call
test.mapCount()
delivers
Map(t -> 2, B -> 4, D -> 1)

Scala iterate over map and turn singleton List into just the singleton

I am trying to extract a value of type List[T] to just T in a Map. So for instance:
val c = Map(1->List(1), 2-> List(2), 3->List(3));
would turn into
Map(1->1,2->2,3->3);
Here is what I have written so far:
val Some(values) = request.body.asFormUrlEncoded.foreach {
case (key,value) =>
Map(key->value.head);
};
and here is the error I am receiving:
constructor cannot be instantiated to expected type; found : (T1, T2) required: scala.collection.immutable.Map[String,Seq[String]]
EDIT: This is ocurring wrt to this line:
case (key,value) =>
EDIT2:
request.body.asFormUrlEncoded example output
Some(Map(test -> List(324)))
Some(Map(SpO2 -> List(456), ETCO2 -> List(123)))

Are you sure that you will always have exactly one element in the list? If so, you should do this, which is clear, and has the benefit that it will throw an error if you get a bad list (doesn't have exactly one element) by accident.
c.map { case (k, List(v)) => k -> v }
// Map(1 -> 1, 2 -> 2, 3 -> 3)
If your lists can have more than one element, and you just want the first, you can do this (which will error on empty lists):
val d = Map(1 -> List(1), 2 -> List(2,4,6), 3 -> List(3))
d.map { case (k, List(v, _*)) => k -> v }
// Map(1 -> 1, 2 -> 2, 3 -> 3)
If your lists may not have exactly one element, and you want to ignore any non-singleton lists instead of throwing errors, use collect instead of map:
val e = Map(1 -> List(1), 2 -> List(2,4,6), 3 -> List(3), 4 -> List())
e.collect { case (k, List(v)) => k -> v }
// Map(1 -> 1, 3 -> 3)
As for your code:
val Some(values) = request.body.asFormUrlEncoded.foreach {
case (key,value) =>
Map(key->value.head);
};
This doesn't really make any sense.
First off, foreach doesn't return anything, so assigning its result to a variable will never work. You probably want this to be a map instead, so that it returns a collection.
Second, your use of Some makes it seem like you don't understand Options, so you might want to read up on that.
Third, if you want the result to be a Map (a collection of pairs), then you'll just want to return the pair, key->value.head, and not a Map.
Fourth, if you're getting errors matching on case (key,value), then probably asFormUrlEncoded doesn't actually return a collection of pairs. You should see what its type actually is.
Lastly, the semicolons are unnecessary. You should remove them.
EDIT based on your comment:
Since request.body.asFormUrlEncoded actually returns things like Some(Map("test" -> List(324))), here is how your code should look.
If asFormUrlEncoded might return None, and you don't have any way of handling that, then you should guard against it:
val a = Some(Map("test" -> List(324)))
val value = a match {
case Some(m) => m.collect { case (k, List(v)) => k -> v }
case None => sys.error("expected something, got nothing")
}
If you're sure that asFormUrlEncoded will already return Some, then you can just do this:
val a = Some(Map("test" -> List(324)))
val Some(value) = a.map(_.collect { case (k, List(v)) => k -> v })

Filtering out keys of a map but keeping all values in scala

I'm trying to write a method with the following signature:
def buildSumMap(minInterval:Int, mappes:SortedMap[Int, Long]):SortedMap[Int, Long] = {...}
Within the method I want to return a new map by applying the following pseudo-code to each
(key:Int,value:Long)-pair of "mappes":
If(key + minInterval > nextKey) {
value += nextValue
}
else {
//Forget previous key(s) and return current key with sum of all previous values
return (key, value)
}
Example: If I had the source Map ((10 -> 5000), (20 -> 5000), (25 -> 7000), (40 -> 13000)) and defined a minInterval of 10, I'd expect the resulting Map:
((10 -> 5000), (25 -> 12000), (40 -> 13000))
I found a lot of examples for transforming keys and values of filtering keys and values seperately but none so far for dropping keys, while preserving the values.

This solution uses List as intermediate structure. It traverses map from left to right and appends key-value pairs to list if interval is big enough, otherwise it replaces head of the list with new key-value pair. TreeMap factory metod reverses list at the end.
import collection.immutable._
def buildSumMap(minInterval:Int, mappes:SortedMap[Int, Long]):SortedMap[Int, Long] =
TreeMap(
mappes.foldLeft[List[(Int, Long)]] (Nil) {
case (Nil, nextKV) => nextKV :: Nil
case (acc # (key, value) :: accTail, nextKV # (nextKey, nextValue)) =>
if (nextKey - key < minInterval)
(nextKey -> (value + nextValue)) :: accTail
else
nextKV :: acc
} : _*
)

To answer the question, basically there is no totally simple way of doing this, because the requirement isn't simple. You need to somehow iterate through the SortedMap while comparing adjacent elements and build a new Map. There are several ways to do it:
Use a fold / reduce / scan / groupBy higher order functions: generally the preferred way, and most concise
Recursion (see http://aperiodic.net/phil/scala/s-99/ for plenty of examples): what you resort to if using higher order functions gets too complicated, or the exact function you need doesn't exist. May be faster than using functions.
Builders - a nice term for a brief foray into mutable-land. Best performance; often equivalent to the recursive version without the ceremony
Here's my attempt using scanLeft:
def buildSumMap(minInterval: Int, mappes: SortedMap[Int, Long]) =
SortedMap.empty[Int, Long] ++ mappes.toSeq.tail.scanLeft(mappes.head){
case ((k1, v1), (k2, v2)) => if (k2 - k1 > minInterval) (k2,v2) else (k1,v2)
}.groupBy(_._1).mapValues(_.map(_._2).sum)
It looks complicated but it isn't really, once you understand what scanLeft and groupBy do, which you can look up elsewhere. It basically scans the sequence from the left and compares the keys, using the key to the left if the gap is too small, then groups the tuples together according to the keys.
TLDR: The key is to learn the built-in functions in the collections library, which takes some practice, but it's good fun.

import scala.collection.SortedMap
def buildSumMap(minInterval:Int, mappes:SortedMap[Int, Long]):SortedMap[Int, Long] = {
def _buildSumMap(map: List[(Int, Long)], buffer: List[(Int, Long)], result:SortedMap[Int, Long]): SortedMap[Int, Long] = {
def mergeBufferWithResult = {
val res = buffer.headOption.map { case (k, v) =>
(k, buffer.map(_._2).sum)
}
res.map(result + _).getOrElse(result)
}
map match {
case entry :: other =>
if(buffer.headOption.exists(entry._1 - _._1 < minInterval)) {
_buildSumMap(other, entry :: buffer, result)
} else {
_buildSumMap(other, entry :: Nil, mergeBufferWithResult)
}
case Nil =>
mergeBufferWithResult
}
}
_buildSumMap(mappes.toList, List.empty, SortedMap.empty)
}
val result = buildSumMap(10 , SortedMap(10 -> 5000L, 20 -> 5000L, 25 -> 7000L, 40 -> 13000L))
println(result)
//Map(10 -> 5000, 25 -> 12000, 40 -> 13000)

I tried to split the parts of the algorithm :
import scala.collection._
val myMap = SortedMap((10 -> 5000), (20 -> 5000), (25 -> 7000), (40 -> 13000)).mapValues(_.toLong)
def filterInterval(minInterval: Int, it: Iterable[Int]):List[Int] = {
val list = it.toList
val jumpMap = list.map(x => (x, list.filter( _ > x + minInterval))).toMap.
filter(_._2.nonEmpty).mapValues(_.min)
def jump(n:Int): Stream[Int] = jumpMap.get(n).map(j => Stream.cons(j, jump(j))).getOrElse(Stream.empty)
list.min :: jump(list.min).toList
}
def buildSumMap(minInterval:Int, mappes:Map[Int, Long]):Map[Int,Long] = {
val filteredKeys: List[Int] = filterInterval(minInterval, mappes.keys)
val agg:List[(Int, Long)] = filteredKeys.map(finalkey =>
(finalkey,mappes.filterKeys(_ <= finalkey).values.sum)
).sort(_._1 < _._1)
agg.zip((filteredKeys.min, 0L) :: agg ).map(st => (st._1._1, st._1._2 - st._2._2)).toMap
}
buildSumMap(10, myMap)

Here's another take:
def buildSumMap(map: SortedMap[Int, Int], diff: Int) =
map.drop(1).foldLeft(map.take(1)) { case (m, (k, v)) =>
val (_k, _v) = m.last
if (k - _k < diff) (m - _k) + (k -> (v + _v))
else m + (k -> v)
}

A much cleaner (than my first attempt) solution using Scalaz 7's State, and a List to store the state of the computation. Using a List makes it efficient to inspect, and modify if necessary, the head of the list at each step.
def f2(minInterval: Int): ((Int, Int)) => State[List[(Int, Int)], Unit] = {
case (k, v) => State {
case (floor, acc) :: tail if (floor + minInterval) > k =>
((k, acc + v) :: tail) -> ()
case state => ((k, v) :: state) -> ()
}
}
scala> mappes.toList traverseS f2(10) execZero
res1: scalaz.Id.Id[List[(Int, Int)]] = List((40,13000), (25,12000), (10,5000))

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How to transform input data into following format? - groupby - scala

Related

Count occurence of key values from several maps grouped by a key in scala 2.11.x

Scala Filter a map for based on unique values within Map values

How to use Reduce on Scala

Scala iterate over map and turn singleton List into just the singleton

Filtering out keys of a map but keeping all values in scala

Categories

Resources