How to combine all the values with the same key in Scala? - scala

I have a map like :
val programming = Map(("functional", 1) -> "scala", ("functional", 2) -> "perl", ("orientedObject", 1) -> "java", ("orientedObject", 2) -> "C++")
with the same first element of key appearing multiple times.
How to regroup all the values corresponding to the same first element of key ? Which would turn this map into :
Map("functional" -> List("scala","perl"), "orientedObject" -> List("java","C++"))

UPDATE: This answer is based upon your original question. If you need the more complex Map definition, using a tuple as the key, then the other answers will address your requirements. You may still find this approach simpler.
As has been pointed out, you can't actually have multiple keys with the same value in a map. In the REPL, you'll note that your declaration becomes:
scala> val programming = Map("functional" -> "scala", "functional" -> "perl", "orientedObject" -> "java", "orientedObject" -> "C++")
programming: scala.collection.immutable.Map[String,String] = Map(functional -> perl, orientedObject -> C++)
So you end up missing some values. If you make this a List instead, you can get what you want as follows:
scala> val programming = List("functional" -> "scala", "functional" -> "perl", "orientedObject" -> "java", "orientedObject" -> "C++")
programming: List[(String, String)] = List((functional,scala), (functional,perl), (orientedObject,java), (orientedObject,C++))
scala> programming.groupBy(_._1).map(p => p._1 -> p._2.map(_._2)).toMap
res0: scala.collection.immutable.Map[String,List[String]] = Map(functional -> List(scala, perl), orientedObject -> List(java, C++))

Based on your edit, you have a data structure that looks something like this
val programming = Map(("functional", 1) -> "scala", ("functional", 2) -> "perl",
("orientedObject", 1) -> "java", ("orientedObject", 2) -> "C++")
and you want to scrap the numerical indices and group by the string key. Fortunately, Scala provides a built-in that gets you close.
programming groupBy { case ((k, _), _) => k }
This will return a new map which contains submaps of the original, grouped by the key that we return from the "partial" function. But we want a map of lists, so let's ignore the keys in the submaps.
programming groupBy { case ((k, _), _) => k } mapValues { _.values }
This gets us a map of... some kind of Iterable. But we really want lists, so let's take the final step and convert to a list.
programming groupBy { case ((k, _), _) => k } mapValues { _.values.toList }

You should try the .groupBy method
programming.groupBy(_._1._1)
and you will get
scala> programming.groupBy(_._1._1)
res1: scala.collection.immutable.Map[String,scala.collection.immutable.Map[(String, Int),String]] = Map(functional -> Map((functional,1) -> scala, (functional,2) -> perl), orientedObject -> Map((orientedObject,1) -> java, (orientedObject,2) -> C++))
you can now "clean" by doing something like:
scala> res1.mapValues(m => m.values.toList)
res3: scala.collection.immutable.Map[String,List[String]] = Map(functional -> List(scala, perl), orientedObject -> List(java, C++))

Read the csv file and create a map that contains key and list of values.
val fileStream = getClass.getResourceAsStream("/keyvaluepair.csv")
val lines = Source.fromInputStream(fileStream).getLines
var mp = Seq[List[(String, String)]]();
var codeMap=List[(String, String)]();
var res = Map[String,List[String]]();
for(line <- lines )
{
val cols=line.split(",").map(_.trim())
codeMap ++= Map(cols(0)->cols(1))
}
res = codeMap.groupBy(_._1).map(p => p._1 -> p._2.map(_._2)).toMap

Since no one has put in the specific ordering he asked for:
programming.groupBy(_._1._1)
.mapValues(_.toSeq.map { case ((t, i), l) => (i, l) }.sortBy(_._1).map(_._2))

Related

How to transform input data into following format? - groupby

What I have is the following input data for a function in a piece of scala code I'm writing:
List(
(1,SubScriptionState(CNN,ONLINE,Seq(12))),
(1,SubScriptionState(SKY,ONLINE,Seq(12))),
(1,SubScriptionState(FOX,ONLINE,Seq(12))),
(2,SubScriptionState(CNN,ONLINE,Seq(12))),
(2,SubScriptionState(SKY,ONLINE,Seq(12))),
(2,SubScriptionState(FOX,ONLINE,Seq(12))),
(2,SubScriptionState(CNN,OFFLINE,Seq(13))),
(2,SubScriptionState(SKY,ONLINE,Seq(13))),
(2,SubScriptionState(FOX,ONLINE,Seq(13))),
(3,SubScriptionState(CNN,OFFLINE,Seq(13))),
(3,SubScriptionState(SKY,ONLINE,Seq(13))),
(3,SubScriptionState(FOX,ONLINE,Seq(13)))
)
SubscriptionState is just a case class here:
case class SubscriptionState(channel: Channel, state: ChannelState, subIds: Seq[Long])
I want to transform it into this:
Map(
1 -> Map(
SubScriptionState(SKY,ONLINE,Seq(12)) -> 1,
SubScriptionState(CNN,ONLINE,Seq(12)) -> 1,
SubScriptionState(FOX,ONLINE,Seq(12)) -> 1),
2 -> Map(
SubScriptionState(SKY,ONLINE,Seq(12,13)) -> 2,
SubScriptionState(CNN,ONLINE,Seq(12)) -> 1,
SubScriptionState(FOX,ONLINE,Seq(12,13)) -> 2,
SubScriptionState(CNN,OFFLINE,Seq(13)) -> 1),
3 -> Map(
SubScriptionState(SKY,ONLINE,Seq(13)) -> 1,
SubScriptionState(FOX,ONLINE,Seq(13)) -> 1,
SubScriptionState(CNN,OFFLINE,Seq(13)) -> 1)
)
How would I go about doing this in scala?
Here is my approach to the problem. I think it may not be a perfect solution, but it works as you would expect.
val result: Map[Int, Map[SubscriptionState, Int]] = list
.groupBy(_._1)
.view
.mapValues { statesById =>
statesById
.groupBy { case (_, subscriptionState) => (subscriptionState.channel, subscriptionState.state) }
.map { case (_, groupedStatesById) =>
val subscriptionState = groupedStatesById.head._2 // groupedStatesById should contain at least one element
val allSubIds = groupedStatesById.flatMap(_._2.subIds)
val updatedSubscriptionState = subscriptionState.copy(subIds = allSubIds)
updatedSubscriptionState -> allSubIds.size
}
}.toMap
This is a "simple" solution using groupMap and groupMapReduce
list
.groupMap(_._1)(_._2)
.view
.mapValues{
_.groupMapReduce(ss => (ss.channel, ss.state))(_.subIds)(_ ++ _)
.map{case (k,v) => SubScriptionState(k._1, k._2, v) -> v.length}
}
.toMap
The groupMap converts the data to a Map[Int, List[SubScriptionState]] and the mapValues converts each List to the appropriate Map. (The view and toMap wrappers make mapValues more efficient and safe.)
The groupMapReduce converts the List[SubScriptionState] into a Map[(Channel, ChannelState), List[SubId]].
The map on this inner Map juggles these values around to make Map[SubScriptionState, Int] as required.
I'm not clear what the purpose of inner Map is. The value is the length of the subIds field so it could be obtained directly from the key rather than needing to look it up in the Map
An attempt using foldLeft:
list.foldLeft(Map.empty[Int, Map[SubscriptionState, Int]]) { (acc, next) =>
val subMap = acc.getOrElse(next._1, Map.empty[SubscriptionState, Int])
val channelSub = subMap.find { case (sub, _) => sub.channel == next._2.channel && sub.state == next._2.state }
acc + (next._1 -> channelSub.fold(subMap + (next._2 -> next._2.subIds.length)) { case (sub, _) =>
val subIds = sub.subIds ++ next._2.subIds
(subMap - sub) + (sub.copy(subIds = subIds) -> subIds.length)
})
}
I noticed that count is not used while folding and can be calculated using storeIds. Also, as storeIds can vary, the inner Map is rather useless as you will have to use find instead of get to fetch values from Map. So if you have control over your ADTs, you could use an intermediary ADT like:
case class SubscriptionStateWithoutIds(channel: Channel, state: ChannelState)
then you can rewrite your foldLeft as follows:
list.foldLeft(Map.empty[Int, Map[SubscriptionStateWithoutIds, Seq[Long]]]) { (acc, next) =>
val subMap = acc.getOrElse(next._1, Map.empty[SubscriptionStateWithoutIds, Seq[Long]])
val withoutId = SubscriptionStateWithoutIds(next._2.channel, next._2.state)
val channelSub = subMap.get(withoutId)
acc + (next._1 -> (subMap + channelSub.fold(withoutId -> next._2.subIds) { seq => withoutId -> (seq ++ next._2.subIds) }))
}
The biggest advantage of intermediary ADT is you can have a cleaner groupMapReduce version:
list.groupMap(_._1)(sub => SubscriptionStateWithoutIds(sub._2.channel, sub._2.state) -> sub._2.subIds)
.map { case (key, value) => key -> value.groupMapReduce(_._1)(_._2)(_ ++ _) }

Scala Map - Use map function to replace key->value

I want to change the keys and values for the keys key1 and key2 only when their values are val1 and val2 (both these mappings should be present for the transformation to take place). I am able to do it using the following code, but I do not think this is very elegant or efficient.
Is there a better way to do the same thing, perhaps using just one .map function applied over map?
Code:
val map = Map(
"key1" -> "val1",
"key2" -> "val2",
"otherkey1" -> "otherval1"
)
val requiredKeys = List("key1", "key2")
val interestingMap = map.filterKeys(requiredKeys.contains) // will give ("key1" -> "val1", "key2" -> "val2").
val changedIfMatched =
if (interestingMap.get("key1").get.equalsIgnoreCase("val1") && interestingMap.get("key2").get.equalsIgnoreCase("val2"))
Map("key1" -> "newval1", "key2" -> "newval2")
else
interestingMap
print(map ++ changedIfMatched) // to replace the old key->values with the new ones, if any.
Also can ++ operation to update the old key->value mappings be made more efficient?
Just do the check ahead of time:
map
.get("k1").filter(_.equalsIgnoreCase("v1"))
.zip(map.get("k2").filter(_.equalsIgnoreCase("v2")))
.headOption
.fold(map) { _ =>
map ++ Map("key1" -> "newVal1", "key2" -> "newVal2")
}
Here's an approach that checks that both key value pairs match.
EDIT: Added a mapValues method to the Map class. This technique can be used to do further checks on the values of the map.
val m = Map("key1" -> "val1", "key2" -> "VAL2", "otherkey1" -> "otherval1")
val oldKVs = Map("key1" -> "val1", "key2" -> "val2")
val newKVs = Map("newkey1" -> "newval1", "newkey2" -> "newval2")
implicit class MapImp[T,S](m: Map[T,S]) {
def mapValues[R](f: S => R) = m.map { case (k,v) => (k, f(v)) }
def subsetOf(m2: Map[T,S]) = m.toSet subsetOf m2.toSet
}
def containsKVs[T](m: Map[T,String], sub: Map[T,String]) =
sub.mapValues(_.toLowerCase) subsetOf m.mapValues(_.toLowerCase)
val m2 = if (containsKVs(m, oldKVs)) m -- oldKVs.keys ++ newKVs else m
println(m2)
// Map(otherkey1 -> otherval1, newkey1 -> newval1, newkey2 -> newval2)
It takes advantage of the fact that you can convert Maps into Sets of Tuple2.
I think this will be the most generic and resuable solution for the problem.
object Solution1 extends App {
val map = Map(
"key1" -> "val1",
"key2" -> "val2",
"otherkey1" -> "otherval1"
)
implicit class MapUpdate[T](map: Map[T, T]) {
def updateMapForGivenKeyValues: (Iterable[(T, T)], Iterable[(T, T)]) => Map[T, T] =
(fromKV: Iterable[(T, T)], toKV: Iterable[(T, T)]) => {
val isKeyValueExist: Boolean = fromKV.toIterator.forall {
(oldKV: (T, T)) =>
map.toIterator.contains(oldKV)
}
if (isKeyValueExist) map -- fromKV.map(_._1) ++ toKV else map
}
}
val updatedMap = map.updateMapForGivenKeyValues(List("key1" -> "val1", "key2" -> "val2"),
List("newKey1" -> "newVal1", "newVal2" -> "newKey2"))
println(updatedMap)
}
So the method updateMapForGivenKeyValues takes the List of old key value and new key value tuple. If all the key value pairs mentioned in the first parameter of the method exist in the map then only we will update the map with new key value pairs mentioned in the second parameter of the method. As the method is generic will can be used on any data type like String, Int, some case class etc.
we can easily re-use the method for different type of maps without even changing a single line of code.
Answer to modified question
val map = Map(
"key1" -> "val1",
"key2" -> "val2",
"otherkey1" -> "otherval1"
)
val requiredVals = List("key1"->"val1", "key2"->"val2")
val newVals = List("newval1", "newval2")
val result =
if (requiredVals.forall{ case (k, v) => map.get(k).exists(_.equalsIgnoreCase(v)) }) {
map ++ requiredVals.map(_._1).zip(newVals)
} else {
map
}
This solution use forall to check that all the key/value pairs in requiredKeys are found in the map by testing each pair in turn.
For each key/value pair (k, v) it does a get on the map using the key to retrieve the current value as Option[String]. This will be None if the key is not found or Some(s) if the key is found.
The code then calls exists on the Option[String]. This method will return false if value is None (the key is not found), otherwise it will return the result of the test that is passed to it. The test is _.equalsIgnoreCase(v) which does a case-insensitive comparison of the contents of the Option (_) and the value from the requireKeys list (v).
If this test fails then the original value of map is returned.
If this test succeeds then a modified version of the map is return. The expression requiredVals.map(_._1) returns the keys from the requireVals list, and the zip(newVals) associates the new values with the original keys. The resulting list of values is added to the map using ++ which will replace the existing values with the new ones.
Original answer
val map = Map(
"key1" -> "val1",
"key2" -> "val2",
"otherkey1" -> "otherval1"
)
val requiredVals = Map("key1"->"val1", "key2"->"val2")
val newVals = Map("newkey1" -> "newval1", "newkey2" -> "newval2")
val result =
if (requiredVals.forall{ case (k, v) => map.get(k).exists(_.equalsIgnoreCase(v)) }) {
map -- requiredVals.keys ++ newVals
} else {
map
}
Note that this replaces the old keys with the new keys, which appears to be what is described. If you want to keep the original keys and values, just delete "-- requiredVals.keys" and it will add the new keys without removing the old ones.
You can use the following code:
val interestingMap =
if(map.getOrElse("key1", "") == "val1" && map.getOrElse("key2", "") == "val2")
map - "key1" - "key2" + ("key1New" -> "val1New") + ("key2New" -> "val2New")
else map
The check part(if statement) can be tweaked to suit your specific need.
if any of these key-value pairs are not present in the map, the original map will be returned, otherwise, you will get a new map with two updates at the requested keys.
Regarding efficiency, as long as there are only two keys to be updated, I do not think there is a real performance difference between using + to add elements directly and using ++ operator to overwrite the keys wholesale. If your map is huge though, maybe using a mutable map proves to be a better option in the long run.

Best way to filter and sort a Map by set of keys

I have a Map instance (immutable):
val source = Map(
("foo", "spam"),
("bar", "hoge"),
("baz", "eggs"),
("qux", "corge"),
("quux", "grault")
)
and I have number of keys (Set or List) in some order which may or may not exist in source map:
baz
foo
quuuuux // does not exist in a source map
But what is the best and cleanest way to iterate over the source map with concise scala style, filter it by my keys and place filtered items into resulting map in the same order as keys are?
Map(baz -> eggs, foo -> spam)
P.S. To clarify - order of keys in resulting map must be the same as in filtration keys list
If you have:
val source = Map(
"foo" -> "spam",
"bar" -> "hoge",
"baz" -> "eggs",
"qux" -> "corge",
"quux" -> "grault"
)
and
val keys = List( "baz", "foo", "quuuux" )
Then, you can:
import scala.collection.immutable.SortedMap
SortedMap(source.toSeq:_*).filter{ case (k,v) => keys.contains(k) }
val keys = List("foo", "bar")
val map = Map("foo" -> "spam", "bar" -> "hoge", "baz" -> "eggs")
keys.foldLeft(ListMap.empty[String, String]){ (acc, k) =>
map.get(k) match {
case Some(v) => acc + (k -> v)
case None => acc
}
}
This will iterate over the keys, building a map containing only the matching keys.
Please note that you need a ListMap to preserve the ordering of keys, although the implementation of ListMap will return the elements in the opposite order they were inserted (since keys are prepended as head of the list)
LinkedHashMap would ensure exact insertion order, but it's a mutable data structure.
If you need an ordered Map, you could use something like a TreeMap with a custom key ordering. So given
import scala.collection.immutable.TreeMap
val source = Map(
("foo", "spam"),
("bar", "hoge"),
("baz", "eggs"),
("qux", "corge"),
("quux", "grault")
)
val order: IndexedSeq[String] = IndexedSeq("baz", "foo", "quuuuux")
implicit val keyOrdering: Ordering[String] = Ordering.by(order.indexOf)
You have choice, either iterate over the ordered keys:
val result1: TreeMap[String, String] = order.collect {
case key if source.contains(key) => key -> source(key)
}(collection.breakOut)
// or a bit shorter
val result2: TreeMap[String, String] = order.flatMap { key => source.get(key).map(key -> _) }(collection.breakOut)
or filter from the source map:
val result3: TreeMap[String, String] = TreeMap.empty ++ source.filterKeys(order.contains)
I am not sure which one would be the most efficient, but I suspect the flatMap one might be fastest, at least for your simple example. Though, imho, the last example is better readable than the others.

How to use Reduce on Scala

I am using scala to implement an algorithm. I have a case where I need to implement such scenario:
test = Map(t -> List((t,2)), B -> List((B,3), (B,1)), D -> List((D,1)))
I need to some the second member of every common tuples.
The desired result :
Map((t,2),(B,4),(D,1))
val resReduce = test.foldLeft(Map.empty[String, List[Map.empty[String, Int]]){(count, tup) => count + (tup -> (count.getOrElse(tup, 0) + 1))
I am trying to use "Reduce", I have to go through every group I did and sum their second member. Any idea how to do that.
If you know that all lists are nonempty and start with the same key (e.g. they were produced by groupBy), then you can just
test.mapValues(_.map(_._2).sum).toMap
Alternatively, you might want an intermediate step that allows you to perform error-checking:
test.map{ case(k,xs) =>
val v = {
if (xs.exists(_._1 != k)) ??? // Handle key-mismatch case
else xs.reduceOption((l,r) => l.copy(_2 = l._2 + r._2))
}
v.getOrElse(??? /* Handle empty-list case */)
}
You could do something like this:
test collect{
case (key, many) => (key, many.map(_._2).sum)
}
wherein you do not have to assume that the list has any members. However, if you want to exclude empty lists, add a guard
case (key, many) if many.nonEmpty =>
like that.
scala> val test = Map("t" -> List(("t",2)), "B" -> List(("B",3), ("B",1)), "D" -> List(("D",1)))
test: scala.collection.immutable.Map[String,List[(String, Int)]] = Map(t -> List((t,2)), B -> List((B,3), (B,1)), D -> List((D,1)))
scala> test.map{case (k,v) => (k, v.map(t => t._2).sum)}
res32: scala.collection.immutable.Map[String,Int] = Map(t -> 2, B -> 4, D -> 1)
Yet another approach, in essence quite similar to what has already been suggested,
implicit class mapAcc(val m: Map[String,List[(String,Int)]]) extends AnyVal {
def mapCount() = for ( (k,v) <- m ) yield { (k,v.map {_._2}.sum) }
}
Then for a given
val test = Map("t" -> List(("t",2)), "B" -> List(("B",3), ("B",1)), "D" -> List(("D",1)))
a call
test.mapCount()
delivers
Map(t -> 2, B -> 4, D -> 1)

Scala iterate over map and turn singleton List into just the singleton

I am trying to extract a value of type List[T] to just T in a Map. So for instance:
val c = Map(1->List(1), 2-> List(2), 3->List(3));
would turn into
Map(1->1,2->2,3->3);
Here is what I have written so far:
val Some(values) = request.body.asFormUrlEncoded.foreach {
case (key,value) =>
Map(key->value.head);
};
and here is the error I am receiving:
constructor cannot be instantiated to expected type; found : (T1, T2) required: scala.collection.immutable.Map[String,Seq[String]]
EDIT: This is ocurring wrt to this line:
case (key,value) =>
EDIT2:
request.body.asFormUrlEncoded example output
Some(Map(test -> List(324)))
Some(Map(SpO2 -> List(456), ETCO2 -> List(123)))
Are you sure that you will always have exactly one element in the list? If so, you should do this, which is clear, and has the benefit that it will throw an error if you get a bad list (doesn't have exactly one element) by accident.
c.map { case (k, List(v)) => k -> v }
// Map(1 -> 1, 2 -> 2, 3 -> 3)
If your lists can have more than one element, and you just want the first, you can do this (which will error on empty lists):
val d = Map(1 -> List(1), 2 -> List(2,4,6), 3 -> List(3))
d.map { case (k, List(v, _*)) => k -> v }
// Map(1 -> 1, 2 -> 2, 3 -> 3)
If your lists may not have exactly one element, and you want to ignore any non-singleton lists instead of throwing errors, use collect instead of map:
val e = Map(1 -> List(1), 2 -> List(2,4,6), 3 -> List(3), 4 -> List())
e.collect { case (k, List(v)) => k -> v }
// Map(1 -> 1, 3 -> 3)
As for your code:
val Some(values) = request.body.asFormUrlEncoded.foreach {
case (key,value) =>
Map(key->value.head);
};
This doesn't really make any sense.
First off, foreach doesn't return anything, so assigning its result to a variable will never work. You probably want this to be a map instead, so that it returns a collection.
Second, your use of Some makes it seem like you don't understand Options, so you might want to read up on that.
Third, if you want the result to be a Map (a collection of pairs), then you'll just want to return the pair, key->value.head, and not a Map.
Fourth, if you're getting errors matching on case (key,value), then probably asFormUrlEncoded doesn't actually return a collection of pairs. You should see what its type actually is.
Lastly, the semicolons are unnecessary. You should remove them.
EDIT based on your comment:
Since request.body.asFormUrlEncoded actually returns things like Some(Map("test" -> List(324))), here is how your code should look.
If asFormUrlEncoded might return None, and you don't have any way of handling that, then you should guard against it:
val a = Some(Map("test" -> List(324)))
val value = a match {
case Some(m) => m.collect { case (k, List(v)) => k -> v }
case None => sys.error("expected something, got nothing")
}
If you're sure that asFormUrlEncoded will already return Some, then you can just do this:
val a = Some(Map("test" -> List(324)))
val Some(value) = a.map(_.collect { case (k, List(v)) => k -> v })