Extended groupBy - scala

I have a collection of objects, each having a list inside, let us say:
case class Article(text: String, labels: List[String])
I need to construct a map, such that every key corresponds to one of the labels and its associated value is a list of articles having this label.
I hope that following example makes my goal more clear. I want to transform following list
List(
Article("article1", List("label1", "label2")),
Article("article2", List("label2", "label3"))
)
into a Map[String,List[Article]]:
Map(
"label1" -> List(Article("article1", ...)),
"label2" -> List(Article("article1", ...), Article("article2", ...)),
"label3" -> List(Article("article2", ...))
)
Is there an elegant way to perform this transformation? (I mean using collection methods without need for using mutable collections directly)

How about this?
val ls = List(
Article("article1", List("label1", "label2")),
Article("article2", List("label2", "label3"))
)
val m = ls
.flatMap{case a # Article(t,ls) => ls.map(_ -> a)}
.groupBy(_._1)
.mapValues(_.map(_._2))
m.foreach(println)
// (label2,List(Article(article1,List(label1, label2)), Article(article2,List(label2, label3))))
// (label1,List(Article(article1,List(label1, label2))))
// (label3,List(Article(article2,List(label2, label3))))

Related

Scala create immutable nested map

I have a situation here
I have two strins
val keyMap = "anrodiApp,key1;iosApp,key2;xyz,key3"
val tentMap = "androidApp,tenant1; iosApp,tenant1; xyz,tenant2"
So what I want to add is to create a nested immutable nested map like this
tenant1 -> (andoidiApp -> key1, iosApp -> key2),
tenant2 -> (xyz -> key3)
So basically want to group by tenant and create a map of keyMap
Here is what I tried but is done using mutable map which I do want, is there a way to create this using immmutable map
case class TenantSetting() {
val requesterKeyMapping = new mutable.HashMap[String, String]()
}
val requesterKeyMapping = keyMap.split(";")
.map { keyValueList => keyValueList.split(',')
.filter(_.size==2)
.map(keyValuePair => (keyValuePair[0],keyValuePair[1]))
.toMap
}.flatten.toMap
val config = new mutable.HashMap[String, TenantSetting]
tentMap.split(";")
.map { keyValueList => keyValueList.split(',')
.filter(_.size==2)
.map { keyValuePair =>
val requester = keyValuePair[0]
val tenant = keyValuePair[1]
if (!config.contains(tenant)) config.put(tenant, new TenantSetting)
config.get(tenant).get.requesterKeyMapping.put(requester, requesterKeyMapping.get(requester).get)
}
}
The logic to break the strings into a map can be the same for both as it's the same syntax.
What you had for the first string was not quite right as the filter you were applying to each string from the split result and not on the array result itself. Which also showed in that you were using [] on keyValuePair which was of type String and not Array[String] as I think you were expecting. Also you needed a trim in there to cope with the spaces in the second string. You might want to also trim the key and value to avoid other whitespace issues.
Additionally in this case the combination of map and filter can be more succinctly done with collect as shown here:
How to convert an Array to a Tuple?
The use of the pattern with 2 elements ensures you filter out anything with length other than 2 as you wanted.
The iterator is to make the combination of map and collect more efficient by only requiring one iteration of the collection returned from the first split (see comments below).
With both strings turned into a map it just needs the right use of groupByto group the first map by the value of the second based on the same key to get what you wanted. Obviously this only works if the same key is always in the second map.
def toMap(str: String): Map[String, String] =
str
.split(";")
.iterator
.map(_.trim.split(','))
.collect { case Array(key, value) => (key.trim, value.trim) }
.toMap
val keyMap = toMap("androidApp,key1;iosApp,key2;xyz,key3")
val tentMap = toMap("androidApp,tenant1; iosApp,tenant1; xyz,tenant2")
val finalMap = keyMap.groupBy { case (k, _) => tentMap(k) }
Printing out finalMap gives:
Map(tenant2 -> Map(xyz -> key3), tenant1 -> Map(androidApp -> key1, iosApp -> key2))
Which is what you wanted.

Join 3 maps (2 master map, 1 resultant map), to map the composite key in the resultant map to have values from the master maps

I have one map containing some master data(id->description):
val map1: Map[String, String] = Map("001" -> "ABCD", "002" -> "MNOP", "003" -> "WXYZ")
I have another map containing some other master data(id->description):
val map2: Map[String, String] = Map("100" -> "Ref1", "200" -> "Ref2", "300" -> "Ref3")
I have a resultant map as follows which is derived from some data set which has yieled the following map where the id from map1 and map2's have been used in combination to determine the key, to be precise a map derived from grouping on ids from both the above maps and then accumulating the amounts:
val map3:Map[(String, String),Double] = Map(("001","200")->3452.30,("003","300")->78484.33,("002","777") -> 893.45)
I need an output in a Map as follows:
("ABCD","Ref2")->3452.30,("WXYZ","Ref3")->78484.33,("MNOP","777") -> 893.45)
I have been trying this:
val map5 = map3.map(obj => {
(map1 getOrElse(obj._1._1, "noMatchMap1"))
(map2 getOrElse(obj._1._2, "noMatchMap2"))
} match {
case "noMatchMap1" => obj
case "noMatchMap2" => obj
case value => value -> obj._2
})
This should be it :
map3.map{
case((key1, key2), d) => ((map1.getOrElse(key1, key1), map2.getOrElse(key2, key2)),d)
}
Btw, I invite you to consult https://stackoverflow.com/help/how-to-ask for how to ask good questions, and in particular, please include what you tried. I'm happy to help you, but this isn't a site where you can just dump your homework/work and get it done :-D

How to iterate values of map in Scala?

For the value val m = Map(2 ->(3, 2), 1 ->(2, 1))
I want to add up elements belonged to same key, thus, the result is : Map(2 -> 5,1 -> 3) Please guys help me how to solve this problem, I'll appreciate any help!
Consider
m.mapValues { case(x,y) => x+y }
which creates a new Map with same keys and computed values. Also consider
def f(t: (Int,Int)) = t._1+t._2
and so a more concise approach includes this
m.mapValues(f)
Note Decomposing tuples in function arguments for details in declaring a function that can take the tuples from the Map.
Update Following important note by #KevinMeredith (see link in comment below), mapValues provides a view to the collection and the transformation needs be referentially transparent; hence as a standard (intuitive) approach consider pattern-matching on the entire key-value group using map for instance like this,
m.map { case (x,(t1,t2)) => x -> (t1+t2) }
or
m.map { case (k,v) => (k,f(v)) }
or
for ( (x,(t1,t2)) <- m ) yield x -> (t1+t2)

Best way to filter and sort a Map by set of keys

I have a Map instance (immutable):
val source = Map(
("foo", "spam"),
("bar", "hoge"),
("baz", "eggs"),
("qux", "corge"),
("quux", "grault")
)
and I have number of keys (Set or List) in some order which may or may not exist in source map:
baz
foo
quuuuux // does not exist in a source map
But what is the best and cleanest way to iterate over the source map with concise scala style, filter it by my keys and place filtered items into resulting map in the same order as keys are?
Map(baz -> eggs, foo -> spam)
P.S. To clarify - order of keys in resulting map must be the same as in filtration keys list
If you have:
val source = Map(
"foo" -> "spam",
"bar" -> "hoge",
"baz" -> "eggs",
"qux" -> "corge",
"quux" -> "grault"
)
and
val keys = List( "baz", "foo", "quuuux" )
Then, you can:
import scala.collection.immutable.SortedMap
SortedMap(source.toSeq:_*).filter{ case (k,v) => keys.contains(k) }
val keys = List("foo", "bar")
val map = Map("foo" -> "spam", "bar" -> "hoge", "baz" -> "eggs")
keys.foldLeft(ListMap.empty[String, String]){ (acc, k) =>
map.get(k) match {
case Some(v) => acc + (k -> v)
case None => acc
}
}
This will iterate over the keys, building a map containing only the matching keys.
Please note that you need a ListMap to preserve the ordering of keys, although the implementation of ListMap will return the elements in the opposite order they were inserted (since keys are prepended as head of the list)
LinkedHashMap would ensure exact insertion order, but it's a mutable data structure.
If you need an ordered Map, you could use something like a TreeMap with a custom key ordering. So given
import scala.collection.immutable.TreeMap
val source = Map(
("foo", "spam"),
("bar", "hoge"),
("baz", "eggs"),
("qux", "corge"),
("quux", "grault")
)
val order: IndexedSeq[String] = IndexedSeq("baz", "foo", "quuuuux")
implicit val keyOrdering: Ordering[String] = Ordering.by(order.indexOf)
You have choice, either iterate over the ordered keys:
val result1: TreeMap[String, String] = order.collect {
case key if source.contains(key) => key -> source(key)
}(collection.breakOut)
// or a bit shorter
val result2: TreeMap[String, String] = order.flatMap { key => source.get(key).map(key -> _) }(collection.breakOut)
or filter from the source map:
val result3: TreeMap[String, String] = TreeMap.empty ++ source.filterKeys(order.contains)
I am not sure which one would be the most efficient, but I suspect the flatMap one might be fastest, at least for your simple example. Though, imho, the last example is better readable than the others.

Distributed Map in Scala Spark

Does Spark support distributed Map collection types ?
So if I have an HashMap[String,String] which are key,value pairs , can this be converted to a distributed Map collection type ? To access the element I could use "filter" but I doubt this performs as well as Map ?
Since I found some new info I thought I'd turn my comments into an answer. #maasg already covered the standard lookup function I would like to point out you should be careful because if the RDD's partitioner is None, lookup just uses a filter anyway. In reference to the (K,V) store on top of spark it looks like this is in progress, but a usable pull request has been made here. Here is an example usage.
import org.apache.spark.rdd.IndexedRDD
// Create an RDD of key-value pairs with Long keys.
val rdd = sc.parallelize((1 to 1000000).map(x => (x.toLong, 0)))
// Construct an IndexedRDD from the pairs, hash-partitioning and indexing
// the entries.
val indexed = IndexedRDD(rdd).cache()
// Perform a point update.
val indexed2 = indexed.put(1234L, 10873).cache()
// Perform a point lookup. Note that the original IndexedRDD remains
// unmodified.
indexed2.get(1234L) // => Some(10873)
indexed.get(1234L) // => Some(0)
// Efficiently join derived IndexedRDD with original.
val indexed3 = indexed.innerJoin(indexed2) { (id, a, b) => b }.filter(_._2 != 0)
indexed3.collect // => Array((1234L, 10873))
// Perform insertions and deletions.
val indexed4 = indexed2.put(-100L, 111).delete(Array(998L, 999L)).cache()
indexed2.get(-100L) // => None
indexed4.get(-100L) // => Some(111)
indexed2.get(999L) // => Some(0)
indexed4.get(999L) // => None
It seems like the pull request was well received and will probably be included in future versions of spark, so it is probably safe to use that pull request in your own code. Here is the JIRA ticket in case you were curious
The quick answer: Partially.
You can transform a Map[A,B] into an RDD[(A,B)] by first forcing the map into a sequence of (k,v) pairs but by doing so you loose the constrain that keys of a map must be a set. ie. you loose the semantics of the Map structure.
From a practical perspective, you can still resolve an element into its corresponding value using kvRdd.lookup(element) but the result will be a sequence, given that you have no warranties that there's a single lookup value as explained before.
A spark-shell example to make things clear:
val englishNumbers = Map(1 -> "one", 2 ->"two" , 3 -> "three")
val englishNumbersRdd = sc.parallelize(englishNumbers.toSeq)
englishNumbersRdd.lookup(1)
res: Seq[String] = WrappedArray(one)
val spanishNumbers = Map(1 -> "uno", 2 -> "dos", 3 -> "tres")
val spanishNumbersRdd = sc.parallelize(spanishNumbers.toList)
val bilingueNumbersRdd = englishNumbersRdd union spanishNumbersRdd
bilingueNumbersRdd.lookup(1)
res: Seq[String] = WrappedArray(one, uno)