Scala map key consists of 2 comma-separated sets. How to extract the first key in a set? - scala

I have a Scala map collection that looks something like this:
var collection = Map((A,B) -> 1)
The key is (A,B) and the value is 1.
My question: If I use collection.head._1, the result is (A,B) which is correct. But I want to extract A only, without B, as I need to compare A with some other variable. So the final result should be A stored in a different variable.
I tried to use collection.head._1(0) which results in error
Any does not take parameters

You can try:
val collection = Map(("A","B") -> 1)
collection.map{ case ((a, b),v) => a -> v}

You can use keySet to get all the keys as a Set[(String, String)] and then map it into the first element of each:
val coll: Map[(String, String), Int] =
Map(
("one", "elephant") -> 1,
("two", "elephants") -> 2,
("three", "elephants") -> 3
)
/*
val myKeys = coll.keySet.map { case (x, _) => x }
// equivalent to:
val myKeys = coll.keySet.map(tup => tup._1)
// equivalent to: */
val myKeys = coll.keySet.map(_._1) // Set(one, two, three)

Related

How to delete key from Map in Option Scala

Say i have case classes like this.
case class someClass0(content: someClass1)
case class someClass1(someContent: Option[Map[String, someClass2]])
case class someClass2(someKey: Array[Int])
I need to delete items in Map(which is immutable) by values.
This values i get through iteration.
val keys_to_remove = new ListBuffer[String]()
val keys_to_keep: List[Int] = List(100, 200)
for (x <- keys_to_keep) {
content.someContent.get.foreach {
case (key: String, value: someClass2) => {
if (!value.someKey.contains(x)) {
keys_to_remove.append(key)
}
}
}
}
So, how to keep all the structure, and delete only needed items by key?
I was trying to change type of Map like
content.someContent.map(_.to(collection.mutable.Map))
But content.someContent.get.remove(key) is not working.
What am i doing wrong?
You don't need mutability for that.
val keys_to_keep: List[String] = List("a", "b")
val res = content.someContent.map(
_.filterKeys(k => !keys_to_keep.contains(k))
)
filterKeys filters a Map by testing each entries' key against a condition.
Of course, it is important to remember that you can't test contains on a List[Int] against Strings, as the result will always be false.
Furthermore, try looking up style-guides for Scala:
Classes are usually named in upper camel case
Values and variables are usually named in lower camel case
Try it out
You can do it using - operator and foldLeft on keys to remove.
you are using get for get value, if you want do it safety, you need to use:
content.someContent.map(immutableMap =>
keys_to_remove.foldLeft(immutableMap){
(map, key) =>
map - key
}).getOrElse(Map.empty[String, SomeClass2])
this works like in this example:
import scala.collection.mutable.ListBuffer
val immutableMap = Map("a" -> 1, "b" -> 2, "c" -> 3, "d" -> 4)
val keys_to_remove: ListBuffer[String] = ListBuffer("b", "d")
println(immutableMap) // Map(a -> 1, b -> 2, c -> 3, d -> 4)
val mapWithoutKeys = keys_to_remove.foldLeft(immutableMap){
(map, key) =>
map - key
}
println(mapWithoutKeys) //Map(a -> 1, c -> 3)
Here's how you can do it:
val optionalMap = someClass0.content.someContent.map {
contentMap => contentMap - keyToBeRemoved
}
val originalStructure = someClass0.copy(content = SomeClass1(optionalMap))
Here's the Scastie
This will remove all keys and keep structure
val someClass0_copy = someClass0.copy(content = Content(someContent = someClass0.content. someContent.map(_.removedAll(keysToRemove)))

Scala- creating a map from string

Hi have a string and the if format of the string is mentioned below:
val str = "{a=10, b=20, c=30}"
All the parameters inside this string is unique and separated by comma and space. Also This string always starts with '{' and ends with '}'. I want to create a Map out of this string something like below:
val values = Map("a" -> 10, "b" -> 20, "c" -> 30)
What is the most efficient way I can achieve this?
scala> val str = "{a=10, b=20, c=30}"
str: String = {a=10, b=20, c=30}
scala> val P = """.*(\w+)=(\d+).*""".r
P: scala.util.matching.Regex = .*(\w+)=(\d+).*
scala> str.split(',').map{ case P(k, v) => (k, v.toInt) }.toMap
res2: scala.collection.immutable.Map[String,Int] = Map(a -> 10, b -> 20, c -> 30)
Use regex can simply achieve this:
"(\\w+)=(\\w+)".r.findAllIn("{a=10, b=20, c=30}").matchData.map(i => {
(i.group(1), i.group(2))
}).toMap
The function you want to write is pretty easy:
def convert(str : String) : Map[String, String] = {
str.drop(1).dropRight(1).split(", ").map(_.split("=")).map(arr => arr(0)->arr(1)).toMap
}
with drop and dropRight, you remove the brackets. Then you split the String with the expression ,, which results in multiple Strings.
Than you split each of this strings, which results in arrays with two elements. Those are used to create a map.
I would do it likes this (I think regex is not needed here):
val str = "{a=10, b=20, c=30}"
val values: Map[String, Int] = str.drop(1).dropRight(1) // drop braces
.split(",") // split into key-value pairs
.map { pair =>
val Array(k, v) = pair.split("=") // split key-value pair and parse to Int
(k.trim -> v.toInt)
}.toMap

Can I call a method inside of sortWith when sorting a sequence (or a map)?

I have a map Map[String,Option[Seq[String]]] and I have values for each of the string in a different map: Map[String,Option[Int]]. I am trying to map over the values and use a sortWith on the sequence but as I read online, I don't see any examples of having custom methods inside the sortWith.
How can I sort my sequence using sortWith? If I wanted to implement a custom method that returns a boolean to tell me what object is considered greater, is this possible?
val fieldMap = Map("user1" -> Seq("field1_name", "field2_name"), "user2" -> Seq("field3_name"))
val fieldValues = Map("field1_name" -> 2, "field2_name" -> 1, "field3_name" -> 3)
val sortedMap = fieldMap.mapValues(fieldList => fieldList.sortWith(fieldValues(_) < fieldValues(_)) // Scala doesn't like this
I tried:
fieldList.sortWith{(x,y) =>
val x = fieldValues(x)
val y = fieldValues(y)
x < y
}
This gives me a Type mismatch of expected type:
(String,String) => Boolean
and actual:
(String,String) => Any
EDIT Solution:
fieldList.sortWith{(x,y) =>
val x = fieldValues(x)
val y = fieldValues(x)
x.getOrElse[Double](0.0) < y.getOrElse[Double](0.0) // have to unwrap the Option.
}
You're using wrong syntax. For using sortWith you have to do something like:
fieldMap.mapValues(
fieldList => fieldList.sortWith(
(a,b) => fieldValues(a) > fieldValues(b)
)
)

acces tuple inside a tuple for anonymous map job in Spark

This post is essentially about how to build joint and marginal histograms from a (String, String) RDD. I posted the code that I eventually used below as the answer.
I have an RDD that contains a set of tuples of type (String,String) and since they aren't unique I want to get a look at how many times each String, String combination occurs so I use countByValue like so
val PairCount = Pairs.countByValue().toSeq
which gives me a tuple as output like this ((String,String),Long) where long is the number of times that the (String, String) tuple appeared
These Strings can be repeated in different combinations and I essentially want to run word count on this PairCount variable so I tried something like this to start:
PairCount.map(x => (x._1._1, x._2))
But the output the this spits out is String1->1, String2->1, String3->1, etc.
How do I output a key value pair from a map job in this case where the key is going to be one of the String values from the inner tuple, and the value is going to be the Long value from the outter tuple?
Update:
#vitalii gets me almost there. the answer gets me to a Seq[(String,Long)], but what I really need is to turn that into a map so that I can run reduceByKey it afterwards. when I run
PairCount.flatMap{case((x,y),n) => Seq[x->n]}.toMap
for each unique x I get x->1
for example the above line of code generates mom->1 dad->1 even if the tuples out of the flatMap included (mom,30) (dad,59) (mom,2) (dad,14) in which case I would expect toMap to provide mom->30, dad->59 mom->2 dad->14. However, I'm new to scala so I might be misinterpreting the functionality.
how can I get the Tuple2 sequence converted to a map so that I can reduce on the map keys?
If I correctly understand question, you need flatMap:
val pairCountRDD = pairs.countByValue() // RDD[((String, String), Int)]
val res : RDD[(String, Int)] = pairCountRDD.flatMap { case ((s1, s2), n) =>
Seq(s1 -> n, s2 -> n)
}
Update: I didn't quiet understand what your final goal is, but here's a few more examples that may help you, btw code above is incorrect, I have missed the fact that countByValue returns map, and not RDD:
val pairs = sc.parallelize(
List(
"mom"-> "dad", "dad" -> "granny", "foo" -> "bar", "foo" -> "baz", "foo" -> "foo"
)
)
// don't use countByValue, if pairs is large you will run out of memmory
val pairCountRDD = pairs.map(x => (x, 1)).reduceByKey(_ + _)
val wordCount = pairs.flatMap { case (a,b) => Seq(a -> 1, b ->1)}.reduceByKey(_ + _)
wordCount.take(10)
// count in how many pairs each word occur, keys and values:
val wordPairCount = pairs.flatMap { case (a,b) =>
if (a == b) {
Seq(a->1)
} else {
Seq(a -> 1, b ->1)
}
}.reduceByKey(_ + _)
wordPairCount.take(10)
to get the histograms for the (String,String) RDD I used this code.
val Hist_X = histogram.map(x => (x._1-> 1.0)).reduceByKey(_+_).collect().toMap
val Hist_Y = histogram.map(x => (x._2-> 1.0)).reduceByKey(_+_).collect().toMap
val Hist_XY = histogram.map(x => (x-> 1.0)).reduceByKey(_+_)
where histogram was the (String,String) RDD

Best way to filter and sort a Map by set of keys

I have a Map instance (immutable):
val source = Map(
("foo", "spam"),
("bar", "hoge"),
("baz", "eggs"),
("qux", "corge"),
("quux", "grault")
)
and I have number of keys (Set or List) in some order which may or may not exist in source map:
baz
foo
quuuuux // does not exist in a source map
But what is the best and cleanest way to iterate over the source map with concise scala style, filter it by my keys and place filtered items into resulting map in the same order as keys are?
Map(baz -> eggs, foo -> spam)
P.S. To clarify - order of keys in resulting map must be the same as in filtration keys list
If you have:
val source = Map(
"foo" -> "spam",
"bar" -> "hoge",
"baz" -> "eggs",
"qux" -> "corge",
"quux" -> "grault"
)
and
val keys = List( "baz", "foo", "quuuux" )
Then, you can:
import scala.collection.immutable.SortedMap
SortedMap(source.toSeq:_*).filter{ case (k,v) => keys.contains(k) }
val keys = List("foo", "bar")
val map = Map("foo" -> "spam", "bar" -> "hoge", "baz" -> "eggs")
keys.foldLeft(ListMap.empty[String, String]){ (acc, k) =>
map.get(k) match {
case Some(v) => acc + (k -> v)
case None => acc
}
}
This will iterate over the keys, building a map containing only the matching keys.
Please note that you need a ListMap to preserve the ordering of keys, although the implementation of ListMap will return the elements in the opposite order they were inserted (since keys are prepended as head of the list)
LinkedHashMap would ensure exact insertion order, but it's a mutable data structure.
If you need an ordered Map, you could use something like a TreeMap with a custom key ordering. So given
import scala.collection.immutable.TreeMap
val source = Map(
("foo", "spam"),
("bar", "hoge"),
("baz", "eggs"),
("qux", "corge"),
("quux", "grault")
)
val order: IndexedSeq[String] = IndexedSeq("baz", "foo", "quuuuux")
implicit val keyOrdering: Ordering[String] = Ordering.by(order.indexOf)
You have choice, either iterate over the ordered keys:
val result1: TreeMap[String, String] = order.collect {
case key if source.contains(key) => key -> source(key)
}(collection.breakOut)
// or a bit shorter
val result2: TreeMap[String, String] = order.flatMap { key => source.get(key).map(key -> _) }(collection.breakOut)
or filter from the source map:
val result3: TreeMap[String, String] = TreeMap.empty ++ source.filterKeys(order.contains)
I am not sure which one would be the most efficient, but I suspect the flatMap one might be fastest, at least for your simple example. Though, imho, the last example is better readable than the others.