is there any better way to read values from a map based on keys if I have more keys in a map?
currently I have a Map[String, List[String]] which can have more than 20 keys:
I am using below for retrieving values for each keys
val names= map.getOrElse("Name", List.empty)
.
.
.
val cities = map.getOrElse("City", List.Empty)
Please help If I can write this in better way.
I very much doubt you're doing yourself any favors by replicating the Map data into local variables.
One thing you could do is employ pattern matching to save some (not much) typing.
val knownKeys = List("Name", "City", "Country") // etc. etc.
val List(names
,cities
,countries
// etc. etc.
) = knownKeys.map(data.getOrElse(_, List()))
A major drawback to this idea is that the list of keys has to be in the exact same order as the order of variables in the extraction.
A better idea is to give your Map its own default.
val data = Map("City" -> List("NY","Rome")
,"Name" -> List("Ed","Al")
// etc. etc.
).withDefaultValue(List.empty[String])
Then you don't need .getOrElse().
data("City") // res0: List[String] = List(NY, Rome)
data("Airport") // res1: List[String] = List()
Related
def arrayToMap(fields: Array[CustomClass]): Map[String, CustomClass] = {
val fieldData = fields.map(f => f.name -> CustomClass(f.name)) // This is Array[(String, CustomClass)], and order is fine at this point
fieldData.toMap // order gets jumbled up
/*
What I've also tried
Map(fieldData : _*)
*/
}
why is converting Array to Map messing up the order? Is there a way to retain the order of the Array of tuples when converting to a Map?
The solution is to use ListMap rather than Map, but the question remains why the order matters. Also, Array is a Java type rather than a pure Scala type, so use Seq to allow Scala types to be used as well.
import scala.collection.immutable.ListMap
def arrayToMap(fields: Seq[CustomClass]): ListMap[String, CustomClass] =
ListMap(fields.map(f => f.name -> CustomClass(f.name)):_*)
I have a set of Map.Entry like Set<Map.Entry<String, ConfigValue>> in scala. Now I want to get the Set either keys(String) or values(ConfigValue) in scala. Please suggest some easy solution for this problem.
Thanks
you can use .map to transform your Set[Map.Entry[String,ConfigValue]] to Set[String] and/or Set[ConfigValue]. however note that you might want to convert to List before to avoid collapsing duplicates.
So if you have
val map: Set[Map[K, V]] = ???
val keys = map.flatMap(_.keySet) will give you Set[K]
val values = map.flatMap(_.values) will give you Set[V]
In both cases duplicates will be removed.
You could create a couple of functions that describe that computation, like:
val getKeys: Set[JavaMap.Entry[String, ConfigValue]] => Set[String] = _.map(_.getKey)
val getValues: Set[JavaMap.Entry[String, ConfigValue]] => Set[ConfigValue] = _.map(_.getValue)
Then when you need to extract one or the other you can call them like so:
val setOfKeyMap: Set[Map.Entry[String, ConfigValue]] = ???
...
val setOfKeys: Set[String] = getKeys(setOfKeyMap)
val setOfValues: Set[ConfigValue] = getValues(setOfKeyMap)
val data = List("foo", "bar", "bash")
val selection = List(0, 2)
val selectedData = data.filter(datum => selection.contains(datum.MYINDEX))
// INVALID CODE HERE ^
// selectedData: List("foo", "bash")
Say I want to filter a List given a list of selected indices. If, in the filter method, I could reference the index of a list item then I could solve this as above, but datum.MYINDEX isn't valid in the above case.
How could I do this instead?
How about using zipWithIndex to keep a reference to the item's index, filtering as such, then mapping the index away?
data.zipWithIndex
.filter{ case (datum, index) => selection.contains(index) }
.map(_._1)
It's neater to do it the other way about (although potentially slow with Lists as indexing is slow (O(n)). Vectors would be better. On the other hand, the contains of the other solution for every item in data isn't exactly fast)
val data = List("foo", "bar", "bash")
//> data : List[String] = List(foo, bar, bash)
val selection = List(0, 2)
//> selection : List[Int] = List(0, 2)
selection.map(index=>data(index))
//> res0: List[String] = List(foo, bash)
First solution that came to my mind was to create a list of pairs (element, index), filter every element by checking if selection contains that index, then map resulting list in order to keep only raw elementd (omit index). Code is self explanatory:
data.zipWithIndex.filter(pair => selection.contains(pair._2)).map(_._1)
or more readable:
val elemsWithIndices = data.zipWithIndex
val filteredPairs = elemsWithIndices.filter(pair => selection.contains(pair._2))
val selectedElements = filteredPairs.map(_._1)
This Works :
val data = List("foo", "bar", "bash")
val selection = List(0, 2)
val selectedData = data.filter(datum => selection.contains(data.indexOf(datum)))
println (selectedData)
output :
List(foo, bash)
Since you have a list of indices already, the most efficient way is to pick those indices directly:
val data = List("foo", "bar", "bash")
val selection = List(0, 2)
val selectedData = selection.map(index => data(index))
or even:
val selectedData = selection.map(data)
or if you need to preserve the order of the items in data:
val selectedData = selection.sorted.map(data)
UPDATED
In the spirit of finding all the possible algorithms, here's the version using collect:
val selectedData = data
.zipWithIndex
.collect {
case (item, index) if selection.contains(index) => item
}
The following is the probably most scalable way to do it in terms of efficiency, and unlike many answers on SO, actually follows the official scala style guide exactly.
import scala.collection.immutable.HashSet
val selectionSet = new HashSet() ++ selection
data.zipWithIndex.collect {
case (datum, index) if selectionSet.contains(index) => datum
}
If the resulting collection is to be passed to additional map, flatMap, etc, suggest turning data into a lazy sequence. In fact perhaps you should do this anyway in order to avoid 2-passes, one for the zipWithIndex one for the collect, but I doubt when benchmarked one would gain much.
There is actually an easier way to filter by index using the map method. Here is an example
val indices = List(0, 2)
val data = List("a", "b", "c")
println(indices.map(data)) // will print List("a", "c")
Does Spark support distributed Map collection types ?
So if I have an HashMap[String,String] which are key,value pairs , can this be converted to a distributed Map collection type ? To access the element I could use "filter" but I doubt this performs as well as Map ?
Since I found some new info I thought I'd turn my comments into an answer. #maasg already covered the standard lookup function I would like to point out you should be careful because if the RDD's partitioner is None, lookup just uses a filter anyway. In reference to the (K,V) store on top of spark it looks like this is in progress, but a usable pull request has been made here. Here is an example usage.
import org.apache.spark.rdd.IndexedRDD
// Create an RDD of key-value pairs with Long keys.
val rdd = sc.parallelize((1 to 1000000).map(x => (x.toLong, 0)))
// Construct an IndexedRDD from the pairs, hash-partitioning and indexing
// the entries.
val indexed = IndexedRDD(rdd).cache()
// Perform a point update.
val indexed2 = indexed.put(1234L, 10873).cache()
// Perform a point lookup. Note that the original IndexedRDD remains
// unmodified.
indexed2.get(1234L) // => Some(10873)
indexed.get(1234L) // => Some(0)
// Efficiently join derived IndexedRDD with original.
val indexed3 = indexed.innerJoin(indexed2) { (id, a, b) => b }.filter(_._2 != 0)
indexed3.collect // => Array((1234L, 10873))
// Perform insertions and deletions.
val indexed4 = indexed2.put(-100L, 111).delete(Array(998L, 999L)).cache()
indexed2.get(-100L) // => None
indexed4.get(-100L) // => Some(111)
indexed2.get(999L) // => Some(0)
indexed4.get(999L) // => None
It seems like the pull request was well received and will probably be included in future versions of spark, so it is probably safe to use that pull request in your own code. Here is the JIRA ticket in case you were curious
The quick answer: Partially.
You can transform a Map[A,B] into an RDD[(A,B)] by first forcing the map into a sequence of (k,v) pairs but by doing so you loose the constrain that keys of a map must be a set. ie. you loose the semantics of the Map structure.
From a practical perspective, you can still resolve an element into its corresponding value using kvRdd.lookup(element) but the result will be a sequence, given that you have no warranties that there's a single lookup value as explained before.
A spark-shell example to make things clear:
val englishNumbers = Map(1 -> "one", 2 ->"two" , 3 -> "three")
val englishNumbersRdd = sc.parallelize(englishNumbers.toSeq)
englishNumbersRdd.lookup(1)
res: Seq[String] = WrappedArray(one)
val spanishNumbers = Map(1 -> "uno", 2 -> "dos", 3 -> "tres")
val spanishNumbersRdd = sc.parallelize(spanishNumbers.toList)
val bilingueNumbersRdd = englishNumbersRdd union spanishNumbersRdd
bilingueNumbersRdd.lookup(1)
res: Seq[String] = WrappedArray(one, uno)
Suppose I have list countries of type List[String] and map capitals of type Map[String, String]. Now I would like to write a functionpairs(countries:List[String], capitals:Map[String, String]):Seq[(String, String)]to return a sequence of pairs (country, capital) and print an error if the capital for some country is not found. What is the best way to do that?
To start with, your Map[String,String] is already a Seq[(String,String)], you can formalise this a bit by calling toSeq if you wish:
val xs = Map("UK" -> "London", "France" -> "Paris")
xs.toSeq
// yields a Seq[(String, String)]
So the problem then boils down to finding countries that aren't in the map. You have two ways of getting a collection of those countries that are represented.
The keys method will return an Iterator[String], whilst keySet will return a Set[String]. Let's favour the latter:
val countriesWithCapitals = xs.keySet
val allCountries = List("France", "UK", "Italy")
val countriesWithoutCapitals = allCountries.toSet -- countriesWithCapitals
//yields Set("Italy")
Convert that into an error in whatever way you see fit.
countries.map(x=>(x, capitals(x)))