Nested Map withDefaultValue changes default value - scala

I have a mutable map containing another mutable map, both with default values. After I assign a value to one key in the enclosed map, its default value seems to change.
I.e. I expected anotherDefault to have the value Map(1 -> default), NOT Map(1 -> something).
Why is this happening?
scala> import scala.collection.mutable.{Map => MMap}
import scala.collection.mutable.{Map=>MMap}
scala> val amap = Map[Int, MMap[Int, String]]().withDefaultValue(MMap().withDefaultValue("default"))
amap: scala.collection.immutable.Map[Int,scala.collection.mutable.Map[Int,String]] = Map()
scala> val bmap = amap(2)
bmap: scala.collection.mutable.Map[Int,String] = Map()
scala> bmap(1)
res17: String = default
scala> bmap(1) = "something"
scala> val anotherDefault = amap(3)
anotherDefault: scala.collection.mutable.Map[Int,String] = Map(1 -> something)

The outer map (amap) is creating a single instance of the inner map to use as the default. When you access this via val bmap = amap(2), then modify bmap, you are modifying the single default map used by amap. When you call amap(3), you then get back this default map, which is now a map with the key/value pair (1 -> "something").
What you probably want is withDefault, not withDefaultValue, although it needs some extra argument/type specification to work:
val amap = Map[Int, MMap[Int, String]]().withDefault(x => MMap[Int, String]().withDefaultValue("default"))

Related

How to head DataFrame with Map[String,Long] column and preserve types?

I have a data frame on which I have applied filter condition
val colNames = customerCountDF
.filter($"fiscal_year" === maxYear && $"fiscal_month" === maxMnth)
out of all the selected rows, I just want the last column of one row.
The last column type is Map[String, Long]. I want all the keys of the map as List[String].
I tried below syntax
val colNames = customerCountDF
.filter($"fiscal_year" === maxYear && $"fiscal_month" === maxMnth)
.head
.getMap(14)
.keySet
.toList
.map(_.toString)
I am using map(_.toString) to convert a List[Nothing] to List[String]. The error that I am getting is:
missing parameter type for expanded function ((x$1) => x$1.toString)
[error] val colNames = customerCountDF.filter($"fiscal_year" === maxYear && $"fiscal_month" === maxMnth).head().getMap(14).keySet.toList.map(_.toString)
The df is as follows:
+-------------+-----+----------+-----------+------------+-------------+--------------------+--------------+--------+----------------+-----------+----------------+-------------+-------------+--------------------+
|division_name| low| call_type|fiscal_year|fiscal_month| region_name|abandon_rate_percent|answered_calls|connects|equiv_week_calls|equiv_weeks|equivalent_calls|num_customers|offered_calls| pv|
+-------------+-----+----------+-----------+------------+-------------+--------------------+--------------+--------+----------------+-----------+----------------+-------------+-------------+--------------------+
| NATIONAL|PHONE|CABLE CARD| 2016| 1|ALL DIVISIONS| 0.02| 10626| 0| 0.0| 0.0| 10649.8| 0| 10864|Map(subscribers_c...|
| NATIONAL|PHONE|CABLE CARD| 2016| 1| CENTRAL| 0.02| 3591| 0| 0.0| 0.0| 3598.6| 0| 3667|Map(subscribers_c...|
+-------------+-----+----------+-----------+------------+-------------+--------------------+--------------+--------+----------------+-----------+----------------+-------------+-------------+--------------------+
one row of just last column selected is
[Map(subscribers_connects -> 5521287, disconnects_hsd -> 7992, subscribers_xfinity home -> 6277491, subscribers_bulk units -> 4978892, connects_cdv -> 41464, connects_disconnects -> 16945, connects_hsd -> 32908, disconnects_internet essentials -> 10319, disconnects_disconnects -> 3506, disconnects_video -> 8960, connects_xfinity home -> 43012)]
I'd like to get the keys of the last column as List[String] after applying the filter condition and taking just one row from the data frame.
The type problem is easy to solve, by explicitly specifying the type parameters at the source which is getMap(14). Since you know that the you are expecting a Map of String -> Int key-value pairs, just replace getMap(14) by getMap[String, Int](14).
And as far as the getMap[String, Int](14) being an empty Map, that has to do with your data and you simply have an empty map at index 14 in the head row.
More Details
In Scala when you create a List[A], Scala infers the type by using the information available.
For example,
// Explicitly provide the type parameter info
scala> val l1: List[Int] = List(1, 2)
// l1: List[Int] = List(1, 2)
// Infer the type parameter by using the arguments passed to List constructor,
scala> val l2 = List(1, 2)
// l2: List[Int] = List(1, 2)
So, what happens when you create an empty list,
// Explicitly provide the type parameter info
scala> val l1: List[Int] = List()
// l1: List[Int] = List()
// Infer the type parameter by using the arguments passed to List constructor,
// but surprise, there are no argument since you are creating empty list
scala> val l2 = List()
// l2: List[Nothing] = List()
So, now when Scala does not know anything, it will choose the most suitable type it can find which is the "empty" type Nothing.
The same thing happens when you do a toList on other collection objects, it tries to infer the type parameter from the source object.
scala> val ks1 = Map.empty[Int, Int].keySet
// ks1: scala.collection.immutable.Set[Int] = Set()
scala> val l1 = ks1.toList
// l1: List[Int] = List()
scala> val ks2 = Map.empty.keySet
// ks: scala.collection.immutable.Set[Nothing] = Set()
scala> val l2 = ks2.toList
// l1: List[Nothing] = List()
Similarly, the getMap(14) which you called on the head Row of the DataFrame, infers the type parameters for the Map using the values it is getting from the Row at index 14. So, if it does not get anything at the said index the returned map will be same as Map.empty which is a Map[Nothing, Nothing].
Which means that your whole,
val colNames = customerCountDF.filter($"fiscal_year" === maxYear && $"fiscal_month" === maxMnth).head.getMap(14).keySet.toList.map(_.toString)
is equivalent to,
val colNames = Map.empty.keySet.toList.map(_.toString)
And hence,
scala> val l = List()
// l1: List[Nothing] = List()
val colNames = l.map(_.toString)
To summarise the above, any List[Nothing] can only be an empty list.
Now, there are two problems, one is about the type-problem in List[Nothing] the other is about it being empty.
After filter you can just select the column and get as Map as below
first().getAs[Map[String, Long]]("pv").keySet
Since you're accessing a single column only (at the 14th position), why not making your developer's live a bit easier (and help the people who would support your code later)?
Try the following:
val colNames = customerCountDF
.where($"fiscal_year" === maxYear) // Split one long filter into two
.where($"fiscal_month" === maxMnth) // where is a SQL-like alias of filter
.select("pv") // Take just the field you need to work with
.as[Map[String, Long]] // Map it to the proper type
.head // Load just the single field (all others are left aside)
.keySet // That's just a pure Scala
I think the above code says what it does in such a clear way (and I think should be the fastest out of the provided solutions since it just loads a single pv field to a JVM object on the driver).
A workaround to get the final result in List[String]. Check this out:
scala> val customerCountDF=Seq((2018,12,Map("subscribers_connects" -> 5521287L, "disconnects_hsd" -> 7992L, "subscribers_xfinity home" -> 6277491L, "subscribers_bulk units" -> 4978892L, "connects_cdv" -> 41464L, "connects_disconnects" -> 16945L, "connects_hsd" -> 32908L, "disconnects_internet essentials" -> 10319L, "disconnects_disconnects" -> 3506L, "disconnects_video" -> 8960L, "connects_xfinity home" -> 43012L))).toDF("fiscal_year","fiscal_month","mapc")
customerCountDF: org.apache.spark.sql.DataFrame = [fiscal_year: int, fiscal_month: int ... 1 more field]
scala> val maxYear =2018
maxYear: Int = 2018
scala> val maxMnth = 12
maxMnth: Int = 12
scala> val colNames = customerCountDF.filter($"fiscal_year" === maxYear && $"fiscal_month" === maxMnth).first.getMap(2).keySet.mkString(",").split(",").toList
colNames: List[String] = List(subscribers_connects, disconnects_hsd, subscribers_xfinity home, subscribers_bulk units, connects_cdv, connects_disconnects, connects_hsd, disconnects_internet essentials, disconnects_disconnects, disconnects_video, connects_xfinity home)
scala>

Convert query string to map in scala

I have a query string in this form:
val query = "key1=val1&key2=val2&key3=val3
I want to create a map with the above key/value pairs. So far I'm doing it like this:
//creating an iterator with 2 values in each group. Each index consists of a key/value pair
val pairs = query.split("&|=").grouped(2)
//inserting the key/value pairs into a map
val map = pairs.map { case Array(k, v) => k -> v }.toMap
Are there any problems with doing it like I do? If so, is there some library I could use to do it?
Here is an approach using the URLEncodedUtils:
import java.net.URI
import org.apache.http.client.utils.URLEncodedUtils
import org.apache.http.{NameValuePair => ApacheNameValuePair}
import scala.collection.JavaConverters._
import scala.collection.immutable.Seq
object GetEncodingTest extends App {
val url = "?one=1&two=2&three=3&three=3a"
val params = URLEncodedUtils.parse(new URI(url), "UTF_8")
val convertedParams: Seq[ApacheNameValuePair] = collection.immutable.Seq(params.asScala: _*)
val scalaParams: Seq[(String, String)] = convertedParams.map(pair => pair.getName -> pair.getValue)
val paramsMap: Map[String, String] = scalaParams.toMap
paramsMap.foreach(println)
}
Assuming the query string you are working with is as simple as you showed, the use of grouped(2) is a great insight and gives a pretty elegant looking solution.
The next step from where you're at is to use the under-documented Array::toMap method:
val qs = "key=value&foo=bar"
qs.split("&|=") // Array(key, value, foo, bar)
.grouped(2) // <iterator>
.map(a => (a(0), a(1))) // <iterator>
.toMap // Map(key -> value, foo -> bar)
grouped(2) returns an Iterator[Array[String]], that's a little harder to follow because iterators don't serialize nicely on the Scala console.
Here's the same result, but a bit more step-by-step:
val qs = "key=value&foo=bar"
qs.split("&") // Array(key=value, foo=bar)
.map(kv => (kv.split("=")(0), kv.split("=")(1))) // Array((key,value), (foo,bar))
.toMap // Map(key -> value, foo -> bar)
If you want a more general solution for HTTP query strings, consider using a library for URL parsing.

scala mutable Map withDefaultValue strange behaviour

I have this example which uses a mutable HashMap.withDefaultValue. withDefaultValues provides a way to return a value even if the key does not exist, but it should not modify the collection. in any case, there is a conflicting behaviour, as map.size returns 0, and at the same time map(key) returns a value.
how is this possible?
import scala.collection.mutable
val map = mutable.HashMap[String, mutable.Map[Int, String]]()
.withDefaultValue(mutable.HashMap[Int, String]())
map("id1")(2) = "three"
println(map.size) // 0 (expected)
println(map) // Map() (expected)
println(map("id1")) // Map(2 -> three) (unexpected)
println(map("id1")(2)) // three (unexpected)
It's possible to factor out defaultValue because it's passed as a value.
import scala.collection.mutable
val defaultValue = mutable.HashMap[Int, String]()
val map = mutable.HashMap[String, mutable.Map[Int, String]]()
.withDefaultValue(defaultValue)
map("id1")(2) = "three"
Which gives you
println(defaultValue) // Map(2 -> three)
... which should explain the rest of the behaviour. And that's exactly why I recommend immutable data structures ;-)

Scala : How to find types of values inside a scala nested collection

consider the following variables in scala :
val nestedCollection_1 = Array(
"key_1" -> Map("key_11" -> "value_11"),
"key_2" -> Map("key_22" -> "value_22"))
val nestedCollection_2 = Map(
"key_3"-> ["key_33","value_33"],
"key_4"-> ["key_44"->"value_44"])
Following are my questions :
1) I want to read the values of the variables nestedCollection_1, nestedCollection_2 and ensure that the value of the variables are of the format
Array[Map[String, Map[String, String]]
and
Map[String, Array[String]]]
2) Is it possible to get the detailed type of a variable in scala? i.e. nestedColelction_1.SOME_METHOD should return Array[Map[String, Map[String, String]] as type of its values
I am not sure what exacltly do you mean. Compiler can ensure type of any variable if you just annotate the type:
val nestedCollection_2: Map[String, List[String]] = Map(
"key_3"-> List("key_33", "value_33"),
"key_4"-> List("key_44", "value_44"))
You can see type of variable in scala repl when you define it, or using Alt + = in Intellij Idea.
scala> val nestedCollection_2 = Map(
| "key_3"-> List("key_33", "value_33"),
| "key_4"-> List("key_44", "value_44"))
nestedCollection_2: scala.collection.immutable.Map[String,List[String]] = Map(key_3 -> List(key_33, value_33), key_4 -> List(key_44, value_44))
Edit
I think I get your question now. Here is how you can get type as String:
import scala.reflect.runtime.universe._
def typeAsString[A: TypeTag](elem: A) = {
typeOf[A].toString
}
Test:
scala> typeAsString(nestedCollection_2)
res0: String = Map[String,scala.List[String]]
scala> typeAsString(nestedCollection_1)
res1: String = scala.Array[(String, scala.collection.immutable.Map[String,String])]

Scala :- Gatling :- Concatenation of two Maps stores last value only and ignores all other values

I have a two Maps and I want to concatenate them.
I tried almost all example given here Best way to merge two maps and sum the values of same key? but it ignores all values for key metrics and only stores last value.
I have downloaded scalaz-full_2.9.1-6.0.3.jar and imported it import scalaz._ but it won't works for me.
How can I concate this two maps with multiple values to same keys ?
Edit :-
Now I tried
val map = new HashMap[String, Set[String]] with MultiMap[String, String]
map.addBinding("""report_type""" , """performance""")
map.addBinding("""start_date""" ,start_date)
map.addBinding("""end_date""" , end_date)
map.addBinding("metrics" , "plays")
map.addBinding("metrics", "displays")
map.addBinding("metrics" , "video_starts")
map.addBinding("metrics" , "playthrough_25")
map.addBinding("metrics", "playthrough_50")
map.addBinding("metrics", "playthrough_75")
map.addBinding("metrics", "playthrough_100")
val map1 = new HashMap[String, Set[String]] with MultiMap[String, String]
map1.addBinding("""dimensions""" , """asset""")
map1.addBinding("""limit""" , """50""")
And tried to conver this mutable maps to immutable type using this link as
val asset_query_string = map ++ map1
val asset_query_string_map =(asset_query_string map { x=> (x._1,x._2.toSet) }).toMap[String, Set[String]]
But still I get
i_ui\config\config.scala:51: Cannot prove that (String, scala.collection.immutable.Set[String]) <:< (St
ring, scala.collection.mutable.Set[String]).
11:10:13.080 [ERROR] i.g.a.ZincCompiler$ - val asset_query_string_map =(asset_query_string map { x=> (x
._1,x._2.toSet) }).toMap[String, Set[String]]
Your problem is not related with a concatenation but with a declaration of the metrics map. It's not possible to have multiple values for a single key in a Map. Perhaps you should look at this collection:
http://www.scala-lang.org/api/2.10.3/index.html#scala.collection.mutable.MultiMap
You can't have duplicate keys in a Map.
for simple map it is impossible to have duplicates keys,if you have the duplicates keys in the map it takes the last one
but you can use MultiMap
import collection.mutable.{ HashMap, MultiMap, Set }
val mm = new HashMap[String, Set[String]] with MultiMap[String, String]
mm.addBinding("metrics","plays")
mm.addBinding("metrics","displays")
mm.addBinding("metrics","players")
println(mm,"multimap")//(Map(metrics -> Set(players, plays, displays)),multimap)
I was able to create two MultiMaps but when I tried to concatenate val final_map = map1 ++ map2
and I tried answer given here Mutable MultiMap to immutable Map
But my problem was not solved, I got
config\config.scala:51: Cannot prove that (String, scala.collection.immutable.Set[String]) <:< (St
ring, scala.collection.mutable.Set[String]).
finally it solved by
val final_map = map1 ++ map2
val asset_query_string_map = final_map.map(kv => (kv._1,kv._2.toSet)).toMap