Convert query string to map in scala - scala

I have a query string in this form:
val query = "key1=val1&key2=val2&key3=val3
I want to create a map with the above key/value pairs. So far I'm doing it like this:
//creating an iterator with 2 values in each group. Each index consists of a key/value pair
val pairs = query.split("&|=").grouped(2)
//inserting the key/value pairs into a map
val map = pairs.map { case Array(k, v) => k -> v }.toMap
Are there any problems with doing it like I do? If so, is there some library I could use to do it?

Here is an approach using the URLEncodedUtils:
import java.net.URI
import org.apache.http.client.utils.URLEncodedUtils
import org.apache.http.{NameValuePair => ApacheNameValuePair}
import scala.collection.JavaConverters._
import scala.collection.immutable.Seq
object GetEncodingTest extends App {
val url = "?one=1&two=2&three=3&three=3a"
val params = URLEncodedUtils.parse(new URI(url), "UTF_8")
val convertedParams: Seq[ApacheNameValuePair] = collection.immutable.Seq(params.asScala: _*)
val scalaParams: Seq[(String, String)] = convertedParams.map(pair => pair.getName -> pair.getValue)
val paramsMap: Map[String, String] = scalaParams.toMap
paramsMap.foreach(println)
}

Assuming the query string you are working with is as simple as you showed, the use of grouped(2) is a great insight and gives a pretty elegant looking solution.
The next step from where you're at is to use the under-documented Array::toMap method:
val qs = "key=value&foo=bar"
qs.split("&|=") // Array(key, value, foo, bar)
.grouped(2) // <iterator>
.map(a => (a(0), a(1))) // <iterator>
.toMap // Map(key -> value, foo -> bar)
grouped(2) returns an Iterator[Array[String]], that's a little harder to follow because iterators don't serialize nicely on the Scala console.
Here's the same result, but a bit more step-by-step:
val qs = "key=value&foo=bar"
qs.split("&") // Array(key=value, foo=bar)
.map(kv => (kv.split("=")(0), kv.split("=")(1))) // Array((key,value), (foo,bar))
.toMap // Map(key -> value, foo -> bar)
If you want a more general solution for HTTP query strings, consider using a library for URL parsing.

Related

How to add to an Immutable map : Scala

I have a ResultSet object returned from Hive using JDBC.
I am trying to store the values in a resultset in a Scala Immutable Map.
How can i add there values to an Immutable map as i am iterating the resultset using while loop
val m : Map[String, String] = null
while ( resultSet.next() ) {
val col = resultSet.getString("col_name")
val data = resultSet.getString("data_type")
m += (col -> data) // This Gives Reassignment error
}
I propose :
Iterator.continually{
val col = resultSet.getString("col_name")
val data = resultSet.getString("data_type")
col->data
}.takeWhile( _ => resultSet.next()).toMap
Instead of thinking "let's init an empty collection and fill it" which is imho the mutable way to think, this proposition rather think in terms of "let's declare how to build a collection with those elements in it and be done" :-)
You might want to use scala.collection.Iterator[A] so that you can create immutable map out of your java resultSet.
val myMap : Map[String, String] = new Iterator[(String, String)] {
override def hasNext = resultSet.next()
override def next() = {
val col = resultSet.getString("col_name")
val data = resultSet.getString("data_type")
col -> data
}
}.toMap
Otherwise you have to use mutable scala.collection.mutable.Map.

How to create a Future of a map of a different type

I am using com.twitter.util.Future, scala 2.11.11
I have this piece of code that I'm trying to convert into a Future[Map[Long, String]]
val simpleMap: Map[Long, Int] = Map(1L -> 2, 2L -> 4)
val keyToNewFutureMap = Future.collect(simpleMap.map {
case (key, value) =>
val newFuture = getAFutureFromValue(value)
key -> newFuture
}.toSeq.toMap
)
val keyToFutureMap = Map(1L -> Future.value(1))
val futureMap = Future.collect(keyToFutureMap) // converts into a
Future[Map[Long, Int]]
Future.collect(Seq(futureMap, keyToNewFutureMap)) // Stuck here
I'm stuck here. I wanted to use the returned maps from both Futures and generate a new map. The new map will contain unique keys that appear in both futureMap and keyToNewFutureMap.
keyToFutureMap is given in the form of a Map[Long, Future[Option[Int]]], which is why I used a collect to turn it into a Future[Map[Long, Int]]
Any help is most appreciated.
If I understood correctly, you want this:
val newFutureMap = Future.traverse(simpleMap) {
case (key, value) =>
getAFutureFromValue(value).map(key -> _)
}.map(_.toMap)

Scala flattening embedded list of lists

I have created a Twitter datastream that is displaying hashtag, author, and mentioned users in the below format.
(List(timetofly, hellocake),Shera_Eyra,List(blxcknicotine, kimtheskimm))
I can't do analysis on this format because of the embedded lists. How can I create another datastream that displays the data in this format?
timetofly, Shera_Eyra, blxcknicotine
timetofly, Shera_Eyra, kimtheskimm
hellocake, Shera_Eyra, blxcknicotine
hellocake, Shera_Eyra, kimtheskimm
Here is my code to produce the data:
val sparkConf = new SparkConf().setAppName("TwitterPopularTags")
val ssc = new StreamingContext(sparkConf, Seconds(sampleInterval))
val stream = TwitterUtils.createStream(ssc, None)
val data = stream.map {line =>
(line.getHashtagEntities.map(_.getText),
line.getUser().getScreenName(),
line.getUserMentionEntities.map(_.getScreenName).toList)
}
In your code snippet, data is a DStream[(Array[String], String, List[String])]. To get a DStream[String] in your desired format, you can use flatMap and map:
val data = stream.map { line =>
(line.getHashtagEntities.map(_.getText),
line.getUser().getScreenName(),
line.getUserMentionEntities.map(_.getScreenName).toList)
}
val data2 = data.flatMap(a => a._1.flatMap(b => a._3.map(c => (b, a._2, c))))
.map { case (hash, user, mention) => s"$hash, $user, $mention" }
The flatMap results in a DStream[(String, String, String)] in which each tuple consists of a hash tag entity, user, and mention entity. The subsequent call to map with the pattern matching creates a DStream[String] in which each String consists of the elements in each tuple, separated by a comma and space.
I would use for comprehension for this:
val data = (List("timetofly", "hellocake"), "Shera_Eyra", List("blxcknicotine", "kimtheskimm"))
val result = for {
hashtag <- data._1
user = data._2
mentionedUser <- data._3
} yield (hashtag, user, mentionedUser)
result.foreach(println)
Output:
(timetofly,Shera_Eyra,blxcknicotine)
(timetofly,Shera_Eyra,kimtheskimm)
(hellocake,Shera_Eyra,blxcknicotine)
(hellocake,Shera_Eyra,kimtheskimm)
If you would prefer a seq of lists of strings, rather than a seq of tuples of strings, then change the yield to give you a list instead: yield List(hashtag, user, mentionedUser)

How to join two RDDs by key to get RDD of (String, String)?

I have two paired rdds in the form RDD [(String, mutable.HashSet[String]):
For example:
rdd1: 332101231222, "320758, 320762, 320760, 320759, 320757, 320761"
rdd2: 332101231222, "220758, 220762, 220760, 220759, 220757, 220761"
I want to combine rdd1 and rdd2 based on common keys, so o/p should be like:
332101231222 320758, 320762, 320760, 320759, 320757, 320761 220758, 220762, 220760, 220759, 220757, 220761
Here is my code:
def cogroupTest (rdd1: RDD [(String, mutable.HashSet[String])], rdd2: RDD [(String, mutable.HashSet[String])] ): Unit =
{
val prods_per_user_co_grouped = (rdd1).cogroup(rdd2)
prods_per_user_co_grouped.map { case (key: String, (value1: mutable.HashSet[String], value2: mutable.HashSet[String])) => {
val combinedhs = value1 ++ value2
val sstr = combinedhs.mkString("\t")
val keypadded = key + "\t"
s"$keypadded$sstr"
}
}.saveAsTextFile("/scratch/rdds_joined/")
Here is the error that I get when I run the my program:
scala.MatchError: (32101231222,(CompactBuffer(Set(320758, 320762, 320760, 320759, 320757, 320761)),CompactBuffer(Set(220758, 220762, 220760, 220759, 220757, 220761)))) (of class scala.Tuple2)
Any help with this will be great!
As you might guess from the name cogroup groups observations by key. It means that in your case you get:
(String, (Iterable[mutable.HashSet[String]], Iterable[mutable.HashSet[String]]))
not
(String, (mutable.HashSet[String], mutable.HashSet[String]))
It is pretty clear when you take a look at the error you get. If you want to combine pairs you should use join method. If not you should adjust pattern to match structure you get and then use something like this:
val combinedhs = value1.reduce(_ ++ _) ++ value2.reduce(_ ++ _)

Nested Map withDefaultValue changes default value

I have a mutable map containing another mutable map, both with default values. After I assign a value to one key in the enclosed map, its default value seems to change.
I.e. I expected anotherDefault to have the value Map(1 -> default), NOT Map(1 -> something).
Why is this happening?
scala> import scala.collection.mutable.{Map => MMap}
import scala.collection.mutable.{Map=>MMap}
scala> val amap = Map[Int, MMap[Int, String]]().withDefaultValue(MMap().withDefaultValue("default"))
amap: scala.collection.immutable.Map[Int,scala.collection.mutable.Map[Int,String]] = Map()
scala> val bmap = amap(2)
bmap: scala.collection.mutable.Map[Int,String] = Map()
scala> bmap(1)
res17: String = default
scala> bmap(1) = "something"
scala> val anotherDefault = amap(3)
anotherDefault: scala.collection.mutable.Map[Int,String] = Map(1 -> something)
The outer map (amap) is creating a single instance of the inner map to use as the default. When you access this via val bmap = amap(2), then modify bmap, you are modifying the single default map used by amap. When you call amap(3), you then get back this default map, which is now a map with the key/value pair (1 -> "something").
What you probably want is withDefault, not withDefaultValue, although it needs some extra argument/type specification to work:
val amap = Map[Int, MMap[Int, String]]().withDefault(x => MMap[Int, String]().withDefaultValue("default"))