Flatten a map into pairs (key, value) in Scala [closed] - scala

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed last month.
Improve this question
Say I construct the following map in scala:
val map = Map.empty[String, Seq[String]] + ("A" -> ("1", "2", "3", "4"), "B" -> ("2", "3"), "C" -> ("3", "4"))
My output should be a Sequence of single key, value pairs. Namely, it should look like this:
[("A", "1"), ("A", "2"), ("A", "3"), ("A", "4"), ("B", "2"), ("B", "3"), ("B", "2"), ("C", "3"),
("C", "4")]
How can I obtain this using flatMap?

I would guess that your original goal was to create next map (note added Seqs in map):
val map = Map.empty[String, Seq[String]] + ("A" -> Seq("1", "2", "3", "4"), "B" -> Seq("2", "3"), "C" -> Seq("3", "4"))
Then you will be able to easily transform it easily with:
val result = map.toSeq.flatMap { case (k, v) => v.map((k, _)) }

Also, you can create the map directly, no need appending to an empty map.
val map = Map("A" -> Seq("1", "2", "3", "4"), "B" -> Seq("2", "3"), "C" -> Seq("3", "4"))

Related

Create a list of Map from csv feeder in gatling

I am using Gatling pebble template in my Gatling simulation, this works with a Map like this
val mapValuesFeeder = Iterator.continually(Map("mapValues" -> List(
Map("id" -> "1", "weight" -> "10"),
Map("id" -> "2", "weight" -> "20"),
)))
but I don't want to hardcode these values in a Map, how can I create a map similar to above Map from CSV feeder data?

Spark provide list of all columns in DataFrame groupBy [duplicate]

This question already has an answer here:
Scala-Spark Dynamically call groupby and agg with parameter values
(1 answer)
Closed 4 years ago.
I need to group the DataFrame by all columns except "tag"
Right now I can do it in the following way:
unionDf.groupBy("name", "email", "phone", "country").agg(collect_set("tag").alias("tags"))
Is it possible to get all columns(except "tag") and pass them to groupBy method without a need to hardcode them as I do it now - "name", "email", "phone", "country".
I tried unionDf.groupBy(unionDf.columns) but it doesn't work
Here's one approach:
import org.apache.spark.sql.functions._
val df = Seq(
("a", "b#c.com", "123", "US", "ab1"),
("a", "b#c.com", "123", "US", "ab2"),
("d", "e#f.com", "456", "US", "de1")
).toDF("name", "email", "phone", "country", "tag")
val groupCols = df.columns.diff(Seq("tag"))
df.groupBy(groupCols.map(col): _*).agg(collect_set("tag").alias("tags")).show
// +----+-------+-----+-------+----------+
// |name| email|phone|country| tags|
// +----+-------+-----+-------+----------+
// | d|e#f.com| 456| US| [de1]|
// | a|b#c.com| 123| US|[ab2, ab1]|
// +----+-------+-----+-------+----------+

GraphFrame: missing or invalid dependency detected while loading class file

I am trying to create a graph using spark graphframe
here is the code:
import org.graphframes._
// Node DataFrames
val v = sqlContext.createDataFrame(List(
("a", "Alice", 34),
("b", "Bob", 36),
("c", "Charlie", 30),
("d", "David", 29),
("e", "Esther", 32),
("f", "Fanny", 36),
("g", "Gabby", 60)
)).toDF("id", "name", "age")
// Edge DataFrame
val e = sqlContext.createDataFrame(List(
("a", "b", "friend"),
("b", "c", "follow"),
("c", "b", "follow"),
("f", "c", "follow"),
("e", "f", "follow"),
("e", "d", "friend"),
("d", "a", "friend"),
("a", "e", "friend")
)).toDF("src", "dst", "relationship")
// Create a GraphFrame
val g = GraphFrame(v, e)
But this is the error I am getting:
error: missing or invalid dependency detected while loading class file
'GraphFrame.class'. Could not access type Logging in package
org.apache.spark, because it (or its dependencies) are missing. Check
your build definition for missing or conflicting dependencies. (Re-run
with -Ylog-classpath to see the problematic classpath.) A full
rebuild may help if 'GraphFrame.class' was compiled against an
incompatible version of org.apache.spark.
I am using Apache Spark 2.1 and Scala 2.11. Any suggestion what can be the issue?
Download following packages from maven central repo
com.typesafe.scala-logging_scala-logging-api_2.11-2.1.2.jar
graphframes_graphframes-0.5.0-spark2.1-s_2.11.jar
org.slf4j_slf4j-api-1.7.7.jar
com.typesafe.scala-logging_scala-logging-slf4j_2.11-2.1.2.jar
org.scala-lang_scala-reflect-2.11.0.jar
Add the following to your sparks-default.conf file(comma separated list of absolute path where the above downloaded jars are located)
spark.jars path_2_jar/org.slf4j_slf4j-api-1.7.7.jar, path_2_jar/org.scala-lang_scala-reflect-2.11.0.jar, path_2_jar/graphframes_graphframes-0.5.0-spark2.1-s_2.11.jar, path_2_jar/com.typesafe.scala-logging_scala-logging-slf4j_2.11-2.1.2.jar, path_2_jar/com.typesafe.scala-logging_scala-logging-api_2.11-2.1.2.jar

How to create a sample dataframe in Scala / Spark

I'm trying to create a simple DataFrame as follows:
import sqlContext.implicits._
val lookup = Array("one", "two", "three", "four", "five")
val theRow = Array("1",Array(1,2,3), Array(0.1,0.4,0.5))
val theRdd = sc.makeRDD(theRow)
case class X(id: String, indices: Array[Integer], weights: Array[Float] )
val df = theRdd.map{
case Array(s0,s1,s2) => X(s0.asInstanceOf[String],s1.asInstanceOf[Array[Integer]],s2.asInstanceOf[Array[Float]])
}.toDF()
df.show()
df is defined as
df: org.apache.spark.sql.DataFrame = [id: string, indices: array<int>, weights: array<float>]
which is what I want.
Upon executing, I get
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 13.0 failed 1 times, most recent failure: Lost task 1.0 in stage 13.0 (TID 50, localhost): scala.MatchError: 1 (of class java.lang.String)
Where is this MatchError coming from? And, is there a simpler way to create sample DataFrames programmatically?
First, theRow should be a Row and not an Array. Now, if you modify your types in such a way that the compatibility between Java and Scala is respected, your example will work
val theRow =Row("1",Array[java.lang.Integer](1,2,3), Array[Double](0.1,0.4,0.5))
val theRdd = sc.makeRDD(Array(theRow))
case class X(id: String, indices: Array[Integer], weights: Array[Double] )
val df=theRdd.map{
case Row(s0,s1,s2)=>X(s0.asInstanceOf[String],s1.asInstanceOf[Array[Integer]],s2.asInstanceOf[Array[Double]])
}.toDF()
df.show()
//+---+---------+---------------+
//| id| indices| weights|
//+---+---------+---------------+
//| 1|[1, 2, 3]|[0.1, 0.4, 0.5]|
//+---+---------+---------------+
For another example that you can refer
import spark.implicits._
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val columns=Array("id", "first", "last", "year")
val df1=sc.parallelize(Seq(
(1, "John", "Doe", 1986),
(2, "Ive", "Fish", 1990),
(4, "John", "Wayne", 1995)
)).toDF(columns: _*)
val df2=sc.parallelize(Seq(
(1, "John", "Doe", 1986),
(2, "IveNew", "Fish", 1990),
(3, "San", "Simon", 1974)
)).toDF(columns: _*)

How to add a list of elements to session in Play?

I'm new to Play framework. How to add a list of elements to a session?
The compiler always complains about the code:
val cookies: List[(String, String)] = List[("a", "b), ("c", "d")]
Ok(views.html.hello(info)).withSession(request.session + cookies)
You don't need to copy existing session yourself:
val cookies: List[(String, String)] = List(("a", "b"), ("c", "d"))
Ok(views.html.hello(info)).addingToSession(cookies: _*)