I am writing below code,
val maplist=List(Map("id" -> "1", "Name" -> "divya"),
Map("id" -> "2", "Name" -> "gaya")
)
val header=maplist.flatMap(_.keys).distinct
val data=maplist.flatMap(_.values)
println(header)
println(data)
I am getting the below output,
List(id, Name)
List(1, divya, 2, gaya)
however I am expecting output as below,
id Name
1 Divya
2 gaya
here in this case I am having only 2 header but in my map it may contain more than 2 headers how to display all in rows. Please help me.
val maplist=List(Map("id" -> "1", "Name" -> "divya"),
Map("id" -> "2", "Name" -> "gaya")
)
val header=maplist.flatMap(_.keys).distinct
val data=maplist.map(_.values)
println(header.mkString(" "))
data.foreach(x => println(x.mkString(" ")))
Related
I have a Scala List of Map[String, String] like this:
val data: List[Map[String, String]] = List(Map("key" -> "123", "fname" -> "Alice", "lname" -> "Baker"), Map("key" -> "456", "fname" -> "Bob", "lname" -> "Lotts"))
I want to transform this to a List like this: List(Map(id -> 123, name -> Alice Baker), Map(id -> 456, name -> Bob Lotts)). Basically, I need to change the key to id and concatenate the fname and lname to name.
I tried the below code. It works, but I am sure there should be a better way of doing this. Can anyone please suggest?
val modData: List[Map[String, String]] = data.map(d => Map("id" -> d.getOrElse("key", ""), "name" -> s"${d.getOrElse("fname", "")} ${d.getOrElse("lname", "")}"))
I would do it in steps, and use default for the map to make it more readable:
val keys = Seq("key", "fname", "lname")
list.iterator
.map(_.withDefault(_ => ""))
.map(keys.map)
.collect { case Seq(id, fname, lname) => Map("id" -> id, "name" -> s"$fname $lname") }
.toList
val mapa = Map("a" -> Array(Map("b" -> "c", "d" -> Array("e"))))
val mapa2 = Map("a" -> Array(Map("b" -> "c", "d" -> Array("e"))))
Is there way how to get key and value from both same maps and compare them?
or how to get all key from map with such structure?
I have this Seq[Map[String, String]] :
val val1 = Seq(
Map("Name" -> "Heidi",
"City" -> "Paris",
"Age" -> "23"),
Map(("Country" -> "France")),
Map("Color" -> "Blue",
"City" -> "Paris"))
and I have this Seq[String]
val val2 = Seq["Name", "Country", "City", "Department"]
Expected output is val1 with all keys present in val2 (I want to filter out the (k,v) from v1 that have keys that are not present in val2) :
val expected = Seq(Map("Name" -> "Heidi", "City" -> "Paris"), Map( "Country" -> "France")), Map("City" -> "Paris"))
Age and Color are strings that are not in val2, I want to omit them from val1 map.
I'm not sure if what you propose is a right approach but nevertheless, it can be done like this:
val1.map(_.filter {
case (key, value) => val2.contains(key)
})
It seems you want something like this:
(note that I used a Set instead of a List to make contains faster)
def ensureMapsHaveOnlyValidKeys[K, V](validKeys: Set[K])(data: IterableOnce[Map[K, V]]): List[Map[K, V]] =
data
.iterator
.filter(_.keysIterator.forall(validKeys.contains))
.toList
I have multiple Map[String, String] in a List (Scala). For example:
map1 = Map("EMP_NAME" -> “Ahmad”, "DOB" -> “01-10-1991”, "CITY" -> “Dubai”)
map2 = Map("EMP_NAME" -> “Rahul”, "DOB" -> “06-12-1991”, "CITY" -> “Mumbai”)
map3 = Map("EMP_NAME" -> “John”, "DOB" -> “11-04-1996”, "CITY" -> “Toronto”)
list = List(map1, map2, map3)
Now I want to create a single dataframe with something like this:
EMP_NAME DOB CITY
Ahmad 01-10-1991 Dubai
Rahul 06-12-1991 Mumbai
John 11-04-1996 Toronto
How do I achieve this?
you can do it like this :
import spark.implicits._
val df = list
.map( m => (m.get("EMP_NAME"),m.get("DOB"),m.get("CITY")))
.toDF("EMP_NAME","DOB","CITY")
df.show()
+--------+----------+-------+
|EMP_NAME| DOB| CITY|
+--------+----------+-------+
| Ahmad|01-10-1991| Dubai|
| Rahul|06-12-1991| Mumbai|
| John|11-04-1996|Toronto|
+--------+----------+-------+
Slightly less specific approach, e.g:
val map1 = Map("EMP_NAME" -> "Ahmad", "DOB" -> "01-10-1991", "CITY" -> "Dubai")
val map2 = Map("EMP_NAME" -> "John", "DOB" -> "01-10-1992", "CITY" -> "Mumbai")
///...
val list = List(map1, map2) // map3, ...
val RDDmap = sc.parallelize(list)
// Get cols dynamically
val cols = RDDmap.take(1).flatMap(x=> x.keys)
// Map is K,V like per Map entry
val df = RDDmap.map{ value=>
val list=value.values.toList
(list(0), list(1), list(2))
}.toDF(cols:_*) // dynamic column names assigned
df.show(false)
returns:
+--------+----------+------+
|EMP_NAME|DOB |CITY |
+--------+----------+------+
|Ahmad |01-10-1991|Dubai |
|John |01-10-1992|Mumbai|
+--------+----------+------+
or to answer your sub-question, here as follows - at least I think this is what you are asking, but probably not:
val RDDmap = sc.parallelize(List(
Map("EMP_NAME" -> "Ahmad", "DOB" -> "01-10-1991", "CITY" -> "Dubai"),
Map("EMP_NAME" -> "John", "DOB" -> "01-10-1992", "CITY" -> "Mumbai")))
...
// Get cols dynamically
val cols = RDDmap.take(1).flatMap(x=> x.keys)
// Map is K,V like per Map entry
val df = RDDmap.map{ value=>
val list=value.values.toList
(list(0), list(1), list(2))
}.toDF(cols:_*) // dynamic column names assigned
You can build a list dynamically of course, but you still need to assign the Map elements. See Appending Data to List or any other collection Dynamically in scala. I would just read in from file and be done with it.
import org.apache.spark.SparkContext
import org.apache.spark.sql._
import org.apache.spark.sql.types.{StringType, StructField, StructType}
object DataFrameTest2 extends Serializable {
var sparkSession: SparkSession = _
var sparkContext: SparkContext = _
var sqlContext: SQLContext = _
def main(args: Array[String]): Unit = {
sparkSession = SparkSession.builder().appName("TestMaster").master("local").getOrCreate()
sparkContext = sparkSession.sparkContext
val sqlContext = new org.apache.spark.sql.SQLContext(sparkContext)
val map1 = Map("EMP_NAME" -> "Ahmad", "DOB" -> "01-10-1991", "CITY" -> "Dubai")
val map2 = Map("EMP_NAME" -> "Rahul", "DOB" -> "06-12-1991", "CITY" -> "Mumbai")
val map3 = Map("EMP_NAME" -> "John", "DOB" -> "11-04-1996", "CITY" -> "Toronto")
val list = List(map1, map2, map3)
//create your rows
val rows = list.map(m => Row(m.values.toSeq:_*))
//create the schema from the header
val header = list.head.keys.toList
val schema = StructType(header.map(fieldName => StructField(fieldName, StringType, true)))
//create your rdd
val rdd = sparkContext.parallelize(rows)
//create your dataframe using rdd
val df = sparkSession.createDataFrame(rdd, schema)
df.show()
}
}
I have written the following code
val list = List(
Map("empid" -> "12", "empName" -> "Rohan", "depId" -> "201"),
Map("empid" -> "13", "empName" -> "swathi", "depId" -> "202")
).flatten.toMap
val mapRDD= sc.parallelize(Seq(list))
val columns=mapRDD.take(1).flatMap(a=>a.keys)
val columnval=mapRDD.take(2).flatMap(a=>a.keys)
val resultantDF=mapRDD.map{value=>
val list=value.values.toList
(list(0),list(1),list(2))
}.toDF(columns:_*)
resultantDF.show()
i am expecting the below output,
+-----+-------+-----+
|empid|empName|depId|
+-----+-------+-----+
| 12| Rohan| 201|
| 13|SWATHI|202 |
but i am getting only,
+-----+-------+-----+
|empid|empName|depId|
+-----+-------+-----+
| 13|SWATHI|202
Please let me know where i am doing the mistake.
The problem lies in your first line only,
scala> val list = List(
| Map("empid" -> "12", "empName" -> "Rohan", "depId" -> "201"),
| Map("empid" -> "13", "empName" -> "swathi", "depId" -> "202")
| ).flatten.toMap
// list: scala.collection.immutable.Map[String,String] = Map(empid -> 13, empName -> swathi, depId -> 202)
Your list actually ends up becoming a Map. And a Map can have only 1 value for each key.
Let's do the first line step by step,
So, first you created a list of maps,
scala> val listOfMaps = List(
| Map("empid" -> "12", "empName" -> "Rohan", "depId" -> "201"),
| Map("empid" -> "13", "empName" -> "swathi", "depId" -> "202")
| )
// list: List[scala.collection.immutable.Map[String,String]] = List(Map(empid -> 12, empName -> Rohan, depId -> 201), Map(empid -> 13, empName -> swathi, depId -> 202))
Then, you flattened the maps inside the listOfMaps which will result in a list of key-value pairs.
scala> val flattenedListOfMaps = listOfMaps.flatten
// flattenedListOfMaps: List[(String, String)] = List((empid,12), (empName,Rohan), (depId,201), (empid,13), (empName,swathi), (depId,202))
Now, you are converting it to a Map using toMap, which will keep on overriding the values of keys and result in a Map with unique keys,
scala> scala> val yourMap = flattenedListOfMaps.toMap
// yourMap: scala.collection.immutable.Map[String,String] = Map(empid -> 13, empName -> swathi, depId -> 202)
As already pointed out in the previous answer and comment, at the moment your list variable is actually a map (which is confusing at least).
What you probably want initially as input is a list.
Hence what you need is:
1.
get rid of .flatten.toMap:
val list = List(
Map("empid" -> "12", "empName" -> "Rohan", "depId" -> "201"),
Map("empid" -> "13", "empName" -> "swathi", "depId" -> "202")
)
2.
Also when calling sc.parallelize you don't need to create a separate Seq from original input (in fact, otherwise you would have a compile error).
So you also need to change it like this:
val mapRDD = sc.parallelize(list)
After making only those two changes you will receive expected result, i.e. 2 records shown in console output.