I am trying to create a array of map based on few condition. Following is my function. Even though I am provided Map scala force me to have a tuple as return type. Is there a way I can fix it?
def getSchemaMap(schema: StructType): Array[(String, String)] ={
schema.fields.flatMap {
case StructField(name, StringType, _, _) => Map(name-> "String")
case StructField(name, IntegerType, _, _) => Map(name-> "int")
case StructField(name, LongType, _, _) => Map(name-> "int")
case StructField(name, DoubleType, _, _) => Map(name-> "int")
case StructField(name, TimestampType, _, _) => Map(name-> "timestamp")
case StructField(name, DateType, _, _) => Map(name-> "date")
case StructField(name, BooleanType, _, _) => Map(name-> "boolean")
case StructField(name, _:DecimalType, _, _) => Map(name-> "decimal")
case StructField(name, _, _, _) => Map(name-> "String")
}
}
Use toMap to convert Array[(String, String)] to Map[String, String]:
def getSchemaMap(schema: StructType): Map[String, String] = {
schema.fields.flatMap {
case StructField(name, StringType, _, _) => Map(name -> "String")
case StructField(name, IntegerType, _, _) => Map(name -> "int")
case StructField(name, LongType, _, _) => Map(name -> "int")
case StructField(name, DoubleType, _, _) => Map(name -> "int")
case StructField(name, TimestampType, _, _) => Map(name -> "timestamp")
case StructField(name, DateType, _, _) => Map(name -> "date")
case StructField(name, BooleanType, _, _) => Map(name -> "boolean")
case StructField(name, _: DecimalType, _, _) => Map(name -> "decimal")
case StructField(name, _, _, _) => Map(name -> "String")
}.toMap
}
But actually you don't need to use flatMap here since you map one value to one value, not value to multiple values. So for that case you can just map values to tuples and then convert the List of tuples to Map
def getSchemaMap(schema: StructType): Map[String, String] = {
schema.fields.map {
case StructField(name, StringType, _, _) => name -> "String"
case StructField(name, IntegerType, _, _) => name -> "int"
case StructField(name, LongType, _, _) => name -> "int"
case StructField(name, DoubleType, _, _) => name -> "int"
case StructField(name, TimestampType, _, _) => name -> "timestamp"
case StructField(name, DateType, _, _) => name -> "date"
case StructField(name, BooleanType, _, _) => name -> "boolean"
case StructField(name, _: DecimalType, _, _) => name -> "decimal"
case StructField(name, _, _, _) => name -> "String"
}.toMap
}
If I understood correctly you are trying to extract the columns (pairs of name/type) into a dictionary of Map[String, String]. You can use the build-in functionality for this and leverage the existing API hence I don't see the need for any custom pattern matching.
You can use df.schema.fields or df.schema.toDDL as explained next:
df.schema.fields.map(f => (f.name, f.dataType.typeName)).toMap // also try out f.dataType.simpleString
// res4: scala.collection.immutable.Map[String,String] = Map(col1 -> string, col2 -> integer, col3 -> string, col4 -> string)
df.schema.toDDL.split(",").map{f => (f.split(" ")(0), f.split(" ")(1))}.toMap
// res8: scala.collection.immutable.Map[String,String] = Map(`col1` -> STRING, `col2` -> INT, `col3` -> STRING, `col4` -> STRING)
And with function:
def schemaToMap(schema: StructType): Map[String, String] =
schema.fields.map(f => (f.name, f.dataType.typeName)).toMap
Everything is correct except flatmap which will convert (flatten) Array[Map(String, String)] to Array[(String, String)].
Replace flatmap with map.
def getSchemaMap(schema: StructType): Array[Map[String, String]] =
schema.fields.map {
//case statements
}
Related
val adjList = Map("Logging" -> List("Networking", "Game"))
// val adjList: Map[String, List[String]] = Map(Logging -> List(Networking, Game))
adjList.flatMap { case (v, vs) => vs.map(n => (v, n)) }.toList
// val res7: List[(String, String)] = List((Logging,Game))
adjList.map { case (v, vs) => vs.map(n => (v, n)) }.flatten.toList
// val res8: List[(String, String)] = List((Logging,Networking), (Logging,Game))
I am not sure what is happening here. I was expecting the same result from both of them.
.flatMap is Map's .flatMap, but .map is Iterable's .map.
For a Map "Logging" -> "Networking" and "Logging" -> "Game" become just the latter "Logging" -> "Game" because the keys are the same.
val adjList: Map[String, List[String]] = Map("Logging" -> List("Networking", "Game"))
val x0: Map[String, String] = adjList.flatMap { case (v, vs) => vs.map(n => (v, n)) }
//Map(Logging -> Game)
val x: List[(String, String)] = x0.toList
//List((Logging,Game))
val adjList: Map[String, List[String]] = Map("Logging" -> List("Networking", "Game"))
val y0: immutable.Iterable[List[(String, String)]] = adjList.map { case (v, vs) => vs.map(n => (v, n)) }
//List(List((Logging,Networking), (Logging,Game)))
val y1: immutable.Iterable[(String, String)] = y0.flatten
//List((Logging,Networking), (Logging,Game))
val y: List[(String, String)] = y1.toList
//List((Logging,Networking), (Logging,Game))
Also https://users.scala-lang.org/t/map-flatten-flatmap/4180
I am trying to reverse a map that has a String as the key and a set of numbers as its value
My goal is to create a list that contains a tuple of a number and a list of strings that had the same number in the value set
I have this so far:
def flipMap(toFlip: Map[String, Set[Int]]): List[(Int, List[String])] = {
toFlip.flatMap(_._2).map(x => (x, toFlip.keys.toList)).toList
}
but it is only assigning every String to every Int
val map = Map(
"A" -> Set(1,2),
"B" -> Set(2,3)
)
should produce:
List((1, List(A)), (2, List(A, B)), (3, List(B)))
but is producing:
List((1, List(A, B)), (2, List(A, B)), (3, List(A, B)))
This works to, but it's not exactly what you might need and you may need some conversions to get the exact data type you need:
toFlip.foldLeft(Map.empty[Int, Set[String]]) {
case (acc, (key, numbersSet)) =>
numbersSet.foldLeft(acc) {
(updatingMap, newNumber) =>
updatingMap.updatedWith(newNumber) {
case Some(existingSet) => Some(existingSet + key)
case None => Some(Set(key))
}
}
}
I used Set to avoid duplicate key insertions in the the inner List, and used Map for better look up instead of the outer List.
You can do something like this:
def flipMap(toFlip: Map[String, Set[Int]]): List[(Int, List[String])] =
toFlip
.toList
.flatMap {
case (key, values) =>
values.map(value => value -> key)
}.groupMap(_._1)(_._2)
.view
.mapValues(_.distinct)
.toList
Note, I personally would return a Map instead of a List
Or if you have cats in scope.
def flipMap(toFlip: Map[String, Set[Int]]): Map[Int, Set[String]] =
toFlip.view.flatMap {
case (key, values) =>
values.map(value => Map(value -> Set(key)))
}.toList.combineAll
// both scala2 & scala3
scala> map.flatten{ case(k, s) => s.map(v => (k, v)) }.groupMapReduce{ case(k, v) => v }{case(k, v) => List(k)}{ _ ++ _ }
val res0: Map[Int, List[String]] = Map(1 -> List(A), 2 -> List(A, B), 3 -> List(B))
// scala3 only
scala> map.flatten((k, s) => s.map(v => (k, v))).groupMapReduce((k, v) => v)((k, v) => List(k))( _ ++ _ )
val res1: Map[Int, List[String]] = Map(1 -> List(A), 2 -> List(A, B), 3 -> List(B))
I have a Map like this:
val map: Map[String, Any] = Map(
"Item Version" -> 1.0,
"Item Creation Time" -> "2019-04-14 14:15:09",
"Trade Dictionary" -> Map(
"Country" -> "India",
"TradeNumber" -> "1",
"action" -> Map(
"Action1" -> false
),
"Value" -> "XXXXXXXXXXXXXXX"
),
"Payments" -> Map(
"Payment Details" -> List(
Map(
"Payment Date" -> "2019-04-11",
"Payment Type" -> "Rej"
))))
I have written a piece of code:
def flattenMap(map: Map[String, Any]): Map[String, Any] = {
val c = map.flatten {
case ((key, map : Map[String, Any])) => map
case ((key, value)) => Map(key -> value)
// case ((key, List(map))) =>
}.toMap
return c
}
def secondFlatten(map: Map[String, Any]): Map[String, Any] = {
val c=map.flatten {
case ((key, map : Map[String, Any])) => flattenMap(map)
case ((key, value)) => Map(key -> value)
}.toMap
return c
}
Which gives output like this:
(Country, India)
(Action1, false)
(Value, XXXXXXXXXXXXXXX)
(Item Version, 1.0)
(TradeNumber, 1)
(Item Creation Time, 2019-04-14 14:15:09)
(Payment Details, List(Map(Payment Date -> 2019-04-11, Payment Type -> Rej)))
I want make some code changes in which I can convert the list of maps to a map like the output, i.e. instead of (Payment Details,List(Map(Payment Date -> 2019-04-11, Payment Type -> Rej))), I should get:
(Payment Date , 2019-04-11),
(Payment Type , Rej)
You should handle your list of map. Based on example you provided, I've updated your method to this:
def secondFlatten(map: Map[String, Any]):Map[String, Any]={
map.flatten {
case ((key, map : Map[String, Any])) => {
map.flatten {
case ((key: String, l: List[Map[String, Any]])) => l.head
case ((key: String, m : Map[String, Any])) => m
case (key: String, value: String) => Map(key -> value)
}
}
case ((key, value)) => Map(key -> value)
}.toMap
}
This might be something that might be useful for your question in the comment. It appends all keys together and keeps value in the map. You could further extend it should you need more complex solution:
def secondFlatten(map: Map[String, Any]):Map[String, Any]={
map.flatten {
case ((key, map : Map[String, Any])) => {
map.flatten {
case ((innerKey: String, l: List[Map[String, Any]])) => l.head.map{case (x:String, y: Any)=> (s"$key--$innerKey--$x"-> y)}
case ((innerKey: String, m : Map[String, Any])) => m.map{case (x:String, y: Any)=> (s"$key--$innerKey--$x"-> y)}
case (innerKey: String, value: String) => Map(s"$key--$innerKey"-> value)
}
}
case ((key, value)) => Map(key -> value)
}.toMap
}
For your particular case you can update your flattenMap method:
def flattenMap(map: Map[String, Any]): Map[String, Any] = {
val c = map.flatten {
case ((key, map : Map[String, Any])) => map
case ((key, value)) => Map(key -> value)
case ((key, map :: Nil)) => map
}.toMap
return c
}
If you want to handle a list of multiple maps, you should probably wrap your returned values into lists and so use some kind of flatMap instead of toMap.
This is my code
import org.apache.spark.SparkContext..
def main(args: Array[String]): Unit = {
val conf = new sparkConf().setMaster("local").setAppname("My app")
val sc = new SparkContext(conf_
val inputfile = "D:/test.txt"
val inputData = sc.textFile(inputFile)
val DupleRawData = inputData.map(_.split("\\<\\>").toList)
.map(s => (s(8),s(18)))
.map(s => (s, 1))
.reduceByKey(_ + _)
val UserShopCount = DupleRawData.groupBy(s => s._1._1)
.map(s => (s._1, s._2.toList.sortBy(z => z._2).reverse))
val ResultSet = UserShopCount.map(s => (s._1, s._2.take(1000).map(z => z._1._2, z._2))))
ResultSet.foreach(println)
//(aaa,List((100,4), (200,4), (300,3), (800,1)))
//(bbb,List((100,6), (400,5), (500,4)))
//(ccc,List((300,7), (400,6), (700,3)))
// here now I reach..
}
and this is the result I'm getting:
(aaa,List((100,4), (200,4), (300,3), (800,1)))
(bbb,List((100,6), (400,5), (500,4)))
(ccc,List((300,7), (400,6), (700,3)))
I want to final result set RDD is
// val ResultSet: org.apache.spark.rdd.RDD[(String, List[(String, Int)])]
(aaa, List(200,4), (800,1)) // because key of bbb and ccc except 100,300
(bbb, List((500,4)) // because aaa and ccc key except 100,400
(ccc, List((700,3)) // because aaa and bbb key except 300,400
please give me a solution or advice...sincerely
Here is my attempt:
val data: Seq[(String, List[(Int, Int)])] = Seq(
("aaa",List((1,4), (2,4), (3,3), (8,1))),
("bbb",List((1,6), (4,5), (5,4))),
("ccc",List((3,7), (6,6), (7,3)))
)
val uniqKeys = data.flatMap {
case (_, v) => {
v.map(_._1)
}
} groupBy(identity(_)) filter (_._2.size == 1)
val result = data.map {
case (pk, v) => val finalValue = v.filter {
case (k, _) => !uniqKeys.contains(k)
}
(pk, finalValue)
}
Output:
result: Seq[(String, List[(Int, Int)])] = List((aaa,List((1,4), (3,3))), (bbb,List((1,6))), (ccc,List((3,7))))
I am assuming your ResultSet is an RDD[String, List[(Int, Int)]]
val zeroVal1: (Long, String, (Int, Int)) = (Long.MaxValue, "", (0, 0))
val zeroVal2: List[(String, (Int, Int))] = List()
val yourNeededRdd = ResultSet
.zipWithIndex()
.flatMap({
((key, list), index) => list.map(t => (t._1, (index, key, t)))
})
.aggregateByKey(zeroVal1)(
(t1, t2) => { if (t1._1 <= t2._1) t1 else t2 },
(t1, t2) => { if (t1._1 <= t2._1) t1 else t2 }
)
.map({ case (t_1, (index, key, t)) => (key, t) })
.aggregateByKey(zeroVal2)(
(l, t) => { t :: l },
(l1, l2) => { l1 ++ l2 }
)
I am trying to build a Map[String, Any] like this:
Map(
somevalues match {
Some(v) => ("myvalues -> v)
None => ???
},
othervalues match {
Some(v) => ("othervalues -> v)
None => ???
},
...etc
)
which value should I use for the none case as I don't want to insert anything in the map in that case?
Consider
List(
someValues match {
case Some(v) => Some("myValues" -> v)
case None => None
},
otherValues match {
case Some(v) => Some("otherValues" -> v)
case None => None
},
...
).flatten.toMap
or shortened:
List(
someValues.map("myValues" -> _),
otherValues.map("otherValues" -> _),
...
).flatten.toMap