I have a map like this:
{
"user":
{
"name": "Jon Doe",
"age": "6",
"birthdate": {
"timestamp": 1456424096
},
"gender": "M"
}
}
and a function like this
def setUser(user: Map[String, Any]): Map[String, Any]={
var usr = Map("name"-> user.get("name").getOrElse(""),
"gender" -> user.get("gender").getOrElse(""),
"age" -> user.get("age").getOrElse(""),
"birthday" -> patient.get("birthdate"))
return usr
}
And I want to have the value of "timestamp" (1456424096) mapped in the "birthday" field.
For now I have this : Some%28%7Btimestamp%3D1456424096%7D%29
I'm very new to this. Can someone help me get the value of "timestamp"?
Assuming that you want just to get rid of nested birthday (not nested user) it can look like that:
val oldData: Map[String, Any] = Map(
"user" -> Map(
"name" -> "John Doe",
"age" -> 6,
"birthday" -> Map("timestamp" -> 1234454666),
"gender" -> "M"
)
)
def flattenBirthday(userMap: Map[String, Any]) = Map(
"user" -> (userMap("user").asInstanceOf[Map[String, Any]] + (
"birthday" -> userMap("user").asInstanceOf[Map[String, Any]]("birthday").asInstanceOf[Map[String, Any]]("timestamp")
))
)
val newData = flattenBirthday(oldData)
But in general dealing with nested immutable maps will be ugly. If you extract that data from JSONs (like in your example) it is better to use some library to deserialize that data into case class objects.
Input:
id1 id2 name value epid
"xxx" "yyy" "EAN" "5057723043" "1299"
"xxx" "yyy" "MPN" "EVBD" "1299"
I want:
{ "id1": "xxx",
"id2": "yyy",
"item_specifics": [
{
"name": "EAN",
"value": "5057723043"
},
{
"name": "MPN",
"value": "EVBD"
},
{
"name": "EPID",
"value": "1299"
}
]
}
I tried the following two solutions from How to aggregate columns into json array? and how to merge rows into column of spark dataframe as vaild json to write it in mysql:
pi_df.groupBy(col("id1"), col("id2"))
//.agg(collect_list(to_json(struct(col("name"), col("value"))).alias("item_specifics"))) // => not working
.agg(collect_list(struct(col("name"),col("value"))).alias("item_specifics"))
But I got:
{ "name":"EAN","value":"5057723043", "EPID": "1299", "id1": "xxx", "id2": "yyy" }
How to fix this? Thanks
For Spark < 2.4
You can create 2 dataframes, one with name and value and other with epic as name and epic value as value and union them together. Then aggregate them as collect_set and create a json. The code should look like this.
//Creating Test Data
val df = Seq(("xxx","yyy" ,"EAN" ,"5057723043","1299"), ("xxx","yyy" ,"MPN" ,"EVBD", "1299") )
.toDF("id1", "id2", "name", "value", "epid")
df.show(false)
+---+---+----+----------+----+
|id1|id2|name|value |epid|
+---+---+----+----------+----+
|xxx|yyy|EAN |5057723043|1299|
|xxx|yyy|MPN |EVBD |1299|
+---+---+----+----------+----+
val df1 = df.withColumn("map", struct(col("name"), col("value")))
.select("id1", "id2", "map")
val df2 = df.withColumn("map", struct(lit("EPID").as("name"), col("epid").as("value")))
.select("id1", "id2", "map")
val jsonDF = df1.union(df2).groupBy("id1", "id2")
.agg(collect_set("map").as("item_specifics"))
.withColumn("json", to_json(struct("id1", "id2", "item_specifics")))
jsonDF.select("json").show(false)
+---------------------------------------------------------------------------------------------------------------------------------------------+
|json |
+---------------------------------------------------------------------------------------------------------------------------------------------+
|{"id1":"xxx","id2":"yyy","item_specifics":[{"name":"MPN","value":"EVBD"},{"name":"EAN","value":"5057723043"},{"name":"EPID","value":"1299"}]}|
+---------------------------------------------------------------------------------------------------------------------------------------------+
For Spark = 2.4
It provides a array_union method. It might be helpful in doing it without union. I haven't tried it though.
val jsonDF = df.withColumn("map1", struct(col("name"), col("value")))
.withColumn("map2", struct(lit("epid").as("name"), col("epid").as("value")))
.groupBy("id1", "id2")
.agg(collect_set("map1").as("item_specifics1"),
collect_set("map2").as("item_specifics2"))
.withColumn("item_specifics", array_union(col("item_specifics1"), col("item_specifics2")))
.withColumn("json", to_json(struct("id1", "id2", "item_specifics2")))
You're pretty close. I believe you're looking for something like this:
val pi_df2 = pi_df.withColumn("name", lit("EPID")).
withColumnRenamed("epid", "value").
select("id1", "id2", "name","value")
pi_df.select("id1", "id2", "name","value").
union(pi_df2).withColumn("item_specific", struct(col("name"), col("value"))).
groupBy(col("id1"), col("id2")).
agg(collect_list(col("item_specific")).alias("item_specifics")).
write.json(...)
The union should bring back epid into item_specifics
Here is what you need to do
import scala.util.parsing.json.JSONObject
import scala.collection.mutable.WrappedArray
//Define udf
val jsonFun = udf((id1 : String, id2 : String, item_specifics: WrappedArray[Map[String, String]], epid: String)=> {
//Add epid to item_specifics json
val item_withEPID = item_specifics :+ Map("epid" -> epid)
val item_specificsArray = item_withEPID.map(m => ( Array(Map("name" -> m.keys.toSeq(0), "value" -> m.values.toSeq(0))))).map(m => m.map( mi => JSONObject(mi).toString().replace("\\",""))).flatten.mkString("[",",","]")
//Add id1 and id2 to output json
val m = Map("id1"-> id1, "id2"-> id2, "item_specifics" -> item_specificsArray.toSeq )
JSONObject(m).toString().replace("\\","")
})
val pi_df = Seq( ("xxx","yyy","EAN","5057723043","1299"), ("xxx","yyy","MPN","EVBD","1299")).toDF("id1","id2","name","value","epid")
//Add epid as part of group by column else the column will not be available after group by and aggregation
val df = pi_df.groupBy(col("id1"), col("id2"), col("epid")).agg(collect_list(map(col("name"), col("value")) as "map").as("item_specifics")).withColumn("item_specifics",jsonFun($"id1",$"id2",$"item_specifics",$"epid"))
df.show(false)
scala> df.show(false)
+---+---+----+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|id1|id2|epid|item_specifics |
+---+---+----+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|xxx|yyy|1299|{"id1" : "xxx", "id2" : "yyy", "item_specifics" : [{"name" : "MPN", "value" : "EVBD"},{"name" : "EAN", "value" : "5057723043"},{"name" : "epid", "value" : "1299"}]}|
+---+---+----+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Content of item_specifics column/ output
{
"id1": "xxx",
"id2": "yyy",
"item_specifics": [{
"name": "MPN",
"value": "EVBD"
}, {
"name": "EAN",
"value": "5057723043"
}, {
"name": "epid",
"value": "1299"
}]
}
I have a MongoDB collection whose documents look like this:
{
"name" : "fabio",
"items" : [
{
"id" : "1",
"word" : "xxxx"
},
{
"id" : "2",
"word" : "yyyy"
}
]
}
Now, given one name and one id, I want to retrieve "name" and the corresponding "word".
I query it like this and it seems to work:
val query = BSONDocument("name" -> name, "items.id" -> id)
But then, how do I access the value of "word"? I can get the name using this reader:
The reader for this object is like this:
implicit object UserReader extends BSONDocumentReader[User] {
def read(doc: BSONDocument): User = {
val name = doc.getAs[String]("name").get
// how do I retrive the value of "word"?
User(id, word)
}
}
But I am very confused about "word".
Additionally, because I am only interested in two fields, how should I filter the query? The following doesn't seem to work.
val filter = BSONDocument("name" -> 1, "items.$.word" -> 1)
Thanks for your help!
I'm using Play framework 2.2.2.
I'm trying to handle json request like this one
[
{
"id" : "123",
"language" : "en",
"text" : "This is an example of a text",
"Metadata_IP" : "192.168.20.34",
"Metadata_date" : "2001-07-04T12:08:56.235-0700"
},
{
"id" : "124",
"language" : "en",
"text" : "Some more text here",
"Metadata_IP" : "192.168.20.31",
"Metadata_date" : "2001-07-04T12:09:56.235-0700",
"Metadata_name" : "someone"
}
]
The Metadata_ field are dynamic field meaning the user can send what ever he want (eg. Metadata_color, etc...)
What is the best way to handle this?
Can I use Readers with deserialize it to case class? How can I do this? I guess the dynamic field will be Map[String, String], but how should I make the reader parse this?
Thanks
Something like this could work:
implicit object jsObjToKeyValueSeq extends Reads[Seq[(String, String)]] {
override def reads(json: JsValue) = json match {
case js: JsObject =>
JsSuccess(js.fields.collect { case (key, JsString(value)) => key -> value })
case x => JsError(s"Unexpected json: $x")
}
}
We have faced the exact problem and solved it using a custom implementation. The solution is detailed here
Example:
Scala class
case class Person(name: String, age: String, customFields: Map[String,String])
Default Json representation of above class will be:
{
"name": "anil",
"age": "30",
"customFields": {
"field1": "value1",
"field2": "value2"
}
}
But what we wanted was:
{
"name": "anil",
"age": "30",
"field1": "value1",
"field2": "value2"
}
This was not very straight forward. While this could be possible using play framework, we didn’t want to complicate things too much. Finally we found a way to do it by returning a Map[String, String] which represents each class (it’s fields & values) using reflection and handle the behavior for custom fields separately.
case class Person(name: String, age: String, customFields:CustomFields)
case class CustomFields(valueMap: Map[String,String])
def covertToMap(ref: AnyRef) =
ref.getClass.getDeclaredFields.foldLeft(Map[String, Any]()){
(map, field) => {
field.setAccessible(true)
val value = field.get(ref)
value match {
case c: CustomFields => {
map ++ c.valueMap
}
case _ => {
map + (field.getName -> value)
}
}
}
}
Use the covertToMap() to convert any case class to a Map and then convert this map to normal Json using jackson json4s.
val json = Serialization.write(covertToMap(person))
Complete source code is available here
Here's what I want to achieve:
{ "user-list" : {
"user" : [
"username" : "foo"
},
{
"username" : "bar"
}
]
}
}
Im using play-framework and scala.
Thanks!
As previous commenters already pointed out, it is not obvious how to help you, given that your json code is invalid (try JSONLint) and that we don't know where it comes from (string? (case) classes from a database? literals?) and what you want to do with it.
Valid json code close to yours would be:
{
"user-list": {
"user": [
{ "username": "foo" },
{ "username": "bar" }
]
}
}
Depending on how much additional information your structure contains, the following might be sufficient (V1):
{
"user-list": [
{ "username": "foo" },
{ "username": "bar" }
]
}
Or even (V2):
{ "user-list": ["foo", "bar"] }
Following the Play documentation, you should be able to generate V1 with:
val jsonObject = Json.toJson(
Map(
"user-list" -> Seq(
toJson(Map("username" -> toJson("foo"))),
toJson(Map("username" -> toJson("bar")))
)
)
)
and V2 with:
val jsonObject = Json.toJson(
Map(
"user-list" -> Seq(toJson("foo"), toJson("bar"))
)
)