How to serialize/deserialize dynamic field names using Play's json - scala

I'm using Play framework 2.2.2.
I'm trying to handle json request like this one
[
{
"id" : "123",
"language" : "en",
"text" : "This is an example of a text",
"Metadata_IP" : "192.168.20.34",
"Metadata_date" : "2001-07-04T12:08:56.235-0700"
},
{
"id" : "124",
"language" : "en",
"text" : "Some more text here",
"Metadata_IP" : "192.168.20.31",
"Metadata_date" : "2001-07-04T12:09:56.235-0700",
"Metadata_name" : "someone"
}
]
The Metadata_ field are dynamic field meaning the user can send what ever he want (eg. Metadata_color, etc...)
What is the best way to handle this?
Can I use Readers with deserialize it to case class? How can I do this? I guess the dynamic field will be Map[String, String], but how should I make the reader parse this?
Thanks

Something like this could work:
implicit object jsObjToKeyValueSeq extends Reads[Seq[(String, String)]] {
override def reads(json: JsValue) = json match {
case js: JsObject =>
JsSuccess(js.fields.collect { case (key, JsString(value)) => key -> value })
case x => JsError(s"Unexpected json: $x")
}
}

We have faced the exact problem and solved it using a custom implementation. The solution is detailed here
Example:
Scala class
case class Person(name: String, age: String, customFields: Map[String,String])
Default Json representation of above class will be:
{
"name": "anil",
"age": "30",
"customFields": {
"field1": "value1",
"field2": "value2"
}
}
But what we wanted was:
{
"name": "anil",
"age": "30",
"field1": "value1",
"field2": "value2"
}
This was not very straight forward. While this could be possible using play framework, we didn’t want to complicate things too much. Finally we found a way to do it by returning a Map[String, String] which represents each class (it’s fields & values) using reflection and handle the behavior for custom fields separately.
case class Person(name: String, age: String, customFields:CustomFields)
case class CustomFields(valueMap: Map[String,String])
def covertToMap(ref: AnyRef) =
ref.getClass.getDeclaredFields.foldLeft(Map[String, Any]()){
(map, field) => {
field.setAccessible(true)
val value = field.get(ref)
value match {
case c: CustomFields => {
map ++ c.valueMap
}
case _ => {
map + (field.getName -> value)
}
}
}
}
Use the covertToMap() to convert any case class to a Map and then convert this map to normal Json using jackson json4s.
val json = Serialization.write(covertToMap(person))
Complete source code is available here

Related

How to use swagger-akka-http annotations for classes containing scala traits?

The naive approach of using swagger-akka-http to annotate a case class containing traits would be
#Schema(description = "identifier of data value")
trait Identifier {
val id: String
}
#Schema(description = "value")
trait Value {
val value: Int
}
#Schema(description = "combine identifier and value")
trait Event extends Identifier with Value
#Schema(description = "response to data query")
case class Response(event: Event)
This produces
"Response" : {
"required" : [ "event" ],
"type" : "object",
"properties" : {
"event" : {
"$ref" : "#/components/schemas/Event"
}
},
"description" : "response to data query"
},
"Event" : {
"type" : "object",
"description" : "combine identifier and value"
}
Unfortunately the schema Event does not contain any information. Is there a way to annotate such a structure successfully?
Minimal example: swagger-akka-http-annotate-traits-test

Parse nested maps in a function (gatling)

I have a map like this:
{
"user":
{
"name": "Jon Doe",
"age": "6",
"birthdate": {
"timestamp": 1456424096
},
"gender": "M"
}
}
and a function like this
def setUser(user: Map[String, Any]): Map[String, Any]={
var usr = Map("name"-> user.get("name").getOrElse(""),
"gender" -> user.get("gender").getOrElse(""),
"age" -> user.get("age").getOrElse(""),
"birthday" -> patient.get("birthdate"))
return usr
}
And I want to have the value of "timestamp" (1456424096) mapped in the "birthday" field.
For now I have this : Some%28%7Btimestamp%3D1456424096%7D%29
I'm very new to this. Can someone help me get the value of "timestamp"?
Assuming that you want just to get rid of nested birthday (not nested user) it can look like that:
val oldData: Map[String, Any] = Map(
"user" -> Map(
"name" -> "John Doe",
"age" -> 6,
"birthday" -> Map("timestamp" -> 1234454666),
"gender" -> "M"
)
)
def flattenBirthday(userMap: Map[String, Any]) = Map(
"user" -> (userMap("user").asInstanceOf[Map[String, Any]] + (
"birthday" -> userMap("user").asInstanceOf[Map[String, Any]]("birthday").asInstanceOf[Map[String, Any]]("timestamp")
))
)
val newData = flattenBirthday(oldData)
But in general dealing with nested immutable maps will be ugly. If you extract that data from JSONs (like in your example) it is better to use some library to deserialize that data into case class objects.

Represent nested parameters for Neo4j query in Scala

I tried to run Neo4j queries with parameters in the form of the Map[String, Anyref] which works just fine. However, I would like to send the data to Neo4j in form of the batch so the result would be Map[String, Map[String,AnyRef]] or Map[String, AnyRef] if the data is converted. But overall I would like to set the data in such a way:
{
"nodes": [
{
"id": 193331567,
"lat": 40.7599983215332,
"lon": -73.98999786376953
},
{
"id": 173062762,
"lat": 41.959999084472656,
"lon": -87.66000366210938
},
{
"id": 66276172,
"lat": 40.72999954223633,
"lon": -74.01000213623047
}
]
}
I wrote it in Scala using nested maps, however, when I pass this nested maps as a parameter to the query it cannot be rendered by Neo4j. So how can I represent this nested JSON structure in Scala? Should I use and Object instead or something like that?
Here is the map I set up:
val paramsList = Map("nodes" -> {
data map { seq =>
Map(
"lat" -> seq(1).toDouble.asInstanceOf[AnyRef],
"lon" -> seq(2).toDouble.asInstanceOf[AnyRef],
"id" -> seq(0).toInt.asInstanceOf[AnyRef]
)
}}.asInstanceOf[AnyRef])
val queryResults = neo4jSession.run(neo4jQuery, params.asJava)
It was important to convert both maps to java.util.Map so Neo4j would be able to pass this data as a parameter.
val paramsList = data map { seq =>
Map(
"lat" -> seq(1).toDouble.asInstanceOf[AnyRef],
"lon" -> seq(2).toDouble.asInstanceOf[AnyRef],
"id" -> seq(0).toInt.asInstanceOf[AnyRef]
).asJava.asInstanceOf[AnyRef]
}
val queryResults = neo4jSession.run(neo4jQueries.searchQueryWithBatchParams, Map("nodes" -> paramsList.asJava.asInstanceOf[AnyRef]).asJava)

How to convert column to vector type?

I have an RDD in Spark where the objects are based on a case class:
ExampleCaseClass(user: User, stuff: Stuff)
I want to use Spark's ML pipeline, so I convert this to a Spark data frame. As part of the pipeline, I want to transform one of the columns into a column whose entries are vectors. Since I want the length of that vector to vary with the model, it should be built into the pipeline as part of the feature transformation.
So I attempted to define a Transformer as follows:
class MyTransformer extends Transformer {
val uid = ""
val num: IntParam = new IntParam(this, "", "")
def setNum(value: Int): this.type = set(num, value)
setDefault(num -> 50)
def transform(df: DataFrame): DataFrame = {
...
}
def transformSchema(schema: StructType): StructType = {
val inputFields = schema.fields
StructType(inputFields :+ StructField("colName", ???, true))
}
def copy (extra: ParamMap): Transformer = defaultCopy(extra)
}
How do I specify the DataType of the resulting field (i.e. fill in the ???)? It will be a Vector of some simple class (Boolean, Int, Double, etc). It seems VectorUDT might have worked, but that's private to Spark. Since any RDD can be converted to a DataFrame, any case class can be converted to a custom DataType. However I can't figure out how to manually do this conversion, otherwise I could apply it to some simple case class wrapping the vector.
Furthermore, if I specify a vector type for the column, will VectorAssembler correctly process the vector into separate features when I go to fit the model?
Still new to Spark and especially to the ML Pipeline, so appreciate any advice.
import org.apache.spark.ml.linalg.SQLDataTypes.VectorType
def transformSchema(schema: StructType): StructType = {
val inputFields = schema.fields
StructType(inputFields :+ StructField("colName", VectorType, true))
}
In spark 2.1 VectorType makes VectorUDT publicly available:
package org.apache.spark.ml.linalg
import org.apache.spark.annotation.{DeveloperApi, Since}
import org.apache.spark.sql.types.DataType
/**
* :: DeveloperApi ::
* SQL data types for vectors and matrices.
*/
#Since("2.0.0")
#DeveloperApi
object SQLDataTypes {
/** Data type for [[Vector]]. */
val VectorType: DataType = new VectorUDT
/** Data type for [[Matrix]]. */
val MatrixType: DataType = new MatrixUDT
}
import org.apache.spark.mllib.linalg.{Vector, Vectors}
case class MyVector(vector: Vector)
val vectorDF = Seq(
MyVector(Vectors.dense(1.0,3.4,4.4)),
MyVector(Vectors.dense(5.5,6.7))
).toDF
vectorDF.printSchema
root
|-- vector: vector (nullable = true)
println(vectorDF.schema.fields(0).dataType.prettyJson)
{
"type" : "udt",
"class" : "org.apache.spark.mllib.linalg.VectorUDT",
"pyClass" : "pyspark.mllib.linalg.VectorUDT",
"sqlType" : {
"type" : "struct",
"fields" : [ {
"name" : "type",
"type" : "byte",
"nullable" : false,
"metadata" : { }
}, {
"name" : "size",
"type" : "integer",
"nullable" : true,
"metadata" : { }
}, {
"name" : "indices",
"type" : {
"type" : "array",
"elementType" : "integer",
"containsNull" : false
},
"nullable" : true,
"metadata" : { }
}, {
"name" : "values",
"type" : {
"type" : "array",
"elementType" : "double",
"containsNull" : false
},
"nullable" : true,
"metadata" : { }
} ]
}
}

Array query in Scala and ReactiveMongo?

I have a MongoDB collection whose documents look like this:
{
"name" : "fabio",
"items" : [
{
"id" : "1",
"word" : "xxxx"
},
{
"id" : "2",
"word" : "yyyy"
}
]
}
Now, given one name and one id, I want to retrieve "name" and the corresponding "word".
I query it like this and it seems to work:
val query = BSONDocument("name" -> name, "items.id" -> id)
But then, how do I access the value of "word"? I can get the name using this reader:
The reader for this object is like this:
implicit object UserReader extends BSONDocumentReader[User] {
def read(doc: BSONDocument): User = {
val name = doc.getAs[String]("name").get
// how do I retrive the value of "word"?
User(id, word)
}
}
But I am very confused about "word".
Additionally, because I am only interested in two fields, how should I filter the query? The following doesn't seem to work.
val filter = BSONDocument("name" -> 1, "items.$.word" -> 1)
Thanks for your help!