Scala generics usage for methods + json4s parsing - scala

I am not sure if this is achievable and I have a very basic understanding of how generics work in scala. But I was wondering if this is possible.
Say I have a method:
case class Person(id:String,name:String)
case class Student(id:String,name:String, class:String)
def convertToJson[A](fileName:String):{
//read file
parse[A]
}
Is it possible to write this generic method which would parse the json based on the type of class I send when I call the convertToJson method?
Something like:
convertToJson[Student](fileName)
convertToJson[Person](fileName)
BTW the above code gives me a :
No Manifest available for A. error.
Using json4s for parsing.
Any help is appreciated.

This will convert a JSON string to a case class
import org.json4s._
import org.json4s.jackson.JsonMethods._
def convertToJson[T](json: String)(implicit fmt: Formats = DefaultFormats, mf: Manifest[T]): T =
Extraction.extract(parse(json))
Once this is defined you can parse appropriate strings to the required type:
case class Person(id: String, name: String)
case class Student(id: String, name: String, `class`: String)
val person = convertToJson[Person]("""{"name":"Jane","id":45}""")
val student = convertToJson[Student]("""{"name":"John","id":63, "class": "101"}""")
Note that this will ignore JSON data that does not match fields in the case class. If a field is optional in the JSON, make it an Option in the case class and you will get None if the field is not there.

Related

How to ignore a field from serializing when using circe in scala

I am using circe in scala and have a following requirement :
Let's say I have some class like below and I want to avoid password field from being serialised then is there any way to let circe know that it should not serialise password field?
In other libraries we have annotations like #transient which prevent field from being serialised ,is there any such annotation in circe?
case class Employee(
name: String,
password: String)
You could make a custom encoder that redacts some fields:
implicit val encodeEmployee: Encoder[Employee] = new Encoder[Employee] {
final def apply(a: Employee): Json = Json.obj(
("name", Json.fromString(a.name)),
("password", Json.fromString("[REDACTED]")),
)
}
LATER UPDATE
In order to avoid going through all fields contramap from a semiauto/auto decoder:
import io.circe.generic.semiauto._
implicit val encodeEmployee: Encoder[Employee] =
deriveEncoder[Employee]
.contramap[Employee](unredacted => unredacted.copy(password = "[REDACTED]"))
Although #gatear's answer is useful, it doesn't actually answer the question.
Unfortunately Circe (at least until version 0.14.2) does not have annotations to ignore fields. So far there is only a single annotation (#JsonKey) and this is used to rename field names.
In order to ignore a field when serialising (which Circe calls encoding) you can avoid that field in the Encoder implementation.
So instead of including the password field:
implicit val employeeEncoder: Encoder[Employee] =
Encoder.forProduct2("name", "password")(employee => (employee.name, employee.password))
you ommit it:
implicit val employeeEncoder: Encoder[Employee] =
Encoder.forProduct1("name")(employee => (u.name))
Alternatively what I've been using is creating a smaller case class which only includes the fields I'm interested in. Then I let Circe's automatic derivation kick in with io.circe.generic.auto._:
import io.circe.generic.auto._
import io.circe.syntax._
case class EmployeeToEncode(name: String)
// Then given an employee object:
EmployeeToEncode(employee.name).asJson
deriveEncoder.mapJsonObject(_.remove("password"))

Converting Scala case class to PySpark schema

Given a simple Scala case class like this:
package com.foo.storage.schema
case class Person(name: String, age: Int)
it's possible to create a Spark schema from a case class as follows:
import org.apache.spark.sql._
import com.foo.storage.schema.Person
val schema = Encoders.product[Person].schema
I wonder if it's possible to access the schema from a case class in Python/PySpark. I would hope to do something like this [Python]:
jvm = sc._jvm
py4j_class = jvm.com.foo.storage.schema.Person
jvm.org.apache.spark.sql.Encoders.product(py4j_class)
This throws an error com.foo.storage.schema.Person._get_object_id does not exist in the JVM. The Encoders.product is a generic in Scala, and I'm not entirely sure how to specify the type using Py4J. Is there a way to use the case class to create a PySpark schema?
I've found there's no clean / easy way to do this using generics, also not as a pure Scala function. What I ended up doing is making a companion object for the case class that can fetch the schema.
Solution
package com.foo.storage.schema
case class Person(name: String, age: Int)
object Person {
def getSchema = Encoders.product[Person].schema
}
This function can be called from Py4J, but will return a JavaObject. It can be converted with a helper function like this:
from pyspark.sql.types import StructType
import json
def java_schema_to_python(j_schema):
json_schema = json.loads(ddl.json())
return StructType.fromJson(json_schema)
Finally, we can extract our schema:
j_schema = jvm.com.foo.storage.Person.getSchema()
java_schema_to_python(j_schema)
Alternative solution
I found there is one more way to do this, but I like the first one better. You can make a generic function that infers the type of the argument in Scala, and uses that to infer the type:
object SchemaConverter {
def getSchemaFromType[T <: Product: TypeTag](obj: T): StructType = {
Encoders.product[T].schema
}
}
Which can be called like this:
val schema = SchemaConverter.getSchemaFromType(Person("Joe", 42))
I didn't like this method since it requires you to create a dummy instance of the case class. Haven't tested it, but I think the function above could be called using Py4J too.

Json4s custom Serializer doesn't work due to type mismatch

I have a case class containing a field of password. For safety I need to mask it when converting to Json.
So I create a custom serializer for this, as below.
import org.json4s.CustomSerializer
import org.json4s._
import scala.runtime.ScalaRunTime
import org.json4s.jackson.JsonMethods._
case class UserInfo(
userid: Long,
username: Option[String],
password: Option[String]
) {
override def toString: String = {
val ui = this.copy(password = password.map(_ => "******"))
ScalaRunTime._toString(ui)
}
}
case object UserInfoSerializer extends CustomSerializer[UserInfo](format => ({
case jsonObj: JObject =>
implicit val formats = DefaultFormats
jsonObj.extract[UserInfo]
}, {
case ui: UserInfo =>
implicit val formats = DefaultFormats
Extraction.decompose(ui.copy(password = ui.password.map(_ => "******")))
}))
implicit val formats = DefaultFormats + UserInfoSerializer
But when I try to convert val ui = UserInfo(123, Some("anonymous"), Some("xxx")) to a Json string by write(render(ui)), it always fails with
scala> render(ui)
<console>:22: error: type mismatch;
found : UserInfo
required: org.json4s.JValue
(which expands to) org.json4s.JsonAST.JValue
render(ui)
I have to use it as render(Extraction.decompose(ui)), or add an implicit conversion from UserInfo to JValue as implicit def userInfo2JValue(ui: UserInfn) = Extraction.decompose(ui)
What's the right way to make the custom serializer work as default ones?
Method render() simply renders JSON AST, it does not know how to convert an instance of your class to JValue. Look at this diagram, which illustrates data transformations with Json4s. Long story short, if you want to render your class as a JSON string, you can first convert it to JValue and then render like you did:
render(Extraction.decompose(ui))
or you can take a shortcut and use Serialization.write which does both operations internally:
Serialization.write(ui)
In either case, it is going to use your custom serializer if it has been added the explicit formats.

Reverse operation of extract (json4s)

json4s allows user to convert a JsonAST object to a case class using extract.
import org.json4s._
import org.json4s.jackson.JsonMethods._
implicit val formats = DefaultFormats
case class Item(name: String, price: Double)
val json = parse("""{"name": "phone", "price": 1000.0}""") // JObject(List((name,JString(phone)), (price,JDouble(1000.0))))
val item = json.extract[Item] // Item(phone,1000.0)
However, to convert a case class into a JsonAST object, the only way I can think of is:
serialize a case class using write
deserialize a string using extract
Like below:
parse(write(item)) // JObject(List((name,JString(phone)), (price,JDouble(1000.0))))
Is there any better way for the conversion? Thank you!
Extraction.decompose converts a case class object into a JsonAST.
Extraction.decompose(item) // JObject(List((name,JString(phone)), (price,JDouble(1000.0))))

How to rename nested fields with Json4s

As the title states, I am trying to rename fields on generated json from case classes using Json4s.
If I try to rename fields on simple case classes like:
case class User(name: String, lastName: String)
Following examples that you can find in the documentation of json4s or here How can I rename a field during serialization with Json4s? will work.
But documentation does not mention how to do nested object renames like for example from deviceId to did in this example:
case class User(name: String, lastName: String, details: UserDetails)
case class UserDetails(deviceId: String)
I tried using things like:
FieldSerializer.renameFrom("deviceId", "did")
or
FieldSerializer.renameFrom("details.deviceId", "details.did")
or
parse(message) transformField {
case ("deviceId", value) => ("did", value)
}
or
parse(message) transformField {
case ("details.deviceId", value) => ("details.did", value)
}
And none of them worked, so my question is: Is this nested rename possible on scala4s? If yes, how can I do to for example rename deviceId to did?
For the nested object, you can create FieldSerializer to bind this nested type, like:
import org.json4s._
import org.json4s.FieldSerializer._
import org.json4s.jackson.Serialization.write
val rename = FieldSerializer[UserDetails](renameTo("deviceId", "did")) // bind UserDetails to FieldSerializer
implicit val format: Formats = DefaultFormats + rename
println(write(u))
> {"name":"name","lastName":"lastName","details":{"did":"deviceId"}}