Retrieving a particular field inside objects in JsArray - scala

A part of my JSON response looks like this:
"resources": [{
"password": "",
"metadata": {
"updated_at": "20190806172149Z",
"guid": "e1be511a-eb8e-1038-9547-0fff94eeae4b",
"created_at": "20190405013547Z",
"url": ""
},
"iam": false,
"email": "<some mail id>",
"authentication": {
"method": "internal",
"policy_id": "Default"
}
}, {
"password": "",
"metadata": {
"updated_at": "20190416192020Z",
"guid": "6b47118c-f4c8-1038-8d93-ed6d7155964a",
"created_at": "20190416192020Z",
"url": ""
},
"iam": true,
"email": "<some mail id>",
"authentication": {
"method": "internal",
"policy_id": null
}
},
...
]
I am using Json helpers provided by the Play framework to parse this Json like this:
val resources: JsArray = response("resources").as[JsArray]
Now I need to extract the field email from all these objects in the resources JsArray. For this I tried writing a foreach loop like:
for (resource <- resources) {
}
But I'm getting an error Cannot resolve symbol foreach at the <- sign. How do I retrieve a particular field like email from each of the JSON objects inside a JsArray

With Play JSON I always use case classes. So your example would look like:
import play.api.libs.json._
case class Resource(password: String, metadata: JsObject, iam: Boolean, email: String, authentication: JsObject)
object Resource {
implicit val jsonFormat: Format[Resource] = Json.format[Resource]
}
val resources: Seq[Resource] = response("resources").validate[Seq[Resource]] match {
case JsSuccess(res, _) => res
case errors => // handle errors , e.g. throw new IllegalArgumentException(..)
}
Now you can access any field in a type-safe manner.
Of course you can replace the JsObjects with case classes in the same way - let me know if you need this in my answer.
But in your case as you only need the email there is no need:
resources.map(_.email) // returns Seq[String]

So like #pme said you should work with case classes, they should look something like this:
import java.util.UUID
import play.api.libs.json._
case class Resource(password:String, metadata: Metadata, iam:Boolean, email:UUID, authentication:Authentication)
object Resource{
implicit val resourcesImplicit: OFormat[Resource] = Json.format[Resource]
}
case class Metadata(updatedAt:String, guid:UUID, createdAt:String, url:String)
object Metadata{
implicit val metadataImplicit: OFormat[Metadata] = Json.format[Metadata]
}
case class Authentication(method:String, policyId: String)
object Authentication{
implicit val authenticationImplicit: OFormat[Authentication] =
Json.format[Authentication]
}
You can also use Writes and Reads instead of OFormat, or custom Writes and Reads, I used OFormat because it is less verbose.
Then when you have your responses, you validate them, you can validate them the way #pme said, or the way I do it:
val response_ = response("resources").validate[Seq[Resource]]
response_.fold(
errors => Future.succeful(BadRequest(JsError.toJson(errors)),
resources => resources.map(_.email))// extracting emails from your objects
???
)
So here you do something when the Json is invalid, and another thing when Json is valid, the behavior is the same as what pme did, just a bit more elegant in my opinion

Assuming that your json looks like this:
val jsonString =
"""
|{
| "resources": [
| {
| "password": "",
| "metadata": {
| "updated_at": "20190806172149Z",
| "guid": "e1be511a-eb8e-1038-9547-0fff94eeae4b",
| "created_at": "20190405013547Z",
| "url": ""
| },
| "iam": false,
| "email": "<some mail id1>",
| "authentication": {
| "method": "internal",
| "policy_id": "Default"
| }
| },
| {
| "password": "",
| "metadata": {
| "updated_at": "20190416192020Z",
| "guid": "6b47118c-f4c8-1038-8d93-ed6d7155964a",
| "created_at": "20190416192020Z",
| "url": ""
| },
| "iam": true,
| "email": "<some mail id2>",
| "authentication": {
| "method": "internal",
| "policy_id": null
| }
| }
| ]
|}
""".stripMargin
you can do:
(Json.parse(jsonString) \ "resources").as[JsValue] match{
case js: JsArray => js.value.foreach(x => println((x \ "email").as[String]))
case x => println((x \ "email").as[String])
}
or:
(Json.parse(jsonString) \ "resources").validate[JsArray] match {
case s: JsSuccess[JsArray] => s.get.value.foreach(x => println((x \ "email").as[String]))
case _: JsError => arr().value //or do something else
}
both works for me.

The resources is a JsArray, a type that doesn't provide .flatMap so cannot be used at right of <- in a for comprehension.
val emailReads: Reads[String] = (JsPath \ "email").reads[String]
val resourcesReads = Reads.seqReads(emailReads)
val r: JsResult[Seq[String]] = resources.validate(resources reads)

Related

Generating class in scala from avro schema

trying to generate classes using avrohugger(https://github.com/julianpeeters/avrohugger#description)
Here is my schema:
{
"name": "test1",
"namespace": "test.testaero",
"type": "map",
"values": [
{
"type": "map",
"values": [
"boolean",
{
"type": "map",
"values": [
"null",
"string",
"boolean",
{
"type": "map",
"values": [
"null",
"string",
"boolean",
"int",
{
"type": "map",
"values": [
"null",
"string",
"int"
],
"default": null
}
],
"default": null
}
],
"default": null
}
]
}
]
}
And code :
object AvroParser extends App{
val inputPath = "app/dto/roman/src/main/resources/tests.avsc"
val outPutPath = "src/main/scala"
val schemaFile = new File(inputPath)
private val scalaTypes: AvroScalaTypes = SpecificRecord.defaultTypes.copy(map = avrohugger.types.ScalaMap)
val generator = new Generator(Standard, avroScalaCustomTypes = Some(scalaTypes))
generator.fileToFile(schemaFile, outPutPath)
}
My types in the schema is a map and I failing in function :
def getSchemaOrProtocols(
infile: File,
format: SourceFormat,
classStore: ClassStore,
classLoader: ClassLoader,
parser: Parser = schemaParser): List[Either[Schema, Protocol]] = {
def unUnion(schema: Schema) = {
schema.getType match {
case UNION => schema.getTypes().asScala.toList
case RECORD => List(schema)
case ENUM => List(schema)
case FIXED => List(schema)
case _ => sys.error("""Neither a record, enum nor a union of either.
|Nothing to map to a definition.""".trim.stripMargin)
}
}
where the type map not matching any of the types from below. How can I adopt schema or maybe i am not passing right arguments?

java.lang.ClassCastException: value 1 (a scala.math.BigInt) cannot be cast to expected type bytes at RecordWithPrimitives.bigInt

I've a below code in scala that serializes the class into byte array -
import org.apache.avro.io.EncoderFactory
import org.apache.avro.reflect.ReflectDatumWriter
import java.io.ByteArrayOutputStream
case class RecordWithPrimitives(
string: String,
bool: Boolean,
bigInt: BigInt,
bigDecimal: BigDecimal,
)
object AvroEncodingDemoApp extends App {
val a = new RecordWithPrimitives(string = "???", bool = false, bigInt = BigInt.long2bigInt(1), bigDecimal = BigDecimal.decimal(5))
val parser = new org.apache.avro.Schema.Parser()
val avroSchema = parser.parse(
"""
|{
| "type": "record",
| "name": "RecordWithPrimitives",
| "fields": [{
| "name": "string",
| "type": "string"
| }, {
| "name": "bool",
| "type": "boolean"
| }, {
| "name": "bigInt",
| "type": {
| "type": "bytes",
| "logicalType": "decimal",
| "precision": 24,
| "scale": 24
| }
| }, {
| "name": "bigDecimal",
| "type": {
| "type": "bytes",
| "logicalType": "decimal",
| "precision": 48,
| "scale": 24
| }
| }]
|}
|""".stripMargin)
val writer = new ReflectDatumWriter[RecordWithPrimitives](avroSchema)
val boaStream = new ByteArrayOutputStream()
val jsonEncoder = EncoderFactory.get.jsonEncoder(avroSchema, boaStream)
writer.write(a, jsonEncoder)
jsonEncoder.flush()
}
When I run the above program I get below error -
Exception in thread "main" java.lang.ClassCastException: value 1 (a scala.math.BigInt) cannot be cast to expected type bytes at RecordWithPrimitives.bigInt
at org.apache.avro.path.TracingClassCastException.summarize(TracingClassCastException.java:79)
at org.apache.avro.path.TracingClassCastException.summarize(TracingClassCastException.java:30)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:84)
at AvroEncodingDemoApp$.delayedEndpoint$AvroEncodingDemoApp$1(AvroEncodingDemoApp.scala:50)
at AvroEncodingDemoApp$delayedInit$body.apply(AvroEncodingDemoApp.scala:12)
at scala.Function0.apply$mcV$sp(Function0.scala:42)
at scala.Function0.apply$mcV$sp$(Function0.scala:42)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)
at scala.App.$anonfun$main$1(App.scala:98)
at scala.App.$anonfun$main$1$adapted(App.scala:98)
at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:575)
at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:573)
at scala.collection.AbstractIterable.foreach(Iterable.scala:933)
at scala.App.main(App.scala:98)
at scala.App.main$(App.scala:96)
at AvroEncodingDemoApp$.main(AvroEncodingDemoApp.scala:12)
at AvroEncodingDemoApp.main(AvroEncodingDemoApp.scala)
Caused by: java.lang.ClassCastException: class scala.math.BigInt cannot be cast to class java.nio.ByteBuffer (scala.math.BigInt is in unnamed module of loader 'app'; java.nio.ByteBuffer is in module java.base of loader 'bootstrap')
at org.apache.avro.generic.GenericDatumWriter.writeBytes(GenericDatumWriter.java:400)
at org.apache.avro.reflect.ReflectDatumWriter.writeBytes(ReflectDatumWriter.java:134)
at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:168)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:93)
at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:158)
at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:245)
at org.apache.avro.specific.SpecificDatumWriter.writeField(SpecificDatumWriter.java:117)
at org.apache.avro.reflect.ReflectDatumWriter.writeField(ReflectDatumWriter.java:184)
at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:234)
at org.apache.avro.specific.SpecificDatumWriter.writeRecord(SpecificDatumWriter.java:92)
at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:145)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:95)
at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:158)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:82)
... 14 more
How to fix this error?

Scala Playframework - How to Modify Json and remove element form JsArray (Seq[JsObject]) based on attribute value

I have a json file something like this.
{
"data": [
{
"id": 1001,
"processes": [
{
"process_id": 301,
"status": "accepted"
},
{
"process_id": 302,
"status": "accepted"
},
{
"process_id": 303,
"status": "failed"
},
{
"process_id": 304,
"status": "failed"
}
]
}
]
}
I want to iterate through the Json and remove all the processes which have failed status,
So my modified Json should be
{
"data": [
{
"id": 1001,
"processes": [
{
"process_id": 301,
"status": "accepted"
},
{
"process_id": 302,
"status": "accepted"
}
]
}
]
}
I have tried with ScalaJsonTransformers but the prune and update works on JsObject, and not on JsArray.
I did try to use
1)
val inputJsonObj = Json.parse(inputJsonStr).as[Seq[JsObject]
val modifiedJson = inputJsonObj.map(model => (model \ "processes").as[Seq[JsObject]].filter(info => {
val status = (info \ "status").as[String]
status match {
case "accepted" => true
case _ =>
// I tried to prune/update here, but its not working
false
}
}))
The code filter out correctly, but does not modify my actual Json.
I created a mutable JsObject and tried to overwrite the value of Processes but this also does not work. Json.parse(resp).as[mutable.Seq[JsObject]] ++ Json.obj("processes" -> modifiedJson)
Can someone please help in this, how to update my json. I want to modify the Json without using any case classes.
If the object with the processes key weren't nested inside an array, you could use JSON transformers:
import play.api.libs.json._
val inputJsonStr = """{
"id": 1001,
"processes": [
{
"process_id": 301,
"status": "accepted"
},
{
"process_id": 302,
"status": "accepted"
},
{
"process_id": 303,
"status": "failed"
},
{
"process_id": 304,
"status": "failed"
}
]
}"""
val inputJsonObj = Json.parse(inputJsonStr)
val transformer = (__ \ "processes").json.update(
__.read[JsArray].map { case JsArray(arr) =>
JsArray(
arr.filter(
info => (info \ "status").as[String].equals("accepted")
)
)
}
)
inputJsonObj
.transform(transformer)
.map(Json.prettyPrint)
See https://scastie.scala-lang.org/um08R9qTTnSIvrs97FWg3A
However, modifying an object nested in an array may be impossible (or difficult) due to this bug: https://github.com/playframework/play-json/issues/82.

How to validate my data with jsonSchema scala

I have a dataframe which looks like that
+--------------------+----------------+------+------+
| id | migration|number|string|
+--------------------+----------------+------+------+
|[5e5db036e0403b1a. |mig | 1| str |
+--------------------+----------------+------+------+
and I have a jsonSchema:
{
"title": "Section",
"type": "object",
"additionalProperties": false,
"required": ["migration", "id"],
"properties": {
"migration": {
"type": "string",
"additionalProperties": false
},
"string": {
"type": "string"
},
"number": {
"type": "number",
"min": 0
}
}
}
I would like to validate the schema of my dataframe with my jsonSchema.
Thank you
Please find inline code comments for the explanation
val newSchema : StructType = DataType.fromJson("""{
| "type": "struct",
| "fields": [
| {
| "name": "id",
| "type": "string",
| "nullable": true,
| "metadata": {}
| },
| {
| "name": "migration",
| "type": "string",
| "nullable": true,
| "metadata": {}
| },
| {
| "name": "number",
| "type": "integer",
| "nullable": false,
| "metadata": {}
| },
| {
| "name": "string",
| "type": "string",
| "nullable": true,
| "metadata": {}
| }
| ]
|}""".stripMargin).asInstanceOf[StructType] // Load you schema from JSON string
// println(newSchema)
val spark = Constant.getSparkSess // Create SparkSession object
//Correct data
val correctData: RDD[Row] = spark.sparkContext.parallelize(Seq(Row("5e5db036e0403b1a.","mig",1,"str")))
val dfNew = spark.createDataFrame(correctData, newSchema) // validating the data
dfNew.show()
//InCorrect data
val inCorrectData: RDD[Row] = spark.sparkContext.parallelize(Seq(Row("5e5db036e0403b1a.",1,1,"str")))
val dfInvalid = spark.createDataFrame(inCorrectData, newSchema) // validating the data which will throw RuntimeException: java.lang.Integer is not a valid external type for schema of string
dfInvalid.show()
val res = spark.sql("") // Load the SQL dataframe
val diffColumn : Seq[StructField] = res.schema.diff(newSchema) // compare SQL dataframe with JSON schema
diffColumn.foreach(_.name) // Print the Diff columns

How to set array of records Using GenericRecordBuilder

I'm trying to turn a Scala object (i.e case class) into byte array.
In order to do so, I'm inserting the object content into a GenericRecordBuilder using its specific schema, and eventually using GenericDatumWriter i turn it into a byte array.
I have no problem to set primitive types, and array of primitive types into the GenericRecordBuilder.
But, I need help with Inserting array of records into the GenericRecordBuilder, and create a byte array from it.
What is the right way to insert array of records into the GenericRecordBuilder?
Here is part of what i'm trying to do:
This is the Schema:
{
"type": "record",
"name": "test1",
"namespace": "ns",
"fields": [
{
"name": "t_name",
"type": "string",
"default": "a"
},
{
"name": "t_num",
"type": "int",
"default": 0
},
{"name" : "t_arr", "type":
["null",
{"type": "array", "items": {
"name": "t_arr_a",
"type": "record",
"fields": [
{
"name": "t_arr_f1",
"type": "int",
"default": 0
},
{
"name": "t_arr_f2",
"type": "int",
"default": 0
}
]
}
}
]
}
]
}
This is the Scala class that populate the GenericRecordBuilder and transform it into byte Array:
package utils
import java.io.ByteArrayOutputStream
import org.apache.avro.{Schema, generic}
import org.apache.avro.generic.{GenericData, GenericDatumWriter}
import org.apache.avro.io.EncoderFactory
import org.apache.avro.generic.GenericRecordBuilder
object CheckRecBuilder extends App {
val avroSchema: Schema = new Schema.Parser().parse(this.getClass.getResourceAsStream("/data/myschema.avsc"))
val recordBuilder = new GenericRecordBuilder(avroSchema)
recordBuilder.set("t_name", "X")
recordBuilder.set("t_num", 100)
recordBuilder.set("t_arr", ???)
val record = recordBuilder.build()
val w = new GenericDatumWriter[GenericData.Record](avroSchema)
val outputStream = new ByteArrayOutputStream()
val e = EncoderFactory.get.binaryEncoder(outputStream, null)
w.write(record, e)
val barr = outputStream.toByteArray
println("End")
}
I manged to set the array of objects.
I wonder if there is a better or righter way for doing it.
Here is what I did:
Created a case class:
case class t_arr_a(t_arr_f1:Int, t_arr_f2:Int)
Created a method that transform case class into a GenericData.Record:
def caseClassToGenericDataRecord(cc:Product, schema:Schema): GenericData.Record = {
val childRecord = new GenericData.Record(schema.getElementType)
val values = cc.productIterator
cc.getClass.getDeclaredFields.map(f => childRecord.put(f.getName, values.next ))
childRecord
}
Updated the class CheckRecBuilder above:
replaced:
recordBuilder.set("t_arr", ???)
With:
val childSchema = new GenericData.Record(avroSchema2).getSchema.getField("t_arr").schema().getTypes().get(1)
val tArray = Array(t_arr_a(2,4), t_arr_a(25,14))
val tArrayGRecords: util.List[GenericData.Record]
= Some(yy.map(x => caseClassToGenericDataRecord(x,childSchema))).map(arr => java.util.Arrays.asList(arr: _*)).orNull
recordBuilder.set("t_arr", tArrayGRecords)