avro4s for scala 3 issue when serializing with schemaV1 but deserializing with schemaV2 - scala

I've written sample code to try avro4s for scala 3. I am trying to simulate a schema change, in this case a Pizza's schema evolved to include a vegan:Option[Boolean]:
package examples
import java.io.File
import com.sksamuel.avro4s.{AvroInputStream, AvroOutputStream, AvroSchema}
#main def createAvroFile() =
val pepperoni = PizzaV1("pepperoni", Seq(Ingredient("pepperoni", 12, 4.4), Ingredient("onions", 1, 0.4)), 598)
val hawaiian = PizzaV1("hawaiian", Seq(Ingredient("ham", 1.5, 5.6), Ingredient("pineapple", 5.2, 0.2)), 391)
val os = AvroOutputStream.binary[PizzaV1].to(new File("/tmp/pizzas.avro")).build()
os.write(Seq(pepperoni, hawaiian))
os.flush()
os.close()
val pizzaV1Schema = AvroSchema[PizzaV1]
val in = AvroInputStream.binary[PizzaV2].from(new File("/tmp/pizzas.avro")).build(pizzaV1Schema)
println(in.iterator.toList.mkString("\n"))
in.close()
case class Ingredient(name: String, sugar: Double, fat: Double)
case class PizzaV1(name: String, ingredients: Seq[Ingredient], calories: Int)
case class PizzaV2(name: String, ingredients: Seq[Ingredient], calories: Int, vegan: Option[Boolean])
When I run it, it fails:
Exception in thread "main" java.lang.NullPointerException: Cannot invoke "org.apache.avro.Schema$Field.schema()" because "field" is null
at com.sksamuel.avro4s.decoders.SchemaFieldDecoder.<init>(records.scala:64)
at com.sksamuel.avro4s.decoders.RecordDecoder.$anonfun$1(records.scala:19)
at scala.collection.ArrayOps$.map$extension(ArrayOps.scala:932)
at scala.IArray$package$IArray$.map(IArray.scala:179)
at com.sksamuel.avro4s.decoders.RecordDecoder.decode(records.scala:20)
at com.sksamuel.avro4s.AvroBinaryInputStream.<init>(AvroBinaryInputStream.scala:31)
at com.sksamuel.avro4s.AvroInputStreamBuilderWithSource.build(AvroInputStream.scala:69)
at examples.CreateAvroFile$package$.createAvroFile(CreateAvroFile.scala:17)
at examples.createAvroFile.main(CreateAvroFile.scala:6)
I debugged it and it seems it tries to find "vegan" field somewhere (in the V1 schema?) but gets a null.
Is this a bug or am I doing something wrong?

Related

Map different value to the case class property during serialization and deserialization using Jackson

I am trying to deserialize this JSON using Jackson library -
{
"name": "abc",
"ageInInt": 30
}
To the case class Person
case class Person(name: String, #JsonProperty(value = "ageInInt")#JsonAlias(Array("ageInInt")) age: Int)
but I am getting -
No usable value for age
Did not find value which can be converted into int
org.json4s.package$MappingException: No usable value for age
Did not find value which can be converted into int
Basically, I want to deserialize the json with the different key fields ageInInt to age.
here is the complete code -
val json =
"""{
|"name": "Tausif",
|"ageInInt": 30
|}""".stripMargin
implicit val format = DefaultFormats
println(Serialization.read[Person](json))
You need to register DefaultScalaModule to your JsonMapper.
import com.fasterxml.jackson.databind.json.JsonMapper
import com.fasterxml.jackson.module.scala.DefaultScalaModule
import com.fasterxml.jackson.core.`type`.TypeReference
import com.fasterxml.jackson.annotation.JsonProperty
val mapper = JsonMapper.builder()
.addModule(DefaultScalaModule)
.build()
case class Person(name: String, #JsonProperty(value = "ageInInt") age: Int)
val json =
"""{
|"name": "Tausif",
|"ageInInt": 30
|}""".stripMargin
val person: Person = mapper.readValue(json, new TypeReference[Person]{})
println(person) // Prints Person(Tausif,30)

Scala dataset map fails with exception No applicable constructor/method found for zero actual parameters

I have the following case classes
case class FeedbackData (prefix : String, position : Int, click : Boolean,
suggestion: Suggestion,
history : List[RequestHistory],
eventTimestamp: Long)
case class Suggestion (clicks : Long, sources : List[String], ctr : Float)
case class RequestHistory (timestamp: Long, url: String)
I use it to perform a map operation on my dataset
sqlContext = ss.sqlContext
import sqlContext.implicits._
val input: Dataset[FeedbackData] = ss.read.json("filename").as(Encoders.bean(classOf[FeedbackData]))
input.map(row => transformRow(row))
At runtime I see the exception
java.util.concurrent.ExecutionException: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 24, Column 81: failed to compile:
No applicable constructor/method found for zero actual parameters; candidates are: "package.FeedbackData(java.lang.String, int, boolean, package.Suggestion, scala.collection.immutable.List, long)"
What am I doing wrong ?
Context is fine here, issue with case class, Scala long (Long) have to used instead of Java long (long):
case class A(num1 : Long, num2 : Long, num3 : Long)
Inspired from #pasha701,use case could be
case class Student(id: Int, name: String)
import spark.implicits._
val df = Seq((1, "james"), (2, "tony")).toDF("id", "name")
df.printSchema()
df.as[Student].rdd.map{
stu=>
stu.id+"\t"+stu.name
}.collect().foreach(println)
output:
root
|-- id: integer (nullable = false)
|-- name: string (nullable = true)
1 james
2 tony
reference:https://spark.apache.org/docs/2.4.0/sql-getting-started.html

Scala deserialize JSON to Collection

My JSON File containes below details
{
"category":"age, gender,post_code"
}
My scala code is below one
val filename = args.head
println(s"Reading ${args.head} ...")
val json = Source.fromFile(filename)
val mapper = new ObjectMapper() with ScalaObjectMapper
mapper.registerModule(DefaultScalaModule)
val parsedJson = mapper.readValue[Map[String, Any]](json.reader())
val data = parsedJson.get("category").toSeq
It's returning Seq(Any) = example List(age, gender,post_code) but I need Seq(String) output please if any has an idea about this please help me.
The idea in scala is to be typesafe whenever possible which you are giving away using Map[String, Any].
So, I recommend using a data class that represents your JSON data.
Example,
define a mapper,
scala> import com.fasterxml.jackson.databind.ObjectMapper
import com.fasterxml.jackson.databind.ObjectMapper
scala> import com.fasterxml.jackson.module.scala.experimental.ScalaObjectMapper
import com.fasterxml.jackson.module.scala.experimental.ScalaObjectMapper
scala> import com.fasterxml.jackson.module.scala.DefaultScalaModule
import com.fasterxml.jackson.module.scala.DefaultScalaModule
scala> val mapper = new ObjectMapper() with ScalaObjectMapper
mapper: com.fasterxml.jackson.databind.ObjectMapper with com.fasterxml.jackson.module.scala.experimental.ScalaObjectMapper = $anon$1#d486a4d
scala> mapper.registerModule(DefaultScalaModule)
res0: com.fasterxml.jackson.databind.ObjectMapper = $anon$1#d486a4d
Now, when you deserialise to Map[K, V] you can not specify all the nested data-structures,
scala> val jsonString = """{"category": ["metal", "metalcore"], "age": 10, "gender": "M", "postCode": "98109"}"""
jsonString: String = {"category": ["metal", "metalcore"], "age": 10, "gender": "M", "postCode": "98109"}
scala> mapper.readValue[Map[String, Any]](jsonString)
res2: Map[String,Any] = Map(category -> List(metal, metalcore), age -> 10, gender -> M, postCode -> 98109)
Following is a solution casting some key to desired data-structure but I personally don not recommend.
scala> mapper.readValue[Map[String, Any]](jsonString).get("category").map(_.asInstanceOf[List[String]]).getOrElse(List.empty[String])
res3: List[String] = List(metal, metalcore)
Best solution is to define a data class which I'm calling SomeData in following example and deserialize to it. SomeData is defined based on your JSON data-structure.
scala> final case class SomeData(category: List[String], age: Int, gender: String, postCode: String)
defined class SomeData
scala> mapper.readValue[SomeData](jsonString)
res4: SomeData = SomeData(List(metal, metalcore),10,M,98109)
scala> mapper.readValue[SomeData](jsonString).category
res5: List[String] = List(metal, metalcore)
Just read the JSON as a JsonNode, and access the property directly:
val jsonNode = objectMapper.readTree(json.reader())
val parsedJson = jsonNode.get("category").asText
By using the scala generic function for converting the JSON String to Case Class/Object you can de-serialize to anything you want. Like
JSON to Collection,
JSON to Case Class, and
JSON to Case Class with Object as field.
Please find a working and detailed answer which I have provided using generics here.

spark implicit encoder not found in scope

I have a problem with spark already outlined in spark custom kryo encoder not providing schema for UDF but created a minimal sample now:
https://gist.github.com/geoHeil/dc9cfb8eca5c06fca01fc9fc03431b2f
class SomeOtherClass(foo: Int)
case class FooWithSomeOtherClass(a: Int, b: String, bar: SomeOtherClass)
case class FooWithoutOtherClass(a: Int, b: String, bar: Int)
case class Foo(a: Int)
implicit val someOtherClassEncoder: Encoder[SomeOtherClass] = Encoders.kryo[SomeOtherClass]
val df2 = Seq(FooWithSomeOtherClass(1, "one", new SomeOtherClass(4))).toDS
val df3 = Seq(FooWithoutOtherClass(1, "one", 1), FooWithoutOtherClass(2, "two", 2)).toDS
val df4 = df3.map(d => FooWithSomeOtherClass(d.a, d.b, new SomeOtherClass(d.bar)))
here, even the createDataSet statement fails due to
java.lang.UnsupportedOperationException: No Encoder found for SomeOtherClass
- field (class: "SomeOtherClass", name: "bar")
- root class: "FooWithSomeOtherClass"
Why is the encoder not in scope or at least not in the right scope?
Also, trying to specify an explicit encoder like:
df3.map(d => {FooWithSomeOtherClass(d.a, d.b, new SomeOtherClass(d.bar))}, (Int, String, Encoders.kryo[SomeOtherClass]))
does not work.
This happens because you should use the Kryo encoder through the whole serialization stack, meaning that your top-level object should have a Kryo encoder. The following runs successfully on a local Spark shell (the change you are interested in is on the first line):
implicit val topLevelObjectEncoder: Encoder[FooWithSomeOtherClass] = Encoders.kryo[FooWithSomeOtherClass]
val df1 = Seq(Foo(1), Foo(2)).toDF
val df2 = Seq(FooWithSomeOtherClass(1, "one", new SomeOtherClass(4))).toDS
val df3 = Seq(FooWithoutOtherClass(1, "one", 1), FooWithoutOtherClass(2, "two", 2)).toDS
df3.printSchema
df3.show
val df4 = df3.map(d => FooWithSomeOtherClass(d.a, d.b, new SomeOtherClass(d.bar)))
df4.printSchema
df4.show
df4.collect

How do I turn a Scala case class into a mongo Document

I'd like to build a generic method for transforming Scala Case Classes to Mongo Documents.
A promising Document constructor is
fromSeq(ts: Seq[(String, BsonValue)]): Document
I can turn a case class into a Map[String -> Any], but then I've lost the type information I need to use the implicit conversions to BsonValues. Maybe TypeTags can help with this?
Here's what I've tried:
import org.mongodb.scala.bson.BsonTransformer
import org.mongodb.scala.bson.collection.immutable.Document
import org.mongodb.scala.bson.BsonValue
case class Person(age: Int, name: String)
//transform scala values into BsonValues
def transform[T](v: T)(implicit transformer: BsonTransformer[T]): BsonValue = transformer(v)
// turn any case class into a Map[String, Any]
def caseClassToMap(cc: Product) = {
val values = cc.productIterator
cc.getClass.getDeclaredFields.map( _.getName -> values.next).toMap
}
// transform a Person into a Document
def personToDocument(person: Person): Document = {
val map = caseClassToMap(person)
val bsonValues = map.toSeq.map { case (key, value) =>
(key, transform(value))
}
Document.fromSeq(bsonValues)
}
<console>:24: error: No bson implicit transformer found for type Any. Implement or import an implicit BsonTransformer for this type.
(key, transform(value))
def personToDocument(person: Person): Document = {
Document("age" -> person.age, "name" -> person.name)
}
Below code works without manual conversion of an object.
import reactivemongo.api.bson.{BSON, BSONDocument, Macros}
case class Person(name:String = "SomeName", age:Int = 20)
implicit val personHandler = Macros.handler[Person]
val bsonPerson = BSON.writeDocument[Person](Person())
println(s"${BSONDocument.pretty(bsonPerson.getOrElse(BSONDocument.empty))}")
You can use Salat https://github.com/salat/salat. A nice example can be found here - https://gist.github.com/bhameyie/8276017. This is the piece of code that will help you -
import salat._
val dBObject = grater[Artist].asDBObject(artist)
artistsCollection.save(dBObject, WriteConcern.Safe)
I was able to serialize a case class to a BsonDocument using the org.bson.BsonDocumentWriter. The below code runs using scala 2.12 and mongo-scala-driver_2.12 version 2.6.0
My quest for this solution was aided by this answer (where they are trying to serialize in the opposite direction): Serialize to object using scala mongo driver?
import org.mongodb.scala.bson.codecs.Macros
import org.mongodb.scala.bson.codecs.DEFAULT_CODEC_REGISTRY
import org.bson.codecs.configuration.CodecRegistries.{fromRegistries, fromProviders}
import org.bson.codecs.EncoderContext
import org.bson.BsonDocumentWriter
import org.mongodb.scala.bson.BsonDocument
import org.bson.codecs.configuration.CodecRegistry
import org.bson.codecs.Codec
case class Animal(name : String, species: String, genus: String, weight: Int)
object TempApp {
def main(args: Array[String]) {
val jaguar = Animal("Jenny", "Jaguar", "Panthera", 190)
val codecProvider = Macros.createCodecProvider[Animal]()
val codecRegistry: CodecRegistry = fromRegistries(fromProviders(codecProvider), DEFAULT_CODEC_REGISTRY)
val codec = Macros.createCodec[Animal](codecRegistry)
val encoderContext = EncoderContext.builder.isEncodingCollectibleDocument(true).build()
var doc = BsonDocument()
val writr = new BsonDocumentWriter(doc) // need to call new since Java lib w/o companion object
codec.encode(writr, jaguar, encoderContext)
print(doc)
}
};