How to set array of records Using GenericRecordBuilder - scala

I'm trying to turn a Scala object (i.e case class) into byte array.
In order to do so, I'm inserting the object content into a GenericRecordBuilder using its specific schema, and eventually using GenericDatumWriter i turn it into a byte array.
I have no problem to set primitive types, and array of primitive types into the GenericRecordBuilder.
But, I need help with Inserting array of records into the GenericRecordBuilder, and create a byte array from it.
What is the right way to insert array of records into the GenericRecordBuilder?
Here is part of what i'm trying to do:
This is the Schema:
{
"type": "record",
"name": "test1",
"namespace": "ns",
"fields": [
{
"name": "t_name",
"type": "string",
"default": "a"
},
{
"name": "t_num",
"type": "int",
"default": 0
},
{"name" : "t_arr", "type":
["null",
{"type": "array", "items": {
"name": "t_arr_a",
"type": "record",
"fields": [
{
"name": "t_arr_f1",
"type": "int",
"default": 0
},
{
"name": "t_arr_f2",
"type": "int",
"default": 0
}
]
}
}
]
}
]
}
This is the Scala class that populate the GenericRecordBuilder and transform it into byte Array:
package utils
import java.io.ByteArrayOutputStream
import org.apache.avro.{Schema, generic}
import org.apache.avro.generic.{GenericData, GenericDatumWriter}
import org.apache.avro.io.EncoderFactory
import org.apache.avro.generic.GenericRecordBuilder
object CheckRecBuilder extends App {
val avroSchema: Schema = new Schema.Parser().parse(this.getClass.getResourceAsStream("/data/myschema.avsc"))
val recordBuilder = new GenericRecordBuilder(avroSchema)
recordBuilder.set("t_name", "X")
recordBuilder.set("t_num", 100)
recordBuilder.set("t_arr", ???)
val record = recordBuilder.build()
val w = new GenericDatumWriter[GenericData.Record](avroSchema)
val outputStream = new ByteArrayOutputStream()
val e = EncoderFactory.get.binaryEncoder(outputStream, null)
w.write(record, e)
val barr = outputStream.toByteArray
println("End")
}

I manged to set the array of objects.
I wonder if there is a better or righter way for doing it.
Here is what I did:
Created a case class:
case class t_arr_a(t_arr_f1:Int, t_arr_f2:Int)
Created a method that transform case class into a GenericData.Record:
def caseClassToGenericDataRecord(cc:Product, schema:Schema): GenericData.Record = {
val childRecord = new GenericData.Record(schema.getElementType)
val values = cc.productIterator
cc.getClass.getDeclaredFields.map(f => childRecord.put(f.getName, values.next ))
childRecord
}
Updated the class CheckRecBuilder above:
replaced:
recordBuilder.set("t_arr", ???)
With:
val childSchema = new GenericData.Record(avroSchema2).getSchema.getField("t_arr").schema().getTypes().get(1)
val tArray = Array(t_arr_a(2,4), t_arr_a(25,14))
val tArrayGRecords: util.List[GenericData.Record]
= Some(yy.map(x => caseClassToGenericDataRecord(x,childSchema))).map(arr => java.util.Arrays.asList(arr: _*)).orNull
recordBuilder.set("t_arr", tArrayGRecords)

Related

Generating class in scala from avro schema

trying to generate classes using avrohugger(https://github.com/julianpeeters/avrohugger#description)
Here is my schema:
{
"name": "test1",
"namespace": "test.testaero",
"type": "map",
"values": [
{
"type": "map",
"values": [
"boolean",
{
"type": "map",
"values": [
"null",
"string",
"boolean",
{
"type": "map",
"values": [
"null",
"string",
"boolean",
"int",
{
"type": "map",
"values": [
"null",
"string",
"int"
],
"default": null
}
],
"default": null
}
],
"default": null
}
]
}
]
}
And code :
object AvroParser extends App{
val inputPath = "app/dto/roman/src/main/resources/tests.avsc"
val outPutPath = "src/main/scala"
val schemaFile = new File(inputPath)
private val scalaTypes: AvroScalaTypes = SpecificRecord.defaultTypes.copy(map = avrohugger.types.ScalaMap)
val generator = new Generator(Standard, avroScalaCustomTypes = Some(scalaTypes))
generator.fileToFile(schemaFile, outPutPath)
}
My types in the schema is a map and I failing in function :
def getSchemaOrProtocols(
infile: File,
format: SourceFormat,
classStore: ClassStore,
classLoader: ClassLoader,
parser: Parser = schemaParser): List[Either[Schema, Protocol]] = {
def unUnion(schema: Schema) = {
schema.getType match {
case UNION => schema.getTypes().asScala.toList
case RECORD => List(schema)
case ENUM => List(schema)
case FIXED => List(schema)
case _ => sys.error("""Neither a record, enum nor a union of either.
|Nothing to map to a definition.""".trim.stripMargin)
}
}
where the type map not matching any of the types from below. How can I adopt schema or maybe i am not passing right arguments?

How to produce tombstone for a Kafka Avro topic

I am trying to produce tombstone messages to a compacted Kafka topic with Avro schema using Scala (v2.13.10) and FS2 Kafka library(v3.0.0-M8) with Vulcan module.
The app consumes from a topic A and produces a tombstone to the same topic A for the values that matches some condition.
A sample snippet
val producerSettings =
ProducerSettings(
keySerializer = keySerializer,
valueSerializer = Serializer.unit[IO]
).withBootstrapServers("localhost:9092")
def processRecord(committableRecord: CommittableConsumerRecord[IO, KeySchema, ValueSchema]
, producer: KafkaProducer.Metrics[IO, KeySchema, Unit]
): IO[CommittableOffset[IO]] = {
val key = committableRecord.record.key
val value = committableRecord.record.value
if(value.filterColumn.field1 == "<removable>") {
val tombStone = ProducerRecord(committableRecord.record.topic, key, ())
val producerRecord: ProducerRecords[CommittableOffset[IO], KeySchema, Unit] = ProducerRecords.one(tombStone, committableRecord.offset)
producer.produce(producerRecord).flatten.map(_.flatMap(a => {
IO(a.passthrough)
}))
}
else
IO(committableRecord.offset)
}
The above snippet works fine if I produce a valid key value message.
However, I am getting the below error when I try to generate an null/empty messages:
java.lang.IllegalArgumentException: Invalid Avro record: bytes is null or empty
at fs2.kafka.vulcan.AvroDeserializer$.$anonfun$using$4(AvroDeserializer.scala:32)
at defer # fs2.kafka.vulcan.AvroDeserializer$.$anonfun$using$3(AvroDeserializer.scala:29)
at defer # fs2.kafka.vulcan.AvroDeserializer$.$anonfun$using$3(AvroDeserializer.scala:29)
at mapN # fs2.kafka.KafkaProducerConnection$$anon$1.withSerializersFrom(KafkaProducerConnection.scala:141)
at map # fs2.kafka.ConsumerRecord$.fromJava(ConsumerRecord.scala:184)
at map # fs2.kafka.internal.KafkaConsumerActor.$anonfun$records$2(KafkaConsumerActor.scala:265)
at traverse # fs2.kafka.KafkaConsumer$$anon$1.$anonfun$partitionsMapStream$26(KafkaConsumer.scala:267)
at defer # fs2.kafka.vulcan.AvroDeserializer$.$anonfun$using$3(AvroDeserializer.scala:29)
at defer # fs2.kafka.vulcan.AvroDeserializer$.$anonfun$using$3(AvroDeserializer.scala:29)
at mapN # fs2.kafka.KafkaProducerConnection$$anon$1.withSerializersFrom(KafkaProducerConnection.scala:141)
A sample Avro Schema:
{
"type": "record",
"name": "SampleOrder",
"namespace": "com.myschema.global",
"fields": [
{
"name": "cust_id",
"type": "int"
},
{
"name": "month",
"type": "int"
},
{
"name": "expenses",
"type": "double"
},
{
"name": "filterColumn",
"type": {
"type": "record",
"name": "filterColumn",
"fields": [
{
"name": "id",
"type": "string"
},
{
"name": "field1",
"type": "string"
}
]
}
}
]
}
Thanks in advance.
I've tried different serializers for producer but all result in same above exception.
First, a producer would use a Serializer, yet your stacktrace says deserializer. Unless your keys are Avro, you don't need an Avro schema to send null values into a topic. Use ByteArraySerializer, and simply send null value...
But this seems like a bug. If the incoming record key/value is null, it should return null, not explicitly throw an error
https://github.com/fd4s/fs2-kafka/blob/series/2.x/modules/vulcan/src/main/scala/fs2/kafka/vulcan/AvroDeserializer.scala#L29
Compare to Confluent implementation

Avro Schema: Build Avro Schema from Schema Fields

I am trying to write a function to calculate a diff between two avro schemas and generate another schema.
schema_one = {
"type": "record",
"name": "schema_one",
"namespace": "test",
"fields": [
{
"name": "type",
"type": "string"
},
{
"name": "id",
"type": "string"
}
]
}
schema_two = {
"type": "record",
"name": "schema_two",
"namespace": "test",
"fields": [
{
"name": "type",
"type": "string"
}
]
}
To get elements field in schema_one not in schema_two
import org.apache.avro.Schema._
import org.apache.avro.{Schema, SchemaBuilder}
val diff: Set[Schema.Field] = schema_one.getFields.asScala.toSet.filterNot(schema_two.getFields.asScala.toSet)
So far, so good.
I want to build a new schema from diff and I expect it to be:
schema_three = {
"type": "record",
"name": "schema_three",
"namespace": "test",
"fields": [
{
"name": "id",
"type": "string"
}
]
}
I cant seem to find any method within Avro SchemaBuilder to achieve this without having to explicitly provide named fields. i.e build Schema given Schema.Fields
For example:
SchemaBuilder.record("schema_three").namespace("test").fromFields(diff)
Is there a way to achieve this? Appreciate comments.
I was able to achieve this using the kite sdk "org.kitesdk" % "kite-data-core" % "1.1.0"
val schema_namespace = schema_one.getNamespace
val schema_name = schema_one.getName
val schemas = diff.map( f => {
SchemaBuilder
.record(schema_name)
.namespace(schema_namespace)
.fields()
.name(f.name())
.`type`(f.schema())
.noDefault()
.endRecord()
}
)
val schema_three = SchemaUtil.merge(schemas.asJava)

Scala Map -add new key and copy value from another key

Considering 2 sets of data as follows:
JSON1=> {
"data": [
{"id": "1-abc",
"model": "Agile",
"status":"open"
"configuration": {
"state": "running",
"rootVolumeSize": "0.00000",
"count": "2",
"type": "large",
"platform": "Linux"
}
"stateId":"123-567"
}
]}
JSON2=>{
"data": [
{"id": "1-abc",
"model": "Agile",
"configuration": {
"state": "running",
"diskSize": "0",
"type": "small",
"platform":"Windows"
}
}
]}
I need to compare JSON1 and JSON2 based on the 1st field id and if they match , I need to merge JSON1 with JSON 2 retaining the existing values in JSON2( only append fields not present).
I have coded the same as below:
private def merger(JSON1: Seq[JSON], JSON2: Seq[JSON]):Seq[JSON] = {
val abcKey = JSON1.groupBy(_.id) map { case (k, v) => (k, v.head)
val mergedRecords = for {
xyzJSON<- JSON2
} yield (
abcKey.get(xyzJSON.id) match {
case Some(JSON1) => xyzJSON.copy(status = JSON1.status,
stateId = JSON1.stateId)
case None => xyzJSON.copy(origin = "N/A")
}
)
I am not able to derive at a solution for reconciling the fields within the configurationMap.
Expected result set should be like:
{
"data": [
{"id": "1-abc",
"model": "Agile",
"status":"open"
"configuration": {
"state": "running",
"diskSize": "0",
"rootVolumeSize": "0.00000",
"count": "2",
"type": "small",
"platform": "Windows",
}
"stateId":"123-567"
}
]}

Cannot read property '_createInstanceCore' of null error in Breeze since "Datatype" is null for ComplexType

I am trying to create the Array complex Type in breeze,since the Complex Type dataType is coming as 'null' ,the following code triggers exception
if (prop.isScalar) {
val = prop.dataType._createInstanceCore(entity, prop);
} else {
val = breeze.makeComplexArray([], entity, prop);
}
Please help me to populate the 'dataType' property of Complex Type.
Entities Used:
"dataProperties": [
{
"name": "carriers",
"complexTypeName":"Carrier#Test",
"isScalar":false
}]
The Carrier entity is defined as below:
{
"shortName": "Carrier",
"namespace": "Test",
"isComplexType": true,
"dataProperties": [
{
"name": "Testing",
"dataType": "String"
}
]
}