Create custom Serializer and Deserializer in kafka using scala - scala

I am using kafka_2.10-0.10.0.1 and scala_2.10.3.
I want to write custom Serializer and Deserializer using scala.
I tried with these Serializer (from CustomType) and Deserializer (obtain a CustomType):
class CustomTypeSerializer extends Serializer[CustomType] {
private val gson: Gson = new Gson()
override def configure(configs: util.Map[String, _], isKey: Boolean):
Unit = {
// nothing to do
}
override def serialize(topic: String, data: CustomType): Array[Byte] = {
if (data == null)
null
else
gson.toJson(data).getBytes
}
override def close(): Unit = {
//nothing to do
}
}
class CustomTypeDeserializer extends Deserializer[CustomType] {
private val gson: Gson = new Gson()
override def deserialize(topic: String, bytes: Array[Byte]): CustomType = {
val offerJson = gson.toJson(bytes.toString)
val psType: Type = new TypeToken[CustomType]() {}.getType()
val ps: CustomType = gson.fromJson(offerJson, psType)
ps
}
override def configure(configs: util.Map[String, _], isKey: Boolean):
Unit = {
// nothing to do
}
override def close(): Unit = {
//nothing to do
}
}
But, I got this error:
Exception in thread "main" org.apache.kafka.common.errors.SerializationException: Error deserializing key/value for partition topic_0_1-1 at offset 26
Caused by: com.google.gson.JsonSyntaxException: java.lang.IllegalStateException: Expected BEGIN_OBJECT but was BEGIN_ARRAY at line 1 column 2 path $
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:224)
at com.google.gson.Gson.fromJson(Gson.java:887)
at com.google.gson.Gson.fromJson(Gson.java:852)
at com.google.gson.Gson.fromJson(Gson.java:801)
at kafka.PSDeserializer.deserialize(PSDeserializer.scala:24)
at kafka.PSDeserializer.deserialize(PSDeserializer.scala:18)
at org.apache.kafka.clients.consumer.internals.Fetcher.parseRecord(Fetcher.java:627)
at org.apache.kafka.clients.consumer.internals.Fetcher.parseFetchedData(Fetcher.java:548)
at org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:354)
at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1000)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:938)
Can you help me please

Find below the custom serializer and deserializer for case class User, User(name:String,id:Int). Replace User in code with your case class. It will work.
import java.io.{ObjectInputStream, ByteArrayInputStream}
import java.util
import org.apache.kafka.common.serialization.{Deserializer, Serializer}
class CustomDeserializer extends Deserializer[User]{
override def configure(configs: util.Map[String,_],isKey: Boolean):Unit = {
}
override def deserialize(topic:String,bytes: Array[Byte]) = {
val byteIn = new ByteArrayInputStream(bytes)
val objIn = new ObjectInputStream(byteIn)
val obj = objIn.readObject().asInstanceOf[User]
byteIn.close()
objIn.close()
obj
}
override def close():Unit = {
}
}
import java.io.{ObjectOutputStream, ByteArrayOutputStream}
import java.util
import org.apache.kafka.common.serialization.Serializer
class CustomSerializer extends Serializer[User]{
override def configure(configs: util.Map[String,_],isKey: Boolean):Unit = {
}
override def serialize(topic:String, data:User):Array[Byte] = {
try {
val byteOut = new ByteArrayOutputStream()
val objOut = new ObjectOutputStream(byteOut)
objOut.writeObject(data)
objOut.close()
byteOut.close()
byteOut.toByteArray
}
catch {
case ex:Exception => throw new Exception(ex.getMessage)
}
}
override def close():Unit = {
}
}

Related

Is it possible to write a upickle Serializer for akka

I would like to implement an akka Serializer using upickle but I'm not sure its possible. To do so I would need to implement a Serializer something like the following:
import akka.serialization.Serializer
import upickle.default._
class UpickleSerializer extends Serializer {
def includeManifest: Boolean = true
def identifier = 1234567
def toBinary(obj: AnyRef): Array[Byte] = {
writeBinary(obj) // ???
}
def fromBinary(bytes: Array[Byte], clazz: Option[Class[_]]): AnyRef = {
readBinary(bytes) // ???
}
}
The problem is I cannot call writeBinary/readBinary without having the relevant Writer/Reader. Is there a way I can look these up based on the object class?
Take a look at following files, you should get some ideas!
CborAkkaSerializer.scala
LocationAkkaSerializer.scala
Note: These serializers are using cbor
I found a way to do it using reflection. I base the solution on the assumption that any object that needs to be serialized should have defined a ReadWriter in its companion object:
class UpickleSerializer extends Serializer {
private var map = Map[Class[_], ReadWriter[AnyRef]]()
def includeManifest: Boolean = true
def identifier = 1234567
def toBinary(obj: AnyRef): Array[Byte] = {
implicit val rw = getReadWriter(obj.getClass)
writeBinary(obj)
}
def fromBinary(bytes: Array[Byte], clazz: Option[Class[_]]): AnyRef = {
implicit val rw = lookup(clazz.get)
readBinary[AnyRef](bytes)
}
private def getReadWriter(clazz: Class[_]) = map.get(clazz) match {
case Some(rw) => rw
case None =>
val rw = lookup(clazz)
map += clazz -> rw
rw
}
private def lookup(clazz: Class[_]) = {
import scala.reflect.runtime._
val rootMirror = universe.runtimeMirror(clazz.getClassLoader)
val classSymbol = rootMirror.classSymbol(clazz)
val moduleSymbol = classSymbol.companion.asModule
val moduleMirror = rootMirror.reflectModule(moduleSymbol)
val instanceMirror = rootMirror.reflect(moduleMirror.instance)
val members = instanceMirror.symbol.typeSignature.members
members.find(_.typeSignature <:< typeOf[ReadWriter[_]]) match {
case Some(rw) =>
instanceMirror.reflectField(rw.asTerm).get.asInstanceOf[ReadWriter[AnyRef]]
case None =>
throw new RuntimeException("Not found")
}
}
}

Trying to extract out GraphStageLogic for custom stateful implementation and then passing it as parameter to GraphStage is giving exception

Below is simplified code snippet, where GraphStateLogic implementaion is passed to GraphStage as an constructor argument :-
package akka.shapes.examples.notworking
import akka.actor.ActorSystem
import akka.stream._
import akka.stream.scaladsl.{GraphDSL, RunnableGraph, Sink, Source}
import akka.stream.stage.{GraphStage, GraphStageLogic, InHandler}
//This is base graph stage, where GraphStageLogic and SinkShape are passed in constructor parameter
class BaseGraphStage[T](val shape: SinkShape[T], graphStageLogic: GraphStageLogic) extends GraphStage[ SinkShape[T] ] {
override def createLogic(inheritedAttributes: Attributes): GraphStageLogic = graphStageLogic
}
//this is a sample stateful extension of GraphStageLogic, that accepts first ten elements only
class CountLogic(sinkShape: SinkShape[Int], maxValue: Int) extends GraphStageLogic(sinkShape) {
var counter: Long = 0
override def preStart(): Unit = {
pull(sinkShape.in)
}
setHandler(sinkShape.in, new InHandler {
override def onPush(): Unit = {
val e = grab(sinkShape.in)
println("conditional sink : " + e)
counter = counter + 1
counter == maxValue match {
case true => completeStage()
case false => pull(sinkShape.in)
}
}
})
}
object SampleSinkNotWorking {
def main(args: Array[String]): Unit = {
implicit val actorSystem = ActorSystem("NotWroking")
implicit val actorMaterializer = ActorMaterializer()
val inlet = Inlet[Int](name = "sampleInlet")
val sinkShape = SinkShape( inlet )
val countGraphStateLogic = new CountLogic(sinkShape, 10)
val sinkGraphStage = new BaseGraphStage[Int](sinkShape, countGraphStateLogic)
val sink = Sink.fromGraph( sinkGraphStage )
val graph = GraphDSL.create() { implicit builder =>
import GraphDSL.Implicits._
Source(1 to 100) ~> sink
ClosedShape
}
val runnableGraph = RunnableGraph.fromGraph(graph)
runnableGraph.run()
}
}
Runnning above code is giving ArrayIndexOutOfBoundsException :-
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException:
-1 at akka.stream.stage.GraphStageLogic.setHandler(GraphStage.scala:439) at
akka.shapes.examples.notworking.CountLogic.(SampleSinkNotWorking.scala:24)
at
akka.shapes.examples.notworking.SampleSinkNotWorking$.main(SampleSinkNotWorking.scala:46)
at
akka.shapes.examples.notworking.SampleSinkNotWorking.main(SampleSinkNotWorking.scala)
I tried debugging, and it looks like, InLet id is -1, ant it's not getting reset.
But, why it's not getting reset, when GraphStateLogic is passed as an constructor argument to GraphState?
i am bit refactor your code and problem is gone, take a look :
class BaseGraphStage(maxValue: Int) extends GraphStage[SinkShape[Int]] {
val inlet = Inlet[Int](name = "sampleInlet")
override def createLogic(inheritedAttributes: Attributes): GraphStageLogic =
new GraphStageLogic(shape) with StageLogging {
var counter: Int = 0
setHandler(inlet, new InHandler {
override def onPush(): Unit = {
val e = grab(inlet)
log.info(s"$e is consumed")
counter += 1
if (counter == maxValue) {
completeStage()
} else {
pull(inlet)
}
}
})
override def preStart(): Unit =
pull(inlet)
override def postStop(): Unit =
counter = 0
}
override def shape: SinkShape[Int] = SinkShape(inlet)
}
object SampleSinkNotWorking {
def main(args: Array[String]): Unit = {
implicit val actorSystem = ActorSystem("NotWorking")
implicit val actorMaterializer = ActorMaterializer()
val sink = Sink.fromGraph(new BaseGraphStage(10))
Source(1 to 100).runWith(sink)
}
}
Can't answer fully on your last question, but i think all trick is about creating inlets in context of graph stage and not out of that, and use pre and post handlers. Hope that helps.

Alpakka - read Kryo-serialized objects from S3

I have Kryo-serialized binary data stored on S3 (thousands of serialized objects).
Alpakka allows to read the content as data: Source[ByteString, NotUsed]. But Kryo format doesn't use delimiters so I can't split each serialized object into a separate ByteString using data.via(Framing.delimiter(...)).
So, Kryo actually needs to read the data to understand when an object ends, and it doesn't look streaming-friendly.
Is it possible to implement this case in streaming fashion so that I get Source[MyObject, NotUsed] in the end of the day?
Here is a graph stage that does that. It handles the case when a serialized object spans two byte strings. It needs to be improved when objects are large (not my use case) and can take more than two byte strings in Source[ByteString, NotUsed].
object KryoReadStage {
def flow[T](kryoSupport: KryoSupport,
`class`: Class[T],
serializer: Serializer[_]): Flow[ByteString, immutable.Seq[T], NotUsed] =
Flow.fromGraph(new KryoReadStage[T](kryoSupport, `class`, serializer))
}
final class KryoReadStage[T](kryoSupport: KryoSupport,
`class`: Class[T],
serializer: Serializer[_])
extends GraphStage[FlowShape[ByteString, immutable.Seq[T]]] {
override def shape: FlowShape[ByteString, immutable.Seq[T]] = FlowShape.of(in, out)
override def createLogic(inheritedAttributes: Attributes): GraphStageLogic = {
new GraphStageLogic(shape) {
setHandler(in, new InHandler {
override def onPush(): Unit = {
val bytes =
if (previousBytes.length == 0) grab(in)
else ByteString.fromArrayUnsafe(previousBytes) ++ grab(in)
Managed(new Input(new ByteBufferBackedInputStream(bytes.asByteBuffer))) { input =>
var position = 0
val acc = ListBuffer[T]()
kryoSupport.withKryo { kryo =>
var last = false
while (!last && !input.eof()) {
tryRead(kryo, input) match {
case Some(t) =>
acc += t
position = input.total().toInt
previousBytes = EmptyArray
case None =>
val bytesLeft = new Array[Byte](bytes.length - position)
val bb = bytes.asByteBuffer
bb.position(position)
bb.get(bytesLeft)
last = true
previousBytes = bytesLeft
}
}
push(out, acc.toList)
}
}
}
private def tryRead(kryo: Kryo, input: Input): Option[T] =
try {
Some(kryo.readObject(input, `class`, serializer))
} catch {
case _: KryoException => None
}
})
setHandler(out, new OutHandler {
override def onPull(): Unit = {
pull(in)
}
})
private val EmptyArray: Array[Byte] = Array.empty
private var previousBytes: Array[Byte] = EmptyArray
}
}
override def toString: String = "KryoReadStage"
private lazy val in: Inlet[ByteString] = Inlet("KryoReadStage.in")
private lazy val out: Outlet[immutable.Seq[T]] = Outlet("KryoReadStage.out")
}
Example usage:
client.download(BucketName, key)
.via(KryoReadStage.flow(kryoSupport, `class`, serializer))
.flatMapConcat(Source(_))
It uses some additional helpers below.
ByteBufferBackedInputStream:
class ByteBufferBackedInputStream(buf: ByteBuffer) extends InputStream {
override def read: Int = {
if (!buf.hasRemaining) -1
else buf.get & 0xFF
}
override def read(bytes: Array[Byte], off: Int, len: Int): Int = {
if (!buf.hasRemaining) -1
else {
val read = Math.min(len, buf.remaining)
buf.get(bytes, off, read)
read
}
}
}
Managed:
object Managed {
type AutoCloseableView[T] = T => AutoCloseable
def apply[T: AutoCloseableView, V](resource: T)(op: T => V): V =
try {
op(resource)
} finally {
resource.close()
}
}
KryoSupport:
trait KryoSupport {
def withKryo[T](f: Kryo => T): T
}
class PooledKryoSupport(serializers: (Class[_], Serializer[_])*) extends KryoSupport {
override def withKryo[T](f: Kryo => T): T = {
pool.run(new KryoCallback[T] {
override def execute(kryo: Kryo): T = f(kryo)
})
}
private val pool = {
val factory = new KryoFactory() {
override def create(): Kryo = {
val kryo = new Kryo
(KryoSupport.ScalaSerializers ++ serializers).foreach {
case ((clazz, serializer)) =>
kryo.register(clazz, serializer)
}
kryo
}
}
new KryoPool.Builder(factory).softReferences().build()
}
}

Akka Kafka Custom Serializer

I'm using Akka Kafka (Scala) and want to send custom objects.
class TweetsSerializer extends Serializer[Seq[MyCustomType]] {
override def configure(configs: util.Map[String, _], isKey: Boolean): Unit = ???
override def serialize(topic: String, data: Seq[MyCustomType]): Array[Byte] = ???
override def close(): Unit = ???
}
How can i correctly write my own serializer ? And, what should i do with field config ?
I would use the StringSerializer, I mean, I´d convert all my types to string before produce them. However that works:
case class MyCustomType(a: Int)
class TweetsSerializer extends Serializer[Seq[MyCustomType]] {
private var encoding = "UTF8"
override def configure(configs: java.util.Map[String, _], isKey: Boolean): Unit = {
val propertyName = if (isKey) "key.serializer.encoding"
else "value.serializer.encoding"
var encodingValue = configs.get(propertyName)
if (encodingValue == null) encodingValue = configs.get("serializer.encoding")
if (encodingValue != null && encodingValue.isInstanceOf[String]) encoding = encodingValue.asInstanceOf[String]
}
override def serialize(topic: String, data: Seq[MyCustomType]): Array[Byte] =
try
if (data == null) return null
else return {
data.map { v =>
v.a.toString
}
.mkString("").getBytes("UTF8")
}
catch {
case e: UnsupportedEncodingException =>
throw new SerializationException("Error when serializing string to byte[] due to unsupported encoding " + encoding)
}
override def close(): Unit = Unit
}
}
object testCustomKafkaSerializer extends App {
implicit val producerConfig = {
val props = new Properties()
props.setProperty("bootstrap.servers", "localhost:9092")
props.setProperty("key.serializer", classOf[StringSerializer].getName)
props.setProperty("value.serializer", classOf[TweetsSerializer].getName)
props
}
lazy val kafkaProducer = new KafkaProducer[String, Seq[MyCustomType]](producerConfig)
// Create scala future from Java
private def publishToKafka(id: String, data: Seq[MyCustomType]) = {
kafkaProducer
.send(new ProducerRecord("outTopic", id, data))
.get()
}
val input = MyCustomType(1)
publishToKafka("customSerializerTopic", Seq(input))
}

Finding right type class to use

I'm trying to implement a type class solution for error handling in a Play application. What I want is to have some type class instances representing some validated (caught) errors and a default type class instance for any unvalidated (uncaught) errors.
I don't know if this is possible, but here's what I have so far:
trait ResponseError[E] {
def report(e: E)(implicit logger: Logger): Unit
def materialize(e: E): Result
}
trait ValidatedError[E <: Throwable] extends ResponseError[E] {
def report(e: E)(implicit logger: Logger): Unit =
ResponseError.logError(e)
}
trait UnvalidatedError[E <: Throwable] extends ResponseError[E] {
def report(e: E)(implicit logger: Logger): Unit = {
ResponseError.logError(e)
UnvalidatedError.notify(e)
}
}
object ResponseError {
def logError(e: Throwable)(implicit logger: Logger): Unit =
logger.error(e.getMessage)
}
object ValidatedError {
import java.util.concurrent.{ExecutionException, TimeoutException}
implicit val executionError = new ValidatedError[ExecutionException] {
def materialize(e: E): Result =
play.api.mvc.Results.BadRequest
}
implicit val timeoutError = new ValidatedError[TimeoutException] {
def materialize(e: E): Result =
play.api.mvc.Results.RequestTimeout
}
}
object UnvalidatedError {
implicit uncaughtError = new UnvalidatedError[Throwable] {
def materialize(e: E): Result =
play.api.mvc.Results.ServiceUnavailable
}
private def notify(e: Throwable) = ??? // send email notification
}
However how can I make sure to try my ValidatedError type class instances first, before falling back to my UnvalidatedError type class instance?
There you go. See my comment for details.
import java.util.concurrent.{TimeoutException, ExecutionException}
type Result = String
val badRequest: Result = "BadRequest"
val requestTimeout: Result = "RequestTimeout"
val serviceUnavailable: Result = "ServiceUnavailable"
class Logger {
def error(s: String) = println(s + "\n")
}
trait ResponseError[E] {
def report(e: E)(implicit logger: Logger): Unit
def materialize(e: E): Result
}
trait ValidatedError[E <: Throwable] extends UnvalidatedError[E] {
override def report(e: E)(implicit logger: Logger): Unit =
ResponseError.logError(e, validated = true)
}
trait UnvalidatedError[E <: Throwable] extends ResponseError[E] {
def report(e: E)(implicit logger: Logger): Unit = {
ResponseError.logError(e, validated = false)
UnvalidatedError.notify(e)
}
}
object ResponseError {
def logError(e: Throwable, validated: Boolean)(implicit logger: Logger): Unit =
logger.error({
validated match {
case true => "VALIDATED : "
case false => "UNVALIDATED : "
}
} + e.getMessage)
}
object ValidatedError {
import java.util.concurrent.{ExecutionException, TimeoutException}
implicit def executionError[E <: ExecutionException] = new ValidatedError[E] {
def materialize(e: E): Result =
badRequest
}
implicit def timeoutError[E <: TimeoutException] = new ValidatedError[E] {
def materialize(e: E): Result =
requestTimeout
}
}
object UnvalidatedError {
implicit def uncaughtError[E <: Throwable] = new UnvalidatedError[E] {
def materialize(e: E): Result =
serviceUnavailable
}
private def notify(e: Throwable) = println("Sending email: " + e) // send email notification
}
def testTypeclass[E](e: E)(implicit logger: Logger, ev: ResponseError[E]): Unit ={
ev.report(e)
}
import ValidatedError._
import UnvalidatedError._
implicit val logger: Logger = new Logger
val executionErr = new ExecutionException(new Throwable("execution exception!"))
testTypeclass(executionErr)
val timeoutErr = new TimeoutException("timeout exception!")
testTypeclass(timeoutErr)
val otherErr = new Exception("other exception!")
testTypeclass(otherErr)
Output:
VALIDATED : java.lang.Throwable: execution exception!
VALIDATED : timeout exception!
UNVALIDATED : other exception!
Sending email: java.lang.Exception: other exception!