How to read serialized object with a method taking generic type by Scala/Kryo? - scala

Use Kryo to read serialized object is easy when I know the specific class type, but if I want to create a method that takes simple generic type, how to do it?
I have code that can not be compiled:
def load[T](path: String): T = {
val instantiator = new ScalaKryoInstantiator
instantiator.setRegistrationRequired(false)
val kryo = instantiator.newKryo()
val input = new Input(FileUtils.readFileToByteArray(new File(path)))
kryo.readObject[T](input, classOf[T])
}
The error I got is:
class type required but T found
kryo.readObject[T](input, classOf[T])
I know what the error means, but don't know the right way to fix it.
The code is modified by my original type-specific code:
def load(path: String): SomeClassType = {
val instantiator = new ScalaKryoInstantiator
instantiator.setRegistrationRequired(false)
val kryo = instantiator.newKryo()
val input = new Input(FileUtils.readFileToByteArray(new File(path)))
kryo.readObject(input, classOf[SomeClassType])
}

I've found the answer, the key is ClassTag:
def load[M: ClassTag](path: String)(implicit tag: ClassTag[M]): M = {
val instantiator = new ScalaKryoInstantiator
instantiator.setRegistrationRequired(false)
val kryo = instantiator.newKryo()
val input = new Input(FileUtils.readFileToByteArray(new File(path)))
kryo.readObject(input, tag.runtimeClass.asInstanceOf[Class[M]])
}
In some threads, the last line is:
kryo.readObject(input, tag.runtimeClass)
This doesn't work in my case, it has to be:
tag.runtimeClass.asInstanceOf[Class[M]]

Related

Adding a name to source processor of Kafka streams app results in serialization exception

I'm trying to name my source processor using the Consumed.as() method (full code below):
val usersOrdersStreams: KStream[UserId, Order] = builder
.stream[UserId, Order](ordersByUserTopic)(Consumed.as("topic-name"))
However when I'm running the application I'm getting the following exception:
scalaorg.apache.kafka.common.config.ConfigException: Please specify a value serde or set one through StreamsConfig#DEFAULT_VALUE_SERDE_CLASS_CONFIG
When I looked at the definition of .as() I saw this:
public static <K, V> Consumed<K, V> as(final String processorName) {
return new Consumed<>(null, null, null, null, processorName);
}
So I guessed the issue was that the key/value serdes were set to null.
I tried to solve it by adding a call to withValueSerde():
val orderSerde = ...
val usersOrdersStreams: KStream[UserId, Order] = builder
.stream[UserId, Order](ordersByUserTopic)(Consumed.as("topic-name").withValueSerde(orderSerde))
But got the same error. What am I doing wrong?
Note: if I remove the Consumed.as() part the code works and the exception is not being thrown
Following is the full code (some imports were removed for readability reasons):
import org.apache.kafka.common.serialization.Serde
import org.apache.kafka.streams.kstream.{GlobalKTable, JoinWindows, TimeWindows, Windowed}
import org.apache.kafka.streams.scala.ImplicitConversions._
import org.apache.kafka.streams.scala.serialization.Serdes
import org.apache.kafka.streams.scala.serialization.Serdes._
import scala.concurrent.duration._
object KafkaStreamsApp {
implicit def serde[A >: Null : Decoder : Encoder]: Serde[A] = {
val serializer = (a: A) => a.asJson.noSpaces.getBytes
val deserializer = (aAsBytes: Array[Byte]) => {
val aAsString = new String(aAsBytes)
val aOrError = decode[A](aAsString)
aOrError match {
case Right(a) => Option(a)
case Left(error) =>
Option.empty
}
}
Serdes.fromFn[A](serializer, deserializer)
}
implicit val orderSerde: Serde[Order] = serde[Order]
// Topics
final val ordersByUserTopic = "orders-by-user"
final val filterOrders = "filter-low-orders"
final val applyMapValues = "mapValues-apply-discount"
final val payedOrdersTopic = "filtered-orders"
type UserId = String
case class Order(user: UserId, amount: Double)
val builder = new StreamsBuilder
val usersOrdersStreams: KStream[UserId, Order] =
builder.stream[UserId, Order](ordersByUserTopic)(Consumed.as("vvv").withValueSerde(orderSerde))
def paidOrdersTopology(): Unit = {
usersOrdersStreams
.filter((_, v) => v.amount > 1000.0, named = Named.as(filterOrders))
.mapValues(v => v.copy(amount = v.amount * 0.85), named = Named.as(applyMapValues))
.to(payedOrdersTopic)
}
def main(args: Array[String]): Unit = {
val props = new Properties
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "orders-application")
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092")
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.stringSerde.getClass)
paidOrdersTopology()
val topology: Topology = builder.build()
println(topology.describe())
val application: KafkaStreams = new KafkaStreams(topology, props)
application.start()
}
}
So... after some digging I managed to find the issue: the key serde was missing. The following code sets only the values serde, which creates a Consumed object with a null key serde:
val orderSerde = ...
val usersOrdersStreams: KStream[UserId, Order] = builder
.stream[UserId, Order](ordersByUserTopic)(Consumed.as("topic-name").withValueSerde(orderSerde))
When I added the key serde as well:
val orderSerde = ...
val consumed = Consumed.as("topic-name")
.withKeySerde(Serdes.stringSerde) // Missing key serde
.withValueSerde(orderSerde)
val usersOrdersStreams: KStream[UserId, Order] =
builder.stream[UserId, Order](ordersByUserTopic)(consumed)
The code started working.
The only thing I'm not sure about is why the error thrown stated that value serde was missing, when it's the key serde that's missing.

Trying to define serializer using avro4s but getting a missing implicit error

I am using flink(1.7) kafka client and Avro4s(2.0.4), I want to serialize to byte array :
class AvroSerializationSchema[IN : SchemaFor : FromRecord: ToRecord] extends SerializationSchema[IN] {
override def serialize(element: IN): Array[Byte] = {
val str = AvroSchema[IN]
val schema: Schema = new Parser().parse(str.toString)
val out = new ByteArrayOutputStream()
val os = AvroOutputStream.data[IN].to(out).build(schema)
os.write(element)
out.close()
out.flush()
os.flush()
os.close()
out.toByteArray
}
}
However I am keep getting this exception :
Error:(15, 35) could not find implicit value for evidence parameter of type com.sksamuel.avro4s.Encoder[IN]
val os = AvroOutputStream.data[IN].to(out).build(schema)
and
Error:(15, 35) not enough arguments for method data: (implicit evidence$3: com.sksamuel.avro4s.Encoder[IN])com.sksamuel.avro4s.AvroOutputStreamBuilder[IN].
Unspecified value parameter evidence$3.
val os = AvroOutputStream.data[IN].to(out).build(schema)
According to code IN has to be Encoder type:
object AvroOutputStream {
/**
* An [[AvroOutputStream]] that does not write the schema. Use this when
* you want the smallest messages possible at the cost of not having the schema available
* in the messages for downstream clients.
*/ def binary[T: Encoder] = new AvroOutputStreamBuilder[T](BinaryFormat)
def json[T: Encoder] = new AvroOutputStreamBuilder[T](JsonFormat)
def data[T: Encoder] = new AvroOutputStreamBuilder[T](DataFormat)
}
so it should something like:
class AvroSerializationSchema[IN : Encoder] ...
You don't need to use FromRecord when writing to the output stream. That is for people who want to have a GenericRecord for their own use. You need to use Encoder.
class AvroSerializationSchema[IN : SchemaFor : Encoder] extends SerializationSchema[IN] {
override def serialize(element: IN): Array[Byte] = {
val str = AvroSchema[IN]
val schema: Schema = new Parser().parse(str.toString)
val out = new ByteArrayOutputStream()
val os = AvroOutputStream.data[IN].to(out).build(schema)
os.write(element)
out.close()
out.flush()
os.flush()
os.close()
out.toByteArray
}
}

Scalacache with redis support

I am trying to integrate redis to scalacache. Keys are usually string but values can be objects, Set[String], etc. Cache is initialized by this
val cache: RedisCache = RedisCache(config.host, config.port)
private implicit val scalaCache: ScalaCache[Array[Byte]] = ScalaCache(cacheService.cache)
But while calling put, i am getting this error "Could not find any Codecs for type Set[String] and Repr". Looks like i need to provide codec for my cache input as suggested here so i added,
class A extends Codec[Set[String], Array[Byte]] with GZippingBinaryCodec[Set[String]]
Even after, my class A, is throwing the same error. What am i missing.
As you mentioned in the link, you can either serialize values in a binary format:
import scalacache.serialization.binary._
or as JSON using circe:
import scalacache.serialization.circe._
import io.circe.generic.auto._
Looks like its solved in next release by binary and circe serialization. I am on version 10 and solved by the following,
implicit object SetBindaryCodec extends Codec[Any, Array[Byte]] {
override def serialize(value: Any): Array[Byte] = {
val stream: ByteArrayOutputStream = new ByteArrayOutputStream()
val oos = new ObjectOutputStream(stream)
oos.writeObject(value)
oos.close()
stream.toByteArray
}
override def deserialize(data: Array[Byte]): Any = {
val ois = new ObjectInputStream(new ByteArrayInputStream(data))
val value = ois.readObject
ois.close()
value
}
}
Perks of being up to date. Will upgrade the version, posted it just in case somebody needs it.

Deserialize binary Thrift message in Scala

I am trying to desrialize a binary message in Scala:
val deserializer = new TDeserializer(new TBinaryProtocol.Factory());
try {
val obj = deserializer.deserialize(new ClientError{}, input._2.toArray)
Where ClientError is the trait generated with Scrooge from a Thrift file. The problem is, that deserialize() expects a TBase object, but TBase is an interface. How do I do this? Do I have to create a new class which implements both?
Thx for any help!
Try this:
def decode(bytes: Array[Byte]): ClientError = {
val protocolFactory = new TBinaryProtocol.Factory
val buffer = new TMemoryInputTransport(bytes)
val proto = protocolFactory.getProtocol(buffer)
ClientError.decode(proto)
}
def getClientError(binaryData: Array[Byte]) : ClientError = {
val tdser = new TDeserializer();
val cliErr = new ClientError()
tdser.deserialize(cliErr, binaryData)
return cliErr
}

How to convert, group and sort java.util.List[java.util.Map[String, Object]]?

I convert this:
import scala.collection.JavaConverters._
import scala.collection.JavaConverters._
val list:java.util.List[java.util.Map[String, Object]] = new java.util.ArrayList[java.util.Map[String, Object]]()
val map1:java.util.Map[String, AnyRef] = new java.util.HashMap[String,AnyRef]()
map1.put("payout", 3.asInstanceOf[AnyRef])
list.add(map1)
val map2:java.util.Map[String, AnyRef] = new java.util.HashMap[String, AnyRef]()
map2.put("payout", 2.asInstanceOf[AnyRef])
list.add(map2)
val map3:java.util.Map[String, AnyRef] = new java.util.HashMap[String, AnyRef]()
map3.put("payout", 2.asInstanceOf[AnyRef])
list.add(map3)
val map4:java.util.Map[String, AnyRef] = new java.util.HashMap[String, AnyRef]()
map4.put("payout", 1.asInstanceOf[AnyRef])
list.add(map4)
println(list)
val result = list.asScala
//result Buffer({payout=3}, {payout=2}, {payout=2}, {payout=1})
And i wish:
list.asScala.groupBy(_("payout")).toList save its ordering (sort by payout)
but .toList.sortBy(_._1) throw error:
error: No implicit Ordering defined for java.lang.Object.
val result = list.groupBy(_("payout")).toList.sortBy(_._1)
This gives a result, but I don't know if its what you wanted:
val result = list.asScala.map(_.asScala).groupBy(_("payout")).toList.sortWith(_._1.asInstanceOf[Int] > _._1.asInstanceOf[Int])
I added map(.asScala) in order to convert your java maps to scala maps. The group by value is a java.lang.Object which does not have an ordering; using sortWith(._1.asInstanceOf[Int] > _._1.asInstanceOf[Int]) I cast it to Int in order to sort it. This will of course crash if some other object is used, but there is no way to order an object that you don't know anything about.