Deserializing LongMap in Kryo - scala

I have a class with a field of scala.collection.mutable.LongMap type.
After serializing it with Kryo I attempt to deserialize the object and get the following exception:
com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: Can not set final scala.collection.mutable.LongMap field com.name.of.field to scala.collection.mutable.HashMap
Serialization trace:
field (com.name.of)
at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:626) ~[com.esotericsoftware.kryo.kryo-2.21.jar:na]
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) ~[com.esotericsoftware.kryo.kryo-2.21.jar:na]
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648) ~[com.esotericsoftware.kryo.kryo-2.21.jar:na]
at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605) ~[com.esotericsoftware.kryo.kryo-2.21.jar:na]
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) ~[com.esotericsoftware.kryo.kryo-2.21.jar:na]
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) ~[com.esotericsoftware.kryo.kryo-2.21.jar:na]
IIUC the LongMap is serialized as HashMap and then deserialization fails as the HashMap can't be written to the LongMap field.
I manually ran something like https://github.com/romix/akka-kryo-serialization/blob/master/src/test/scala/com/romix/scala/serialization/kryo/MapSerializerTest.scala#L78 and confirmed that serialized LongMap is deserialized into a HashMap.
Any idea how to properly read/write this Object so the LongMap will be read as LongMap instead of HashMap?
Do I need to use a proxy class? write a custom serializer/deserializer?
Alternatively, is there a decent serialization library that handles LongMaps properly?
P.S. I would have tagged the question with LongMap but I don't have enough reputation to create new tags.

Yes, you need to add a custom serializer. There is https://github.com/twitter/chill#serializers-for-scala-classes which includes serializers for some types in Scala standard library, but apparently not for LongMap (you may already be using this library, perhaps indirectly). Look at how they do it and write your own.
However, this error shouldn't happen by default. Look for Kryo#register and Kryo#setDefaultSerializer calls in your code (or code you call): are you telling Kryo to serialize/deserialize all scala.collection.mutable.Maps as HashMaps?

Related

Is there a way to name a Processor in Kafka Streams DSL in Scala

I’ve been trying to name a KStream using Kafka Streams DSL in Scala but I cannot find a way to name a processor in org.apache.kafka.streams.scala.kstream.Consumed.
Although there is a java method org.apache.kafka.streams.kstream.Consumed#as but it throws an exception. Does anyone knows what can be done?
ClassCastException invoking Processor. Do the Processor's input types match the deserialized types? Check the Serde setup and change the default Serdes in StreamConfig or provide correct Serdes via method parameters. Make sure the Processor can accept the deserialized input of type key: [B, and value: [B.
Note that although incorrect Serdes are a common cause of error, the cast exception might have another cause (in user code, for example). For example, if a processor wires in a store, but casts the generics incorrectly, a class cast exception could be raised during processing, but the cause would not be wrong Serdes.
org.apache.kafka.streams.errors.StreamsException: ClassCastException invoking Processor. Do the Processor's input types match the deserialized types? Check the Serde setup and change the default Serdes in StreamConfig or provide correct Serdes via method parameters. Make sure the Processor can accept the deserialized input of type key: [B, and value: [B.
Note that although incorrect Serdes are a common cause of error, the cast exception might have another cause (in user code, for example). For example, if a processor wires in a store, but casts the generics incorrectly, a class cast exception could be raised during processing, but the cause would not be wrong Serdes.
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:146)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:236)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:216)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:168)
at org.apache.kafka.streams.processor.internals.SourceNode.process(SourceNode.java:85)
at org.apache.kafka.streams.processor.internals.StreamTask.lambda$process$1(StreamTask.java:678)
at ...
EDIT:
The code that I used:
val someEvents = builder
.stream[String, String]("some_events")
(org.apache.kafka.streams.kstream.Consumed.as("some_event_stream"))
The code I should've used:
val someEvents = builder
.stream[String, String]("some_events")
(org.apache.kafka.streams.kstream.Consumed.as("some_event_stream")
.withKeySerde(Serdes.String())
.withValueSerde(Serdes.String()))
Initial code:
val someEvents = builder
.stream[String, String]("some_events")
(org.apache.kafka.streams.kstream.Consumed.as("some_event_stream"))
I made two mistakes:
Put Consumed parameter in a separate line. (todo needs explanation why it doesn't work in Scala)
Used Java method org.apache.kafka.streams.kstream.Consumed.as and didn't provide SerDes.
This works:
val someEvents = builder.stream[String, String]("some_events")(org.apache.kafka.streams.kstream.Consumed.as("some_event_stream").withKeySerde(Serdes.String()).withValueSerde(Serdes.String()))

Spring stream binds to String instead to pojo

We just upgraded our Spring to:
Spring boot: 2.1.0.RELEASE
Spring cloud: Greenwich.SR1
Spring integration kafka: 3.1.0.RELEASE
Spring kafka: 2.2.7.RELEASE
and we're using
Kafka 2.1.1
We have a topic that more than one type of class instances can be sent to, all extend from the same abstract class. Let's the abstract class named AbstractMessage and there are to subclasses MessageImpl1 and MessageImpl2.
We used to receive it in the consumer as an object (in order to write a log if a mistaken class has been received somehow) and then cast it to the relevant MessageImpl by using if(message instanceof MessageImpl){}
After the upgrade, all the messages were bound to String instead to their classes.
I read here that content-type=application/json binds to a pojo, but even though I added in both input and output it was bound to a string:
spring.cloud.stream.bindings.input.contentType=application/json
spring.cloud.stream.bindings.output.contentType=application/json
Trying to receive the MessageImpl directly got this error:
Caused by: com.fasterxml.jackson.databind.exc.InvalidDefinitionException: Cannot
construct instance of MessageImpl1 (no Creators, like default
construct, exist): cannot deserialize from Object value (no delegate-
or property-based Creator)
at [Source: (byte[])
Any idea how to fix it?
What version did you upgrade from? Show your code and configuration properties. Previous versions used Kryo serialization by default, which is now deprecated in favor of JSON, but your POJOs need to be json-friendly.
The Kryo serialization is deprecated, but you can add the converter.
See the documentation.
Provided MessageConverters
As mentioned earlier, the framework already provides a stack of MessageConverters to handle most common use cases. The following list describes the provided MessageConverters, in order of precedence (the first MessageConverter that works is used):
ApplicationJsonMessageMarshallingConverter: Variation of the org.springframework.messaging.converter.MappingJackson2MessageConverter. Supports conversion of the payload of the Message to/from POJO for cases when contentType is application/json (DEFAULT).
TupleJsonMessageConverter: DEPRECATED Supports conversion of the payload of the Message to/from org.springframework.tuple.Tuple.
ByteArrayMessageConverter: Supports conversion of the payload of the Message from byte[] to byte[] for cases when contentType is application/octet-stream. It is essentially a pass through and exists primarily for backward compatibility.
ObjectStringMessageConverter: Supports conversion of any type to a String when contentType is text/plain. It invokes Object’s toString() method or, if the payload is byte[], a new String(byte[]).
JavaSerializationMessageConverter: DEPRECATED Supports conversion based on java serialization when contentType is application/x-java-serialized-object.
KryoMessageConverter: DEPRECATED Supports conversion based on Kryo serialization when contentType is application/x-java-object.
JsonUnmarshallingConverter: Similar to the ApplicationJsonMessageMarshallingConverter. It supports conversion of any type when contentType is application/x-java-object. It expects the actual type information to be embedded in the contentType as an attribute (for example, application/x-java-object;type=foo.bar.Cat).
When no appropriate converter is found, the framework throws an exception. When that happens, you should check your code and configuration and ensure you did not miss anything (that is, ensure that you provided a contentType by using a binding or a header). However, most likely, you found some uncommon case (such as a custom contentType perhaps) and the current stack of provided MessageConverters does not know how to convert. If that is the case, you can add custom MessageConverter. See User-defined Message Converters.

How to register StringType$ using kryo serializer in spark

I'm trying to use the kryo serializer in spark. I have set spark.kryo.registrationRequired=true to make sure that I'm registering all the necessary classes. Apart from requiring that I register my custom classes, it is asking me to register spark classes as well like StructType.
Although I have registered the spark StringType, it is now crashing saying that I need to register StringType$ as well.
com.esotericsoftware.kryo.KryoException (java.lang.IllegalArgumentException: Class is not registered: org.apache.spark.sql.types.StringType$
Note: To register this class use: kryo.register(org.apache.spark.sql.types.StringType$.class);
Serialization trace:
dataType (org.apache.spark.sql.types.StructField)
fields (org.apache.spark.sql.types.StructType))
I am importing spark implicits in order to read in json. I'm not sure if this is contributing to the problem.
import spark.implicits._
val foo = spark.read.json(inPath).as[MyCaseClass]
I do realize that setting registration required to false will stop this error, but I am not seeing any performance gain in that case so am trying to make sure that I register every necessary class.
I faced the same issue, and after some experiments, I managed to solve it with the following line:
Class.forName("org.apache.spark.sql.types.StringType$")
That way you register the class in Kryo and it stops complaining.
A good reference: https://mail-archives.apache.org/mod_mbox/spark-user/201603.mbox/%3CCAHCfvsSyUpx78ZFS_A9ycxvtO1=Jp7DfCCAeJKHyHZ1sugqHEQ#mail.gmail.com%3E
Cheers

Serialize linked hash map kryo

I use kryo serializer as a serializer in project works with spark and written in scala. I register all the classes I use in my project and there is one class which is not serialize or desiralize which is linked hash map.
Linked hash map registration :
kryo.register(scala.collection.mutable.linkedHashMap[_, _])
I get my data from elastic and the runtime type of properties is linkedHashMap.
For example document look like that : person{age : 20, name : "yarden"}, map to linkedHashMap[String, Any] and if there is another object in person etc.
When I want to deserialize Data which is stored as linkedHashMap(for example collect RDD) the result is just empty object.
I try to use the MapSerializer(another parameter in register function) but it is fail because it is for java linkedHashMap.
I try to search for suitable serializer for scala linkedHashMap but I didn't found.
There is a solution of every time I get a LinkedHashMap convert it to Map and it is work but it is not a good practice.
I think about maybe there is a way to cause runtime type to be map and not linkedHashMap but I didn't find any solution.
I Believe the best practice is to found serializer which is suitable to linkedHashMap of scala but I didn't found any.
Any solution of the things I didn't succeed to solve will be welcomed.
Try using the TraversibleSerializer.
For example:
kryo.register(classOf[mutable.LinkedHashMap[Any, Any]],
new TraversableSerializer[(Any, Any), mutable.LinkedHashMap[Any, Any]](true))

What is the alternate for Datastax cassandra core driver DataType serialize/deserialize methods

We are using scala code run jobs from spark(1.5.2) which connects to cassandra. The new spark-cassandra-connector(1.5) depends on cassandra-driver-core-2.2.0-RC3.
DataType serialize/deserialize methods removed in 2.2.0-RC3.
What is the alternate way to serialize/deserialize?
13: error: value serialize is not a member of com.datastax.driver.core.DataType.CollectionType
[ERROR] implicit def ListString2ByteBuffer(list : List[String]): ByteBuffer =
DataType.list(DataType.text()).serialize(list.asJava, ProtocolVersion.NEWEST_SUPPORTED);
See: Upgrade guide
"DataType has no more references to TypeCodec, so methods that dealt with serialization and deserialization of data types have been removed... These methods must now be invoked on TypeCodec directly."
To obtain TypeCodec you can use something like that:
CodecRegistry.DEFAULT_INSTANCE.codecFor(myDateType)