How to save a Akka Stream of Case class to Kafka directly? - scala

I am able to save my data to Kafka in form of String like this :
val producerSettings = ProducerSettings(actorSystem, new StringSerializer, new StringSerializer)
.withBootstrapServers("localhost:9092")
def kafkaSink(source: Source[ModbusMessage, NotUsed]) = source.map(a => s"Id:${a.sensorId}:Values:${a.values.mkString(",")}").map { m =>
new ProducerRecord[String, String]("sampleData", m)
}.
runWith(Producer.plainSink(producerSettings))
, but is there a way to save my case class directly into Kafka. Like if I want to save my data in form of My case class ModbusMessage.
If someone can provide me a quick example that would be great !
Thanks.
Help is appreciated !

Kafka's model of a data message, which consists of a message key and a message value, is based on raw bytes (byte[]) for keys and values. So you need to provide a serializer that converts your case class into byte[]. In your example above, you have configured the use of StringSerializer for both keys and values, and the serializer converts String -> byte[].
but is there a way to save my case class directly into Kafka. Like if I want to save my data in form of My case class ModbusMessage.
If you have a case class ModbusMessage, then you need to implement + configure a ModbusMessage -> byte[] serializer. You can implement such a serializer yourself, but as others have commented on your question you can also opt for serialization frameworks such as Avro or Protobuf. You may also want to look at e.g. https://github.com/scala/pickling.
Do I need to convert it into Json using library like liftJson and then save it as a String inside Kafka ?
No, you don't need to. You can -- and most probably also should -- convert directly from your case class ModbusMessage to byte[] (and, for deserialization, in the opposite direction). There's no point in converting your case class first to JSON, because the JSON representation, too, must be serialized to byte[] so that it can be sent to Kafka (so you'd incur twice the conversion cost here).

Related

Efficient JSON-toJSON transformations with spray-json

I have a scenario similar to this: an Akka HTTP service calls another service and performs some transformations on its JSON response. Let's say it replaces "http" with "https" on every "link" attribute's value.
Right now the implementation is something like:
def route: Route =
callToAnotherService(request) { eventualJsonResponse =>
complete(
eventualJsonResponse.flatMap(
(jsonResponse: HttpResponse) => {
Unmarshal(jsonResponse.entity.withContentType(MediaTypes.`application/json`))
.to[JsValue]
.map(replaceHttpInLinks)
.flatMap(Marshal(_).to[ResponseEntity])
.map(responseEntity => jsonResponse.copy(entity = responseEntity)))
}
)
)
}
The transformation method has the following signature:
def replaceHttpInLinks(jsValue: JsValue): JsValue = {
// Recursively find "link" attributes and replace protocol
}
As you can see, the called service's JSON response is unmarshalled into a JsValue object and then this object is used to perform the changes.
That response can be huge, and I'm concerned about both performance and memory consumption.
I was looking for a way of making those changes without unmarshalling the whole JSON document, and hopefully without introducing foreign libraries (Play JSON or others). I was thinking of something event based, along the lines of the old SAX API for XML.
Does anyone come up with any idea to achieve it?
I think that with Spray is more complicated because it will try to build the JsValue from the body of the HTTPRequest. My suggestion is to use Circe and use HCursor to unmarshall manually. Take a look to some exampl here.
You can integrate circe with Akka: https://github.com/hseeberger/akka-http-json

Kafka Streams: POJO serialization/deserialization

What class/method in Kafka Streams can we use to serialize/deserialize Java object to byte array OR vice versa? The following link proposes the usage of ByteArrayOutputStream & ObjectOutputStream but they are not thread safe.
Send Custom Java Objects to Kafka Topic
There is another option to use the ObjectMapper, ObjectReader (for threadsafe), but that's converting from POJO -> JSON -> bytearray. Seems this option is an extensive one. Wanted to check if there is a direct way to translate object into bytearray and vice versa which is threadsafe. Please suggest
import org.apache.kafka.common.serialization.Serializer;
public class HouseSerializer<T> implements Serializer<T>{
private Class<T> tClass;
public HouseSerializer(){
}
#SuppressWarnings("unchecked")
#Override
public void configure(Map configs, boolean isKey) {
tClass = (Class<T>) configs.get("POJOClass");
}
#Override
public void close() {
}
#Override
public byte[] serialize(String topic, T data) {
//Object serialization to be performed here
return null;
}
}
Note: Kafka version - 0.10.1
Wanted to check if there is a direct way to translate object into bytearray
I would suggest you look at using Avro serialization with the Confluent Schema Registry, if possible, but not required. JSON is a good fall back, but takes more space "on the wire", and so MsgPack would be the alternative there.
See Avro code example here
Above example is using the avro-maven-plugin to generate a LogLine class from the src/main/resources/avro schema file.
Otherwise, it's up to you for how to serialize your object into a byte array, for example, a String is commonly packed as
[(length of string) (UTF8 encoded bytes)]
While booleans are a single 0 or 1 bit
which is threadsafe
I understand the concern, but you aren't commonly sharing deserialized data between threads. You send/read/process a message for each independent one.

(Scala) Recognize the type a serialized object by Kryo which is sent through network using UDP

Currently I am having a problem of recognizing a receiving serialized object which is sent through network using UDP.
I have an abstract class called MsgType:
sealed abstract class MsgType
case class Msg(message : String) extends MsgType
case class End() extends MsgType
in that, Msg means a normal message whilst End means a termination request at client side.
===========================================================================
At server side, I have a function call isMessage to detect whether it's a normal message or the termination request:
def isMessage(message: AnyRef): Boolean = {
message match{
case End => false
case Msg(message) => true
}
}
===========================================================================
Here is the code using Kryo for receiving the message sent from client:
val inputString = kyro.readObject(input, classOf[MsgType])
println("incoming Message: " + isMessage(inputString))
However, when I run the code, there is an exception named:
Exception in thread "main" com.esotericsoftware.kryo.KryoException: Error
constructing instance of class: MsgType
I know it's because MsgType is an abstract class....
Could anyone suggest me a better solution to deal with this problem of recognizing the type of received serialized object?
Thanks and Best Regards,
Long.
Unfortunately Kryo doesn't work this way. It should know the type of object being deserialized before deserialization is started. It means you need to save this information in your serialized data as well. Conviently Kryo provides writeClassAndObject and readClassAndObject that do exactly that.
Also I expect that you'll have another issue related to the usage of the case classes. Unless you provided custom deserializators, Kryo will fail because your Msg case class doesn't have a default constructor (i.e. one with no parameters). You may consider using twitter chill - a Scala wrapper for Kryo. See alao Handling case classes in twitter chill (Scala interface to Kryo)?

Custom field serializer / deserializer

I am able to load some entities into ElasticSearch with out-of-the box Spring Data ElasticSearch. The thing is my model classes contemplate many properties and for some of those I don't want my representation (typing) be reflected into ES.
#Field(serializer = MyCustomSerializer, deserializer = MyCustomDeserializer)
private SomeClass someObject;
I'd like, for example, for SomeClass to be serialized as a String, so I can query it as such. Also, when reading data from ES, I want to be able to write a custom deserializer (MyCustomDeserializer) to convert this String into my own model.
Is there any way I can accomplish that??
Thanks
Spring Data ElasticSearch uses jackson to serialize the fields, so you could achieve custom serialization logic by defining:
#JsonSerialize(using = MyCustomSerializer.class)
#JsonDeserialize(using = MyCustomDeserializer.class)
private SomeClass someObject;
Or configure your mapping globally in a jackson ObjectMapper, replacing the default EntityMapper from spring-data-elasticsearch. More on that here.

Overcoming changes to persistent message classes in Akka Persistence

Let's say I start out with a Akka Persistence system like this:
case class MyMessage(x: Int)
class MyProcessor extends Processor {
def receive = {
case Persistent(m # MyMessage) => m.x
//...
}
}
And then someday I change it to this:
case class MyMessage(x: Int, y: Int)
class MyProcessor extends Processor {
def receive = {
case Persistent(m # MyMessage) => m.x + m.y
//...
}
}
After I deploy my new system, when the instance of MyProcessor tries to restore its state, the journaled messages will be of the former case class. Because it is expecting the latter type, it will throw an OnReplayFailure, rendering the processor useless. Question is: if we were to assume an absent y can equal 0 (or whatever) is there a best practice to overcome this? For example, perhaps using an implicit to convert from the former message to the latter on recovery?
Akka uses Java serialization by default and says that for a long-term project we should use a proper alternative. This is because Java serialization is very difficult to evolve over time. Akka recommends using either Google Protocol Buffers, Apache Thrift or Apache Avro.
With Google Protocol Buffers, for example, in your case you'd be writing something like:
if (p.hasY) p.getY else 0
Akka explains all that in a nice article (admittedly it's not very Google-able):
http://doc.akka.io/docs/akka/current/scala/persistence-schema-evolution.html
and even explains your particular use case of adding a new field to an existing message type:
http://doc.akka.io/docs/akka/current/scala/persistence-schema-evolution.html#Add_fields
A blog post the Akka documentation recommends for a comparison of the different serialization toolkits:
http://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html