Kafka Streams: POJO serialization/deserialization - apache-kafka

What class/method in Kafka Streams can we use to serialize/deserialize Java object to byte array OR vice versa? The following link proposes the usage of ByteArrayOutputStream & ObjectOutputStream but they are not thread safe.
Send Custom Java Objects to Kafka Topic
There is another option to use the ObjectMapper, ObjectReader (for threadsafe), but that's converting from POJO -> JSON -> bytearray. Seems this option is an extensive one. Wanted to check if there is a direct way to translate object into bytearray and vice versa which is threadsafe. Please suggest
import org.apache.kafka.common.serialization.Serializer;
public class HouseSerializer<T> implements Serializer<T>{
private Class<T> tClass;
public HouseSerializer(){
}
#SuppressWarnings("unchecked")
#Override
public void configure(Map configs, boolean isKey) {
tClass = (Class<T>) configs.get("POJOClass");
}
#Override
public void close() {
}
#Override
public byte[] serialize(String topic, T data) {
//Object serialization to be performed here
return null;
}
}
Note: Kafka version - 0.10.1

Wanted to check if there is a direct way to translate object into bytearray
I would suggest you look at using Avro serialization with the Confluent Schema Registry, if possible, but not required. JSON is a good fall back, but takes more space "on the wire", and so MsgPack would be the alternative there.
See Avro code example here
Above example is using the avro-maven-plugin to generate a LogLine class from the src/main/resources/avro schema file.
Otherwise, it's up to you for how to serialize your object into a byte array, for example, a String is commonly packed as
[(length of string) (UTF8 encoded bytes)]
While booleans are a single 0 or 1 bit
which is threadsafe
I understand the concern, but you aren't commonly sharing deserialized data between threads. You send/read/process a message for each independent one.

Related

How to convert the Core message to a JMS message?

I need to convert org.apache.activemq.artemis.core.message.impl.CoreMessage to javax.jms.Message. How can i do this? Maybe there is a required util method somewhere in the code, or it needs to be done manually?
I want to intercept the following events:
afterSend
afterDeliver
messageExpired
And then send the message to a direct endpoint Camel route which requires a javax.jms.Message instance.
My recommendation would be to simply copy the message and route the copy to the address of your choice, e.g.:
public class MyPlugin implements ActiveMQServerMessagePlugin {
ActiveMQServer server;
#Override
public void registered(ActiveMQServer server) {
this.server = server;
}
#Override
public void afterSend(ServerSession session,
Transaction tx,
Message message,
boolean direct,
boolean noAutoCreateQueue,
RoutingStatus result) throws ActiveMQException {
Message copy = message.copy();
copy.setAddress("foo");
try {
server.getPostOffice().route(copy, false);
} catch (Exception e) {
e.printStackTrace();
}
}
}
Then a Camel consumer can pick up the message and do whatever it needs to with it. This approach has a few advantages:
It's simple. It would technically be possible to convert the org.apache.activemq.artemis.api.core.Message instance into a javax.jms.Message instance, but it's not going to be straight-forward. javax.jms.Message is a JMS client class. It's not used on the server anywhere so there is no existing facility to do any kind of conversion to/from it.
It's fast. If you use a javax.jms.Message you'd also have to use a JMS client to send it and that would mean creating and managing JMS resources like a javax.jms.Connection and a javax.jms.Session. This is not really something you want to be doing in a broker plugin as it will add a fair amount of latency. The method shown here uses the broker's own internal API to deal with the message. No client resources are necessary.
It's asynchronous. By sending the message and letting Camel pick it up later you don't have to wait on Camel at all which reduces the latency added by the plugin.
org.apache.activemq.artemis.jms.client.ActiveMQMessage
This looks like the implementation of javax.jms.Message with an underlying org.apache.activemq.artemis.api.core.client.ClientMessage which extends CoreMessage

Sleuth tracing is not working for transactional Kafka producers

Currently, we are using transactional Kafka producers. What we have noticed is that the tracing aspect of Kafka is missing which means we don't get to see the instrumentation of Kafka producers thereby missing the b3 headers.
After going through the code, we found that the post processors are not invoked for transactional producers which means the TracingProducer is never created by the TraceProducerPostProcessor. Is there a reason for that? Also, what is the work around for enabling tracing for the transactional producers? It seems there is not a single place easily to create a tracing producer (DefaultKafkaProducerFactory #doCreateTxProducer is private)
Screen shot attached(DefaultKafkaProducerFactory class). In the screenshot you can see the post processors are invoked only for raw producer not for the case for transactional producer.
Your help will be much appreciated.
Thanks
DefaultKafkaProducerFactory#createRawProducer
??
createRawProducer() is called for both transactional and non-transactional producers:
Something else is going on.
EDIT
The problem is that sleuth replaces the producer with a different one, but factory discards that and uses the original.
https://github.com/spring-projects/spring-kafka/issues/1778
EDIT2
Actually, it's a good thing that we discard the tracing producer here; Sleuth also wraps the factory in a proxy and wraps the CloseSafeProducer in a TracingProducer; but I see the same result with both transactional and non-transactional producers...
#SpringBootApplication
public class So67194702Application {
public static void main(String[] args) {
SpringApplication.run(So67194702Application.class, args);
}
#Bean
public ApplicationRunner runner(ProducerFactory<String, String> pf) {
return args -> {
Producer<String, String> prod = pf.createProducer();
prod.close();
};
}
}
Putting a breakpoint on the close()...
Thanks Gary Russell for the very quick response. The createRawConsumer is effectivly called for both transactional and non transactional consumers.
Sleuth is using the TraceConsumerPostProcessor to wrap a Kafka consumer into a TracingConsumer. As the ProducerPostProcessor interface extends the Function interface, we may suppose the result of the function could/should be used but the createRawConsumer method of the DefaultKafkaProducerFactory is applying the post processors without using the return type. Causing the issue in this specific case.
So, couldn't we modify the implementation of the createRawConsumer to assign the result of the post processor. If not, wouldn't it be better to have post processors extending a Consumer instead of a Function?
Successful test made by overriding the createRawConsumer method as follow
#Override
protected Producer<K, V> createRawProducer(Map<String, Object> rawConfigs) {
Producer<K, V> kafkaProducer = new KafkaProducer<>(rawConfigs, getKeySerializerSupplier().get(), getValueSerializerSupplier().get());
for (ProducerPostProcessor<K, V> pp : getPostProcessors()) {
kafkaProducer = pp.apply(kafkaProducer);
}
return kafkaProducer;
}
Thank you for your help.

Java Object serialization in scala

Pardon me as I am new to Scala.
I have created a case class which encapsultes some information. One of the objects i want to take in for that is of JavaClass. As i am using in spark, i would need it to be serializable. How can i do that?
                              
Java class
public class Currency {
public Currency(final BigDecimal amount, final CurrencyUnit unit) {
//Doing Something
}
}
case class ReconEntity(inputCurrency : Currency, outputCurrency : Currency)
Using implicit i want to have my serialization code for Currency so that spark can work on ReconEntity.
Firstly, have you tried some RDD operations using your Currency and ReconEntity classes? Do you actually get an error? Spark is able to handle RDD operations with apparently non-serializable Scala classes as values, at least (you can try this in the spark-shell, though possibly this might require the Kryo serializer to be enabled).
Since you state that you don't own the Currency class, you can't add extends Serializable, which would be the simplest solution.
Another approach is to wrap the class with a serializable wrapper, as described in this article: Beating Serialization in Spark - example code copied here for convenience:
For simple classes, it is easiest to make a wrapper interface that
extends Serializable. This means that even though UnserializableObject
cannot be serialized we can pass in the following object without any
issue
public interface UnserializableWrapper extends Serializable {
public UnserializableObject create(String prm1, String prm2);
}
The object can then be passed into an RDD or Map function using the
following approach
UnserializableWrapper usw = new UnserializableWrapper() {
public UnserializableObject create(String prm1, String prm2) {
return new UnserializableObject(prm1,prm2);
} }
If the class is merely a data structure, without significant methods, then it might be easier to unpack its fields into your RDD types (in your case, ReconEntity) and discard the class itself.
If the class has methods that you need, then your other (ugly) option is to cut-and-paste code into a new serializable class or into helper functions in your Spark code.

How to save a Akka Stream of Case class to Kafka directly?

I am able to save my data to Kafka in form of String like this :
val producerSettings = ProducerSettings(actorSystem, new StringSerializer, new StringSerializer)
.withBootstrapServers("localhost:9092")
def kafkaSink(source: Source[ModbusMessage, NotUsed]) = source.map(a => s"Id:${a.sensorId}:Values:${a.values.mkString(",")}").map { m =>
new ProducerRecord[String, String]("sampleData", m)
}.
runWith(Producer.plainSink(producerSettings))
, but is there a way to save my case class directly into Kafka. Like if I want to save my data in form of My case class ModbusMessage.
If someone can provide me a quick example that would be great !
Thanks.
Help is appreciated !
Kafka's model of a data message, which consists of a message key and a message value, is based on raw bytes (byte[]) for keys and values. So you need to provide a serializer that converts your case class into byte[]. In your example above, you have configured the use of StringSerializer for both keys and values, and the serializer converts String -> byte[].
but is there a way to save my case class directly into Kafka. Like if I want to save my data in form of My case class ModbusMessage.
If you have a case class ModbusMessage, then you need to implement + configure a ModbusMessage -> byte[] serializer. You can implement such a serializer yourself, but as others have commented on your question you can also opt for serialization frameworks such as Avro or Protobuf. You may also want to look at e.g. https://github.com/scala/pickling.
Do I need to convert it into Json using library like liftJson and then save it as a String inside Kafka ?
No, you don't need to. You can -- and most probably also should -- convert directly from your case class ModbusMessage to byte[] (and, for deserialization, in the opposite direction). There's no point in converting your case class first to JSON, because the JSON representation, too, must be serialized to byte[] so that it can be sent to Kafka (so you'd incur twice the conversion cost here).

Why are static GWT fields not transferred to the client?

ConfigProperty.idPropertyMap is filled on the server side. (verified via log output)
Accessing it on the client side shows it's empty. :-( (verified via log output)
Is this some default behaviour? (I don't think so)
Is the problem maybe related to the inner class ConfigProperty.IdPropertyMap, java.util.HashMap usage, serialization or some field access modifier issue?
Thanks for your help
// the transfer object
public class ConfigProperty implements IsSerializable, Comparable {
...
static public class IdPropertyMap extends HashMap
implements IsSerializable
{
...
}
protected static IdPropertyMap idPropertyMap = new IdPropertyMap();
...
}
// the server service
public class ManagerServiceImpl extends RemoteServiceServlet implements
ManagerService
{
...
public IdPropertyMap getConfigProps(String timeToken)
throws ConfiguratorException
{
...
}
}
added from below after some good answers (thanks!):
answer bottom line: static field sync is not implemented/supported currently. someone/me would have to file a feature request
just my perspective (an fallen-in-love newby to GWT :-)):
I understand pretty good (not perfect! ;-)) the possible implications of "global" variable syncing (a dependency graph or usage of annotations could be useful).
But from a new (otherwise experienced Java EE/web) user it looks like this:
you create some myapp.shared.dto.MyClass class (dto = data transfer objects)
you add some static fields in it that just represent collections of those objects (and maybe some other DTOs)
you can also do this on the client side and all the other static methods work as well
only thing not working is synchronization (which is not sooo bad in the first place)
BUT: some provided annotation, let's say #Transfer static Collection<MyClass> myObjList; would be handy, since I seem to know the impact and benefits that this would bring.
In my case it's rather simple since the client is more static, but would like to have this data without explicitely implementing it if the GWT framework could do it.
static variables are purely class variable It has nothing to do with individual instances. serialization applies only to object.
So ,your are getting always empty a ConfigProperty.idPropertyMap
The idea of RPC is not that you can act as though the client and the server are exactly the same JVM, but that they can share the objects that you pass over the wire. To send a static field over the wire, from the server to the client, the object stored in that field must be returned from the RPC method.
Static properties are not serialized and sent over the wire, because they do not belong to a single object, but to the class itself.
public class MyData implements Serializable {
protected String name;//sent over the wire, each MyData has its own name
protected String key;
protected static String masterKey;//All objects on the server or client
// share this, it cannot be sent over RPC. Instead, another RPC method
// could access it
}
Note, however, that it will only be that one instance which will be shared - if something else on the server changes that field, all clients which have asked for a copy will need to be updated