Kafkastream springcloud kafka join selectKey - apache-kafka

could you please help to configure a spring cloud stream app based on Kafka, I'm facing issue on the selectKey operation.
Let's explain what i m try to reach
2 incoming topics Person, RefGenre
Person contain the key of Refgenre (in value)
public class Person {
String nom;
String prenom;
String codeGenre; <<--- here is the key of the second topic refgenre
}
So I m using the selectKey operator to prepare my stream before the join operation.
a new topic is created with selectByKey (my-app-KSTREAM-KEY-SELECT-0000000004-repartition), and then serialization issue happens :
Exception in thread "my-app-3c57b31c-28e5-4199-b07d-87f8940425ab-StreamThread-1" org.apache.kafka.streams.errors.StreamsException: ClassCastException while producing data to topic my-app-KSTREAM-KEY-SELECT-0000000004-repartition. A serializer (key: org.apache.kafka.common.serialization.StringSerializer / value: statefull.serde.PersonWithGenreSerde) is not compatible to the actual key or value type (key type: java.lang.String / value type: statefull.model.Person). Change the default Serdes in StreamConfig or provide correct Serdes via method parameters (for example if using the DSL, #to(String topic, Produced<K, V> produced) with Produced.keySerde(WindowedSerdes.timeWindowedSerdeFrom(String.class))).
Where can i specify serde for this repartition topic and can i specify the name of this "internal" topic ?
#Bean
public BiFunction<KStream<String, Person>, KTable<String, ReferentielGenre>, KStream<Long, PersonWithGenre>> joinKtable() {
return (persons, referentielGenres) ->
persons.selectKey((k,v) -> v.getCodeGenre())
.join(referentielGenres,
(person, genre) -> new PersonWithGenre(person.getNom(), person.getPrenom(),genre),
Joined.with(Serdes.String(), new PersonWithGenreSerde(), null));
}
here is the full code of my not working job : https://github.com/YohanAlard/joinkstream
Is there a better way to handle this usecase ?

Related

Kafka Streams getting into error state constantly and application is unable to start

I have been working on kafka streams from quite some time. I am stuck in a problem and have't been able to solve it. I am expecting some help from this platform.
So scenario is something like this. There are many customers and many employees working for those customers. There is a single topic storing all this data. I want to stream that topic
and want to build a meteralized store in a way that all the employee under same customer are grouped. I am using open source ListSerializer and ListDeserializer classes for this work.
So topic definition is something like this. c represent customer and e represents employee.
key value
c1-e1 e1
c1-e2 e2
c1-e3 e3
c2-e1 e4
c2-e2 e5
c3-e3 e6
I want this
c1 - [e1, e2, e3]
c2 - [e4, e5, e6]
I have used ArrayListtSerializer class which implements Serializer<ArrayList<T>>, here T is object which is Employee. ArrayListDeserializer implements Deserializer<ArrayList<T>>.
I have written below code to achieve this and code works fine. It read from topic and steam data in a store with customer-id as a key and list of employees as a value against that key.
KStream<String, Employee> source = builder.stream(topicName, consumed.with(Serdes.String(), getSerdeForObject(Employee));
Serde<Employee> employeeSerde = getSerdeForObject(Employee);
Serde<ArrayList<Employee>> employeeArrayListSerde = Serdes.serdeFrom(new ArrayListtSerializer<>(employeeSerde.serializer()), new ArrayListDeserializer<>(employeeSerde.deserializer()));
source.groupBy((k, v) -> v.getCustomer())
.aggregate(() -> new ArrayList<>(), (key, value, aggregate) -> {
aggregate.add(value);
return aggregate;
}, Materialized.<String, ArrayList<Employee>, KeyValueStore<Bytes, byte[]>>as("NewStore")
.withValueSerde(employeeArrayListSerde));
Now I am working to make list serializer and list deserializer classes a generic ones. Such that ArrayListtSerializer must be ListtSerializer which will implement Serializer<List<T>> instead of Serializer<ArrayList<T>>, and vice versa for list deserializer. I have transformed both ArrayListtSerializer and ArrayListDeserializer classes into these ListtSerializer and ListDeserializerand changed my code to this.
KStream<String, Employee> source = builder.stream(topicName, consumed.with(Serdes.String(), getSerdeForObject(Employee));
Serde<Employee> employeeSerde = getSerdeForObject(Employee);
Serde<List<Employee>> employeeListSerde = Serdes.serdeFrom(new ListtSerializer<>(employeeSerde.serializer()), new ListDeserializer<>(employeeSerde.deserializer()));
source.groupBy((k, v) -> v.getCustomer())
.aggregate(() -> new ArrayList<>(), (key, value, aggregate) -> {
aggregate.add(value);
return aggregate;
}, Materialized.<String, List<Employee>, KeyValueStore<Bytes, byte[]>>as("NewStore")
.withValueSerde(employeeListSerde));
After changing my code, streaming is constantly getting into error state and application fails to start. I am getting below below error.
Caused by: org.apache.kafka.streams.errors.StreamsException: A serializer (key: org.apache.kafka.common.serialization.StringSerializer / value: org.apache.kafka.common.serialization.ByteArraySerializer) is not compatible to t
he actual key or value type (key type: java.lang.String / value type: Employee). Change the default Serdes in StreamConfig or provide correct Serdes via method parameters.
at org.apache.kafka.streams.processor.internals.SinkNode.process(SinkNode.java:94)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:143)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:126)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:90)
at org.apache.kafka.streams.kstream.internals.KStreamFilter$KStreamFilterProcessor.process(KStreamFilter.java:43)
at org.apache.kafka.streams.processor.internals.ProcessorNode$1.run(ProcessorNode.java:50)
at org.apache.kafka.streams.processor.internals.ProcessorNode.runAndMeasureLatency(ProcessorNode.java:244)
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:133)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:143)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:126)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:90)
at org.apache.kafka.streams.kstream.internals.KStreamMap$KStreamMapProcessor.process(KStreamMap.java:42)
at org.apache.kafka.streams.processor.internals.ProcessorNode$1.run(ProcessorNode.java:50)
at org.apache.kafka.streams.processor.internals.ProcessorNode.runAndMeasureLatency(ProcessorNode.java:244)
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:133)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:143)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:126)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:90)
at org.apache.kafka.streams.processor.internals.SourceNode.process(SourceNode.java:87)
at org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:302)
... 6 more
Caused by: java.lang.ClassCastException: class Employee cannot be cast to class [B (Employee is in unnamed module of loader 'app'; [B is in module java.base of loader 'bootstrap')
at org.apache.kafka.common.serialization.ByteArraySerializer.serialize(ByteArraySerializer.java:19)
at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(RecordCollectorImpl.java:157)
at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(RecordCollectorImpl.java:101)
at org.apache.kafka.streams.processor.internals.SinkNode.process(SinkNode.java:89)
... 25 more
I have spent enough time to understand the issue but I am still unable to get this. Can someone please help me to solve this issue. Any help will be appreciated.
Caused by: org.apache.kafka.streams.errors.StreamsException: A serializer
key: org.apache.kafka.common.serialization.StringSerializer
value: org.apache.kafka.common.serialization.ByteArraySerializer
is not compatible to the actual key or value type
key type: java.lang.String
value type: Employee
Change the default Serdes in StreamConfig or provide
correct Serdes via method parameters.
The error says is that you are trying to use ByteArraySerializer for serializing the Employee class.
The ByteArraySerializer, simply takes the byte[] and sends it (there is nothing to serialize here).
You therefore need to convert the Employee to a byte[] and send it or use an appropriate serializer for your Employee class, i.e. something that takes the Employee object and converts it into a byte[].
You can either serialize/de-serialize your Employee object to JSON or Avro (to name a few).
For example, You can write a generic JSON serializer and de-serializer for all of your custom objects.
See JsonSerializer, JsonDeserializer in Spring (even if you are not using Spring, you can take a look at the code)
and then create a Serde out of the serializer and de-serializer.

Is there any way to publish messages to particular partition in kafka without using the Message key?

I have millions of records each with unique identifier.
All the records are categorized by series number, let's say 10k records belong to series-1, another 10k to series-2 and so on..
Now I want to publish the all series-1 records to partition-1, all series-2 to partition-2 and so on..
To achieve this I don't want to use the message key, is there any other alternative?
I am new to kafka, Please correct me if the question is wrong or not have proper details?
You can use the below methods to publish a message on a specific partition
Simple Kafka Producer
/**
Creates a record to be sent to a specified topic and partition
**/
public ProducerRecord(String topic, Integer partition, K key, V value) {
this(topic, partition, null, key, value, null);
}
Basic example to publish a message on the partition
Properties properties = new Properties();
properties.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, <bootstrap server detail>);
properties.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
properties.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
//Define partition based on condition series-1 to 2 series-2 to 2
int partition=getPartitionOnCondition.....
String topic=""..
Producer<String, String> producer = new KafkaProducer<String,String>(
properties);
ProducerRecord<String, String> record = new ProducerRecord<String, String>(topic, partition, key, value);
producer.send(record);
Custom Partitioner
You can also use a custom partitioner for a producer or Stream partitioner
https://kafka.apache.org/documentation.html
Custom Stream Partitioner(In case you are using Kafka Stream)
If you are using Kafka Stream. It also provides a way to enable Custom Partitioner around Kafka Stream
https://kafka.apache.org/23/javadoc/org/apache/kafka/streams/processor/StreamPartitioner.html
Its better to create a Custom Partitioner class for your producer application.
For every record you publish with the Producer application, given custom partitioner's partition() method will be invoked with the messages key&value. There you may write your logic to parse the field to determine the partition number to which message should be written.
Create a custom partitioner class similar to below,
public class CustomPartitioner extends DefaultPartitioner {
#Override
public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
Integer partitionNumber = null;
if (valueBytes != null) {
String valueStr = new String(valueBytes);
/* code to extract the partition number from value*/
/*assign partitionNumber decided based on the value*/
}
return partitionNumber;
}
}
Assign the partitioner in your Producer class and start publishing the messages
props.put("partitioner.class", "com.example.CustomPartitioner");

Receiving Kafka Key in spring boot kafka listener

I am new in Spring Kafka. I have an microservice which sends message with a kafka key which is an user defined object.
1) First microservice sends message to Kafka with a key which is instance of MyKey object.
2) What I need to do is, to listen that topic and get this message with the key, and create a new key by using that Key.
Lets say that the message is send by the key which is myKey. And what I want to do in the listener is to create a new extended key as:
#KafkaListener(groupId = Bindings.CONSUMER_GROUP_DATA_CLEANUP, topics = "users")
public void process( #Payload MyMessage myMessage){
MyExtended myExtendedKey= new MyExtendedKey(myKey.getX(), myKey.getY());
....
....
kafkaTemplate.send(TOPIC, myExtendedKey, message);
}
I do not know how can I get the key of the message which is sent in the listener.
Please read the documentation.
...
Finally, metadata about the message is available from message headers. You can use the following header names to retrieve the headers of the message:
KafkaHeaders.RECEIVED_MESSAGE_KEY
KafkaHeaders.RECEIVED_TOPIC
KafkaHeaders.RECEIVED_PARTITION_ID
KafkaHeaders.RECEIVED_TIMESTAMP
KafkaHeaders.TIMESTAMP_TYPE
The following example shows how to use the headers:
#KafkaListener(id = "qux", topicPattern = "myTopic1")
public void listen(#Payload String foo,
#Header(KafkaHeaders.RECEIVED_MESSAGE_KEY) Integer key,
#Header(KafkaHeaders.RECEIVED_PARTITION_ID) int partition,
#Header(KafkaHeaders.RECEIVED_TOPIC) String topic,
#Header(KafkaHeaders.RECEIVED_TIMESTAMP) long ts
) {
...
}
The offset is also available.
The easiest way to get the key, value, and metadata for a Message using #KafkaListener is by using a ConsumerRecord in your KafkaListener function instead receive only the payload as a value record.
#KafkaListener(topics = "any-topic")
void listener(ConsumerRecord<String, String> record) {
System.out.println(record.key());
System.out.println(record.value());
System.out.println(record.partition());
System.out.println(record.topic());
System.out.println(record.offset());
}
Does not have beautiful annotations, but it works, Also if you want to receive records from a Kafka topic, process those records, and send them again to another Kafka topic, I would recommend you take a look at Kafka Streams API.

Can I use Kafka streams read and write messages of different types?

I'm writing an app that uses Kafka streams. It reads from topic A, makes some transformations, and writes to topic B. During the transformations the values are grouped by key so the output key, value types are different than the input value types.
Kafka streams use Serdes of a specific type (e.g. String serdes serializes and deserializes strings) for both serialization and deserialization, so it won't work after the data is transformed. How can I define different serializers and deserializers within the Streams API?
sure, you can
when you either create a stream, invoke groupBy or write output to some topic, you could provide Serde or Serialized. Example:
Serde<String> stringSerde = Serdes.String();
Consumed<String, String> consumed = Consumed.with(stringSerde, stringSerde);
Produced<String, YourCustomItem> produced = Produced.with(stringSerde, new JsonSerde<>(YourCustomItem.class));
KStream<String, String> kStream = streamsBuilder.stream("sourceTopicName", consumed);
KStream<String, YourCustomItem> transformedKStream = kStream.mapValues((key, value) -> new YourCustomItem());
transformedKStream.to("destinationTopicName", produced);
transformedKStream.groupByKey(Serialized.with(Serdes.String(), new JsonSerde<>(YourCustomItem.class)));
where JsonSerde is from spring-kafka dependency.
or you could use the following Serde:
Serializer<JsonNode> jsonSerializer = new JsonSerializer();
Deserializer<JsonNode> jsonDeserializer = new JsonDeserializer();
Serde<JsonNode> jsonSerde = Serdes.serdeFrom(jsonSerializer, jsonDeserializer);

Apache Kafka : Check existence of message in a Topic

I have a situation where i need to check whether a particular message already exists in a topic or not, i need absolutely no duplicates in the topic.
Can any one suggest any graceful way of doing this, rather than consuming all the messages and checking against them.
I do not consider myself an expert in Kafka, but I think what you pretend is "against" the essence of Kafka.
However I come out with a solution using the Kafka Streams library for Java. Basically, the process is the following:
Map every message into a new key-value where the key is a combination of the previous key and its value: (key1, message1) -> (key1-message1, message1)
Group the messages using the keys, as a result of this operation you obtain a KGroupedStream.
Apply a reduce function, modifying the value to some custom value such as the string "Duplicated value".
Convert the resulting KTable after the reduce into a KStream and push it into a new Kafka Topic.
There are so many assumptions in the previous explanation, I am going to provide some code in order to give some light:
KStreamBuilder builder = new KStreamBuilder();
KStream<String, String> resources = builder.stream("topic-where-the-messages-are-sent");
KeyValueMapper<String, String, KeyValue<String,String>> kvMapper = new KeyValueMapper<String, String, KeyValue<String,String>>() {
public KeyValue<String, String> apply(String key, String value) {
return new KeyValue<String, String>(key + "-" + value, value);
}
};
Reducer<String> reducer = new Reducer<String>() {
public String apply(String value1, String value2) {
return "Duplicated message";
}
};
resources.map(kvMapper)
.groupByKey()
.reduce(reducer, "test-store-name")
.toStream()
.to("unique-message-output");
KafkaStreams streams = new KafkaStreams(builder, props);
streams.start();
Have in mind that this is probably not an optimal solution and maybe you would not consider it as a "graceful" way of solving your problem.
I hope it helps.