How to do message filtering at Kafka record level when using SpringCloudStream? - apache-kafka

I am using Spring Cloud Stream (SCS) with Kafka as a binder.
I would like to do low-level filtering on records based on the Kafka header. What would be the recommended approach?
The filtered message should be ignored and the offset should be committed.
I was thinking about configuring a RecordFilterStrategy.

RecordFilterStrategy is not supported by Spring Cloud Stream.
You can add ListenerContainerCustomizer bean example here and add a RecordInterceptor to the listener container. If the interceptor returns null, the listener is not called and the offset will be committed, as if the listener was called and exited normally.

Related

Runtime consume record from the offset in kafka spring boot

I want to read the record from the Kafka runtime passing parameter(offset).
I am using a #KafkaListener but in that, I am unable to set the offset runtime of the user request. And if no offset is passed it will consume the latest records. Any help is appreciated.
The latest 2.8 release has a new feature where you can use the KafkaTemplate to receive a specific record at a specific offset.
https://docs.spring.io/spring-kafka/docs/current/reference/html/#kafka-template-receive
If you want to receive all records from that offset, use the seek mechanisms provided by the container.
https://docs.spring.io/spring-kafka/docs/current/reference/html/#seek

Fully Transactional Spring Kafka Consumer/Listener

Currently, I have a Kafka Listener configured with a ConcurrentKafkaListenerContainerFactory and a SeekToCurrentErrorHandler (with a DeadLetterPublishingRecoverer configured with 1 retry).
My Listener method is annotated with #Transactional (and also all the methods in my Services that interact with the DB).
My Listener method does the following:
Receive message from Kafka
Interact with several services that save different parts of the received data to the DB
Ack message in Kafka (i.e., commit offset)
If it fails somewhere in the middle, it should rollback and retry until max retries.
Then send message to DLT.
I'm trying to make this method fully transactional, i.e., if something fails all previous changes are rolled back.
However, the #Transactional annotation in the Listener method is not enough.
How can I achieve this?
What configurations should I employ to make the Listener method fully transactional?
If you are not also publishing to Kafka from the listener, there is no need (or benefit) to using Kafka transactions; just overhead. The STCEH + DLPR is enough.
If you are also publishing to Kafka (and want those to be rolled back too), then see the documentation - configure a KafkaTransactionManager in the listener container.

Does GlobalKTable maintain data after application restart?

I'm working with Spring Cloud Streams and I have a BiFunction that receives a KStream and a GlobalKTable. I dont want to lose the GlobalKTable data after my application restarts, but it's what is happening.
#Bean
public BiFunction<KStream<String, MyClass1>, GlobalKTable<String, MyClass2>, KStream<String, MyClass3>> process() {
...
}
I've also configured the "materializedAs" property:
spring.cloud.stream.kafka.streams.bindings.process-in-1.consumer.materializedAs: MYTABLE
I Have a topic A that have a retention time of 1 week. So, if a message from topic A was erased due retention time and my application restarts, the GlobalKTable doesn't find this message.
The GlobalKTable data should really be erased when my application restarts?
GlobalKTable always restores from the input topic directly. It builds the state store based on the input topic. If the state store is already there and in sync with the input topic, I believe the restore on startup will be faster (therefore, if you are using Spring for Apache Kafka < 2.7, you need to do what Gary suggested above). However, if the input topic is completely removed, then the state store needs to be rebuilt entirely from scratch with the new input topic. That is the reason, why you are not seeing any data restored on startup after deleting the topic. This thread has some more details on this topic.
See the binder documentation.
By default, the Kafkastreams.cleanup() method is called when the binding is stopped. See the Spring Kafka documentation. To modify this behavior simply add a single CleanupConfig #Bean (configured to clean up on start, stop, or neither) to the application context; the bean will be detected and wired into the factory bean.
Spring for Apache Kafka 2.7 and later does not remove the state by default any longer: https://github.com/spring-projects/spring-kafka/commit/eff205404389b563849fdd4dceb52b23aeb38f20

How to Handle Deserialization Exception & Converting to New Schema with Spring Cloud Stream?

I am have trouble understanding how to properly handle a deserialization exception within Spring Cloud stream. Primarily because the framework implemented does not support headers and the DLQ is supposed to be a separate schema than the original message. So the process flow needs to be: consume message -> deserialization error -> DlqHandler -> serialize with NEW schema -> send to DLQ
The documentation linked below doesn't give a good idea on if that is even possible. I have seen quite a few examples of SeekToCurrentErrorHandler for Spring-Kafka but those to my knowledge are different implementations and do not match with how I could properly get the deserialization error and then have a section for custom code to serialize into a new format and move from there.
My main question is: Is capturing the deserialization exception and reserializing possible with spring cloud streams (kafka)?
Spring Cloud Documentation for DLQ
Yes, but not using the binding retry or DLQ properties.
Instead, add a ListenerContainerCustomizer bean and customize the binding's listener container with a SeekToCurrentErrorHandler configured for the retries you need and, probably, a subclass of the DeadLetterPublishingRecoverer using an appropriately configured KafkaTemplate and possibly overriding the createProducerRecord method.

Streaming database data to Kafka topic without using a connector

I have a use case where I have to push all my MySQL database data to a Kafka topic. Now, I know I can get this up and running using a Kafka connector, but I want to understand how it all works internally without using a connector. In my spring boot project I already have created a Kafka Producer file where I set all my configuration, create a Producer record and so on.
Has anyone tried this approach before? Can anyone throw some light on this?
Create entity using spring jpa for tables and send data to topic using find all. Use scheduler for fetching data and sending it to topic. You can add your own logic for fetching from DB and also a different logic for sending it to Kafka topic. Like fetch using auto increment, fetch using last updated timestamp or a bulk fetch. Same logic of JDBC connectors can be implemented.
Kakfa Connect will do it in an optimized way.