How to Handle Deserialization Exception & Converting to New Schema with Spring Cloud Stream? - spring-cloud

I am have trouble understanding how to properly handle a deserialization exception within Spring Cloud stream. Primarily because the framework implemented does not support headers and the DLQ is supposed to be a separate schema than the original message. So the process flow needs to be: consume message -> deserialization error -> DlqHandler -> serialize with NEW schema -> send to DLQ
The documentation linked below doesn't give a good idea on if that is even possible. I have seen quite a few examples of SeekToCurrentErrorHandler for Spring-Kafka but those to my knowledge are different implementations and do not match with how I could properly get the deserialization error and then have a section for custom code to serialize into a new format and move from there.
My main question is: Is capturing the deserialization exception and reserializing possible with spring cloud streams (kafka)?
Spring Cloud Documentation for DLQ

Yes, but not using the binding retry or DLQ properties.
Instead, add a ListenerContainerCustomizer bean and customize the binding's listener container with a SeekToCurrentErrorHandler configured for the retries you need and, probably, a subclass of the DeadLetterPublishingRecoverer using an appropriately configured KafkaTemplate and possibly overriding the createProducerRecord method.

Related

Spring cloud stream routing on payload

I want to use spring cloud stream for my microservice to handle event from kafka.
I read from one topic that can hold several JSON payloads (I have one topic since its all messages arrived to it are from the same subject).
I have different cloud function to handle according to the different payload.
How can I rout the incoming event to specific function based on property in its payload?
Say I have JSON message that can have the following properties:
{
"type":"A"
"content": xyz
}
So the input message can have a property A or B
Say I want to call some bean function when the type is A and another bean function when type is B
It is not clear from the question whether you are using the message channel-based Kafka binder or Kafka Streams binder. The comments above imply some reference to KStream. Assuming that you are using the message channel-based Kafka binder, you have the option of using the message routing feature in Spring Cloud Stream. The basic usage is explained in this section of the docs: https://docs.spring.io/spring-cloud-stream/docs/3.2.1/reference/html/spring-cloud-stream.html#_event_routing
You can provide a routing-expression which is a SpEL expression to pass the right property values.
If you want advanced routing capabilities beyond what can be expressed through a SpEL expression, you can also implement a custom MessageRoutingCallback. See this sample application for more details: https://github.com/spring-cloud/spring-cloud-stream-samples/tree/main/routing-samples/message-routing-callback

How to do message filtering at Kafka record level when using SpringCloudStream?

I am using Spring Cloud Stream (SCS) with Kafka as a binder.
I would like to do low-level filtering on records based on the Kafka header. What would be the recommended approach?
The filtered message should be ignored and the offset should be committed.
I was thinking about configuring a RecordFilterStrategy.
RecordFilterStrategy is not supported by Spring Cloud Stream.
You can add ListenerContainerCustomizer bean example here and add a RecordInterceptor to the listener container. If the interceptor returns null, the listener is not called and the offset will be committed, as if the listener was called and exited normally.

Design question around Spring Cloud Stream, Avro, Kafka and losing data along the way

We have implemented a system consisting of several Spring Boot microservices that communicate via messages posted to Kafka topics. We are using Spring Cloud Stream to handle a lot of the heavy lifting of sending and receiving messages via Kafka. We are using Apache Avro as a transport protocol, integrating with a Schema Server (Spring Cloud Stream default implementation for local development, and Confluent for production).


We model our message classes in a common library, that every micro service includes as a dependency. We use ‘dynamic schema generation’ to infer the Avro schema from the shape of our message classes before the Avro serialisation occurs when a microservice acts as a producer and sends a message. The consuming micro service can look up the schema from the registry based on the schema version and deserialise into the message class, which it also has as a dependency.


It works well, however there is one big drawback for us that I wonder if anyone has experienced before and could offer any advice. If we wish to add a new field for example to one of the model classes, we do it in the common model class library and update the of that dependency in the micro service. But it means that we need to update the version of that dependency in every micro service along the chain, even if the in-between micro services do not need that new field. Otherwise, the data value of that new field will be lost along the way, because of the way the micro service consumers deserialise into an object (which might be an out of date version of the class) along the way.

To give an example, lets say we have a model class in our model-common library called PaymentRequest (the #Data annotation is Lombok and juts generates getters and setters from the fields):
#Data
class PaymentRequest {
String paymentId;
String customerId;
}


And we have a micro service called PayService which sends a PaymentRequest message onto Kafka topic:


#Output("payment-broker”)
MessageChannel paymentBrokerTopic();
...

PaymentRequest paymentRequest = getPaymentRequest();

Message<PaymentRequest> message = MessageBuilder.withPayload(paymentRequest).build();
paymentBrokerTopic().(message);

And we have this config in application.yaml in our Spring Boot application:


spring:
cloud:
stream:
schema-registry-client:
endpoint: http://localhost:8071
schema:
avro:
dynamicSchemaGenerationEnabled: true
 bindings:
Payment-broker:
destination: paymentBroker
contentType: application/*+avro

Spring Cloud Stream’s Avro MessageConverter infers the schema from the PaymentRequest object, adds a schema to the schema registry if there is not already a matching one there, and sends the message on Kafka in Avro format.

Then we have a consumer in another micro service, BrokerService, which has this consumer:


#Output("payment-processor”)
MessageChannel paymentProcessorTopic();


#Input(“payment-request”)
SubscribableChannel paymentRequestTopic();

#StreamListener("payment-request")
public void processNewPayment(Message<PaymentRequest> request) {
// do some processing and then send on…
paymentProcessorTopic().(message);
}


It is able to deserialise that Avro message from Kafka into a PaymentRequest POJO, do some extra processing on it, and send the message onwards to another topic, which is called paymentProcessor, which then gets picked up by another micro service, called PaymentProcessor, which has another StreamListener consumer:



#Input(“payment-processor”)
SubscribableChannel paymentProcessorTopic();


#StreamListener("payment-processor”)
public void processNewPayment(Message<PaymentRequest> request) {
// do some processing and action request…
}


If we wish to update the PaymentRequest class in the model-common library, so that it has a new field:

#Data
class PaymentRequest {
String paymentId;
String customerId;
String processorCommand;
}


if we update the dependency version in each of the micro service, the value of that new field get deserialised into the field when the message is read, and reserialised into the message when it gets sent on to the next topic, each time.


However, if we do not update the version of model-common library in the second service in the chain. BrokerService, for example, it will deserialise the message into a version of the class without that new field, and so when the message is reserialised into a message sent on to the payment-processor topic the Avro message will not have the data for that field.
The third micro service, PaymentProcessor, might have the version of the model-common lib that does contain the new field, but when the message is deserialised into the POJO the value for that field will be null.

I know Avro has features for schema evolution where default values can be assigned for new fields to allow for backwards and forwards compatibility, but that is not sufficient for us here, we need the real values. And ideally we do not want a situation where we would have to update the dependency version of the model library in every micro service because that introduces a lot of work and coupling between services. Often a new field is not needed by the services midway along the chain, and only might be relevant in the first service and the final one for example.


So has anyone else faced this issue and thought of a good way round it? We are keen to not lose the power of Avro and the convenience of Spring Cloud Stream, but not have such dependency issues. Anything around custom serializers/deserializers we could try? Or using GenericRecords? Or an entirely different approach?


Thanks for any help in advance!


Kafka ktable corrupt message handling

We are using Ktabke for aggregation in kafka, its very basic use and have referred from kafka doc.
I am just trying to investigate that if some message consumption fails while aggregating how we can move such message to error topic or dlq.
I found something similar for KStream but not able to find for KTable and i was not able to simply extends KStream solution to ktable.
Reference for KStream
Handling bad messages using Kafka's Streams API
My use case is very simple for any kind of exception just move to error topic and move on to different message
There is no built in support for what you ask atm (Kafka 2.2), but you need to make sure that your application code does not throw any exceptions. All provided handlers that can be configured, are for exceptions thrown by Kafka Streams runtime. Those handlers are providing, because otherwise the user has no chance at all to react to those exception.
Feel free to create feature request Jira.

KafkaAvroSerializer with multiple avro registry urls

we have a KafkaAvroSerde configured with multiple avroregistry url. At some point, the serde got a timeout while trying to register a schema on 1 url, but since it threw an IO exception up to the stream app, the stream thread closed. From a kafka stream app perspective, this kinds of defies the purpose of having the ability to support multiple urls when creating the avro serdes, since the runtime exception bubbling up the DSL api stack will close the Stream Thread.
couple of questions:
Is there a good way to handle this?
Do we need to enforce a retry in the app logic (which can be tricky when you simply materialize a topic into a store)?
Otherwise is there an avroserde wrapper that
could retry with the actual configure avroRegistry urls?
When materializing into a local rocksDB store, is there an added
value to register the schema in the registry or should we configure auto.register.schemas to false?
>
Exception in thread "mediafirst-npvr-adapter-program-mapping-mtrl02nsbe02.pf.spop.ca-f5e097bd-ff1b-42da-9f7d-2ab9fa5d2b70-GlobalStreamThread" org.apache.kafka.common.errors.SerializationException: Error registering Avro schema: {"type":"record","name":"ProgramMapp
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Register operation timed out; error code: 50002; error code: 50002
at io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:191)
at io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:218)
at io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:307)
at io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:299)
at io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:294)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.registerAndGetId(CachedSchemaRegistryClient.java:61)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.register(CachedSchemaRegistryClient.java:100)
at io.confluent.kafka.serializers.AbstractKafkaAvroSerializer.serializeImpl(AbstractKafkaAvroSerializer.java:79)
at io.confluent.kafka.serializers.KafkaAvroSerializer.serialize(KafkaAvroSerializer.java:53)
at io.confluent.kafka.streams.serdes.avro.SpecificAvroSerializer.serialize(SpecificAvroSerializer.java:65)
at io.confluent.kafka.streams.serdes.avro.SpecificAvroSerializer.serialize(SpecificAvroSerializer.java:38)
at org.apache.kafka.streams.state.StateSerdes.rawValue(StateSerdes.java:178)
at org.apache.kafka.streams.state.internals.MeteredKeyValueBytesStore$1.innerValue(MeteredKeyValueBytesStore.java:68)
at org.apache.kafka.streams.state.internals.MeteredKeyValueBytesStore$1.innerValue(MeteredKeyValueBytesStore.java:57)
at org.apache.kafka.streams.state.internals.InnerMeteredKeyValueStore.put(InnerMeteredKeyValueStore.java:199)
at org.apache.kafka.streams.state.internals.MeteredKeyValueBytesStore.put(MeteredKeyValueBytesStore.java:121)
at com.bell.cts.commons.kafka.store.custom.CustomStoreProcessor.process(CustomStoreProcessor.java:37)
at org.apache.kafka.streams.processor.internals.ProcessorNode$1.run(ProcessorNode.java:46)
at org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:208)
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:124)
at org.apache.kafka.streams.processor.internals.GlobalProcessorContextImpl.forward(GlobalProcessorContextImpl.java:52)
at org.apache.kafka.streams.processor.internals.SourceNode.process(SourceNode.java:80)
at org.apache.kafka.streams.processor.internals.GlobalStateUpdateTask.update(GlobalStateUpdateTask.java:87)
at org.apache.kafka.streams.processor.internals.GlobalStreamThread$StateConsumer.pollAndUpdate(GlobalStreamThread.java:239)
at org.apache.kafka.streams.processor.internals.GlobalStreamThread.run(GlobalStreamThread.java:282)
From a kafka stream app perspective, this kinds of defies the purpose of having the ability to support multiple urls when creating the avro serdes, since the runtime exception bubbling up the DSL api stack will close the Stream Thread.
I disagree here: from a Kafka Streams perspective, serialization failed and thus the application does need to shut down. Note that Kafka Streams is agnostic to the Serdes you are using, and thus, does not know that your Serde is talking to a schema registry and that it could retry.
Thus, the Serde is responsible to handle retrying internally. I am not aware of a wrapper that does this, but it should not be too hard to build yourself. I'll create an internal ticket to track this feature request. I think it makes a lot of sense to add this for the out-of-the-box experience.
For RocksDB: all records that are written into RocksDB are also written into a changelog topic. Thus, to allow Kafka Streams to read this data to recover state after an error, you need to register the schemas.