Spring cloud stream routing on payload - apache-kafka

I want to use spring cloud stream for my microservice to handle event from kafka.
I read from one topic that can hold several JSON payloads (I have one topic since its all messages arrived to it are from the same subject).
I have different cloud function to handle according to the different payload.
How can I rout the incoming event to specific function based on property in its payload?
Say I have JSON message that can have the following properties:
{
"type":"A"
"content": xyz
}
So the input message can have a property A or B
Say I want to call some bean function when the type is A and another bean function when type is B

It is not clear from the question whether you are using the message channel-based Kafka binder or Kafka Streams binder. The comments above imply some reference to KStream. Assuming that you are using the message channel-based Kafka binder, you have the option of using the message routing feature in Spring Cloud Stream. The basic usage is explained in this section of the docs: https://docs.spring.io/spring-cloud-stream/docs/3.2.1/reference/html/spring-cloud-stream.html#_event_routing
You can provide a routing-expression which is a SpEL expression to pass the right property values.
If you want advanced routing capabilities beyond what can be expressed through a SpEL expression, you can also implement a custom MessageRoutingCallback. See this sample application for more details: https://github.com/spring-cloud/spring-cloud-stream-samples/tree/main/routing-samples/message-routing-callback

Related

Differentiate target consumer in MassTransit

There are few consumers listening on one Kafka topic. A message has a parameter by which it can be determined which consumer needs to consume it. What mechanism of MassTransit use to implement such a solution?
As explained in the interoperability documentation, MassTransit uses the messageType header to determine which message types are present in the serialized message body. If there are no message types, such as when the RawJson message deserializer is used, it will deliver the message to all registered consumers.
Now, with Kafka, the type itself is part of the TopicEndpoint configuration, so only that message type is dispatched to the endpoint. Depending upon the serialization (AVRO, Json, etc.) the experience depends upon whether or not the message types are available.
You could certainly write your own deserializer that uses that parameter to determine which message types are in the message and write your own deserializer to properly respond to TryGetMessage<T> with the supported types. The best example of that would be either the JsonConsumeContext, or the recently updated RawJsonConsumeContext that now supports transport headers for message identification.

How to Handle Deserialization Exception & Converting to New Schema with Spring Cloud Stream?

I am have trouble understanding how to properly handle a deserialization exception within Spring Cloud stream. Primarily because the framework implemented does not support headers and the DLQ is supposed to be a separate schema than the original message. So the process flow needs to be: consume message -> deserialization error -> DlqHandler -> serialize with NEW schema -> send to DLQ
The documentation linked below doesn't give a good idea on if that is even possible. I have seen quite a few examples of SeekToCurrentErrorHandler for Spring-Kafka but those to my knowledge are different implementations and do not match with how I could properly get the deserialization error and then have a section for custom code to serialize into a new format and move from there.
My main question is: Is capturing the deserialization exception and reserializing possible with spring cloud streams (kafka)?
Spring Cloud Documentation for DLQ
Yes, but not using the binding retry or DLQ properties.
Instead, add a ListenerContainerCustomizer bean and customize the binding's listener container with a SeekToCurrentErrorHandler configured for the retries you need and, probably, a subclass of the DeadLetterPublishingRecoverer using an appropriately configured KafkaTemplate and possibly overriding the createProducerRecord method.

Design question around Spring Cloud Stream, Avro, Kafka and losing data along the way

We have implemented a system consisting of several Spring Boot microservices that communicate via messages posted to Kafka topics. We are using Spring Cloud Stream to handle a lot of the heavy lifting of sending and receiving messages via Kafka. We are using Apache Avro as a transport protocol, integrating with a Schema Server (Spring Cloud Stream default implementation for local development, and Confluent for production).


We model our message classes in a common library, that every micro service includes as a dependency. We use ‘dynamic schema generation’ to infer the Avro schema from the shape of our message classes before the Avro serialisation occurs when a microservice acts as a producer and sends a message. The consuming micro service can look up the schema from the registry based on the schema version and deserialise into the message class, which it also has as a dependency.


It works well, however there is one big drawback for us that I wonder if anyone has experienced before and could offer any advice. If we wish to add a new field for example to one of the model classes, we do it in the common model class library and update the of that dependency in the micro service. But it means that we need to update the version of that dependency in every micro service along the chain, even if the in-between micro services do not need that new field. Otherwise, the data value of that new field will be lost along the way, because of the way the micro service consumers deserialise into an object (which might be an out of date version of the class) along the way.

To give an example, lets say we have a model class in our model-common library called PaymentRequest (the #Data annotation is Lombok and juts generates getters and setters from the fields):
#Data
class PaymentRequest {
String paymentId;
String customerId;
}


And we have a micro service called PayService which sends a PaymentRequest message onto Kafka topic:


#Output("payment-broker”)
MessageChannel paymentBrokerTopic();
...

PaymentRequest paymentRequest = getPaymentRequest();

Message<PaymentRequest> message = MessageBuilder.withPayload(paymentRequest).build();
paymentBrokerTopic().(message);

And we have this config in application.yaml in our Spring Boot application:


spring:
cloud:
stream:
schema-registry-client:
endpoint: http://localhost:8071
schema:
avro:
dynamicSchemaGenerationEnabled: true
 bindings:
Payment-broker:
destination: paymentBroker
contentType: application/*+avro

Spring Cloud Stream’s Avro MessageConverter infers the schema from the PaymentRequest object, adds a schema to the schema registry if there is not already a matching one there, and sends the message on Kafka in Avro format.

Then we have a consumer in another micro service, BrokerService, which has this consumer:


#Output("payment-processor”)
MessageChannel paymentProcessorTopic();


#Input(“payment-request”)
SubscribableChannel paymentRequestTopic();

#StreamListener("payment-request")
public void processNewPayment(Message<PaymentRequest> request) {
// do some processing and then send on…
paymentProcessorTopic().(message);
}


It is able to deserialise that Avro message from Kafka into a PaymentRequest POJO, do some extra processing on it, and send the message onwards to another topic, which is called paymentProcessor, which then gets picked up by another micro service, called PaymentProcessor, which has another StreamListener consumer:



#Input(“payment-processor”)
SubscribableChannel paymentProcessorTopic();


#StreamListener("payment-processor”)
public void processNewPayment(Message<PaymentRequest> request) {
// do some processing and action request…
}


If we wish to update the PaymentRequest class in the model-common library, so that it has a new field:

#Data
class PaymentRequest {
String paymentId;
String customerId;
String processorCommand;
}


if we update the dependency version in each of the micro service, the value of that new field get deserialised into the field when the message is read, and reserialised into the message when it gets sent on to the next topic, each time.


However, if we do not update the version of model-common library in the second service in the chain. BrokerService, for example, it will deserialise the message into a version of the class without that new field, and so when the message is reserialised into a message sent on to the payment-processor topic the Avro message will not have the data for that field.
The third micro service, PaymentProcessor, might have the version of the model-common lib that does contain the new field, but when the message is deserialised into the POJO the value for that field will be null.

I know Avro has features for schema evolution where default values can be assigned for new fields to allow for backwards and forwards compatibility, but that is not sufficient for us here, we need the real values. And ideally we do not want a situation where we would have to update the dependency version of the model library in every micro service because that introduces a lot of work and coupling between services. Often a new field is not needed by the services midway along the chain, and only might be relevant in the first service and the final one for example.


So has anyone else faced this issue and thought of a good way round it? We are keen to not lose the power of Avro and the convenience of Spring Cloud Stream, but not have such dependency issues. Anything around custom serializers/deserializers we could try? Or using GenericRecords? Or an entirely different approach?


Thanks for any help in advance!


Is Kafka message headers the right place to put event type name?

In scenario where multiple single domain event types are produced to single topic and only subset of event types are consumed by consumer i need a good way to read the event type before taking action.
I see 2 options:
Put event type (example "ORDER_PUBLISHED") into message body (payload) itself which would be like broker agnostic approach and have other advantages. But would involve parsing of every message just to know the event type.
Utilize Kafka message headers which would allow to consume messages without extra payload parsing.
The context is event-sourcing. Small commands, small payloads. There are no huge bodies to parse. Golang. All messages are protobufs. gRPC.
What is typical workflow in such scenario.
I tried to google on this topic, but didn't found much on Headers use-cases and good practices.
Would be great to hear when and how to use Kafka message headers and when not to use.
Clearly the same topic should be used for different event types that apply to the same entity/aggregate (reference). Example: BookingCreated, BookingConfirmed, BookingCancelled, etc. should all go to the same topic in order to (excuse the pun) guarantee ordering of delivery (in this case the booking ID is the message key).
When the consumer gets one of these events, it needs to identify the event type, parse the payload, and route to the processing logic accordingly. The event type is the piece of message metadata that allows this identification.
Thus, I think a custom Kafka message header is the best place to indicate the type of event. I'm not alone:
Felipe Dutra: "Kafka allow you to put meta-data as header of your message. So use it to put information about the message, version, type, a correlationId. If you have chain of events, you can also add the correlationId of opentracing"
This GE ERP system has a header labeled "event-type" to show "The type of the event that is published" to a kafka topic (e.g., "ProcessOrderEvent").
This other solution mentions that "A header 'event' with the event type is included in each message" in their Kafka integration.
Headers are new in Kafka. Also, as far as I've seen, Kafka books focus on the 17 thousand Kafka configuration options and Kafka topology. Unfortunately, we don't easily find much on how an event-driven architecture can be mapped with the proper semantics onto elements of the Kafka message broker.

How to implement HTTP endpoint in Micronaut to pause consumption of messages from Kafka?

let my describe the rationale behind my question:
We have a Micronaut-based application consuming messages from Kafka broker.
The consumed messages are processed and fed to another remote "downstream" application.
If this downstream application is going to restart purposely, it will take a while to get ready accepting further messages from our Micronaut-based application.
So we have the idea to send out Micronaut application a request to SUSPEND/PAUSE consumption of messages from Kafka (e.g. via HTTP to an appropriate endpoint).
The KafkaConsumer interface seems to have appropriate methods to achieve this goal like
public void pause​(java.util.Collection<TopicPartition> partitions)
public void resume​(java.util.Collection<TopicPartition> partitions)
But how to get a reference to the appropriate KafkaConsumer instance fed in to our HTTP endpoint?
We've tried to get it injected to the constructor of the HTTP endpoint/controller class, but this yields
Error instantiating bean of type [HttpController]
Message: Missing bean arguments for type: org.apache.kafka.clients.consumer.KafkaConsumer. Requires arguments: AbstractKafkaConsumerConfiguration consumerConfiguration
It's possible to get a reference of KafkaConsumer instance as method parameter with #Topic annotated receive methods as describes in the Micronaut Kafka documentation,
but this would result in storing this reference as instance variable, get it accessed by the HTTP endpoint, etc. pp. ... which sounds not very convincing:
You get a reference to the KafkaConsumer ONLY when receiving the next message! This might by appropriate for SUSPENDING/PAUSING, but not for RESUMING!
By the way, calling KafkaConsumer.resume(...) on a reference saved as instance variable yields
java.util.ConcurrentModificationException: KafkaConsumer is not safe for multi-threaded access
at org.apache.kafka.clients.consumer.KafkaConsumer.acquire(KafkaConsumer.java:2201)
at org.apache.kafka.clients.consumer.KafkaConsumer.acquireAndEnsureOpen(KafkaConsumer.java:2185)
at org.apache.kafka.clients.consumer.KafkaConsumer.resume(KafkaConsumer.java:1842)
[...]
I think the same holds true when implementing KafkaConsumerAware interface to store a reference of the freshly created KafkaConsumer instance.
So are there any ideas how to handle this in an appropriate way?
Thanks
Christian