How to implement HTTP endpoint in Micronaut to pause consumption of messages from Kafka? - apache-kafka

let my describe the rationale behind my question:
We have a Micronaut-based application consuming messages from Kafka broker.
The consumed messages are processed and fed to another remote "downstream" application.
If this downstream application is going to restart purposely, it will take a while to get ready accepting further messages from our Micronaut-based application.
So we have the idea to send out Micronaut application a request to SUSPEND/PAUSE consumption of messages from Kafka (e.g. via HTTP to an appropriate endpoint).
The KafkaConsumer interface seems to have appropriate methods to achieve this goal like
public void pause​(java.util.Collection<TopicPartition> partitions)
public void resume​(java.util.Collection<TopicPartition> partitions)
But how to get a reference to the appropriate KafkaConsumer instance fed in to our HTTP endpoint?
We've tried to get it injected to the constructor of the HTTP endpoint/controller class, but this yields
Error instantiating bean of type [HttpController]
Message: Missing bean arguments for type: org.apache.kafka.clients.consumer.KafkaConsumer. Requires arguments: AbstractKafkaConsumerConfiguration consumerConfiguration
It's possible to get a reference of KafkaConsumer instance as method parameter with #Topic annotated receive methods as describes in the Micronaut Kafka documentation,
but this would result in storing this reference as instance variable, get it accessed by the HTTP endpoint, etc. pp. ... which sounds not very convincing:
You get a reference to the KafkaConsumer ONLY when receiving the next message! This might by appropriate for SUSPENDING/PAUSING, but not for RESUMING!
By the way, calling KafkaConsumer.resume(...) on a reference saved as instance variable yields
java.util.ConcurrentModificationException: KafkaConsumer is not safe for multi-threaded access
at org.apache.kafka.clients.consumer.KafkaConsumer.acquire(KafkaConsumer.java:2201)
at org.apache.kafka.clients.consumer.KafkaConsumer.acquireAndEnsureOpen(KafkaConsumer.java:2185)
at org.apache.kafka.clients.consumer.KafkaConsumer.resume(KafkaConsumer.java:1842)
[...]
I think the same holds true when implementing KafkaConsumerAware interface to store a reference of the freshly created KafkaConsumer instance.
So are there any ideas how to handle this in an appropriate way?
Thanks
Christian

Related

Spring cloud stream routing on payload

I want to use spring cloud stream for my microservice to handle event from kafka.
I read from one topic that can hold several JSON payloads (I have one topic since its all messages arrived to it are from the same subject).
I have different cloud function to handle according to the different payload.
How can I rout the incoming event to specific function based on property in its payload?
Say I have JSON message that can have the following properties:
{
"type":"A"
"content": xyz
}
So the input message can have a property A or B
Say I want to call some bean function when the type is A and another bean function when type is B
It is not clear from the question whether you are using the message channel-based Kafka binder or Kafka Streams binder. The comments above imply some reference to KStream. Assuming that you are using the message channel-based Kafka binder, you have the option of using the message routing feature in Spring Cloud Stream. The basic usage is explained in this section of the docs: https://docs.spring.io/spring-cloud-stream/docs/3.2.1/reference/html/spring-cloud-stream.html#_event_routing
You can provide a routing-expression which is a SpEL expression to pass the right property values.
If you want advanced routing capabilities beyond what can be expressed through a SpEL expression, you can also implement a custom MessageRoutingCallback. See this sample application for more details: https://github.com/spring-cloud/spring-cloud-stream-samples/tree/main/routing-samples/message-routing-callback

Design question around Spring Cloud Stream, Avro, Kafka and losing data along the way

We have implemented a system consisting of several Spring Boot microservices that communicate via messages posted to Kafka topics. We are using Spring Cloud Stream to handle a lot of the heavy lifting of sending and receiving messages via Kafka. We are using Apache Avro as a transport protocol, integrating with a Schema Server (Spring Cloud Stream default implementation for local development, and Confluent for production).


We model our message classes in a common library, that every micro service includes as a dependency. We use ‘dynamic schema generation’ to infer the Avro schema from the shape of our message classes before the Avro serialisation occurs when a microservice acts as a producer and sends a message. The consuming micro service can look up the schema from the registry based on the schema version and deserialise into the message class, which it also has as a dependency.


It works well, however there is one big drawback for us that I wonder if anyone has experienced before and could offer any advice. If we wish to add a new field for example to one of the model classes, we do it in the common model class library and update the of that dependency in the micro service. But it means that we need to update the version of that dependency in every micro service along the chain, even if the in-between micro services do not need that new field. Otherwise, the data value of that new field will be lost along the way, because of the way the micro service consumers deserialise into an object (which might be an out of date version of the class) along the way.

To give an example, lets say we have a model class in our model-common library called PaymentRequest (the #Data annotation is Lombok and juts generates getters and setters from the fields):
#Data
class PaymentRequest {
String paymentId;
String customerId;
}


And we have a micro service called PayService which sends a PaymentRequest message onto Kafka topic:


#Output("payment-broker”)
MessageChannel paymentBrokerTopic();
...

PaymentRequest paymentRequest = getPaymentRequest();

Message<PaymentRequest> message = MessageBuilder.withPayload(paymentRequest).build();
paymentBrokerTopic().(message);

And we have this config in application.yaml in our Spring Boot application:


spring:
cloud:
stream:
schema-registry-client:
endpoint: http://localhost:8071
schema:
avro:
dynamicSchemaGenerationEnabled: true
 bindings:
Payment-broker:
destination: paymentBroker
contentType: application/*+avro

Spring Cloud Stream’s Avro MessageConverter infers the schema from the PaymentRequest object, adds a schema to the schema registry if there is not already a matching one there, and sends the message on Kafka in Avro format.

Then we have a consumer in another micro service, BrokerService, which has this consumer:


#Output("payment-processor”)
MessageChannel paymentProcessorTopic();


#Input(“payment-request”)
SubscribableChannel paymentRequestTopic();

#StreamListener("payment-request")
public void processNewPayment(Message<PaymentRequest> request) {
// do some processing and then send on…
paymentProcessorTopic().(message);
}


It is able to deserialise that Avro message from Kafka into a PaymentRequest POJO, do some extra processing on it, and send the message onwards to another topic, which is called paymentProcessor, which then gets picked up by another micro service, called PaymentProcessor, which has another StreamListener consumer:



#Input(“payment-processor”)
SubscribableChannel paymentProcessorTopic();


#StreamListener("payment-processor”)
public void processNewPayment(Message<PaymentRequest> request) {
// do some processing and action request…
}


If we wish to update the PaymentRequest class in the model-common library, so that it has a new field:

#Data
class PaymentRequest {
String paymentId;
String customerId;
String processorCommand;
}


if we update the dependency version in each of the micro service, the value of that new field get deserialised into the field when the message is read, and reserialised into the message when it gets sent on to the next topic, each time.


However, if we do not update the version of model-common library in the second service in the chain. BrokerService, for example, it will deserialise the message into a version of the class without that new field, and so when the message is reserialised into a message sent on to the payment-processor topic the Avro message will not have the data for that field.
The third micro service, PaymentProcessor, might have the version of the model-common lib that does contain the new field, but when the message is deserialised into the POJO the value for that field will be null.

I know Avro has features for schema evolution where default values can be assigned for new fields to allow for backwards and forwards compatibility, but that is not sufficient for us here, we need the real values. And ideally we do not want a situation where we would have to update the dependency version of the model library in every micro service because that introduces a lot of work and coupling between services. Often a new field is not needed by the services midway along the chain, and only might be relevant in the first service and the final one for example.


So has anyone else faced this issue and thought of a good way round it? We are keen to not lose the power of Avro and the convenience of Spring Cloud Stream, but not have such dependency issues. Anything around custom serializers/deserializers we could try? Or using GenericRecords? Or an entirely different approach?


Thanks for any help in advance!


KafkaRestProxy multiple instances issue

I'm having an architecture of microservices where each service's producer write to the same topic. I have two instance of kafkaRestproxy each listen to that topic but the problem here is that :
Suppose a request come to instance-1 of a restproxy and it will redirect to the microservice and that service done with the job and write the response to the topic but the response is consumed by the second instance of the restproxy let say instance-2.
What should I do to solve this? Is their any kind of application_id we can attach to the request so when that microservice done with the job and if another instance of restproxy consumed that response then we can redirect the response to that instance of restproxy which gets that request?
Your proxies form a Kafka Consumer group, just as any other application. When you request records, you give both the consumer group and the consumer instance name (such as a host of the HTTP client) GET /consumers/(string:group_name)/instances/(string:instance)/records
You should generally not try to strictly control which consumers get which information beyond assigning a unique instance to each request, to allow for parallel consumption (assuming this is what you want).
Also, the rest proxy isn't consuming anything unless you have another application that's requesting that information, e.g. the GET request above.

How to configure Kafka RPC caller topic and group

I'm trying to implement an RPC architecture using Kafka as a message broker. The decision of using Kafka instead of another message broker solution is dictated by the current context.
The actual implementation consists on two different types of service:
The receiver: this service receives messages from a Kafka topic which consumes, processes the messages and then publish the response message to a response topic;
The caller: this service receives HTTP requests, then publish messages to the receiver topic, consumes the response topic of the receiver service for the response message, then returns it as an HTTP response.
The request/response messages published in the topics are related by the message key.
The receiver implementation was fairly simple: at startup, it creates the "request" and "response" topic, then starts consuming the request topic with the service group id (many instances of the receiver will share the same group id in order to implement a proper request balance). When a request arrives, the service processes the request and then publish the response in the response topic.
My problem is with the caller implementation, in particular while consuming the response from the response queue.
With the following assumptions:
The HTTP requests must be managed concurrently;
There could be more than one instance of this caller service.
every single thread/service must receive all the messages in the response topic, in order to find the message with the corresponding request key.
As an example, imagine that two receiver services produce two messages with keys 1 and 2 respectively. These messages will be published in the receiver topic, and processed. The response will then be published in the topic receiver-responses. If the two receiver services share the same group-id, it could be that response 1 arrives to the service that published message 2 and vice versa, resulting in a HTTP timeout.
To avoid this problem, I've managed to think these possible solutions:
Creating a new group for every request (EDIT: but a group cannot be deleted via code, hence it would be necessary another service to clean the zookeeper from these groups);
Creating a new topic for every request, then delete it afterwards.
Hoping that I made myself sufficiently clear - I must admit I am a beginner to Kafka - my question would be:
Which solution is more costly than the other? Or is there another topic/group configuration that could achieve the assumption 3?
Thanks.
I think I've found a possible solution. A group will be automatically deleted by the zookeeper when it's offset doesn't update for a period of time, determined by the configuration offsets.topic.retention.minutes.
The offset update time check should be possible to set up by setting the configuration offsets.retention.check.interval.ms.
This way, when a consumer connects to the response topic searching for the reply message, the created group can be abandoned, and it will be deleted by the zookeeper later in time.

Spring KafkaEmbedded - problem consuming messages

I have problem using KafkaEmbedded from https://mvnrepository.com/artifact/org.springframework.kafka/spring-kafka-test/2.1.10.RELEASE
I'm using KafkaEmbedded to create Kafka broker for testing producer/consumer pipelines. These producers/consumers are standard clients from kafka-clients. I'm not using Spring Kafka clients.
Everything is working, the code works fine, but I have to use consumeFromEmbeddedTopics() method from KafkaEmbedded to make consumer works. If I won't use this method, the consumer does not get any messages.
There are two problems I have with this method: first, it needs KafkaConsumer as a parameter(and I don't want to expose it in class) and invoking this method gives ConcurrentModificationException when an object invokes poll using #Scheduled.
I'm using auto.offset.reset property so it's a different thing.
My question is: how to correctly consume records from KafkaEmbedded without invoking these consumeFromEmbeddedTopics() methods?
There is nothing special about that method, it simply subscribes the consumer to the topic(s) and polls it.
There is no reason you can't do the same with your Consumer.