How to make Spring Cloud Stream consumer in Webflux application? - apache-kafka

I have a Webflux based microservice that has a simple reactive repository:
public interface NotificationRepository extends ReactiveMongoRepository<Notification, ObjectId> {
}
Now I would like to extend this microservice to consume event messages from Kafka. This message/event will be then saved into the database.
For the Kafka listener, I used Spring Cloud Stream. I created some simple Consumer and it works great - I'm able to consume the message and save it into the database.
#Bean
public Consumer<KStream<String, Event>> documents(NotificationRepository repository) {
return input ->
input.foreach((key, value) -> {
LOG.info("Received event, Key: {}, value: {}", key, value);
repository.save(initNotification(value)).subscribe();
});
}
But is this the correct way to connect Spring Cloud Stream consumer and reactive repository? It doesn't look like it is when I have to call subscribe() in the end.
I read the Spring Cloud Stream documentation (for 3.0.0 release) and they say
Native support for reactive programming - since v3.0.0 we no longer distribute spring-cloud-stream-reactive modules and instead relying on native reactive support provided by spring cloud function. For backward compatibility you can still bring spring-cloud-stream-reactive from previous versions.
and also in this presentation video they mention they have reactive programming support using project reactor. So I guess there is a way I just don't know it. Can you show me how to do it right?
I apologize if this all sounds too stupid but I'm very new to Spring Cloud Stream and reactive programming and haven't found many articles describing this.

Just use Flux as consumed type, something like this:
#Bean
public Consumer<Flux<Message<Event>>> documents(NotificationRepository repository) {
return input ->
input
.map(message-> /*map the necessary value like:*/ message.getPayload().getEventValue())
.concatMap((value) -> repository.save(initNotification(value)))
.subscribe();
}
If you use Function with empty return type (Function<Flux<Message<Event>>, Mono<Void>>) instead of a Consumer, then framework can automatically subscribe. With Consumer you have to subscribe manually, because the framework has no reference to the stream. But in Consumer case you subscribe not to the repository but the whole stream which is ok.

Related

Spring cloud stream, kafka binder - seek on demand

I use spring cloud stream with kafka. I have a topic X, with partition Y and consumer group Z. Spring boot starter parent 2.7.2, spring kafka version 2.8.8:
#StreamListener("input-channel-name")
public void processMessage(final DomainObject domainObject) {
// some processing
}
It works fine.
I would like to have an endpoint in the app, that allows me to re-read/re-process (seek right?) all the messages in X.Y (again). But not after rebalancing (ConsumerSeekAware#onPartitionsAssigned) or after app restart (KafkaConsumerProperties#resetOffsets) but on demand like this:
#RestController
#Slf4j
#RequiredArgsConstructor
public class SeekController {
#GetMapping
public void seekToBeginningForDomainObject() {
/**
* seekToBeginning for X, Y, input-channel-name
*/
}
}
I just can't achieve that. Is it even possible ?. I understand that I have to do that on the consumer level, probably the one that is created after #StreamListener("input-channel-name") subscription, right ? but I've no clue how to obtain that consumer. How can I execute seek on demand to make kafka send the messages to the consumer again ? I just want to reset the offset for X.Y.Z to 0 to just make the app, load and process all the messages again.
https://docs.spring.io/spring-cloud-stream/docs/current/reference/html/spring-cloud-stream-binder-kafka.html#rebalance-listener
KafkaBindingRebalanceListener.onPartitionsAssigned() provides a boolean to indicate whether this is an initial assignment Vs. a rebalance assignment.
Spring cloud stream does not currently support arbitrary seeks at runtime, even though the underlying KafkaMessageDrivenChannelAdapter does support getting access to a ConsumerSeekCallback (which allows arbitrary seeks between polls). It would need an enhancement to the binder to allow access to this code.
It is possible, though, to consume idle container events in an event listener; the event contains the consumer, so you could do arbitrary seeks under those conditions.

Spring Cloud 3.1 documentation on how to use KTable in a Spring boot app

Im struggling to find any documentation on where i can use Spring Cloud Streams that takes a Kafka topic and puts it into a KTable.
Having looked for documentation for example on here https://cloud.spring.io/spring-cloud-static/spring-cloud-stream-binder-kafka/3.0.0.RC1/reference/html/spring-cloud-stream-binder-kafka.html#_materializing_ktable_as_a_state_store nothing is very concrete on the way to do this in Spring boot using annotations.
I was hoping i could just create a simple KTable using a KStream where in my application.properties i have this:
spring.cloud.stream.bindings.process-in-0.destination: my-topic
Then in my Configuration I was hoping i could do something like this
#Bean
public Consumer<KStream<String, String>> process() {
return input -> input.toTable(Materialized.as("my-store"))
}
Please advise what im missing?
If all you want to do is to consume data from Kafka topic as KTable, then you can do this as below.
#Bean
public Consumer<KTable<String, String>> process() {
return input -> {
};
}
If you want to materialize the table into a named store, then you can add this to the configuration.
spring.cloud.stream.kafka.streams.bindings.process_in_0.consumer.materializedAs: my-store
You could also do what you had in the question, i.e. receive it as a KStream and then convert to KTable. However, if that is all you need to do, you might rather receive it as KTable in the first place as I suggest here.

How does SpringCloudStream Kafka consume in bulk?

Native Kafka can be used KafkaConsumer.poll () to obtain messages in batches. How to configure it after integrating spring cloud starter stream Kafka (3.0.10. Release)? I didn't understand the official documents
Hope to give an example, thank you
See https://docs.spring.io/spring-cloud-stream/docs/3.0.10.RELEASE/reference/html/spring-cloud-stream.html#_batch_consumers
Set the batch-mode consumer binding property and use
#Bean
Consumer<List<Foo>> input() {
return list -> {
System.out.println(list);
};
}

Design question around Spring Cloud Stream, Avro, Kafka and losing data along the way

We have implemented a system consisting of several Spring Boot microservices that communicate via messages posted to Kafka topics. We are using Spring Cloud Stream to handle a lot of the heavy lifting of sending and receiving messages via Kafka. We are using Apache Avro as a transport protocol, integrating with a Schema Server (Spring Cloud Stream default implementation for local development, and Confluent for production).


We model our message classes in a common library, that every micro service includes as a dependency. We use ‘dynamic schema generation’ to infer the Avro schema from the shape of our message classes before the Avro serialisation occurs when a microservice acts as a producer and sends a message. The consuming micro service can look up the schema from the registry based on the schema version and deserialise into the message class, which it also has as a dependency.


It works well, however there is one big drawback for us that I wonder if anyone has experienced before and could offer any advice. If we wish to add a new field for example to one of the model classes, we do it in the common model class library and update the of that dependency in the micro service. But it means that we need to update the version of that dependency in every micro service along the chain, even if the in-between micro services do not need that new field. Otherwise, the data value of that new field will be lost along the way, because of the way the micro service consumers deserialise into an object (which might be an out of date version of the class) along the way.

To give an example, lets say we have a model class in our model-common library called PaymentRequest (the #Data annotation is Lombok and juts generates getters and setters from the fields):
#Data
class PaymentRequest {
String paymentId;
String customerId;
}


And we have a micro service called PayService which sends a PaymentRequest message onto Kafka topic:


#Output("payment-broker”)
MessageChannel paymentBrokerTopic();
...

PaymentRequest paymentRequest = getPaymentRequest();

Message<PaymentRequest> message = MessageBuilder.withPayload(paymentRequest).build();
paymentBrokerTopic().(message);

And we have this config in application.yaml in our Spring Boot application:


spring:
cloud:
stream:
schema-registry-client:
endpoint: http://localhost:8071
schema:
avro:
dynamicSchemaGenerationEnabled: true
 bindings:
Payment-broker:
destination: paymentBroker
contentType: application/*+avro

Spring Cloud Stream’s Avro MessageConverter infers the schema from the PaymentRequest object, adds a schema to the schema registry if there is not already a matching one there, and sends the message on Kafka in Avro format.

Then we have a consumer in another micro service, BrokerService, which has this consumer:


#Output("payment-processor”)
MessageChannel paymentProcessorTopic();


#Input(“payment-request”)
SubscribableChannel paymentRequestTopic();

#StreamListener("payment-request")
public void processNewPayment(Message<PaymentRequest> request) {
// do some processing and then send on…
paymentProcessorTopic().(message);
}


It is able to deserialise that Avro message from Kafka into a PaymentRequest POJO, do some extra processing on it, and send the message onwards to another topic, which is called paymentProcessor, which then gets picked up by another micro service, called PaymentProcessor, which has another StreamListener consumer:



#Input(“payment-processor”)
SubscribableChannel paymentProcessorTopic();


#StreamListener("payment-processor”)
public void processNewPayment(Message<PaymentRequest> request) {
// do some processing and action request…
}


If we wish to update the PaymentRequest class in the model-common library, so that it has a new field:

#Data
class PaymentRequest {
String paymentId;
String customerId;
String processorCommand;
}


if we update the dependency version in each of the micro service, the value of that new field get deserialised into the field when the message is read, and reserialised into the message when it gets sent on to the next topic, each time.


However, if we do not update the version of model-common library in the second service in the chain. BrokerService, for example, it will deserialise the message into a version of the class without that new field, and so when the message is reserialised into a message sent on to the payment-processor topic the Avro message will not have the data for that field.
The third micro service, PaymentProcessor, might have the version of the model-common lib that does contain the new field, but when the message is deserialised into the POJO the value for that field will be null.

I know Avro has features for schema evolution where default values can be assigned for new fields to allow for backwards and forwards compatibility, but that is not sufficient for us here, we need the real values. And ideally we do not want a situation where we would have to update the dependency version of the model library in every micro service because that introduces a lot of work and coupling between services. Often a new field is not needed by the services midway along the chain, and only might be relevant in the first service and the final one for example.


So has anyone else faced this issue and thought of a good way round it? We are keen to not lose the power of Avro and the convenience of Spring Cloud Stream, but not have such dependency issues. Anything around custom serializers/deserializers we could try? Or using GenericRecords? Or an entirely different approach?


Thanks for any help in advance!


How does Spring Kafka/Spring Cloud Stream guarantee the transactionality / atomicity involving a Database and Kafka?

Spring Kafka, and thus Spring Cloud Stream, allow us to create transactional Producers and Processors. We can see that functionality in action in one of the sample projects: https://github.com/spring-cloud/spring-cloud-stream-samples/tree/master/transaction-kafka-samples:
#Transactional
#StreamListener(Processor.INPUT)
#SendTo(Processor.OUTPUT)
public PersonEvent process(PersonEvent data) {
logger.info("Received event={}", data);
Person person = new Person();
person.setName(data.getName());
if(shouldFail.get()) {
shouldFail.set(false);
throw new RuntimeException("Simulated network error");
} else {
//We fail every other request as a test
shouldFail.set(true);
}
logger.info("Saving person={}", person);
Person savedPerson = repository.save(person);
PersonEvent event = new PersonEvent();
event.setName(savedPerson.getName());
event.setType("PersonSaved");
logger.info("Sent event={}", event);
return event;
}
In this excerpt, there's a read from a Kafka topic, a write in a database and another write to another Kafka topic, all of this transactionally.
What I wonder, and would like to have answered is how is that technically achieved and implemented.
Since the datasource and Kafka don't participate in a XA transaction (2 phase commit), how does the implementation guarantee that a local transaction can read from Kafka, commit to a database and write to Kafka all of this transactionally?
There is no guarantee, only within Kafka itself.
Spring provides transaction synchronization so the commits are close together but it is possible for the DB to commit and the Kafka does not. So you have to deal with the possibility of duplicates.
The correct way to do this, when using spring-kafka directly, is NOT with #Transactional but to use a ChainedKafkaTransactionManager in the listener container.
See Transaction Synchronization.
Also see Distributed transactions in Spring, with and without XA and the "Best Efforts 1PC pattern" for background.
However, with Stream, there is no support for the chained transaction manager, so the #Transactional is required (with the DB transaction manager). This will provide similar results to chained tx manager, with the DB committing first, just before Kafka.