Creating a Kafka aggregator and joining it with an event - apache-kafka

I am trying to create an aggregator wherein I listen for multiple records and consolidate them into one. After consolidation, I wait for a process event by joining a stream and aggregated application in listen() method. On arrival of the process event, some business logic is triggered. I have defined both aggregator and process listener in a single spring boot project.
#Bean
public Function<KStream<FormUUID, FormData>, KStream<UUID, Application>> process()
{
return formEvent -> formEvent.groupByKey()
.reduce((k, v) -> v)
.toStream()
.selectKey((k, v) -> k.getReferenceNo())
.groupByKey()
.aggregate(Application::new, (key, value, aggr) -> aggr.performAggregate(value),
Materialized.<UUID, Application, KeyValueStore<Bytes, byte[]>> as("appStore")
.withKeySerde(new JsonSerde<>(UUID.class))
.withValueSerde(new JsonSerde<>(Application.class)))
.toStream();
}
#Bean
public BiConsumer<KStream<String, ProcessEvent>, KTable<String, Application>> listen()
{
return (eventStream, appTable) ->
{
eventStream.join(appTable, (event, app) -> app)
.foreach((k, app) -> app.createQuote());
};
}
However, now I am facing SerializationException. The first part(aggregation) works fine however the join is failing with exception
java.lang.ClassCastException: com.xxxxx.datamapper.domain.FormData cannot be cast to com.xxxxx.datamapper.domain.Application
at org.apache.kafka.streams.kstream.internals.KStreamPeek$KStreamPeekProcessor.process(KStreamPeek.java:42) ~[kafka-streams-2.3.1.jar:?]
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:117) ~[kafka-streams-2.3.1.jar:?]
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:201) ~[kafka-streams-2.3.1.jar:?]
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:180) ~[kafka-streams-2.3.1.jar:?]
org.apache.kafka.streams.errors.ProcessorStateException: task [0_0] Failed to flush state store APPLICATION_TOPIC-STATE-STORE-0000000001
at org.apache.kafka.streams.processor.internals.ProcessorStateManager.flush(ProcessorStateManager.java:280) ~[kafka-streams-2.3.1.jar:?]
at org.apache.kafka.streams.processor.internals.AbstractTask.flushState(AbstractTask.java:204) ~[kafka-streams-2.3.1.jar:?]
at org.apache.kafka.streams.processor.internals.StreamTask.flushState(StreamTask.java:519) ~[kafka-streams-2.3.1.jar:?]
I think, the problem is in my application.yml. Since the "spring.json.key.default.type" property is set as FormUUID the same is being used for Application object present in listen method. I want to configure the type for remaining types UUID, Application and ProcessEvent in my application.yml. but not sure how to configure the mapping type for each consumer and producer defined.
spring.cloud:
function.definition: process;listen
stream:
kafka.streams:
bindings:
process-in-0.consumer.application-id: form-aggregator
listen-in-0.consumer.application-id: event-processor
listen-in-1.consumer.application-id: event-processor
binder.configuration:
default.key.serde: org.springframework.kafka.support.serializer.JsonSerde
default.value.serde: org.springframework.kafka.support.serializer.JsonSerde
spring.json.key.default.type: com.xxxx.datamapper.domain.FormUUID
spring.json.value.default.type: com.xxxx.datamapper.domain.FormData
commit.interval.ms: 1000
bindings:
process-in-0.destination: FORM_DATA_TOPIC
process-out-0.destination: APPLICATION_TOPIC
listen-in-0.destination: PROCESS_TOPIC
listen-in-1:
destination: APPLICATION_TOPIC
consumer:
useNativeDecoding: true

If you are using the latest Horsham versions of Spring Cloud Stream Kafka Streams binder, you do not need to set any explicit Serdes for inbound and outbound. However, you still need to provide them wherever the Kafka Streams API requires them, as in the case of your aggregate method call above. If you are facing this serialization error on the inbound of the second processor, I suggest trying to remove all Serdes from the configuration. You can simplify as it below (given that you are on the latest Horsham release). The binder will infer the correct Serdes to use on the inbound/outbound. One benefit of delegating this to the binder is that you don't need to provide any explicit key/value types through configuration because the binder will introspect for the types. Make sure your POJO types that you are using are JSON friendly. See if that works. If you are still having issues, please create a small sample application where we can reproduce the issue and we will take a look.
spring.cloud:
function.definition: process;listen
stream:
kafka.streams:
bindings:
process-in-0.consumer.application-id: form-aggregator
listen-in-0.consumer.application-id: event-processor
listen-in-1.consumer.application-id: event-processor
binder.configuration:
commit.interval.ms: 1000
bindings:
process-in-0.destination: FORM_DATA_TOPIC
process-out-0.destination: APPLICATION_TOPIC
listen-in-0.destination: PROCESS_TOPIC
listen-in-1.destination: APPLICATION_TOPIC

Related

Error handling in Spring Cloud Kafka Streams

I'm using Spring Cloud Stream with Kafka Streams. Let's say I have a processor which is a Function which converts a KStream of Strings to a KStream of CityProgrammes. It invokes an API to find the City by name and an other transformation which finds any events near that city.
Now the problem is that any error happens during the transformation, the whole application stops. I want to send that one particular message to a DLQ and move along. I've been reading for days and everyone suggests to handle errors within the called services but that is a nonesense in my opinion, plus I still need to return a KStream: how do I do that within a catch?
I also looked at UncaughtExeptionHandler but it is not aware of the message and only able to restart the processing which won't skip this invalid message.
This might sound like an A-B problem so the question rephrased: how do I maintain the flow in a KStream when an exception occurs and send the invalid item to the DLQ?
When it comes to the application-level errors you have, it is up to the application itself how the error is handled. Kafka Streams and the Spring Cloud Stream binder mainly support deserialization and serialization errors at the framework level. Although that is the case, I think your scenario can be handled. If you are using Kafka Client prior to 2.8, here is an SO answer I gave before on something similar: https://stackoverflow.com/a/66749750/2070861
If you are using Kafka/Streams 2.8, here is an idea that you can use. However, the code below should only be used as a starting point. Adjust it according to your use case. Read more on how branching works in Kafka Streams 2.8. The branching API is significantly refactored in 2.8 from the prior versions.
public Function<KStream<?, String>, KStream<?, Foo>> convert() {
Foo[] foo = new Foo[0];
return input -> {
final Map<String, ? extends KStream<?, String>> branches =
input.split(Named.as("foo-")).branch((key, value) -> {
try {
foo[0] = new Foo(); // your API call for CitiProgramme converion here, possibly.
return true;
}
catch (Exception e) {
Message<?> message = MessageBuilder.withPayload(value).build();
streamBridge.send("to-my-dlt", message);
return false;
}
}, Branched.as("bar"))
.defaultBranch();
final KStream<?, String> kStream = branches.get("foo-bar");
return kStream.map((key, value) -> new KeyValue<>("", foo[0]));
};
}
}
The default branch is ignored in this code because that only contains the records that threw exceptions. Those were handled by the catch statement above in which we send the records to a DLT programmatically. Finally, we get the good records and map them to a new KStream and send it through the outbound.

Cannot get custom store connected to a Transformer with Spring Cloud Stream Binder Kafka 3.x

Cannot get custom store connected to my Transformer in Spring Cloud Stream Binder Kafka 3.x (functional style) following examples from here.
I am defining a KeyValueStore as a #Bean with type StoreBuilder<KeyValueStore<String,Long>>:
#Bean
public StoreBuilder<KeyValueStore<String,Long>> myStore() {
return Stores.keyValueStoreBuilder(
Stores.persistentKeyValueStore("my-store"), Serdes.String(),
Serdes.Long());
}
#Bean
#DependsOn({"myStore"})
public MyTransformer myTransformer() {
return new MyTransformer("my-store");
}
In debugger I can see that the beans get initialised.
In my stream processor function then:
return myStream -> {
return myStream
.peek(..)
.transform(() -> myTransformer())
...
MyTransformer is declared as
public class MyTransformer implements Transformer<String, MyEvent, KeyValue<KeyValue<String,Long>, MyEvent>> {
...
#Override
public void init(final ProcessorContext context) {
this.context = context;
this.myStore = context.getStateStore(storeName);
}
Getting the following error when application context starts up from my unit test:
Caused by: org.apache.kafka.streams.errors.StreamsException: Processor KSTREAM-TRANSFORM-0000000002 has no access to StateStore my-store as the store is not connected to the processor. If you add stores manually via '.addStateStore()' make sure to connect the added store to the processor by providing the processor name to '.addStateStore()' or connect them via '.connectProcessorAndStateStores()'. DSL users need to provide the store name to '.process()', '.transform()', or '.transformValues()' to connect the store to the corresponding operator, or they can provide a StoreBuilder by implementing the stores() method on the Supplier itself. If you do not add stores manually, please file a bug report at https://issues.apache.org/jira/projects/KAFKA.
In the application startup logs when running my unit test, I can see that the store seems to get created:
2021-04-06 00:44:43.806 INFO [ main] .k.s.AbstractKafkaStreamsBinderProcessor : state store my-store added to topology
I'm already using pretty much every feature of the Spring Cloud Stream Binder Kafka in my app and from my unit test, everything works very well. Unexpectedly, I got stuck at adding the custom KeyValueStore to my Transformer. It would be great, if you could spot an error in my setup.
The versions I'm using right now:
org.springframework.boot:spring-boot:jar:2.4.4
org.springframework.kafka:spring-kafka:jar:2.6.7
org.springframework.kafka:spring-kafka-test:jar:2.6.7
org.springframework.cloud:spring-cloud-stream-binder-kafka-streams:jar:3.0.4.RELEASE
org.apache.kafka:kafka-streams:jar:2.7.0
I've just tried with
org.springframework.cloud:spring-cloud-stream-binder-kafka-streams:jar:3.1.3-SNAPSHOT
and the issue seems to persist.
In your processor function, when you call .transform(() -> myTransformer()), you also need to provide the state store names in order for this to be connected to that transformer. There are some overloaded transform methods in the KStream API that takes state store names as a vararg. I wonder if this is the issue that you are running into. You may want to change that call to .transform(() -> myTransformer(), "myStore").

Spring boot Kafka request-reply scenario

I am implementing POC of request/reply scenario in order to move event-based microservice stack with using Kafka.
There is 2 options in spring.
I wonder which one is better to use. ReplyingKafkaTemplate or cloud-stream
First is ReplyingKafkaTemplate which can be easily configured to have dedicated channel to reply topics for each instance.
record.headers().add(new RecordHeader(KafkaHeaders.REPLY_TOPIC, provider.getReplyChannelName().getBytes()));
Consumer should not need to know replying topic name, just listens a topic and returns with given reply topic.
#KafkaListener(topics = "${kafka.topic.concat-request}")
#SendTo
public ConcatReply listen(ConcatModel request) {
.....
}
Second option is using combination of StreamListener, spring-integration and IntegrationFlows. Gateway should be configured and reply topics should be filtered.
#MessagingGateway
public interface StreamGateway {
#Gateway(requestChannel = START, replyChannel = FILTER, replyTimeout = 5000, requestTimeout = 2000)
String process(String payload);
}
#Bean
public IntegrationFlow headerEnricherFlow() {
return IntegrationFlows.from(START)
.enrichHeaders(HeaderEnricherSpec::headerChannelsToString)
.enrichHeaders(headerEnricherSpec -> headerEnricherSpec.header(Channels.INSTANCE_ID ,instanceUUID))
.channel(Channels.REQUEST)
.get();
}
#Bean
public IntegrationFlow replyFiltererFlow() {
return IntegrationFlows.from(GatewayChannels.REPLY)
.filter(Message.class, message -> Channels.INSTANCE_ID.equals(message.getHeaders().get("instanceId")) )
.channel(FILTER)
.get();
}
Building reply
#StreamListener(Channels.REQUEST)
#SendTo(Channels.REPLY)
public Message<?> process(Message<String> request) {
Specifying reply channel is mandatory. So receieved reply topics are filtered according to instanceID which a kind of workaround (might bloat up the network). On the other hand, DLQ scenario is enabled with adding
consumer:
enableDlq: true
Using spring cloud streams looks promising in terms of interoperability with RabbitMQ and other features, but not officially supports request reply scenario right away. Issue is still open, not rejected also. (https://github.com/spring-cloud/spring-cloud-stream/issues/1800)
Any suggestions are welcomed.
Spring Cloud Stream is not designed for request/reply; it can be done, it is not straightforward and you have to write code.
With #KafkaListener the framework takes care of everything for you.
If you want it to work with RabbitMQ too, you can annotate it with #RabbitListener as well.

How to inject KafkaTemplate in Quarkus

I'm trying to inject a KafkaTemplate to send a single message. I'm developing a small function that lies outside the reactive approach.
I can only find examples that use #Ingoing and #Outgoing from Smallrye but I don't need a KafkaStream.
I tried with Kafka-CDI but I'm unable to inject the SimpleKafkaProducer.
Any ideas?
For Clement's answer
It seems the right direction, but executing orders.send("hello"); I receive this error:
(vert.x-eventloop-thread-3) Unhandled exception:java.lang.IllegalStateException: Stream not yet connected
I'm consuming from my topic by command line, Kafka is up and running, if I produce manually I can see the consumed messages.
It seems relative to this sentence by the doc:
To use an Emitter for the stream hello, you need a #Incoming("hello")
somewhere in your code (or in your configuration).
I have this code in my class:
#Incoming("orders")
public CompletionStage<Void> consume(KafkaMessage<String, String> msg) {
log.info("Received message (topic: {}, partition: {}) with key {}: {}", msg.getTopic(), msg.getPartition(), msg.getKey(), msg.getPayload());
return msg.ack();
}
Maybe I've forgotten some configurations?
So, you just need to use an Emitter:
#Inject
#Stream("orders") // Emit on the channel 'orders'
Emitter<String> orders;
// ...
orders.send("hello");
And in your application.properties, declare:
## Orders topic (WRITE)
mp.messaging.outgoing.orders.type=io.smallrye.reactive.messaging.kafka.Kafka
mp.messaging.outgoing.orders.topic=orders
mp.messaging.outgoing.orders.bootstrap.servers=localhost:9092
mp.messaging.outgoing.orders.key.serializer=org.apache.kafka.common.serialization.StringSerializer
mp.messaging.outgoing.orders.value.serializer=org.apache.kafka.common.serialization.StringSerializer
mp.messaging.outgoing.orders.acks=1
To avoid Stream not yet connected exception, as suggested by doc:
To use an Emitter for the stream hello, you need a #Incoming("hello")
somewhere in your code (or in your configuration).
Assuming you have something like this in your application.properties:
# Orders topic (READ)
smallrye.messaging.source.orders-r-topic.type=io.smallrye.reactive.messaging.kafka.Kafka
smallrye.messaging.source.orders-r-topic.topic=orders
smallrye.messaging.source.orders-r-topic.bootstrap.servers=0.0.0.0:9092
smallrye.messaging.source.orders-r-topic.key.deserializer=org.apache.kafka.common.serialization.StringDeserializer
smallrye.messaging.source.orders-r-topic.value.deserializer=org.apache.kafka.common.serialization.StringDeserializer
smallrye.messaging.source.orders-r-topic.group.id=my-group-id
Add something like this:
#Incoming("orders-r-topic")
public CompletionStage<Void> consume(KafkaMessage<String, String> msg) {
log.info("Received message (topic: {}, partition: {}) with key {}: {}", msg.getTopic(), msg.getPartition(), msg.getKey(), msg.getPayload());
return msg.ack();
}
Since Clement's answer the #Stream annotation has been deprecated. The #Channel annotation
must be used instead.
You can use an Emitter provided by the quarkus-smallrye-reactive-messaging-kafka dependency to produce message to a Kafka topic.
A simple Kafka producer implementation:
public class MyKafkaProducer {
#Inject
#Channel("my-topic")
Emitter<String> myEmitter;
public void produce(String message) {
myEmitter.send(message);
}
}
And the following configuration must be added to the application.properties file:
mp.messaging.outgoing.my-topic.connector=smallrye-kafka
mp.messaging.outgoing.my-topic.bootstrap.servers=localhost:9092
mp.messaging.outgoing.my-topic.value.serializer=org.apache.kafka.common.serialization.StringSerializer
This will produce string serialized messages to a kafka topic named my-topic.
Note that by default the name of the channel is also the name of the kafka topic in which the data will be produced. This behavior can be changed through the configuration. The supported configuration attributes are described in the reactive Messaging documentation

Kafka Producer : Handle Exception in Async Send with Callback

I need to catch the exceptions in case of Async send to Kafka. The Kafka producer Api comes with a fuction send(ProducerRecord record, Callback callback). But when I tested this against following two scenarios :
Kafka Broker Down
Topic not pre created
The callbacks are not getting called. Rather I am getting warning in the code for unsuccessful send (as shown below).
Questions :
So are the callbacks called only for specific exceptions ?
When does Kafka Client try to connect to Kafka broker while async send : on every batch send or periodically ?
Kafka Warning Image
Note : I am also using linger.ms setting of 25 sec to batch send my records.
public class ProducerDemo {
static KafkaProducer<String, String> producer;
public static void main(String[] args) throws IOException {
final Logger logger = LoggerFactory.getLogger(ProducerDemo.class);
Properties properties = new Properties();
properties.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "127.0.0.1:9092");
properties.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
properties.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
properties.setProperty(ProducerConfig.ACKS_CONFIG, "1");
properties.setProperty(ProducerConfig.LINGER_MS_CONFIG, "30000");
producer = new KafkaProducer<String, String>(properties);
String topic = "first_topic";
for (int i = 0; i < 5; i++) {
String value = "hello world " + Integer.toString(i);
String key = "id_" + Integer.toString(i);
ProducerRecord<String, String> record = new ProducerRecord<String, String>(topic, key, value);
producer.send(record, new Callback() {
public void onCompletion(RecordMetadata recordMetadata, Exception e) {
//execute everytime a record is successfully sent or exception is thrown
if(e == null){
// No Exception
}else{
//Exception Handling
}
}
});
}
producer.close();
}
You will get those warning for non-existing topic as a resilience mechanism provided with KafkaProducer. If you wait a bit longer(should be 60 seconds by default), the callback will be called eventually:
Here's my snippet:
So, when something goes wrong and async send is not successful, it will eventually fail with a failed future or/and a callback with exception.
If you are not running it transactionally, it can still mean that some messages from the batch have found their way to the broker, while others haven't.
It will most certainly be a problem if you need a blocking-style acknowledgement to the upstream system(like http ingestion interface, etc.) per every message that is sent to Kafka. The only way to do that is by blocking every message with the future's get, as described in the documentation:
In general, I've noticed a lot of question related to KafkaProducer delivery semantics and guarantees. It can definitely be documented better.
One more thing, since you mentioned linger.ms:
Note that records that arrive close together in time will generally
batch together even with linger.ms=0 so under heavy load batching will
occur regardless of the linger configuration
For the first question, here is the answer.
As per the apache kafka documentation, you can capture below exceptions using onCompletion method when you are implementing Callback interface
https://kafka.apache.org/25/javadoc/org/apache/kafka/clients/producer/Callback.html
For the second question, the combination of below properties control when to send the records and as far as i understand, it's same for synchronous or asynchronous call.
linger.ms
max.block.ms
https://kafka.apache.org/documentation/#linger.ms
So are the callbacks called only for specific exceptions ?
Yes, that's how it works. From documentation (2.5.0):
* Fully non-blocking usage can make use of the {#link Callback} parameter to provide a callback that
* will be invoked when the request is complete.
Notice the important part: when the request is complete, what means that the producer must have accepted the record and sent the ProduceRequest to Kafka Broker. Without digging too deep into internals, this means that broker metadata must be present and the partition must exist.
When it comes to formal specification, you'd need to take a good look at send()'s Javadoc and possibly at KafkaProducer's implementation of doSend method. Out there you're going to see that multiple exceptions can be thrown at the in submitting call (instead of returning a future and invoking callback), e.g. :
if broker metadata is not available in timeout given,
if data could not be serialized,
if serialized form was too large, etc.