Cannot get custom store connected to a Transformer with Spring Cloud Stream Binder Kafka 3.x - apache-kafka

Cannot get custom store connected to my Transformer in Spring Cloud Stream Binder Kafka 3.x (functional style) following examples from here.
I am defining a KeyValueStore as a #Bean with type StoreBuilder<KeyValueStore<String,Long>>:
#Bean
public StoreBuilder<KeyValueStore<String,Long>> myStore() {
return Stores.keyValueStoreBuilder(
Stores.persistentKeyValueStore("my-store"), Serdes.String(),
Serdes.Long());
}
#Bean
#DependsOn({"myStore"})
public MyTransformer myTransformer() {
return new MyTransformer("my-store");
}
In debugger I can see that the beans get initialised.
In my stream processor function then:
return myStream -> {
return myStream
.peek(..)
.transform(() -> myTransformer())
...
MyTransformer is declared as
public class MyTransformer implements Transformer<String, MyEvent, KeyValue<KeyValue<String,Long>, MyEvent>> {
...
#Override
public void init(final ProcessorContext context) {
this.context = context;
this.myStore = context.getStateStore(storeName);
}
Getting the following error when application context starts up from my unit test:
Caused by: org.apache.kafka.streams.errors.StreamsException: Processor KSTREAM-TRANSFORM-0000000002 has no access to StateStore my-store as the store is not connected to the processor. If you add stores manually via '.addStateStore()' make sure to connect the added store to the processor by providing the processor name to '.addStateStore()' or connect them via '.connectProcessorAndStateStores()'. DSL users need to provide the store name to '.process()', '.transform()', or '.transformValues()' to connect the store to the corresponding operator, or they can provide a StoreBuilder by implementing the stores() method on the Supplier itself. If you do not add stores manually, please file a bug report at https://issues.apache.org/jira/projects/KAFKA.
In the application startup logs when running my unit test, I can see that the store seems to get created:
2021-04-06 00:44:43.806 INFO [ main] .k.s.AbstractKafkaStreamsBinderProcessor : state store my-store added to topology
I'm already using pretty much every feature of the Spring Cloud Stream Binder Kafka in my app and from my unit test, everything works very well. Unexpectedly, I got stuck at adding the custom KeyValueStore to my Transformer. It would be great, if you could spot an error in my setup.
The versions I'm using right now:
org.springframework.boot:spring-boot:jar:2.4.4
org.springframework.kafka:spring-kafka:jar:2.6.7
org.springframework.kafka:spring-kafka-test:jar:2.6.7
org.springframework.cloud:spring-cloud-stream-binder-kafka-streams:jar:3.0.4.RELEASE
org.apache.kafka:kafka-streams:jar:2.7.0
I've just tried with
org.springframework.cloud:spring-cloud-stream-binder-kafka-streams:jar:3.1.3-SNAPSHOT
and the issue seems to persist.

In your processor function, when you call .transform(() -> myTransformer()), you also need to provide the state store names in order for this to be connected to that transformer. There are some overloaded transform methods in the KStream API that takes state store names as a vararg. I wonder if this is the issue that you are running into. You may want to change that call to .transform(() -> myTransformer(), "myStore").

Related

Error handling in Spring Cloud Kafka Streams

I'm using Spring Cloud Stream with Kafka Streams. Let's say I have a processor which is a Function which converts a KStream of Strings to a KStream of CityProgrammes. It invokes an API to find the City by name and an other transformation which finds any events near that city.
Now the problem is that any error happens during the transformation, the whole application stops. I want to send that one particular message to a DLQ and move along. I've been reading for days and everyone suggests to handle errors within the called services but that is a nonesense in my opinion, plus I still need to return a KStream: how do I do that within a catch?
I also looked at UncaughtExeptionHandler but it is not aware of the message and only able to restart the processing which won't skip this invalid message.
This might sound like an A-B problem so the question rephrased: how do I maintain the flow in a KStream when an exception occurs and send the invalid item to the DLQ?
When it comes to the application-level errors you have, it is up to the application itself how the error is handled. Kafka Streams and the Spring Cloud Stream binder mainly support deserialization and serialization errors at the framework level. Although that is the case, I think your scenario can be handled. If you are using Kafka Client prior to 2.8, here is an SO answer I gave before on something similar: https://stackoverflow.com/a/66749750/2070861
If you are using Kafka/Streams 2.8, here is an idea that you can use. However, the code below should only be used as a starting point. Adjust it according to your use case. Read more on how branching works in Kafka Streams 2.8. The branching API is significantly refactored in 2.8 from the prior versions.
public Function<KStream<?, String>, KStream<?, Foo>> convert() {
Foo[] foo = new Foo[0];
return input -> {
final Map<String, ? extends KStream<?, String>> branches =
input.split(Named.as("foo-")).branch((key, value) -> {
try {
foo[0] = new Foo(); // your API call for CitiProgramme converion here, possibly.
return true;
}
catch (Exception e) {
Message<?> message = MessageBuilder.withPayload(value).build();
streamBridge.send("to-my-dlt", message);
return false;
}
}, Branched.as("bar"))
.defaultBranch();
final KStream<?, String> kStream = branches.get("foo-bar");
return kStream.map((key, value) -> new KeyValue<>("", foo[0]));
};
}
}
The default branch is ignored in this code because that only contains the records that threw exceptions. Those were handled by the catch statement above in which we send the records to a DLT programmatically. Finally, we get the good records and map them to a new KStream and send it through the outbound.

Migrate Spring cloud stream listener (kafka) from declarative to functional model

I'm trying to migrate an implementation of spring cloud streams (kafka) declarative way to the recommended functional model
In this blog post they say :
...a functional programming model in Spring Cloud Stream (SCSt). It’s
less code, less configuration. Most importantly, though, your code is
completely decoupled and independent from the internals of SCSt
My current implementation:
Declaring the MessageChanel
#Input(PRODUCT_INPUT_TOPIC)
MessageChannel productInputChannel();
Using #StreamListener which is deprecated now
#StreamListener(StreamConfig.PRODUCT_INPUT_TOPIC)
public void addProduct(#Payload Product product, #Header Long header1, #Header String header2)
Here it is
#Bean
public Consumer<Product> addProduct() {
return product -> {
// your code
};
}
I am not sure what is the value of PRODUCT_INPUT_TOPIC, but let's assume input.
So the s-c-stream will automatically create a binding for you with name addProduct-in-0. Here are the details. You can use it as is, but if you still want to use the custom name, you can use spring.cloud.stream.function.bindings.addProduct-in-0=input. - see more here.
If you need access to headers, you can just pass a Message as input argument
Here it is
#Bean
public Consumer<Message<Product>> addProduct() {
return message -> {
Product product = message.getPayload();
// your code
};
}

Creating a Kafka aggregator and joining it with an event

I am trying to create an aggregator wherein I listen for multiple records and consolidate them into one. After consolidation, I wait for a process event by joining a stream and aggregated application in listen() method. On arrival of the process event, some business logic is triggered. I have defined both aggregator and process listener in a single spring boot project.
#Bean
public Function<KStream<FormUUID, FormData>, KStream<UUID, Application>> process()
{
return formEvent -> formEvent.groupByKey()
.reduce((k, v) -> v)
.toStream()
.selectKey((k, v) -> k.getReferenceNo())
.groupByKey()
.aggregate(Application::new, (key, value, aggr) -> aggr.performAggregate(value),
Materialized.<UUID, Application, KeyValueStore<Bytes, byte[]>> as("appStore")
.withKeySerde(new JsonSerde<>(UUID.class))
.withValueSerde(new JsonSerde<>(Application.class)))
.toStream();
}
#Bean
public BiConsumer<KStream<String, ProcessEvent>, KTable<String, Application>> listen()
{
return (eventStream, appTable) ->
{
eventStream.join(appTable, (event, app) -> app)
.foreach((k, app) -> app.createQuote());
};
}
However, now I am facing SerializationException. The first part(aggregation) works fine however the join is failing with exception
java.lang.ClassCastException: com.xxxxx.datamapper.domain.FormData cannot be cast to com.xxxxx.datamapper.domain.Application
at org.apache.kafka.streams.kstream.internals.KStreamPeek$KStreamPeekProcessor.process(KStreamPeek.java:42) ~[kafka-streams-2.3.1.jar:?]
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:117) ~[kafka-streams-2.3.1.jar:?]
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:201) ~[kafka-streams-2.3.1.jar:?]
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:180) ~[kafka-streams-2.3.1.jar:?]
org.apache.kafka.streams.errors.ProcessorStateException: task [0_0] Failed to flush state store APPLICATION_TOPIC-STATE-STORE-0000000001
at org.apache.kafka.streams.processor.internals.ProcessorStateManager.flush(ProcessorStateManager.java:280) ~[kafka-streams-2.3.1.jar:?]
at org.apache.kafka.streams.processor.internals.AbstractTask.flushState(AbstractTask.java:204) ~[kafka-streams-2.3.1.jar:?]
at org.apache.kafka.streams.processor.internals.StreamTask.flushState(StreamTask.java:519) ~[kafka-streams-2.3.1.jar:?]
I think, the problem is in my application.yml. Since the "spring.json.key.default.type" property is set as FormUUID the same is being used for Application object present in listen method. I want to configure the type for remaining types UUID, Application and ProcessEvent in my application.yml. but not sure how to configure the mapping type for each consumer and producer defined.
spring.cloud:
function.definition: process;listen
stream:
kafka.streams:
bindings:
process-in-0.consumer.application-id: form-aggregator
listen-in-0.consumer.application-id: event-processor
listen-in-1.consumer.application-id: event-processor
binder.configuration:
default.key.serde: org.springframework.kafka.support.serializer.JsonSerde
default.value.serde: org.springframework.kafka.support.serializer.JsonSerde
spring.json.key.default.type: com.xxxx.datamapper.domain.FormUUID
spring.json.value.default.type: com.xxxx.datamapper.domain.FormData
commit.interval.ms: 1000
bindings:
process-in-0.destination: FORM_DATA_TOPIC
process-out-0.destination: APPLICATION_TOPIC
listen-in-0.destination: PROCESS_TOPIC
listen-in-1:
destination: APPLICATION_TOPIC
consumer:
useNativeDecoding: true
If you are using the latest Horsham versions of Spring Cloud Stream Kafka Streams binder, you do not need to set any explicit Serdes for inbound and outbound. However, you still need to provide them wherever the Kafka Streams API requires them, as in the case of your aggregate method call above. If you are facing this serialization error on the inbound of the second processor, I suggest trying to remove all Serdes from the configuration. You can simplify as it below (given that you are on the latest Horsham release). The binder will infer the correct Serdes to use on the inbound/outbound. One benefit of delegating this to the binder is that you don't need to provide any explicit key/value types through configuration because the binder will introspect for the types. Make sure your POJO types that you are using are JSON friendly. See if that works. If you are still having issues, please create a small sample application where we can reproduce the issue and we will take a look.
spring.cloud:
function.definition: process;listen
stream:
kafka.streams:
bindings:
process-in-0.consumer.application-id: form-aggregator
listen-in-0.consumer.application-id: event-processor
listen-in-1.consumer.application-id: event-processor
binder.configuration:
commit.interval.ms: 1000
bindings:
process-in-0.destination: FORM_DATA_TOPIC
process-out-0.destination: APPLICATION_TOPIC
listen-in-0.destination: PROCESS_TOPIC
listen-in-1.destination: APPLICATION_TOPIC

How to inject KafkaTemplate in Quarkus

I'm trying to inject a KafkaTemplate to send a single message. I'm developing a small function that lies outside the reactive approach.
I can only find examples that use #Ingoing and #Outgoing from Smallrye but I don't need a KafkaStream.
I tried with Kafka-CDI but I'm unable to inject the SimpleKafkaProducer.
Any ideas?
For Clement's answer
It seems the right direction, but executing orders.send("hello"); I receive this error:
(vert.x-eventloop-thread-3) Unhandled exception:java.lang.IllegalStateException: Stream not yet connected
I'm consuming from my topic by command line, Kafka is up and running, if I produce manually I can see the consumed messages.
It seems relative to this sentence by the doc:
To use an Emitter for the stream hello, you need a #Incoming("hello")
somewhere in your code (or in your configuration).
I have this code in my class:
#Incoming("orders")
public CompletionStage<Void> consume(KafkaMessage<String, String> msg) {
log.info("Received message (topic: {}, partition: {}) with key {}: {}", msg.getTopic(), msg.getPartition(), msg.getKey(), msg.getPayload());
return msg.ack();
}
Maybe I've forgotten some configurations?
So, you just need to use an Emitter:
#Inject
#Stream("orders") // Emit on the channel 'orders'
Emitter<String> orders;
// ...
orders.send("hello");
And in your application.properties, declare:
## Orders topic (WRITE)
mp.messaging.outgoing.orders.type=io.smallrye.reactive.messaging.kafka.Kafka
mp.messaging.outgoing.orders.topic=orders
mp.messaging.outgoing.orders.bootstrap.servers=localhost:9092
mp.messaging.outgoing.orders.key.serializer=org.apache.kafka.common.serialization.StringSerializer
mp.messaging.outgoing.orders.value.serializer=org.apache.kafka.common.serialization.StringSerializer
mp.messaging.outgoing.orders.acks=1
To avoid Stream not yet connected exception, as suggested by doc:
To use an Emitter for the stream hello, you need a #Incoming("hello")
somewhere in your code (or in your configuration).
Assuming you have something like this in your application.properties:
# Orders topic (READ)
smallrye.messaging.source.orders-r-topic.type=io.smallrye.reactive.messaging.kafka.Kafka
smallrye.messaging.source.orders-r-topic.topic=orders
smallrye.messaging.source.orders-r-topic.bootstrap.servers=0.0.0.0:9092
smallrye.messaging.source.orders-r-topic.key.deserializer=org.apache.kafka.common.serialization.StringDeserializer
smallrye.messaging.source.orders-r-topic.value.deserializer=org.apache.kafka.common.serialization.StringDeserializer
smallrye.messaging.source.orders-r-topic.group.id=my-group-id
Add something like this:
#Incoming("orders-r-topic")
public CompletionStage<Void> consume(KafkaMessage<String, String> msg) {
log.info("Received message (topic: {}, partition: {}) with key {}: {}", msg.getTopic(), msg.getPartition(), msg.getKey(), msg.getPayload());
return msg.ack();
}
Since Clement's answer the #Stream annotation has been deprecated. The #Channel annotation
must be used instead.
You can use an Emitter provided by the quarkus-smallrye-reactive-messaging-kafka dependency to produce message to a Kafka topic.
A simple Kafka producer implementation:
public class MyKafkaProducer {
#Inject
#Channel("my-topic")
Emitter<String> myEmitter;
public void produce(String message) {
myEmitter.send(message);
}
}
And the following configuration must be added to the application.properties file:
mp.messaging.outgoing.my-topic.connector=smallrye-kafka
mp.messaging.outgoing.my-topic.bootstrap.servers=localhost:9092
mp.messaging.outgoing.my-topic.value.serializer=org.apache.kafka.common.serialization.StringSerializer
This will produce string serialized messages to a kafka topic named my-topic.
Note that by default the name of the channel is also the name of the kafka topic in which the data will be produced. This behavior can be changed through the configuration. The supported configuration attributes are described in the reactive Messaging documentation

Any pointers for creating Kafka endpoint support?

I have a fairly immediate need to support Kafka integration testing using the Citrus Framework. I was thinking of taking the existing jms module as an example/framework and using Spring Kafka. Any pointers or gotchas that I should be aware of? I am willing, assuming I'm successful, to donate the module back to the project.
Here is an example of how you can use a Kafka Camel component with Citrus:
#Bean
public CamelContext camelKafkaAdapterContext() throws Exception {
SpringCamelContext context = new SpringCamelContext();
context.addRouteDefinition(new RouteDefinition()
.from("kafka:localhost:9092?topic=test&zookeeperHost=localhost&zookeeperPort=2181&serializerClass=kafka.serializer.StringEncoder")
.to("log:com.consol.citrus.camel?level=DEBUG")
.to("seda:kafka-buffer"));
return context;
}
#Bean
public CamelEndpoint kafkaEndpoint(CamelContext camelContext) {
CamelEndpoint endpoint = new CamelEndpoint();
endpoint.getEndpointConfiguration().setCamelContext(camelContext);
endpoint.getEndpointConfiguration().setEndpointUri("seda:kafka-buffer");
return endpoint;
}
You first define a Camel Context which will be startet when you run any test with Citrus. After it is instantiated, this Camel component will read from the configured topic and send all messages into a buffer seda:kafka-buffer (seda is used only as an example). After which you can use a Citrus CamelEndpoint to read messages from that buffer inside any test.
receive(action -> action.endpoint(kafkaEndpoint)
.messageType(MessageType.JSON)
.payload(...);
Note, this is just an example I have assembled. I haven't tested this exact setup, but it will work once you configure the Camel Context correctly.