Spring KafkaEmbedded - problem consuming messages - apache-kafka

I have problem using KafkaEmbedded from https://mvnrepository.com/artifact/org.springframework.kafka/spring-kafka-test/2.1.10.RELEASE
I'm using KafkaEmbedded to create Kafka broker for testing producer/consumer pipelines. These producers/consumers are standard clients from kafka-clients. I'm not using Spring Kafka clients.
Everything is working, the code works fine, but I have to use consumeFromEmbeddedTopics() method from KafkaEmbedded to make consumer works. If I won't use this method, the consumer does not get any messages.
There are two problems I have with this method: first, it needs KafkaConsumer as a parameter(and I don't want to expose it in class) and invoking this method gives ConcurrentModificationException when an object invokes poll using #Scheduled.
I'm using auto.offset.reset property so it's a different thing.
My question is: how to correctly consume records from KafkaEmbedded without invoking these consumeFromEmbeddedTopics() methods?

There is nothing special about that method, it simply subscribes the consumer to the topic(s) and polls it.
There is no reason you can't do the same with your Consumer.

Related

How to disable a Spring Cloud Stream Kafka consumer

Here is my situation:
We have a Spring cloud Stream 3 Kafka service connected to multiple topics in the same broker but I want to control connecting to a specific topic based on properties.
Every topic has its own binder and binding but the broker is the same for all.
I tried disabling the binding (that was the only solution I found so far) by using the property below and that works for the StreamListener to not receive messages but the connection to the topic and rebalancing is still happening.
spring:
cloud:
stream:
bindings:
...
anotherBinding:
consumer:
...
autostartup: false
I wonder if there is any setting on binder level that prevents it from starting. One of the topics consumer should only be available in one of the environments.
Thanks
Disabling the bindings by setting autoStartup to false should work, I am not sure what the issue is.
It doesn't look like you are using the new functional model, but the StreamListener. If you are using the functional model, here is another thing that you can try. You can disable the bindings by not including the corresponding functions at runtime. For example, assume you have the following two consumers.
#Bean
public Consumer<String> one() {}
#Bean
public Consumer<String> two() {}
When running this app, you can provide the property spring.cloud.function.definition to include/exclude functions. For instance, when you run it with spring.cloud.function.definition=one, then the consumer two will not be activated at all. When running with spring.cloud.function.definition=two, then the consumer one will not be activated.
The downside to the above approach is that if you decide to start the other function once the app started (given autoStartup is false on the other function), it will not work as it was not part of the original bindings through spring.cloud.function.definition. However, based on your requirements, this is probably not an issue as you know which environments are targeted for the corresponding topics. In other words, if you know that consumer one needs to always consume from the topic one, then you don't include consumer two as part of the definition.

Fully Transactional Spring Kafka Consumer/Listener

Currently, I have a Kafka Listener configured with a ConcurrentKafkaListenerContainerFactory and a SeekToCurrentErrorHandler (with a DeadLetterPublishingRecoverer configured with 1 retry).
My Listener method is annotated with #Transactional (and also all the methods in my Services that interact with the DB).
My Listener method does the following:
Receive message from Kafka
Interact with several services that save different parts of the received data to the DB
Ack message in Kafka (i.e., commit offset)
If it fails somewhere in the middle, it should rollback and retry until max retries.
Then send message to DLT.
I'm trying to make this method fully transactional, i.e., if something fails all previous changes are rolled back.
However, the #Transactional annotation in the Listener method is not enough.
How can I achieve this?
What configurations should I employ to make the Listener method fully transactional?
If you are not also publishing to Kafka from the listener, there is no need (or benefit) to using Kafka transactions; just overhead. The STCEH + DLPR is enough.
If you are also publishing to Kafka (and want those to be rolled back too), then see the documentation - configure a KafkaTransactionManager in the listener container.

Is it better to keep a Kafka Producer open or to create a new one for each message?

I have data coming in through RabbitMQ. The data is coming in constantly, multiple messages per second.
I need to forward that data to Kafka.
In my RabbitMQ delivery callback where I am getting the data from RabbitMQ I have a Kafka producer that immediately sends the recevied messages to Kafka.
My question is very simple. Is it better to create a Kafka producer outside of the callback method and use that one producer for all messages or should I create the producer inside the callback method and close it after the message is sent, which means that I am creating a new producer for each message?
It might be a naive question but I am new to Kafka and so far I did not find a definitive answer on the internet.
EDIT : I am using a Java Kafka client.
Creating a Kafka producer is an expensive operation, so using Kafka producer as a singleton will be a good practice considering performance and utilizing resources.
For Java clients, this is from the docs:
The producer is thread safe and should generally be shared among all threads for best performance.
For librdkafka based clients (confluent-dotnet, confluent-python etc.), I can link this related issue with this quote from the issue:
Yes, creating a singleton service like that is a good pattern. you definitely should not create a producer each time you want to produce a message - it is approximately 500,000 times less efficient.
Kafka producer is stateful. It contains meta info(periodical synced from brokers), send message buffer etc. So create producer for each message is impracticable.

How do I stop attempting to consume messages off of Kafka when at the end of the log?

I have a Kafka consumer that I create on a schedule. It attempts to consume all of the new messages that have been added since the last commit was made.
I would like to shut the consumer down once it consumes all of the new messages in the log instead of waiting indefinitely for new messages to come in.
I'm having trouble finding a solution via Kafka's documentation.
I see a number of timeout related properties available in the Confluent.Kafka.ConsumerConfig and ClientConfig classes, including FetchWaitMaxMs, but am unable to decipher which to use. I'm using the .NET client.
Any advice would be appreciated.
I have found a solution. Version 1.0.0-beta2 of Confluent's .NET Kafka library provides a method called .Consume(TimeSpan timeSpan). This will return null if there are no new messages to consume or if we're at the partition EOF. I was previously using the .Consume(CancellationToken cancellationToken) overload which was blocking and preventing me from shutting down the consumer. More here: https://github.com/confluentinc/confluent-kafka-dotnet/issues/614#issuecomment-433848857
Another option was to upgrade to version 1.0.0-beta3 which provides a boolean flag on the ConsumeResult object called IsPartitionEOF. This is what I was initially looking for - a way to know when I've reached the end of the partition.
I have never used the .NET client, but assuming it cannot be all that different from the Java client, the poll() method should accept a timeout value in milliseconds, so setting that to 5000 should work in most cases. No need to fiddle with config classes.
Another approach is to find the maximum offset at the time that your consumer is created, and only read up until that offset. This would theoretically prevent your consumer from running indefinitely if, by any chance, it is not consuming as fast as producers produce. But I have never tried that approach.

Kafka stream api - how to test processing with embedded kafka

I'd like to test my processor with embedded kafka. Is it even possible?
When I run the app locally with Kafka & ZK then it works perfectly - my example listener receives the message same as processor (great, both listen to the same topic), but when I test it with embedded kafka then only method annotated with #KafkaListener gets the message but processor doesn't get anything.
I would like to send message to the processor's topic, then check if it sent the result to the other topic.
Is there any solution for such usecase?
It's recommended to test your code using TopologyTestDriver: https://kafka.apache.org/11/documentation/streams/developer-guide/testing.html
You can also use KafkaEmbedded, or maybe better EmbeddedKafkaCluster. For examples, check out the Kafka Streams integration tests: https://github.com/apache/kafka/tree/trunk/streams/src/test/java/org/apache/kafka/streams/integration