KafkaRebalancer handle with Spring Configurations - apache-kafka

We are using spring-kafka (1.3.2.RELEASE) in our application.
Right now we are using auto-commit=true in our configurations.
We faced some problem because of same, like same offset getting read multiple times, so we are now planning to do manual commits and possibly save the read offsets in some external repository.
We need to handle kafka rebalances as well.
I have read the documentation, in plain java, rebalance listener is configured using ContainerProperties.
setConsumerRebalanceListener(rebalanceListner);
https://docs.spring.io/spring-kafka/reference/htmlsingle/#_very_very_quick
I am searching for configuring Rebalance listneres using Spring Java Configurations, but unable to find one.
Kindly let me know.
Thanks

If I understand you correctly, you want to have something like this:
#Bean
ContainerProperties containerProperties() {
ContainerProperties containerProperties = new ContainerProperties(SOME_TOPIC);
containerProperties.setConsumerRebalanceListener(myConsumerRebalanceListener());
// Other properties set
return containerProperties;
}
#Bean
ConsumerRebalanceListener myConsumerRebalanceListener() {
return new ConsumerRebalanceListener() {
#Override
public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
}
#Override
public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
}
}
}
That containerProperties bean you can use in the KafkaMessageListenerContainer instance or you can populate that myConsumerRebalanceListener in the AbstractKafkaListenerContainerFactory.getContainerProperties().

Related

Kafka state-store sharing across sub-toplogies

I am trying to create a custom joining consumer to join multiple events.
I have created a topology which have four sub-toplogies(subtopology-0, subtoplogy-1, subtopology-2, subtopology-3) not in the exact order of what is described by topology.describe().
I have created a state-store in three of the sub-toplogies(subtopology-0, subtoplogy-1, subtopology-2) and trying to attach all the state-store created different state-stores using .connectProcessorAndStateStores("PROCESS2", "COUNTS") as per the kafka developer guide https://kafka.apache.org/0110/documentation/streams/developer-guide
Here is the code snippet of how I am creating and attaching processors to the topology.
class StreamCustomizer implements KafkaStreamsInfrastructureCustomizer {
public someMethod(StreamBuilder builder) {
Topology topology = builder.build();
topology.addProcessor("Processor1", new Processor() {...}, "state-store-1).addStateStore(store1,..);
topology.addProcessor("Processor2", new Processor() {...}, "state-store-1)
.addStateStore(store1,..);
topology.addProcessor("Processor3", new Processor() {...}, "state-store-1)
addStateStore(store1,..);
topology.addProcessor("Processor4", new Processor4() {...}, "Processor1", Processor2", "Processor3")
connectProcessorAndStateStores("Prcoessor4", "state-store-1", "state-store-2", "state-store-3");
}
}
This is how the processor is defined for all the sub-toplogies as described above
new Processor {
private ProcessorContext;
private KeyValueStore<K, V> store;
init(ProcessorContext) {
this.context = context;
store = context.getStore("store-name");
}
}
This is hot the processor 4 is written, with all the state-store retrieved in init method from context store.
new Processor4() {
private KeyValueStore<K, V> store1;
private KeyValueStore<K, V> store2;
private KeyValueStore<K, V> store3;
}
I am observing a strange behaviour that with the above code, store1, store2, and store3 all are re-intiailized and no keys are preserved whatever were stored in their respective sub-toplogies(1,2,3). However, the same code works i.e., all state store preserved the key-value stored in their respective sub-topology when state-stores are declared at class level.
class StreamCustomizer implements KafkaStreamsInfrastructureCustomizer {
private KeyValueStore <K, V> store1;
private KeyValueStore <K, V> store2;
private KeyValueStore <K, V> store3;
}
and then in the processor implementation, just init the state-store in init method.
new Processor {
private ProcessorContext;
init(ProcessorContext) {
this.context = context;
store1 = context.getStore("store-name-1");
}
}
Can someone please assist in finding the reason, or if there is anything wrong in this topology? Also, I have read in state-store can be shared within the same sub-topology.
Hard to say (the code snippets are not really clear), however, if share state you effectively merge sub-topologies. Thus, if you do it correct, you would end up with a single sub-topology containing all your processor.
As long as you see 4 sub-topologies, state store are not shared yet, ie, not connected correctly.

Quarkus Kafka - Batch/Bulk message consumer

I want to batch process. In my use case send kafka producer messages are sent one by one. I want to read them as a list in the consumer application. I can do that at the Spring Kafka library. Spring Kafka batch listener
Is there any way to do this with the quarkus-smallrye-reactive-messaging-kafka library?
I tried the example below but got an error.
ERROR [io.sma.rea.mes.provider] (vert.x-eventloop-thread-3) SRMSG00200: The method org.MyConsumer#aggregate has thrown an exception: java.lang.ClassCastException: class org.TestConsumer cannot be cast to class io.smallrye.mutiny.Multi (org.TestConsumer is in unnamed module of loader io.quarkus.bootstrap.classloading.QuarkusClassLoader #6f2c0754; io.smallrye.mutiny.Multi is in unnamed module of loader io.quarkus.bootstrap.classloading.QuarkusClassLoader #4c1638b)
application.properties:
kafka.bootstrap.servers=hosts
mp.messaging.connector.smallrye-kafka.group.id=KafkaQuick
mp.messaging.connector.smallrye-kafka.auto.offset.reset=earliest
mp.messaging.incoming.test-consumer.connector=smallrye-kafka
mp.messaging.incoming.test-consumer.value.deserializer=org.TestConsumerDeserializer
TestConsumerDeserializer:
public class TestConsumerDeserializer extends JsonbDeserializer<TestConsumer>{
public TestConsumerDeserializer(){
// pass the class to the parent.
super(TestConsumer.class);
}
}
MyConsumer:
#ApplicationScoped
public class MyConsumer {
#Incoming("test-consumer")
//#Outgoing("aggregated-channel")
public void aggregate(Multi<Message<TestConsumer>> in) {
System.out.println(in);
}
}
Batch support has been added to the Quarkus Kafka connector.
See https://quarkus.io/guides/kafka#receiving-kafka-records-in-batches.
I don't understand the reason why the ClassNotFoundException in the question.
But I found solutions for reading bulk/bach messages using quarkus-smallrye-reactive-messaging-kafka.
Solution 1:
#Incoming("test-consumer-topic")
#Outgoing("aggregated-channel")
public Multi<List<TestConsumer>> aggregate(Multi<TestConsumer> in) {
return in.groupItems().intoLists().every(Duration.ofSeconds(5));
}
#Incoming("aggregated-channel")
public void test(List<TestConsumer> test) {
System.out.println("size: "+ test.size());
}
Solution 2:
#Incoming("test-consumer-topic")
#Outgoing("events-persisted")
public Multi<Message<TestConsumer>> processPayloadStream(Multi<Message<TestConsumer>> messages) {
return messages
.groupItems().intoLists().of(4)
.emitOn(Infrastructure.getDefaultWorkerPool())
.flatMap(messages1 -> {
persist(messages1);
return Multi.createFrom().items(messages1.stream());
}).emitOn(Infrastructure.getDefaultExecutor());
}
public void persist(List<Message<TestConsumer>> messages){
System.out.println("messages size:"+ messages.size());
}
#Incoming("events-persisted")
public CompletionStage<Void> messageAcknowledging(Message<TestConsumer> message){
return message.ack();
}
note: Using the application.properties config in the question.
references:
Support subscribing with Multi<Message<>>...
Get Bulk polled message from kafka

Spring Cloud Stream Kafka Commit Failed since the group is rebalanced

I have got the CommitFailedException for some time-consuming Spring Cloud Stream applications. I know to fix this issue I need to set the max.poll.records and max.poll.interval.ms to match my expectations for the time it takes to process the batch. However, I am not quite sure how to set it for consumers in Spring Cloud Stream.
Exception:
org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records. at
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:808) at
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:691) at
org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1416) at
org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1377) at
org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.commitIfNecessary(KafkaMessageListenerContainer.java:1554) at
org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.processCommits(KafkaMessageListenerContainer.java:1418) at
org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.pollAndInvoke(KafkaMessageListenerContainer.java:739) at
org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:700) at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
java.lang.Thread.run(Thread.java:748)
Moreover, how can I ensure this situation won't happen at all? Or alternatively, how can I inject some sort of roll-back in the case of this exception? The reason is I am doing some other external works and once it is finished I publish the output message accordingly. Therefore, if the message cannot get published due to any issues after the work was done on the external system, I have to revert it back (some sort of atomic transaction over Kafka publish and other external systems).
You can set arbitrary Kafka properties either at the binder level documentation here
spring.cloud.stream.kafka.binder.consumerProperties
Key/Value map of arbitrary Kafka client consumer properties. In addition to support known Kafka consumer properties, unknown consumer properties are allowed here as well. Properties here supersede any properties set in boot and in the configuration property above.
Default: Empty map.
e.g. spring.cloud.stream.kafka.binder.consumerProperties.max.poll.records=10
Or at the binding level documentation here.
spring.cloud.stream.kafka.bindings.<channelName>.consumer.configuration
Map with a key/value pair containing generic Kafka consumer properties. In addition to having Kafka consumer properties, other configuration properties can be passed here. For example some properties needed by the application such as spring.cloud.stream.kafka.bindings.input.consumer.configuration.foo=bar.
Default: Empty map.
e.g. spring.cloud.stream.kafka.bindings.input.consumer.configuration.max.poll.records=10
You can get notified of commit failures by adding an OffsetCommitCallback to the listener container's ContainerProperties and setting syncCommits to false. To customize the container and its properties, add a ListenerContainerCustomizer bean to the application.
EDIT
Async commit callback...
#SpringBootApplication
#EnableBinding(Sink.class)
public class So57970152Application {
public static void main(String[] args) {
SpringApplication.run(So57970152Application.class, args);
}
#Bean
public ListenerContainerCustomizer<AbstractMessageListenerContainer<byte[], byte[]>> customizer() {
return (container, dest, group) -> {
container.getContainerProperties().setAckMode(AckMode.RECORD);
container.getContainerProperties().setSyncCommits(false);
container.getContainerProperties().setCommitCallback((map, ex) -> {
if (ex == null) {
System.out.println("Successful commit for " + map);
}
else {
System.out.println("Commit failed for " + map + ": " + ex.getMessage());
}
});
container.getContainerProperties().setClientId("so57970152");
};
}
#StreamListener(Sink.INPUT)
public void listen(String in) {
System.out.println(in);
}
#Bean
public ApplicationRunner runner(KafkaTemplate<byte[], byte[]> template) {
return args -> {
template.send("input", "foo".getBytes());
};
}
}
Manual commits (sync)...
#SpringBootApplication
#EnableBinding(Sink.class)
public class So57970152Application {
public static void main(String[] args) {
SpringApplication.run(So57970152Application.class, args);
}
#Bean
public ListenerContainerCustomizer<AbstractMessageListenerContainer<byte[], byte[]>> customizer() {
return (container, dest, group) -> {
container.getContainerProperties().setAckMode(AckMode.MANUAL_IMMEDIATE);
container.getContainerProperties().setClientId("so57970152");
};
}
#StreamListener(Sink.INPUT)
public void listen(String in, #Header(KafkaHeaders.ACKNOWLEDGMENT) Acknowledgment ack) {
System.out.println(in);
try {
ack.acknowledge(); // MUST USE MANUAL_IMMEDIATE for this to work.
System.out.println("Commit successful");
}
catch (Exception e) {
System.out.println("Commit failed " + e.getMessage());
}
}
#Bean
public ApplicationRunner runner(KafkaTemplate<byte[], byte[]> template) {
return args -> {
template.send("input", "foo".getBytes());
};
}
}
Set you heartbeat interval to less that 1/3rd of your session timeout. If the broker cannot determine if your consumer is alive, it will initiate a partition rebalance among the remaining consumers. So you have a heartbeat thread to inform the broker that the consumer is alive in case the application is taking a bit longer to process. Change these in your consumer configs:
heartbeat.interval.ms
session.timeout.ms
Try increasing the session timeout if it does not work. You have to fiddle around with these values.

Spring Kafka - access offsetsForTimes to start consuming from specific offset

I have a fairly straightforward Kafka consumer:
MessageListener<String, T> messageListener = record -> {
doStuff( record.value()));
};
startConsumer(messageListener);
protected void startConsumer(MessageListener<String, T> messageListener) {
ConcurrentMessageListenerContainer<String, T> container = new ConcurrentMessageListenerContainer<>(
consumerFactory(this.brokerAddress, this.groupId),
containerProperties(this.topic, messageListener));
container.start();
}
I can consume messages without any issue.
Now, I have the requirement to seek from a specific offset based on the result of a call to offsetsForTimes on the Kafka Consumer.
I understand that I can seek to a certain position using the ConsumerSeekAware interface:
#Override
public void onPartitionsAssigned(Map<TopicPartition, Long> assignments,
ConsumerSeekCallback callback) {
assignments.forEach((t, o) -> callback.seek(t.topic(), t.partition(), ?????));
}
The problem now, is that I do not have access to the Kafka Consumer inside the callback, therefore I have no way to call offsetsForTimes.
Is there any other way to achieve this?
Use a ConsumerAwareRebalanceListener to do the initial seeks (introduced in 2.0).
The current version is 2.2.0.
How to test a ConsumerAwareRebalanceListener?

How can I create a state store that is restorable from an existing changelog topic?

I am using the streams DSL to deduplicate a topic called users:
topology.addStateStore(Stores.keyValueStoreBuilder(Stores.persistentKeyValueStore("users"), byteStringSerde, userSerde));
KStream<ByteString, User> users = topology.stream("users", Consumed.with(byteStringSerde, userSerde));
users.transform(() -> new Transformer<ByteString, User, KeyValue<ByteString, User>>() {
private KeyValueStore<ByteString, User> store;
#Override
#SuppressWarnings("unchecked")
public void init(ProcessorContext context) {
store = (KeyValueStore<ByteString, User>) context.getStateStore("users");
}
#Override
public KeyValue<ByteString, User> transform(ByteString key, User value) {
User user = store.get(key);
if (user != null) {
store.put(key, value);
return new KeyValue<>(key, value);
}
return null;
}
#Override
public KeyValue<ByteString, User> punctuate(long timestamp) {
return null;
}
#Override
public void close() {
}
}, "users");
Given this code, Kafka Streams creates an internal changelog topic for the users store. I am wondering, is there some way I can use the existing users topic instead of creating an essentially identical changelog topic?
PS. I see that StreamsBuilder says this is possible:
However, no internal changelog topic is created since the original input topic can be used for recovery
But following the code to InternalStreamsBuilder#table() and InternalStreamsBuilder#createKTable(), I am not seeing how it's achieving this effect.
Not all thing the DSL does are possible at the Processor API level -- it's using some internals, that are not part of public API to achieve what you describe.
It's the call to InternalTopologyBuilder#connectSourceStoreAndTopic() that does the trick (cf. InternalStreamsBuilder#table()).
For your use case about de-duplication, it seem that you need two topics though (depending what de-duplication logic you apply). Restoring via a changelog topic does key-based updates and thus does not consider values (that might be part of your deduplication logic, too).