I have a fairly straightforward Kafka consumer:
MessageListener<String, T> messageListener = record -> {
doStuff( record.value()));
};
startConsumer(messageListener);
protected void startConsumer(MessageListener<String, T> messageListener) {
ConcurrentMessageListenerContainer<String, T> container = new ConcurrentMessageListenerContainer<>(
consumerFactory(this.brokerAddress, this.groupId),
containerProperties(this.topic, messageListener));
container.start();
}
I can consume messages without any issue.
Now, I have the requirement to seek from a specific offset based on the result of a call to offsetsForTimes on the Kafka Consumer.
I understand that I can seek to a certain position using the ConsumerSeekAware interface:
#Override
public void onPartitionsAssigned(Map<TopicPartition, Long> assignments,
ConsumerSeekCallback callback) {
assignments.forEach((t, o) -> callback.seek(t.topic(), t.partition(), ?????));
}
The problem now, is that I do not have access to the Kafka Consumer inside the callback, therefore I have no way to call offsetsForTimes.
Is there any other way to achieve this?
Use a ConsumerAwareRebalanceListener to do the initial seeks (introduced in 2.0).
The current version is 2.2.0.
How to test a ConsumerAwareRebalanceListener?
Related
Scenario:
We are using kafka processor API ( not DSL ) for reading records from source topic, stream
processor will write records to one or more target topics.
We know exactly once can be implemented for the entire processor level by using :
props.put("isolation.level", "read_committed");
But we want to decide based on the incoming records key if we want exactly once or at-least once semantic .
import org.apache.kafka.streams.processor.Processor;
public class StreamRouterProcessor implements Processor<String,Object>
{
private ProcessorContext context;
#Override
public void init(ProcessorContext context) {
}
#Override
public void process(String eventName, String eventMessage) // this is called for each record
{
}
}
Is there a way to select exactly-once or at-least once on the fly for
each record
being processed ( perhaps for each record processed by the process() method above) ? .
For enabling exactly_once semantic you should use StreamsConfig.PROCESSING_GUARANTEE_CONFIG property. ConsumerConfig.ISOLATION_LEVEL_CONFIG (isolation.level) is consumer config and should be use if you use raw Consumer
It is not possible to choose processing guarantees (exactly-once or at-least-once) at message level
I'm using Reactor Kafka to both consume and produce Kafka events. In the case of consuming events my consumer is slow and therefor I need to handle backpressure.
However, I experience that no matter what I call Subscription.request() with, the publisher will publish all events from the topic immediately, therefor overwhelming the consumer.
I'm using a custom Subscriber, setting a small number of initial request by calling Subscription.request(), when I subscribe to KafkaReceiver.receive() to do this. To my understanding this is how I tell the publisher how many events my consumer initially wants.
My subscriber:
public class KafkaEventSubscriber extends BaseSubscriber {
private final int numberOfItemsToRequestOnSubscribe;
private final int numberOfItemsToRequestOnNext;
public KafkaEventSubscriber(int numberOfItemsToRequestOnSubscribe,
int numberOfItemsToRequestOnNext) {
this.numberOfItemsToRequestOnSubscribe = numberOfItemsToRequestOnSubscribe;
this.numberOfItemsToRequestOnNext = numberOfItemsToRequestOnNext;
}
#Override
protected void hookOnSubscribe(Subscription subscription) {
subscription.request(numberOfItemsToRequestOnSubscribe);
}
#Override
protected void hookOnNext(EnrichedMetadata value) {
request(numberOfItemsToRequestOnNext);
}
}
How I use the subscriber:
kafkaReceiver.receive().map(ReceiverRecord::value).map(KafkaConsumer::acknowledge).subscribe(new KafkaEventSubscriber(10, 1));
I expect the KafkaReceiver to output 10 events before any call to the subscribers onNext() method is done, but the KafkaReceiver outputs all events that are not already ACK:ed from the topic.
I experience that no matter what we call Subscription.request() with, the publisher will publish all events from the topic immediately, not respecting the backpressure measures I've been taking.
I have this case: users collect orders as order lines. I implemented this with Kafka topic containing events with order changes, they are merged, stored in local key-value store and broadcasted as second topic with order versions.
I need to somehow react to abandoned orders - ones that were started but there was no change for at least last x hours.
Simple solution could be to scan local storage every y minutes and post event of order status change to Abandoned. It seems I cannot access store not from processor... But it is also not very elegant coding. Any suggestions are welcome.
--edit
I cannot just add puctuation to merge/validation transformer, because its output is different and should be routed elsewhere, like on this image (single kafka streams app):
so "abandoned orders processor/transformer" will be a no-op for its input (the only trigger here is time). Another thing is that i such case (as on image) my transformer gets ForwardingDisabledProcessorContext upon initialization so I cannot emit any messages in punctuator. I could just pass there kafkaTemplate bean and just produce new messages, but then whole processor/transformer is just empty shell only to access local store...
this is snippet of code I used:
public class AbandonedOrdersTransformer implements ValueTransformer<OrderEvent, OrderEvent> {
#Override
public void init(ProcessorContext processorContext) {
this.context = processorContext;
stateStore = (KeyValueStore)processorContext.getStateStore(KafkaConfig.OPENED_ORDERS_STORE);
//main scheduler
this.context.schedule(TimeUnit.MINUTES.toMillis(5), PunctuationType.WALL_CLOCK_TIME, (timestamp) -> {
KeyValueIterator<String, Order> iter = this.stateStore.all();
while (iter.hasNext()) {
KeyValue<String, Order> entry = iter.next();
if(OrderStatuses.NEW.equals(entry.value.getStatus()) &&
(timestamp - entry.value.getLastChanged().getTime()) > TimeUnit.HOURS.toMillis(4)) {
//SEND ABANDON EVENT "event"
context.forward(entry.key, event);
}
}
iter.close();
context.commit();
});
}
#Override
public OrderEvent transform(OrderEvent orderEvent) {
//do nothing
return null;
}
#Override
public void close() {
//do nothing
}
}
We are using spring-kafka (1.3.2.RELEASE) in our application.
Right now we are using auto-commit=true in our configurations.
We faced some problem because of same, like same offset getting read multiple times, so we are now planning to do manual commits and possibly save the read offsets in some external repository.
We need to handle kafka rebalances as well.
I have read the documentation, in plain java, rebalance listener is configured using ContainerProperties.
setConsumerRebalanceListener(rebalanceListner);
https://docs.spring.io/spring-kafka/reference/htmlsingle/#_very_very_quick
I am searching for configuring Rebalance listneres using Spring Java Configurations, but unable to find one.
Kindly let me know.
Thanks
If I understand you correctly, you want to have something like this:
#Bean
ContainerProperties containerProperties() {
ContainerProperties containerProperties = new ContainerProperties(SOME_TOPIC);
containerProperties.setConsumerRebalanceListener(myConsumerRebalanceListener());
// Other properties set
return containerProperties;
}
#Bean
ConsumerRebalanceListener myConsumerRebalanceListener() {
return new ConsumerRebalanceListener() {
#Override
public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
}
#Override
public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
}
}
}
That containerProperties bean you can use in the KafkaMessageListenerContainer instance or you can populate that myConsumerRebalanceListener in the AbstractKafkaListenerContainerFactory.getContainerProperties().
I am trying to achieve concurrent processing of Kafka Topic-Partitions using Reactor Kafka with auto-acknowledgement. The documentation here makes it seem like this is possible:
http://projectreactor.io/docs/kafka/milestone/reference/#concurrent-ordered
The only difference between that and what I am attempting is I am using auto-acknowledgement.
I have the following code (relevant method is receiveAuto):
public class KafkaFluxFactory<K, V> {
private final Map<String, Object> properties;
public KafkaFluxFactory(Map<String, Object> properties) {
this.properties = properties;
}
public Flux<ConsumerRecord<K, V>> receiveAuto(Collection<String> topics, Scheduler scheduler) {
return KafkaReceiver.create(ReceiverOptions.create(properties).subscription(topics))
.receiveAutoAck()
.flatMap(flux -> flux.groupBy(this::extractTopicPartition))
.flatMap(topicPartitionFlux -> topicPartitionFlux.publishOn(scheduler));
}
private TopicPartition extractTopicPartition(ConsumerRecord<K, V> record) {
return new TopicPartition(record.topic(), record.partition());
}
}
When I use this to create a Flux of Consumer Records from Kafka with a parallel Scheduler (Schedulers.newParallel("debug", 10)), I see that they all end up getting processed on the same Thread.
Any thoughts on what I may be doing wrong?
After quite a bit of trial-and-error plus some rethinking of what I want to accomplish I realized I was trying to solve two problems in one bit of code.
The two things I need are:
In-order processing of Kafka Partitions
Ability to parallelize the processing of each partition
In trying to solve both with this piece of code, I was limiting downstream users' abilities to configure the level of parallelization. I therefore changed the method to return a Flux of GroupedFluxes which provides downstream users with the correct granularity of determining what is parallelizable:
public Flux<GroupedFlux<TopicPartition, ConsumerRecord<K, V>>> receiveAuto(Collection<String> topics) {
return KafkaReceiver.create(createReceiverOptions(topics))
.receiveAutoAck()
.flatMap(flux -> flux.groupBy(this::extractTopicPartition));
}
Downstream, users are able to parallelize each emitted GroupedFlux using whatever Scheduler they wish:
public <V> void work(Flux<GroupedFlux<TopicPartition, V>> flux) {
flux.doOnNext(groupPublisher -> groupPublisher
.publishOn(Schedulers.elastic())
.subscribe(this::doWork))
.subscribe();
}
This has the desired behavior processing each TopicPartition-GroupedFlux in-order and parallel to other GroupedFluxes.
I guess it executes sequentially at least in your consumer. To do a parallel consuming you should convert you flux to ParallelFlux
public ParallelFlux<ConsumerRecord<K, V>> receiveAuto(Collection<String> topics, Scheduler scheduler) {
return KafkaReceiver.create(ReceiverOptions.create(properties).subscription(topics))
.receiveAutoAck()
.flatMap(flux -> flux.groupBy(this::extractTopicPartition))
.flatMap(topicPartitionFlux -> topicPartitionFlux.parallel().runOn(Schedulers.parallel()));
}
After in your consumer function if you want to consume in parallel way you should use method such as:
void subscribe(Consumer<? super T> onNext, Consumer<? super Throwable>
onError, Runnable onComplete, Consumer<? super Subscription> onSubscribe)
Or any other overloaded method with Consumer<T super T> onNext arguments.
If you just use method as below you will consume flux in sequential way
void subscribe(Subscriber<? super T> s)