Kafka: Intermittent slowness when consuming first message from topic

Kafka: Intermittent slowness when consuming first message from topic - apache-kafka

I am using Kafka 0.9.0.1.
The first time I start up my application it takes 20-30 seconds to retrieve the "latest" message from the topic
I've used different Kafka brokers (with different configs) yet I still see this behaviour. There is usually no slowness for subsequent messages.
Is this expected behaviour? you can clearly see this below by running this sample application and changing the broker/topic name to your own settings
public class KafkaProducerConsumerTest {
public static final String KAFKA_BROKERS = "...";
public static final String TOPIC = "...";
public static void main(String[] args) throws ExecutionException, InterruptedException {
new KafkaProducerConsumerTest().run();
}
public void run() throws ExecutionException, InterruptedException {
Properties consumerProperties = new Properties();
consumerProperties.setProperty("bootstrap.servers", KAFKA_BROKERS);
consumerProperties.setProperty("group.id", "Test");
consumerProperties.setProperty("auto.offset.reset", "latest");
consumerProperties.setProperty("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
consumerProperties.setProperty("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
MyKafkaConsumer kafkaConsumer = new MyKafkaConsumer(consumerProperties, TOPIC);
Executors.newFixedThreadPool(1).submit(() -> kafkaConsumer.consume());
Properties producerProperties = new Properties();
producerProperties.setProperty("bootstrap.servers", KAFKA_BROKERS);
producerProperties.setProperty("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
producerProperties.setProperty("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
MyKafkaProducer kafkaProducer = new MyKafkaProducer(producerProperties, TOPIC);
kafkaProducer.publish("Test Message");
}
}
class MyKafkaConsumer {
private final Logger logger = LoggerFactory.getLogger(MyKafkaConsumer.class);
private KafkaConsumer<String, Object> kafkaConsumer;
public MyKafkaConsumer(Properties properties, String topic) {
kafkaConsumer = new KafkaConsumer<String, Object>(properties);
kafkaConsumer.subscribe(Lists.newArrayList(topic));
}
public void consume() {
while (true) {
logger.info("Started listening...");
ConsumerRecords<String, Object> consumerRecords = kafkaConsumer.poll(Long.MAX_VALUE);
logger.info("Received records {}", consumerRecords.iterator().next().value());
}
}
}
class MyKafkaProducer {
private KafkaProducer<String, Object> kafkaProducer;
private String topic;
public MyKafkaProducer(Properties properties, String topic) {
this.kafkaProducer = new KafkaProducer<String, Object>(properties);
this.topic = topic;
}
public void publish(Object object) throws ExecutionException, InterruptedException {
ProducerRecord<String, Object> producerRecord = new ProducerRecord<>(topic, "key", object);
Future<RecordMetadata> response = kafkaProducer.send(producerRecord);
response.get();
}
}

The first message should take longer than the rest because when you start a new consumer in the consumer group specified by the statement consumerProperties.setProperty("group.id", "Test");, Kakfka will balance the partitions such that each partition is consumed by atmost one consumer and will distribute the partitions for the topic across multiple consumer processes.
Also, with Kafka 0.9, there is a seperate __consumer_offsets topic which Kafka uses to manage the offsets for each consumer in a consumer group. It is likely that when you start the consumer for the first time, it looks at this topic to fetch the latest offset (there might have been a consumer consuming from this topic earlier which would have got killed, therefore it is necessary to fetch from the correct offset).
These 2 factors will cause a higher latency in the consumption of first set of messages. I can't comment on the exact latency of 20-30 seconds, but I guess this should be the default behaviour.
PS: The exact number might also depend upon other secondary factors like whether you are running the broker & the consumers on the same machine (where there would be no network latency) or on different ones where they would be communicating using TCP.

According to this link:
Try setting group_id=None in your consumer, or call consumer.close()
before ending script, or use assign() not subscribe (). Otherwise you are
rejoining an existing group that has known but unresponsive members. The
group coordinator will wait until those members checkin/leave/timeout.
Since the consumers no longer exist (it's your prior script runs) they have
to timeout.
And consumer.poll() blocks during group rebalance.
So it is correct behavior if you join group with unresponsively members (maybe you terminate the application ungracefully).
Please confirm you call "consumer.close()" before exiting your application.

Just tried your code with minimal logging additions now many times. Here is a typical log output:
2016-07-24 15:12:51,417 Start polling...|INFO|KafkaProducerConsumerTest
2016-07-24 15:12:51,604 producer has send message|INFO|KafkaProducerConsumerTest
2016-07-24 15:12:51,619 producer got response, exiting|INFO|KafkaProducerConsumerTest
2016-07-24 15:12:51,679 Received records [Test Message]|INFO|KafkaProducerConsumerTest
2016-07-24 15:12:51,679 Start polling...|INFO|KafkaProducerConsumerTest
2016-07-24 15:12:54,680 returning on empty poll result|INFO|KafkaProducerConsumerTest
The sequence of events is as expected and in a timely manner. The consumer starts polling, the producer sends the message and receives a result, the consumer receives the message and all this with 300ms. Then the consumer starts polling again and is thrown out 3 seconds later as I change the poll timeout respectively.
I am using Kafka 0.9.0.1 for broker and client libraries. The connection is on localhost and it is a test environment with no load at all.
For completeness, here is the log form the server that was triggered by the exchange above.
[2016-07-24 15:12:51,599] INFO [GroupCoordinator 0]: Preparing to restabilize group Test with old generation 0 (kafka.coordinator.GroupCoordinator)
[2016-07-24 15:12:51,599] INFO [GroupCoordinator 0]: Stabilized group Test generation 1 (kafka.coordinator.GroupCoordinator)
[2016-07-24 15:12:51,617] INFO [GroupCoordinator 0]: Assignment received from leader for group Test for generation 1 (kafka.coordinator.GroupCoordinator)
[2016-07-24 15:13:24,635] INFO [GroupCoordinator 0]: Preparing to restabilize group Test with old generation 1 (kafka.coordinator.GroupCoordinator)
[2016-07-24 15:13:24,637] INFO [GroupCoordinator 0]: Group Test generation 1 is dead and removed (kafka.coordinator.GroupCoordinator)
You may want to compare with your server logs for the same exchange.

Related

Kafka Consumer reading from some topics but not others

I have a Kafka Streams application that reads from a set of topics, enriches the messages with some extra data and then outputs onto another set of topics eg:
topic.blue.unprocessed -> Kafka Streams App -> topic.blue.processed
topic.yellow.unprocessed -> Kafka Streams App -> topic.yellow.processed
The consumer group is set up with a regex topic pattern and will read topics with the prefix topic.
This was working just fine for some time but I recently noticed it had stopped reading messages from some of the topics eg. no messages from topic.yellow.unprocessed are being read but topic.blue.unprocessed is still functioning fine.
I investigated the logs and could see that the app was still reading from topic.yellow.unprocessed a month ago, however there was a large delay of 5 days from when the message appeared on the topic until it was read by the Streams application. Now it is not reading them at all.
Wondering if anyone has an idea why this may be occurring for only some topics - I would expect if there was an issue with the app or consumer ACL that it would affect all topics but I'm not seeing that happening.
I have confirmed topic.yellow.unprocessed is deployed and is receiving messages - they just are not being consumed by the application. Debug logs are enabled but are showing nothing.
See below consumer code:
#Value("${kafka.configuration.inputTopicRegex}")
private String inputTopicRegex;
#Value("${kafka.configuration.deadLetterTopic}")
private String deadLetterTopic;
#Value("${kafka.configuration.brokerAddress}")
private String brokerAddress;
#Autowired
AvroRecodingSerde avroSerde;
public KafkaStreams createStreams() {
return new KafkaStreams(createTopology(), createKakfaProperties());
}
private Properties createKakfaProperties() {
Properties config = new Properties();
config.put(StreamsConfig.APPLICATION_ID_CONFIG, "topic.color.app");
config.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, brokerAddress);
config.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
config.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
config.put(StreamsConfig.DEFAULT_DESERIALIZATION_EXCEPTION_HANDLER_CLASS_CONFIG, LogAndContinueExceptionHandler.class.getName());
config.put(StreamsConfig.PROCESSING_GUARANTEE_CONFIG, StreamsConfig.EXACTLY_ONCE);
config.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG, 86400000);
return config;
}
public Topology createTopology() {
StreamsBuilder builder = new StreamsBuilder();
// stream of records
KStream<String, GenericRecord> ingressStream = builder.stream(Pattern.compile(inputTopicRegex), Consumed.with(Serdes.String(), avroSerde));
KStream<String, GenericRecord> processedStream = ingressStream.transformValues(enrichMessage);
processedStream.to(destinationOrDeadletter, Produced.with(Serdes.String(), avroSerde));
return builder.build();
}

Kafka Consumer multi tenancy

I am new to writing the Kafka consumer, I have scenario in case I have two consumer running under a same group id and I have two partitions.
Suppose that;
Consumer 1===>Linked to ====>Partition 1
Consumer 2===>Linked to ====>Partition 2
I case my consumer-2 is down how can I ensure that my Consumer-1 re-read all the event which came to partition 2 again, I just came across some thing regarding setConsumerRebalanceListener so I have set my container property for this, and for the onPartitionsAssigned method I am setting consumer.seekToBeginning(consumer.assignment())
Is this correct, does this line means my consumer-1 will read all the event from partition-2 as well when the consumer-2 is down and the partition-2 is reassigned to consumer?
I also will request if someone could share some good links where i can read the basics about ConsumerRebalanceListener.
public ConcurrentKafkaListenerContainerFactory<String, MultiTenancyOrgDataMessage> kafkaListenerContainerFactory() {
LOG.debug("ConcurrentKafkaListenerContainerFactory executing");
ConcurrentKafkaListenerContainerFactory<String, MultiTenancyOrgDataMessage> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.getContainerProperties().setAckMode(AckMode.MANUAL_IMMEDIATE);
factory.getContainerProperties().setConsumerRebalanceListener(new ConsumerAwareRebalanceListener() {
#Override
public void onPartitionsAssigned(Consumer<?, ?> consumer, Collection<TopicPartition> partitions) {
consumer.seekToBeginning(consumer.assignment()); // read topic from beginning on service restart
}
});

This is what committing is for - if consumer 2 goes down then any records it has consumed but not committed will be picked up by consumer 1 after a rebalance.
This is one reason that Kafka supports at least once semantics - after rebalance consumer 1 will pick up from the last committed offset and hence may process records that have been successfully processed by consumer 2 if it died before committing.
An example of why you might use a ConsumerRebalanceListener is to deal with pausing across a rebalance - I have written about this at https://chrisg23.blogspot.com/2020/02/why-is-pausing-kafka-consumer-so.html?m=1

Spring Kafka polling with #KafkaListener and listener ack-mode set as record

I'm using #KafkaListener and ConcurrentKafkaListenerContainerFactory to listen to 3 kafka topics and each topic has 10 partitions. I have few questions on how this works.
ConcurrentKafkaListenerContainerFactory<String, String> kafkaListenerContainerFactory(
ConsumerFactory<String, String> consumerFactory) {
ConcurrentKafkaListenerContainerFactory<String, String> factory =
new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory);
factory.setConcurrency(30);
factory.getContainerProperties().setSyncCommits(true);
return factory;
}
#KafkaListener(topics = "topic1", containerFactory="kafkaListenerContainerFactory")
public void handleMessage(final ConsumerRecord<Object, String> arg0) throws Exception {
}
#KafkaListener(topics = "topic2", containerFactory="kafkaListenerContainerFactory")
public void handleMessage(final ConsumerRecord<Object, String> arg0) throws Exception {
}
#KafkaListener(topics = "topic3", containerFactory="kafkaListenerContainerFactory")
public void handleMessage(final ConsumerRecord<Object, String> arg0) throws Exception {
}
my listener.ackmode is return and enable.auto.commit is set to false and partition.assignment.strategy: org.apache.kafka.clients.consumer.RoundRobinAssignor
1) my understanding about concurrency is, since i set my concurrency (at factory level) to 30, and i have a total of 30 partitions (for all three topic together) to read from, each thread will be assigned one partition. Is my understanding correct? how does it impact if i override concurrency again inside #KafkaListener annotation?
2) When spring call the poll() method, does it poll from all three topics?
3) since i set listener.ackmode is set to return, will it wait until all of the records that were returned in a single poll() to completed before issuing a next poll()? Also what happens if my records are taking longer than max.poll.interval.ms to process? Lets say 1-100 offsets are returned in a single poll() call and my code is only able to process 50 before max.poll.interval.ms is hit, will spring issue another poll at this time because it already hit max.poll.interval.ms? if so will the next poll() return records from offset 51?
really appreciate for your time and help

my listener.ackmode is return
There is no such ackmode; since you don't set it on the factory, your actual ack mode is BATCH (the default). To use ack mode record (if that's what you mean), you must so configure the factory container properties.
my understanding about concurrency is ...
Your understanding is incorrect; concurrency can not be greater than the number of partitions in the topic with the most partitions (if a listener listens to multiple topics). Since you only have 10 partitions in each topic, your actual concurrency is 10.
Overriding the concurrency on a listener simply overrides the factory setting; you always need at least as many partitions as concurrency.
When spring call the poll() method, does it poll from all three topics?
Not with that configuration; you have 3 concurrent containers, each with 30 consumers listening to one topic. You have 90 consumers.
If you have a single listener for all 3 topics, the poll will return records from all 3; but you still may have 20 idle consumers, depending on how the partition assignor allocates the partitions - see the logs "partitions assigned" for exactly how the partitions are allocated. The round robin assignor should distribute them ok.
will spring issue another poll at this time
Spring has no control - if you are taking too long, the Consumer thread is in the listener - the Consumer is not thread-safe so we can't issue an asynchronous poll.
You must process max.poll.records within max.poll.interval.ms to avoid Kafka from rebalancing the partitions.
The ack mode makes no difference; it's all about processing the results of the poll in a timely fashion.

Kafka Consumer API jumping offsets

I am using Kafka Version 2.0 and java consumer API to consume messages from a topic. We are using a single node Kafka server with one consumer per partition. I have observed that the consumer is loosing some of the messages.
The scenario is:
Consumer polls the topic.
I have created One Consumer Per Thread.
Fetches the messages and gives it to a handler to handle the message.
Then it commits the offsets using "At-least-once" Kafka Consumer semantics to commit Kafka offset.
In parallel, I have another consumer running with a different group-id. In this consumer, I'm simply increasing the message counter and committing the offset. There's no message loss in this consumer.
try {
//kafkaConsumer.registerTopic();
consumerThread = new Thread(() -> {
final String topicName1 = "topic-0";
final String topicName2 = "topic-1";
final String topicName3 = "topic-2";
final String topicName4 = "topic-3";
String groupId = "group-0";
final Properties consumerProperties = new Properties();
consumerProperties.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "192.168.13.49:9092");
consumerProperties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.ByteArrayDeserializer");
consumerProperties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.ByteArrayDeserializer");
consumerProperties.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
consumerProperties.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, "100");
consumerProperties.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false");
consumerProperties.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, 1000);
try {
consumer = new KafkaConsumer<>(consumerProperties);
consumer.subscribe(Arrays.asList(topicName1, topicName2, topicName3, topicName4));
} catch (KafkaException ke) {
logTrace(MODULE, ke);
}
while (service.isServiceStateRunning()) {
ConsumerRecords<String, byte[]> records = consumer.poll(Duration.ofMillis(100));
for (TopicPartition partition : records.partitions()) {
List<ConsumerRecord<String, byte[]>> partitionRecords = records.records(partition);
for (ConsumerRecord<String, byte[]> record : partitionRecords) {
processMessage(simpleMessage);
}
}
consumer.commitSync();
}
kafkaConsumer.closeResource();
}, "KAKFA_CONSUMER");
} catch (Exception e) {
}

There seems to be a problem with usage of subscribe() here.
Subscribe is used to subscribe to topics and not to partitions. To use specific partitions you need to use assign(). Read up the extract from the documentation:
public void subscribe(java.util.Collection topics)
Subscribe to the given list of topics to get dynamically assigned
partitions. Topic subscriptions are not incremental. This list will
replace the current assignment (if there is one). It is not possible
to combine topic subscription with group management with manual
partition assignment through assign(Collection). If the given list of
topics is empty, it is treated the same as unsubscribe(). This is a
short-hand for subscribe(Collection, ConsumerRebalanceListener), which
uses a noop listener. If you need the ability to seek to particular
offsets, you should prefer subscribe(Collection,
ConsumerRebalanceListener), since group rebalances will cause
partition offsets to be reset. You should also provide your own
listener if you are doing your own offset management since the
listener gives you an opportunity to commit offsets before a rebalance
finishes.
public void assign(java.util.Collection partitions)
Manually assign a list of partitions to this consumer. This interface
does not allow for incremental assignment and will replace the
previous assignment (if there is one). If the given list of topic
partitions is empty, it is treated the same as unsubscribe(). Manual
topic assignment through this method does not use the consumer's group
management functionality. As such, there will be no rebalance
operation triggered when group membership or cluster and topic
metadata change. Note that it is not possible to use both manual
partition assignment with assign(Collection) and group assignment with
subscribe(Collection, ConsumerRebalanceListener).

You probably shouldn't do what you're doing. You should use subscribe, and use multiple partitions per topic, and multiple consumers in the group for high availability, and allow the consumer to handle the offsets for you.
You don't describe why you're trying to process your topics in this custom way? It's advanced and leads to issues.
The timestamps on your instances should not have to be synchronised to do normal topic processing.
If you're looking for more performance or to isolate records more carefully to avoid "head of line blocking" consider something like Parallel Consumer (PC).
It also tracks per record acknowledgement, among other things. Check out Parallel Consumer on GitHub (it's open source BTW, and I'm the author).

Loading offsets and metadata blocks KafkaConsumer after broker restart for a long time

we have the problem that sometimes calls to the 'poll' method of the new KafkaConsumer hangs for as long as 20 to 30 Minutes after one out of three kafka brokers got restartet !
We are using a 3 broker kafka setup (0.9.0.1).
Our Consumer-Processes use the new Java KafkaConsumer-API and we are
assigning to specific TopicPartitions.
for different reasons i can't show the real code here, but basically our code works like this :
Properties consumerProps=loadConsumerProperties();
// bootstrap.servers=<IP1>:9092,<IP2>:9092,<IP3>:9092
// group.id="consumer_group_gwbc2
// enable.auto.commit=false
// auto.offset.reset=latest
// session.timeout.ms=30000
// key.deserializer=org.apache.kafka.common.serialization.StringDeserializer
// value.deserializer=org.apache.kafka.common.serialization.ByteArrayDeserializer
KafkaConsumer<String, byte[]> consumer = new KafkaConsumer<>(consumerProps);
consumer.assign(Arrays.asList(new TopicPartition("someTopic",0)));
while (true) {
// THIS CALL sometimes blocks for a very long Time after a broker restart
ConsumerRecords<String, byte[]> records = kafkaConsumer.poll(200);
Iterator<ConsumerRecord<String, byte[]>> recordIter = records.iterator();
while (recordIter.hasNext()) {
ConsumerRecord<String, byte[]> record = recordIter.next();
// Very fast, actually just sending a UDP Paket via Netty.
processRecord(record);
if (lastCommitHappendFiveOrMoreSecondsAgo()) {
kafkaConsumer.commitAsync();
}
}
}
kafka-topics.sh describes the __consumer_offsets topic as follows
Topic:__consumer_offsets PartitionCount:50
ReplicationFactor:3 Configs:segment.bytes=104857600,
cleanup.policy=compact,compression.type=uncompressed
the server.log of the restarted broker shows that loading the offsets from a specific partition of the __consumer_offsets topic takes a long time (in this case about 22 Minutes). This correlates to the time the 'poll' call of the consumer is blocked.
[2016-07-25 16:02:40,846] INFO [Group Metadata Manager on Broker 1]: Loading offsets and group metadata from [__consumer_offsets,15] (kafka.coordinator.GroupMetadataManager)
[2016-07-25 16:25:36,697] INFO [Group Metadata Manager on Broker 1]: Finished loading offsets from [__consumer_offsets,15] in 1375851 milliseconds.
i'am wondering what makes the loading process so slow and what can be done about it !?

Found the reason.
the server.xml configuration files for our brokers contain the property
log.cleaner.enable=false
(by default this property is true as of version 0.9.0.1)
this means that kafkas internal compacted __consumer_offsets topic
is not actually compacted since the log-cleaner is disabled.
in effect some partitions of this topic grew to a size of serveral gigabytes which explains the amount of time needed to read through all of the consumer-offsets data when a new group-coordinator needs to refill it's cache.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse