Messages vanish between kafka producer and consumer - apache-kafka

I have a very simple embedded kafka application: one producer thread and two consumer treads that write top postgres db. The three threads run in a single process. I am using librdkafka to implement my consumer and producer and run apache-kafka as a broker. Message size is approximately 2kB. I have two counters: one incremented when I write (rd_kafka_produce) and another incremented when I read (rd_kafka_consume_batch). If I run my producer fast enough (over 30000 messages/second) the producer counter ends up much larger than the consumer counter (15% or so if I run for 30 seconds). So I am loosing messages somewhere. My first question is how to debug such a problem? The second is what is most probable cause of this problem and how can I fix it?

Related

Kafka consumer can't read from multiple partitions at the same time

I use Confluent.Kafka 1.9.2 C# library to create single Kafka consumer for listening topic with several partitions. Currently consumer drain out all messages from first partition and only then goes to next. As I know from KIP, I can avoid such behavior and achieve round-robin by changing max.partition.fetch.bytes parameter to lower value. I changed this value to 5000 bytes and pushed 10000 messages to first partition and 1000 to second, average size of messages is 2000 bytes, so consumer should to move between partitions every 2-3 messages (if I understand correctly). But it still drains out first partition before consuming second one. My only guess why it don't work as should is latest comment here that such approach can't work with several brokers, btw Kafka server that I use just has 6 brokers. Could it be the reason or maybe something else?

Can offsets be skipped(almost 12000 in some cases) for a particular partition in the kafka broker?

Kafka image :confluentinc/cp-kafka:5.2.1
Kafka client: apache.kafka.client 2.5.0
load:One thread is handling 4 partitions
Noticing some of the offsets are getting skipped (missing) in a partition.First thought it was an issue on the consumer side but In both thread consumer groups logs offsets are getting skipped(also another observation: Consumer thread is taking significant amount of time to jump the skipped offset which is causing lag)This happened when the load on the cluster was high . We are not using transactional kafka or using idempotent configurations.What can be the possible reasons for this?
Producer properties:
ACKS_CONFIG=1
RETRIES_CONFIG=1
BATCH_SIZE_CONFIG=16384);
LINGER_MS_CONFIG=1
BUFFER_MEMORY_CONFIG=33554432
KEY_SERIALIZER_CLASS_CONFIG= StringSerializer
VALUE_SERIALIZER_CLASS_CONFIG= ByteArraySerializer
ProducerConfig.RECONNECT_BACKOFF_MS_CONFIG, 1000
Consumer properties:
HEARTBEAT_INTERVAL_MS_CONFIG=6 seconds
SESSION_TIMEOUT_MS_CONFIG=30 sec
ConsumerConfig.MAX_POLL_RECORDS_CONFIG=10
FETCH_MAX_WAIT_MS_CONFIG=200
MAX_PARTITION_FETCH_BYTES_CONFIG=1048576
FETCH_MAX_BYTES_CONFIG=31457280
AUTO_OFFSET_RESET_CONFIG=latest
ENABLE_AUTO_COMMIT_CONFIG=false
Edit
there were no consumer rebalances checked the logs during this time so no consumer rebalances
we have two consumer groups (in different data centers )and both the groups skipped these offsets.So ruling out any issue from the consumer side.
-Both had the same pattern the consumer eg-> both consumer threads stopped consuming till offset 112300 and after 30 mins started consuming after skipping 12k offsets. And the threads were consuming other partitions. This only happened for 3-4 partitions.
-so what I’m wondering is it normal to have such huge offset gaps ? (During high loads)Didn’t find anything concrete when going through docs. And what can cause this issue - is it from the broker side or the producer side to have ghost offsets

Kafka Topic Lag is keep increasing gradually when Message Size is huge

I am using the Kafka Streams Processor API to construct a Kafka Streams application to retrieve messages from a Kafka topic. I have two consumer applications with the same Kafka Streams configuration. The difference is only in the message size. The 1st one has messages with 2000 characters (3KB) while 2nd one has messages with 34000 characters (60KB).
Now in my second consumer application I am getting too much lag which increases gradually with the traffic while my first application is able to process the messages at the same time without any lag.
My Stream configuration parameters are as below,
application.id=Application1
default.key.serde=org.apache.kafka.common.serialization.Serdes$StringSerde
default.value.serde=org.apache.kafka.common.serialization.Serdes$StringSerde
num.stream.threads=1
commit.interval.ms=10
topology.optimization=all
Thanks
In order to consume messages faster, you need to increase the number of partitions (if it's not yet done, depending on the current value), and do one of the following two options:
1) increase the value for the config num.stream.threads within your application
or
2) start several applications with the same consumer group (the same application.id).
as for me, increasing num.stream.threads is preferable (until you reach the number of CPUs of the machine your app runs on). Try gradually increasing this value, e.g go from 4 over 6 to 8, and monitor the consumer lag of your application.
By increasing num.stream.threads your app will be able to consume messages in parallel, assuming you have enough partitions.

kafka increase session timeout so that consumers will not get the same messsage

I have a Java app that consumes and produce messages from kafka.
I have a threadpool of 5 thread, each thread creates a consumer and since I have 5 partitions the job is decided between them.
i have a problem that 2 threads are getting the same message since the hearthbeat doesn't comes to the broker since each message processing takes about an hour.
I tried to increase the session.timeout.ms in the broker and also changed the group.min.session.timeout.ms so that the max value will allow it.
In this case the consumer cannot be started.
Any ideas?
The keep alive is not sent so thats not true as far as it seems

kafka consumer sessions timing out

We have an application that a consumer reads a message and the thread does a number of things, including database accesses before a message is produced to another topic. The time between consuming and producing the message on the thread can take several minutes. Once message is produced to new topic, a commit is done to indicate we are done with work on the consumer queue message. Auto commit is disabled for this reason.
I'm using the high level consumer and what I'm noticing is that zookeeper and kafka sessions timeout because it is taking too long before we do anything on consumer queue so kafka ends up rebalancing every time the thread goes back to read more from consumer queue and it starts to take a long time before a consumer reads a new message after a while.
I can set zookeeper session timeout very high to not make that a problem but then i have to adjust the rebalance parameters accordingly and kafka won't pickup a new consumer for a while among other side effects.
What are my options to solve this problem? Is there a way to heartbeat to kafka and zookeeper to keep both happy? Do i still have these same issues if i were to use a simple consumer?
It sounds like your problems boil down to relying on the high-level consumer to manage the last-read offset. Using a simple consumer would solve that problem since you control the persistence of that offset. Note that all the high-level consumer commit does is store the last read offset in zookeeper. There's no other action taken and the message you just read is still there in the partition and is readable by other consumers.
With the kafka simple consumer, you have much more control over when and how that offset storage takes place. You can even persist that offset somewhere other than Zookeeper (a data base, for example).
The bad news is that while the simple consumer itself is simpler than the high-level consumer, there's a lot more work you have to do code-wise to make it work. You'll also have to write code to access multiple partitions - something the high-level consumer does quite nicely for you.
I think issue is consumer's poll method trigger consumer's heartbeat request. And when you increase session.timeout. Consumer's heartbeat will not reach to coordinator. Because of this heartbeat skipping, coordinator mark consumer dead. And also consumer rejoining is very slow especially in case of single consumer.
I have faced a similar issue and to solve that I have to change following parameter in consumer config properties
session.timeout.ms=
request.timeout.ms=more than session timeout
Also you have to add following property in server.properties at kafka broker node.
group.max.session.timeout.ms =
You can see the following link for more detail.
http://grokbase.com/t/kafka/users/16324waa50/session-timeout-ms-limit