New Kafka consumer ignoring earliest offset - apache-kafka

I have a kafka topic(XYZ) with just one partition and one consumer(C1) running on RHEL6 machine. I have copied the same setup on to a RHEL7 machine and stopped the C1 and started the new consumer(C2) using the same group id as the C1. C2 is able to connect, exchange heartbeat messages and print following debug statements repeatedly but not able to consume messages. Received zero records. It uses "earliest" offset reset but seems to ignore it. note: all consumer share the same code version.
Logs:
Fetcher: [Consumer clientId=testclient-0, groupId=mcu] Fetch READ_UNCOMMITTED request at offset 1001 for partition XYZ-0 returned fetch data (error=none, highWaterMark=1001, lastStableOffset=1001, logStartOffset=10 ....
Adding READ_UNCOMMITTED request for partiion...
Sending READ_UNCOMMITTED request for partition...
Resetting offset for partition XYZ-0 to the committed offset 1001
My question is why it is resetting offset to the latest offset value and not consume message from beginning.
Following is what i have tried to resolve the issue.
I am able to consume message from C2 consumer from another topic on same cluster.
I am able to consume message from same topic XYZ from another consumer C3 on a different RHEL 6 machine with exactly same configuration.
I have used a new Group id but no success
Produced message after staring the consumer first but no success.
Note: Kafka client vaersion 1.0
Any help would be highly appreciated. Thank you.

I was able to fix it after setting the following jvm argument.
-Dorg.xerial.snappy.tempdir=/some/other/path/with/execpermissions/
The new host where I set up the consumer didn't have write permissions to /tmp dir(default) which is needed to unpack snappy library. Also make sure /tmp has space left for these libraries. The RHEL6 hosts with old consumers had write permissions. The compression type was not set on topic configuration so it used snappy by default. Other topics where I could consume from had compression type set to producer, so it didn't need snappy libraries and hence worked. The issue had nothing to do with RHEL 6 or RHEL7 version.
More info: Kafka Broker throws error for clients that produce snappy compressed messages

Related

Kafka stream consumer skipping a few offset no log compaction enabled

kafka server version: 3.2.0
kstreams version : 2.7.2
I have a producer, which is producing to topic foo, I can see the offset from the producer in the logs.
We have kafka stream application reading from the same topic foo. What I am observing is that the consumer skips reading offset. Sometime the skip is over 30 to 40 offsets. I am printing the offset in process method using ProcessorContext.offset() method.
Skipping of offset seems to be very common, will using ProcessorContext.offset() result in every offset being printed ?.
Some points
No kafka rebalance has occurred.
No restarts of the container
We have 3 state store defined in the streams application, and the change log topic has replication factor of 1.
We have kafka broker outage where few broker were down some extend period of time, about 3 weeks back. I dont know how things impact the message i should consumer today.
We have NOT set processing.guarantee, so default should be AT_LEAST_ONCE. We do not have transactions enabled, so it cant be transactional messages. which are skipped
The log to print offset if the first line in the process method.
Question:
What internal kafka stream logs can I see to see if messages are consumed.
Any reason why the messages could be skipped

Kafka : Failed to update metadata after 60000 ms with only one broker down

We have a kafka producer configured as -
metadata.broker.list=broker1:9092,broker2:9092,broker3:9092,broker4:9092
serializer.class=kafka.serializer.StringEncoder
request.required.acks=1
request.timeout.ms=30000
batch.num.messages=25
message.send.max.retries=3
producer.type=async
compression.codec=snappy
Replication Factor is 3 and total number of partition currently is 108
Rest of the properties are default.
This producer was running absolutely fine. Then, due to some reason, one of the broker went down. Then, our producer started to show the log as -
"Failed to update metadata after 60000 ms". Nothing else was there in the log and we were seeing this error. In some interval, few requests were getting blocked, even if producer was async.
This issue was resolved when the broker was again up and running.
What can be the reason of this? One broker down should not affect the system as a whole as per my understanding.
Posting the answer for someone who might face this issue -
The reason is older version of Kafka Producer. The kafka producers take bootstrap servers as list. In older versions, for fetching metadata, producers will try to connect with all the servers in Round Robin fashion. So, if one of the broker is down, the requests going to this server will fail and this message will come.
Solution:
Upgrade to newer producer version.
can reduce metadata.fetch.timeout.ms settings: This will ensure the main thread is not getting blocked and send will fail soon. Default value is 60000ms. Not needed in higher version
Note: Kafka send method is blocked till the producer is able to write to buffer.
I got the same error because I forgot to create the topic. Once I created the topic the issue was resolved.

Kafka Connect offset.storage.topic not receiving messages (i.e. how to access Kafka Connect offset metadata?)

I am working on setting up a Kafka Connect Distributed Mode application which will be a Kafka to S3 pipeline. I am using Kafka 0.10.1.0-1 and Kafka Connect 3.1.1-1. So far things are going smoothly but one aspect that is important to the larger system I am working with requires knowing offset information of the Kafka -> FileSystem pipeline. According to the documentation, the offset.storage.topic configuration will be the location the distributed mode application uses for storing offset information. This makes sense given how Kafka stores consumer offsets in the 'new' Kafka. However, after doing some testing with the FileStreamSinkConnector, nothing is being written to my offset.storage.topic which is the default value: connect-offsets.
To be specific, I am using a Python Kafka producer to push data to a topic and using Kafka Connect with the FileStreamSinkConnect to output the data from the topic to a file. This works and behaves as I expect the connector to behave. Additionally, when I stop the connector and start the connector, the application remembers the state in the topic and there is no data duplication. However, when I go to the offset.storage.topic to see what offset metadata is stored, there is nothing in the topic.
This is the command that I use:
kafka-console-consumer --bootstrap-server kafka1:9092,kafka2:9092,kafka3:9092 --topic connect-offsets --from-beginning
I receive this message after letting this command run for a minute or so:
Processed a total of 0 messages
So to summarize, I have 2 questions:
Why is offset metadata not being written to the topic that should be storing this even though my distributed application is keeping state correctly?
How do I access offset metadata information for a Kafka Connect distributed mode application? This is 100% necessary for my team's Lambda Architecture implementation of our system.
Thanks for the help.
Liju is correct, connect-offsets is used to track offsets for source connectors (which have a producer but not a consumer). Sink connector have a consumer and track offsets the usual way - __consumer_offsets topic
The best way to look at last committed offsets is with the consumer group tool:
bin/kafka-consumer-groups.sh --group connect-elastic-login-connector --bootstrap-server localhost:9092 --describe
The group name is always "connect-" and the connector name (in my case, elastic-login-connector). This will show the latest offset committed by the group, which basically acknowledges that all messages up to this offset were written to Elastic.
The offsets might be committing to the kafka default offset commit topic i.e. _consumer_offsets
The new S3 Connector released by Confluent might be of interested to you.
From what you describe, maybe it can significantly simplify your goal of exporting records from Kafka to your S3 buckets.

kafka messages monitoring to show the actual messages published or consumed

I have kafka installed on my local server, and through some other application running in the server produers are publishing messages to the brokers inside of my kafka server, through the zookeeper I can easily see the health of my kafka which shows all the topics created inside my kafka server a, offsets inside topics etc etc, so only thing zookeeper is not able to show is the messages that are inside the individual topics, so someone recommended kafka-manager tool, I installed and ran it, it worked fine, it showed lot of information from my kafka server, but still it was not able to show real messages that are published or consumed by respective consumers inside my kafka server, so my question is , is there a way/tool/code to find out the messages published or consumed, I mean in addition to this kafka-manager or I have Install some plugins inside of the same kafka-manager so that it will also show the respective messages.Thanks in advance!!
A Kafka broker cannot tell you how many messages are have been consumed for a given consumer on a given topic. The only thing that a Kafka broker knows about is the current log offset of the consumer, and the current max offset of the log. It cannot however, tell you how many messages before the current offset the consumer actually received, as it keeps no counters around this, and the consumer defines its own initial position (as well as being able to seek to various places in the log).
You can get both of these numbers using the $KAFKA_HOME/bin/kafka-consumer-offset-checker.sh script.

kafka new producer is not able to update metadata after one of the broker is down

I have an kafka environment which has 2 brokers and 1 zookeeper.
While I am trying to produce messages to kafka, if i stop broker 1(which is the leader one) the client stops producing messaging and give me the below error although the broker 2 is elected as a new leader for the topic and partions.
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.
After 10 minutes passed, since broker 2 is new leader i expected producer to send data to broker 2 but it continued failing by giving above exception. lastRefreshMs and lastSuccessfullRefreshMs is still same although the metadataExpireMs is 300000 for producer.
I am using kafka new Producer implementation on producer side.
It seems that when producer is initiated, it binds to one broker and if that broker goes down it is not even trying to connect to another brokers in cluster.
But my expectation is if a broker goes down, it should directly check metadata for another brokers that are available and send data to them.
Btw my topic is 4 partition and has replication factor of 2. Giving this info in case it makes sense.
Configuration params.
{request.timeout.ms=30000, retry.backoff.ms=100, buffer.memory=33554432, ssl.truststore.password=null, batch.size=16384, ssl.keymanager.algorithm=SunX509, receive.buffer.bytes=32768, ssl.cipher.suites=null, ssl.key.password=null, sasl.kerberos.ticket.renew.jitter=0.05, ssl.provider=null, sasl.kerberos.service.name=null, max.in.flight.requests.per.connection=5, sasl.kerberos.ticket.renew.window.factor=0.8, bootstrap.servers=[10.201.83.166:9500, 10.201.83.167:9500], client.id=rest-interface, max.request.size=1048576, acks=1, linger.ms=0, sasl.kerberos.kinit.cmd=/usr/bin/kinit, ssl.enabled.protocols=[TLSv1.2, TLSv1.1, TLSv1], metadata.fetch.timeout.ms=60000, ssl.endpoint.identification.algorithm=null, ssl.keystore.location=null, value.serializer=class org.apache.kafka.common.serialization.ByteArraySerializer, ssl.truststore.location=null, ssl.keystore.password=null, key.serializer=class org.apache.kafka.common.serialization.ByteArraySerializer, block.on.buffer.full=false, metrics.sample.window.ms=30000, metadata.max.age.ms=300000, security.protocol=PLAINTEXT, ssl.protocol=TLS, sasl.kerberos.min.time.before.relogin=60000, timeout.ms=30000, connections.max.idle.ms=540000, ssl.trustmanager.algorithm=PKIX, metric.reporters=[], compression.type=none, ssl.truststore.type=JKS, max.block.ms=60000, retries=0, send.buffer.bytes=131072, partitioner.class=class org.apache.kafka.clients.producer.internals.DefaultPartitioner, reconnect.backoff.ms=50, metrics.num.samples=2, ssl.keystore.type=JKS}
Use Case:
1- Start BR1 and BR2 Produce data (Leader is BR1)
2- Stop BR2 produce data(fine)
3- Stop BR1(which means there is no active working broker in cluster at this time) and then Start BR2 and produce data (failed although leader is BR2)
4- Start BR1 produce data(leader is still BR2 but data is produced finely)
5- Stop BR2(now BR1 is leader)
6- Stop BR1(BR1 is still leader)
7- Start BR1 produce data(message is produced fine again)
If producer send the latest successful data to BR1 and then all brokers goes down, the producer expects BR1 to get up again although BR2 is up and new leader. Is this an expected behaviour?
After spending hours I figured out the behaviour of kafka in my situation. May be this is a bug or may be this needs to be done this way for the reasons lie under the hood but actually if i would do such implementation i wouldn't do this way :)
When all brokers goes down, if you are able to get up only one broker this must be the broker which went down last in order to produce messages successfully.
Let's say you have 5 brokers; BR1, BR2, BR3, BR4 and BR5. If all goes down and if the lastly dead broker is BR3(which was the last leader), although you start all brokers BR1, BR2, BR4 and BR5, it will not make any sense unless you start BR3.
You need to increase the number of retries.
In your case you need to set it to >=5.
That is the only way for your producer to know that your cluster has a new leader.
Besides that, make sure that all your brokers have a copy of your partition(s). Else you aren't going to get a new leader.
in the latest kafka version, when a broker down and that's have a leader partition which used by a producer. The producer will retry until catch retriable exception, then producer need to update metadata. The new metadata can be fetch from leastLoadNode. So new leader will be updated and producer can write there.