Apache Kafka Producer.send randomly hangs - scala

I was trying to get a basic kafka stream working, and I created a producer and used
producer.send(record).get()
to send a ProducerRecord to the kafka stream. It was hanging, so I changed a few config things and eventually it worked. I tried it again with the EXACT same code and EXACT same server configuration, and it hung. I tried it about 10 more times and it hung each time. Why is this happening?

Related

Kafka log lag negative

I was writting a java program to consume messages from kafka.
Kafka version was 2.4.0.
I used spring-kafka-2.5.0.RELEASAE.
I was using cmak 3.0.0.5 to monitor Kakfa.
All seemed well when I first deployed the service. But as I killed the java program and restarted it again, I saw many of the lags of the topic partitions were negative. And I got no clue why
these lags could be negative. We got no requirements on message consistency, so I was not sure if the messages were lost or repeated consumed when I restarted the java program, but I guess they were not lost, but consumed repeatedly.

Springboot kafka streams Application failed if one kafka broker went down

we are using springboot application to develop kafka streams application. Till these days we are using single broker only so we are not facing any issues
But a week ago we created cluster mode with 3 zookeepers and 3 kafka broker for higher availability
we configured our application like the following.
spring.kafka.bootstrap-servers=x.x.x.x:9093,x.x.x.x:9093,x.x.x.x:9093
leader-1
leader-2
leader-3
So we are testing the server down behaviour below are the results
Expected behavior: it has to continuously run without any struggle by consuming and producing the data
Actual behavior: if we down any one server it will throw the exception and broker not available after some time application got stopped
while analysing the cause we found consuming topic is having leader-1 and data producing topic is having leader-2 so when i stop the leader-1 what we thought is it will change to the next leader but it is not?
is this is the default behaviour or else we are doing anything wrong?
can anyone please suggest me how to overcome this issue?

Kafka : Failed to update metadata after 60000 ms with only one broker down

We have a kafka producer configured as -
metadata.broker.list=broker1:9092,broker2:9092,broker3:9092,broker4:9092
serializer.class=kafka.serializer.StringEncoder
request.required.acks=1
request.timeout.ms=30000
batch.num.messages=25
message.send.max.retries=3
producer.type=async
compression.codec=snappy
Replication Factor is 3 and total number of partition currently is 108
Rest of the properties are default.
This producer was running absolutely fine. Then, due to some reason, one of the broker went down. Then, our producer started to show the log as -
"Failed to update metadata after 60000 ms". Nothing else was there in the log and we were seeing this error. In some interval, few requests were getting blocked, even if producer was async.
This issue was resolved when the broker was again up and running.
What can be the reason of this? One broker down should not affect the system as a whole as per my understanding.
Posting the answer for someone who might face this issue -
The reason is older version of Kafka Producer. The kafka producers take bootstrap servers as list. In older versions, for fetching metadata, producers will try to connect with all the servers in Round Robin fashion. So, if one of the broker is down, the requests going to this server will fail and this message will come.
Solution:
Upgrade to newer producer version.
can reduce metadata.fetch.timeout.ms settings: This will ensure the main thread is not getting blocked and send will fail soon. Default value is 60000ms. Not needed in higher version
Note: Kafka send method is blocked till the producer is able to write to buffer.
I got the same error because I forgot to create the topic. Once I created the topic the issue was resolved.

Spring Cloud Stream Kafka Binder autoCommitOnError=false get unexpected behavior

I am using Spring Boot 2.1.1.RELEASE and Spring Cloud Greenwich.RC2, and the managed version for spring-cloud-stream-binder-kafka is 2.1.0RC4. The Kafka version is 1.1.0. I have set the following properties as the messages should not be consumed if there is an error.
spring.cloud.stream.bindings.input.group=consumer-gp-1
...
spring.cloud.stream.kafka.bindings.input.consumer.autoCommitOnError=false
spring.cloud.stream.kafka.bindings.input.consumer.enableDlq=false
spring.cloud.stream.bindings.input.consumer.max-attempts=3
spring.cloud.stream.bindings.input.consumer.back-off-initial-interval=1000
spring.cloud.stream.bindings.input.consumer.back-off-max-interval=3000
spring.cloud.stream.bindings.input.consumer.back-off-multiplier=2.0
....
There are 20 partitions in the Kafka topic and Kerberos is used for authentication (not sure if this is relevant).
The Kafka consumer is calling a web service for every message it processes, and if the web service is unavailable then I expect that the consumer will then try to process the message for 3 times before it moves on to the next message. So for my test, I disabled the webservice, and therefore none of the message could be processed correctly. From the logs I can see that this is happening.
After a while I stopped and then restarted the Kafka consumer (webservice is still disabled). I was expecting that after the restart of the Kafka consumer, it would attempt to process the messages that was not successfully processed the first time around. From the logs (I printed out each message with its fields) after the restart of the Kafka Consumer I couldn't see this happening. I thought the partition might be influencing something, but I check the logs and all 20 partitions were assigned to this single consumer.
Is there a property I have missed? I thought the expected behavior when I restart the consumer the second time, is that Kafka broker would pass the records that were not successfully processed to the consumer again.
Thanks
Parameters working as expected. See comment.

Kafka Consumer's poll() method gets blocked

I'm new to Kafka 0.9 and testing some features I realized a strange behaviour in the Java implemented Consumer (KafkaConsumer).
The Kafka broker is located in an Ambari external machine.
Even thou I could implement a Producer and start sending messages to the external broker, I have no clue why, when the consumer tries to read the events (poll), it gets stuck.
I know the producer is working well, since I do can consume messages through the console consumer (which is working locally on ambari). But when I execute the Java Consumer, nothing happens, just gets stuck. Debugging the code I could see that it gets blocked at the poll() line:
ConsumerRecords<String, String> records = consumer.poll(100);
The timeout does nothing, by the way. Doesn't matter if you put 0, 100 or 1000 ms, the consumer gets blocked in this line and does not timeout nor throw exceptions.
I tried all kind of alternative properties, such as advertised.host.name, advertised.listener,... and so on, with zero luck.
Any help would be highly appreciated. Thanks in advance!
The reason might be the machine where your consumer code is running is unable to connect to zookeeper. Try running the same consumer code on the machine where your Kafka is installed (i tried this and worked for me). I also solved the problem by mentioning the below properties in the server.properties file:
advertised.host.name="ip address which you want to expose"
// In my case, it is the public IP of the EC2 machine, I have kafka and zookeeper installed on the same EC2 machine.
advertised.port=9092
Regarding the statement:
ConsumerRecords<String, String> records = consumer.poll(100);
The above statement doesn't mean the consumer will timeout after 100 ms; rather, it is the polling period. Whatever data it captures in 100 ms is read into records collection.
in my cases,the poll() method finally stuck in the limitless loop ensureCoordinatorReady(), the Coordinator word mentioned me that the coordinator runs on another host.(for test purpose, i only add one broker host to my /etc/hosts while there are three broker totally). so the consumer get the consumer coordinator correctly.
so the solution comes out:
configure the hosts correctly running kafka broker in /etc/hosts file