Messages sent synchronously to Kafka fails when controller broker is restarted - apache-kafka

We need to send message to Kafka in sync as we can not afford loosing messages. Also we can only wait for few seconds for write to complete. We are using following config for producer. During rolling restart we are seeing timeouts for requests when controller broker is restarted at the end.
acks=all
request timeout = 200
retries = 3
Why are we seeing timeouts during
controller broker restart and not when other brokers were restarted during rolling restart?
How much time it takes for a new controller to get elected considering it is not a big deployment?
Can these timeouts be avoided considering the time constraints?

Related

Kafka Cluster, Producer pauses when one node fails

I have a 3 node Kafka cluster with a replication factor of 3, min in sync replicas of 2 and required acks set to all.
Everything works fine until I turn off one broker node, then the Java Producer stops producing for ~15 seconds until it works again. I have figured out that this delay is the zookeeper session timeout ms value. It looks like the Producer is waiting until new partition leaders are elected, and this apparently only occurs after the zookeeper session timed out.
Does anyone know if this behavior is normal and if it's possible to reduce this delay to like a few Ms? I would like that the Producer switches instantly after one node fails.

Kafka Streams Apps Threads fail transaction and are fenced and restarted after Kafka broker restart

We are noticing Streams Apps threads fail transactions during rolling restarts of our Kafka Brokers. The transaction failure causes stream thread fencing which in turn causes a restart of the thread and re-balancing. The re-balancing causes some delay in processing. Our goal is to make broker restarts as smooth as possible and prevent processing delays as much as possible.
For our rolling Broker restarts we use the controlled.shutdown=true configuration, and before each restart we wait for all partitions to be in-sync across all replicas.
For our Streams Apps we have properly configured group.instance.id and an appropriate session.timeout.ms so that rolling restarts of the streams apps themselves are smooth and without re-balances.
From the Kafka Streams app logs I have identified a sequence of events leading up to the fencing:
Broker starts shutting down
App logs error producing to topic due to NOT_LEADER_OR_FOLLOWER
App heartbeats failing because coordinator is restarting broker
App discovers new group coordinator (this bounces a a bit between the restarting broker and live brokers)
App stabilizes
Broker starting up again
App fails to do fetch request to starting broker due to FETCH_SESSION_ID_NOT_FOUND
App discovers starting broker as transaction coordinator
App transaction fails due to one of two reasons:
InvalidProducerEpochException: Producer attempted to produce with an old epoch.
ProducerFencedException: There is a newer producer with the same transactionalId which fences the current one
Stream threads end up in fatal error state, get fenced and restarted which causes a rebalance.
What could be causing the two exceptions that cause stream thread transactions to fail? My intuition is that the broker starting up is assigned as transaction coordinator before it has synced its transaction states with the in-sync brokers. This could explain old epochs or different transactional ids to be known by that broker.
How can we further identify what is going wrong here and how it can be improved?
you can set request.timeout.ms in kafka streams which will make stream API wait for a longer period of time. if kafka broker is not up in a given period of time then only it will throw an exception which can be handled by using ProductionExceptionHandler as described in Handling exceptions in Kafka streams

Kafka is trying to send messages to a broker in "recovery mode"

I have the following setup
3 Kafka (v2.1.1) Brokers
5 Zookeeper instances
Kafka brokers have the following configuration:
auto.create.topics.enable: 'false'
default.replication.factor: 1
delete.topic.enable: 'false'
log.cleaner.threads: 1
log.message.format.version: '2.1'
log.retention.hours: 168
num.partitions: 1
offsets.topic.replication.factor: 1
transaction.state.log.min.isr: '2'
transaction.state.log.replication.factor: '3'
zookeeper.connection.timeout.ms: 10000
zookeeper.session.timeout.ms: 10000
min.insync.replicas: '2'
request.timeout.ms: 30000
Producer configuration (using Spring Kafka) is more or less as following:
...
acks: all
retries: Integer.MAX_VALUE
deployment.timeout.ms: 360000ms
enable.idempotence: true
...
This configuration I read as follows: There are three Kafka brokers, but once one of them dies, it is fine if only at least two replicate and persist the data before sending the ack back (= in sync replicas). In case of failure, Kafka producer will keep retrying for 6 minutes, but then gives up.
This is the scenario which causes me headache:
All Kafka and Zookeeper instances are up and alive
I start sending messages in chunks (500 pcs each)
In the middle of the processing, one of the Brokers dies (hard kill)
Immediately, I see logs like 2019-08-09 13:06:39.805 WARN 1 --- [b6b45bb5c-7dxh7] o.a.k.c.NetworkClient : [Producer clientId=bla-6b6b45bb5c-7dxh7, transactionalId=bla-6b6b45bb5c-7dxh70] 4 partitions have leader brokers without a matching listener, including [...] (question 1: I do not see any further messages coming in, does this really mean the whole cluster is now stuck and waiting for the dead Broker to come back???)
After the dead Broker starts to boot up again, it starts with recovery of the corrupted index. This operation takes more than 10 minutes as I have a lot of data on the Kafka cluster
Every 30s, the producer tries to send the message again (due to request.timeout.ms property set to 30s)
Since my deployment.timeout.ms is se to 6 minutes and the Broker needs 10 minutes to recover and does not persist the data until then, the producer gives up and stops retrying = I potentially lose the data
The questions are
Why the Kafka cluster waits until the dead Broker comes back?
When the producer realizes the Broker does not respond, why it does not try to connect another Broker?
The thread is completely stuck for 6 minutes and waiting until the dead Broker recovers, how can I tell the producer to rather try another Broker?
Am I missing something or is there any good practice to avoid such scenario?
You have a number of questions, I'll take a shot at providing our experience which will hopefully shed light on some of them.
In my product, IBM IDR Replication, we had to provide information for robustness to customers who's topics were being rebalanced, or whom had lost a broker in their clusters. The results of some of our testing was the simply setting the request timeout was not sufficient because in certain circumstances the request would decide not to wait the entire time, and rather perform another retry almost instantly. This burned through the configured number of retries Ie. there are circumstances where the timeout period is circumvented.
As such we instructed users to utilize a formula like the following...
https://www.ibm.com/support/knowledgecenter/en/SSTRGZ_11.4.0/com.ibm.cdcdoc.cdckafka.doc/tasks/robust.html
"To tune the values for your environment, adjust the Kafka producer properties retry.backoff.ms and retries according to the following formula:
retry.backoff.ms * retries > the anticipated maximum time for leader change metadata to propagate in the clusterCopy
For example, you might wish to configure retry.backoff.ms=300, retries=150 and max.in.flight.requests.per.connection=1."
So maybe try utilizing retries and retry.backoff.ms. Note that utilizing retries without idempotence can cause batches to be written out of order if you have more than one in flight... so choose accordingly based on your business logic.
It was our experience that the Kafka Producer writes to the broker which is the leader for the topic, and so you have to wait for the new leader to be elected. When it is, if the retry process is still ongoing, the producer transparently determines the new leader and writes data accordingly.

Kafka Streams app does NOT fail when the Kafka cluster goes down

I have a Kafka Streams app running (0.10.2.1). When I shut down the Kafka cluster the streams app continues to wait for the next message, when the cluster is brought back up, it will resume consuming messages. For the duration that the cluster is down the app appears to be working fine. I have tested this for over 45 minutes.
I would expect Kafka to throw an exception or stop. I have configure a StateListener to log when KafkaStreams shuts down, however it is never invoked.
kafkaStreams.setStateListener((newState, _) => {
if (newState == KafkaStreams.State.NOT_RUNNING) {
Log.error("Kafka died unexpectedly.")
}
})
How do I get Kafka to throw an exception or shutdown when it cannot connect to the cluster?
Note: this assumes that cluster goes down after the app has started
Why would you want the Kafka Streams app to go down?
The app should be resilient to broker failures, that is, keep going patiently until the broker recovers and it seems that this is what it's doing. If you have multiple instances of the Kafka Streams application and one of them loses connectivity to the broker, the load will be re-balanced onto the remaining instances. If each instance that lost connectivity just shut itself down, you would be losing instances and with them losing redundancy and parallelism even if the broker connectivity recovered. The way it is now Kafka Streams is designed for resilience. I'd argue that this is the correct behaviour.
IMHO if you want to detect broker (or connectivity) failures, that's a use case for monitoring, not for introducing failures into Kafka Streams applications.

Maximum value for zookeeper.connection.timeout.ms

Right now we are running kafka in AWS EC2 servers and zookeeper is also running on separate EC2 instances.
We have created a service (system units ) for kafka and zookeeper to make sure that they are started in case the server gets rebooted.
The problem is sometimes zookeeper severs are little late in starting and kafka brokers by that time getting terminated.
So to deal with this issue we are planning to increase the zookeeper.connection.timeout.ms to some high number like 10 mins, at the broker side. Is this a good approach ?
Are there any size effect of increasing the zookeeper.connection.timeout.ms timeout in zookeeper ?
Increasing zookeeper.connection.timeout.ms may or may not handle your problem in hand but there is a possibility that it will take longer time to detect a broker soft failure.
Couple of things you can do:
1) You must alter the System to launch the kafka to delay by 10 mins (the time you wanted to put in zookeper timeout).
2) We are using HDP cluster which automatically takes care of such scenarios.
Here is an explanation from Kafka FAQs:
During a broker soft failure, e.g., a long GC, its session on ZooKeeper may timeout and hence be treated as failed. Upon detecting this situation, Kafka will migrate all the partition leaderships it currently hosts to other replicas. And once the broker resumes from the soft failure, it can only act as the follower replica of the partitions it originally leads.
To move the leadership back to the brokers, one can use the preferred-leader-election tool here. Also, in 0.8.2 a new feature will be added which periodically trigger this functionality (details here).
To reduce Zookeeper session expiration, either tune the GC or increase zookeeper.session.timeout.ms in the broker config.
https://cwiki.apache.org/confluence/display/KAFKA/FAQ
Hope this helps